Multi-Tenant Customer Routing for Amazon RDS and Amazon Redshift with Heimdall Data

By Erik Brandsberg, Chief Architect – Heimdall Data
By Antony Prasad Thevaraj, Partner Solutions Architect – AWS

Heimdall Data

Software-as-a-service (SaaS) platforms can have challenges scaling up the customer count while managing their data. To isolate a customer’s data, a distinct database per customer is created, and the application simply accesses the proper database as needed for a given customer.

There are limits a single database can handle, however. For light workloads, this may be in the thousands, and for heavier workloads it may be as low as tens of thousands per database.

One solution is to modify the application to look up customer data in the database cluster. Unfortunately, this places the large burden of development and maintenance on the application team. Another way is to use a database proxy to route queries automatically to the proper cluster, which can vastly simplify the process.

For third-party software, modifying the application is typically not possible—implementing a proxy would be the only solution.

In this post, we will explore how the Heimdall Database Proxy can be configured to provide customer query routing for Amazon Relational Database Service (Amazon RDS) for Postgres. Although, any SQL database type including Amazon Redshift can be supported.

Heimdall Data is an AWS Partner with Service Ready designations for Amazon RDS and Amazon Redshift. Heimdall Data offers a database proxy that offloads SQL and improves database scale; deployment does not require code changes.

Solution Implementation

This provides a single Heimdall Proxy endpoint to access data for multiple customers from multiple databases, transparently.

Figure 1 – Topology of customer query routing with the Heimdall Database Proxy.

This implementation requires the following:

Multiple database clusters, each with their own non-overlapping databases and/or users,
One or more updated Heimdall Proxy instances.
Virtual private cloud (VPC) routing configured so the proxies can access the databases and that application can access proxies.

Step 1: Configure a Baseline Cluster Configuration

In the Heimdall Central Manager, there’s an option to configure the system using the configuration wizard. For an Amazon RDS environment, the “AWS Detect” pre-populates many of the configuration fields. For Amazon Redshift, select “Manual Configuration.”

The Data Source screen should be configured as per Figure 2. For Redshift users, configure as needed.

Figure 2 – Example of a Postgres data source configuration.

Note in Figure 2 the use of the variable placeholder ${host}, which maps to a host field stored in a database table; we’ll explain more below.

The default database specified should be the one being used to control the routing of queries; we will be adding a table to this database to provide the customer routing information. Complete the configuration wizard using the defaults, as appropriate.

Step 2: Modify Data Source with the defaultHost Option

After the configuration wizard is complete, in the Data Source tab, under Connection Properties configure the “defaultHost” option with the hostname (excluding domain) of the “primary” database instance as per Figure 3. This instance will be used to store authentication information for all of the users, and will be the source of routing information.

Figure 3 – Modified Postgres data source with defaultHost set.

Step 3: Configure an Authentication Table on the Database

For the routing to work, we will need to configure SQL authentication on the proxy. This will configure the proxy to a per-user or database basis by adjusting the data returned from a SQL query.

The first step is to build the authentication table. In order to conform to general conventions, the Heimdall Proxy uses a table that generally matches the Postgres hba.conf file, although not all columns are needed. For our example, we show two formats: Amazon Redshift and Postgres.

Please note that Redshift doesn’t provide array variables. Instead, we’ll use a simple text field that emulates an array; a username would be stored as “{user1}” or “{user1,user2}”. The Heimdall Proxy parses this into an array and the table definition for Amazon Redshift will be:

CREATE TABLE IF NOT EXISTS heimdall.pg_hba
(
	line_number INTEGER,
enabled BOOLEAN NOT NULL DEFAULT true,
	address VARCHAR(256),
	netmask VARCHAR(256),
	"database" VARCHAR(256),
	user_name VARCHAR(256),
	"host" VARCHAR(256),
	auth_method VARCHAR(256) NOT NULL DEFAULT 'trust'::character varying
)

This table also supports IP network-based controls, if desired.

Next, for Postgres, we will use:

CREATE TABLE public.pg_hba (
	line_number int4 NULL,
	enabled bool NOT NULL DEFAULT true,
	"database" _text NULL,
	host text NULL,
	auth_method varchar NULL DEFAULT 'trust'::character varying
);

In this case, we have simplified the table to the minimum.

Now, we can add in data for the routing. For Amazon Redshift:

line_number	enabled	address	netmask	database	user_name	host	auth_method
5	true	10.0.0.0	255.0.0.0	{all}	{test1}	test1	trust
10	true	0.0.0.0	0.0.0.0	{dev}	{test2}	test2	trust

For Postgres:

line_number	enabled	database	host	auth_method
10	true	{database1}	postgres1	trust
20	true	{database2}	postgres2	trust
30	true	{database3}	postgres3	trust

Step 4: Configure SQL Authentication

In the Virtual Database (VDB) tab, under proxy settings, configure the SQL authentication parameter, providing a privileged user that will be used to access the authentication table (to be created), as per Figure 4 or 5.

Figure 4 – SQL authentication with routing via username.

Figure 5 – SQL authentication with routing only by database on Postgres.

For username-based routing, the authentication query could be:

select * from heimdall.pg_hba where enabled = true and user_name like '%${user}%' order by line_number asc

For database-based routing on Postgres, the query can be:

select * from pg_hba where enabled = true and '${database}' = ANY(database) order by line_number asc

These queries support both array values and non-array values for the user and database column, as Postgres can use an array value to match multiple names in a row, but Amazon Redshift cannot.

Step 5: Testing

Now, everything should be configured and we can start verifying the credentials via the authentication test graphical user interface (GUI) under the VDB authentication settings.

First, use a valid IP address range in our Amazon Redshift example:

Figure 6 – Authentication test passing for user test1 from IP 10.0.10.1.

Now, test with an invalid IP address (IP address not within the 10.x.x.x range):

Figure 7 – Authentication test failing due to the test IP being out of range.

If you connect using both configured users via the proxy, you can do a “show pools” command to show which connection pools are mapped to which servers:

$ psql --host=127.0.0.1 --port=5439 --user=test2 dev
Password for user test2: …

dev=> show pools;
URL or catalog:user         | busy | connecting | idle | wait count | created |...
----------------------------+------+------------+------+------------+---------+
 test1.…:5439/              | 0/0  | 0          | 2    | 0          | 2       |
    NA:test1                | 0/0  | 0          | 1/0  | 0          | 1       |
    dev:test1               | 0/0  | 0          | 1/0  | 0          | 1       |
 test2.…:5439/              | 1/0  | 0          | 0    | 0          | 1       |
    dev:test2               | 1/0  | 0          | 0/0  | 0          | 1       |
(5 rows)

Here, we can see that the “test1” user was routed to the “test1” server, while the “test2” user was routed to “test2”.

In the Postgres example, we used DBeaver, a frequently used SQL management tool. Here, we have eight different connections configured in DBeaver:

Figure 8 – Example DBeaver connection configurations for testing.

On the Postgres 1-3 databases, each has one database, named database1 to database 3. If you connect directly, you find the proper database:

Figure 9 – Direct connection to postgres1 database.

Likewise, if you connect to the heimdall-database1 connection, it will route you to the backend database postgres1:

Figure 10 – Configuration for the heimdall-database1 connection.

Figure 11 – Connecting via the proxy, routed appropriate for access to database1.

If you connect to database2, you get routed to the postgres2 server:

Figure 12 – Routing to database2 via the proxy.

Conclusion

Heimdall Data provides functionality needed to support many complex database environments, and solves many of the challenges in scaling database access.

In this post, we have shown how to horizontally scale a SaaS environment, but we also assist in horizontally scaling Amazon RDS with no code changes required. Contact Heimdall Data to find a solution to your challenges, and download a free trial from AWS Marketplace.

Resources:

AWS Blog: Active Directory authentication and authorization
AWS Blog: Automated query caching
AWS Blog: Advanced connection pooling
Heimdall Data technical documentation

.

.

Heimdall Data – AWS Partner Spotlight

Heimdall Data is an AWS Partner with Service Ready designations for Amazon RDS and Amazon Redshift. Heimdall Data offers a database proxy that offloads SQL and improves database scale while securing database access. Proxy deployment does not require code changes.

Contact Heimdall Data | Partner Overview | AWS Marketplace