How to set up a single pgpool endpoint for reads and writes with Amazon Aurora PostgreSQL
Amazon Aurora provides endpoints for the primary DB instance (the cluster endpoint) and for Read Replicas (the reader endpoint). Aurora updates the cluster endpoint automatically so that it always points to the primary instance. The reader endpoint load balances read operations across all available Read Replicas.
Amazon Aurora Replicas typically have less than 100 ms of replication lag. So if your application can tolerate that latency, it can use both the cluster and reader endpoints to take advantage of a horizontally scaled database (Figure 1).
However, managing two database endpoints, one for reads and one for writes, adds complexity to the application. In this post, I show how to use pgpool to provide a single PostgreSQL-compatible Aurora endpoint that automatically directs write traffic to the cluster endpoint and read traffic to the reader endpoint (Figure 2).
Pgpool is a BSD-licensed middleware that sits between a PostgreSQL database and the database clients. In this example, we deploy it using the architecture in Figure 3.
The Amazon Aurora cluster consists of one primary instance and two Aurora Read Replicas that are deployed across two Availability Zones and two private subnets. The cluster has a security group that allows ingress only from the pgpool instances.
Pgpool is deployed in an Auto Scaling group with a single active instance for failover purposes. It is also isolated in private subnets with a security group that permits access only from allowed Classless Inter-Domain Routing (CIDR) blocks.
The access subnets host a Network Load Balancer that provides a consistent endpoint for pgpool. So your database endpoint doesn’t change if the pgpool instance fails and the Auto Scaling group creates a new one.
Deploying with AWS CloudFormation
The AWS CloudFormation templates that are used in this example are located in this GitHub repository. The example uses several nested AWS CloudFormation templates to deploy the virtual private cloud (VPC) infrastructure, the security groups, the Aurora cluster, and the pgpool middleware. Using nested stacks lets you break up a large stack into several reusable components. If you aren’t familiar with AWS CloudFormation, review the AWS CloudFormation documentation.
For complete instructions on deploying the templates, see the README file on GitHub. The following sections contain some of the highlights.
Creating the Amazon Aurora cluster
The following AWS CloudFormation snippets show how to create a three-node Amazon Aurora cluster when the VPC infrastructure and security groups are ready. Amazon Aurora has one of the three nodes as the primary and the other two as Read Replicas.
The blog post Using pgpool and Amazon ElastiCache for query caching with Amazon Redshift is a good reference for deploying pgpool on AWS. In addition, the pgpool documentation contains a section that is specifically for Amazon Aurora.
In the AWS CloudFormation template that deploys pgpool, you’ll set up an ELB load balancer and an Auto Scaling group. The launch configuration for the Auto Scaling group deploys and configures pgpool using the AWS CloudFormation cfn-init tool.
First, you need to install a few packages and unpack the pgpool installation file:
Next, build pgpool, and create the log and PID directories:
Amazon Aurora uses MD5 authentication, so you need to enter your master database user in a local authentication file.
You also need to configure MD5 authentication in
Finally, register pgpool as a service, start it, and use
chkconfig to start it automatically.
Most of the pgpool configuration is done in
/usr/local/etc/pgpool.conf. The following table shows the important settings.
|listen_addresses||‘*’||Allow incoming connections on all interfaces.|
|backend_hostname0||The Amazon Aurora cluster endpoint|
|backend_port0||3306||Amazon Aurora in PostgreSQL mode uses port 3306.|
|backend_flag0||ALWAYS_MASTER||Don’t let pgpool try to fail over.|
|backend_hostname1||The Amazon Aurora reader endpoint|
|backend_port1||3306||Amazon Aurora in PostgreSQL mode uses port 3306.|
|enable_pool_hba||On||Amazon Aurora requires this for authentication.|
|pool_passwd||‘pool_passwd’||Set location of authentication file.|
|Ssl||On||Amazon Aurora uses Secure Sockets Layer (SSL) connections.|
|health_check_*||Configure with master account credentials.|
Testing the configuration
Now you can issue a few SQL statements and confirm that pgpool is directing traffic as expected. Start by creating a table and inserting a few rows:
/var/log/pgpool.log, you can see that these statements are directed to the Amazon Aurora primary (node 0):
Next, issue a query:
/var/log/pgpool.log, you can see that this statement is directed to the Amazon Aurora reader endpoint (node 1):
As a final test, confirm that pgpool handles SELECT statements embedded in a transaction:
These statements should all hit the primary node, as the embedded
SELECT is querying rows that are written in the transaction. You can again see from the log that things are working as expected.
Of course, you should do a full regression test with your own database clients before using this configuration in production. If the scalability of the pgpool middleware concerns you, consider using a clustered pgpool instance where the Auto Scaling group allows multiple instances and scales based on load.
Strive for simplicity
In this post, we saw how to use pgpool to provide a single endpoint for an Amazon Aurora cluster that automatically directs read traffic to the reader endpoint. You can use this technique to simplify how your application handles connections to Amazon Aurora.
If you need more advanced routing options, such as the ability to route database connections based on custom logic, check out a solution called pgbouncer-rr. With custom logic, you can handle more advanced routing such as sending a small percentage of queries to a newer schema for canary testing.
In the future, keep an eye on Amazon Aurora Serverless, which scales the database backend automatically without requiring the configuration of replicas. This option is still in preview and does not yet support PostgreSQL compatibility.
About the Author
Randy DeFauw is a principal solutions architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.