AWS Database Blog
Use pgactive for rolling major version upgrades in Amazon RDS for PostgreSQL
In this post, we explore how pgactive can perform rolling major version upgrades for Amazon Relational Database Service (Amazon RDS) for PostgreSQL, allowing for a smoother transition with reduced impact on your applications.
PostgreSQL database upgrades in Amazon RDS
Database upgrades are essential for maintaining optimal performance and security. Each new major version release introduces enhanced features, performance optimizations, and critical security updates that protect against vulnerabilities.
RDS for PostgreSQL databases support both minor and major version upgrades. Minor version upgrades, which typically include bug fixes and security patches, are released periodically within a major version. These updates are backward-compatible, making sure that existing applications remain unaffected. You can initiate a minor version upgrade manually by modifying your database, or you can enable the Auto minor version upgrade option.
Unlike minor upgrades, major version upgrades introduce new, non-backward-compatible features and database changes. Major versions are typically released annually and are supported for about 5 years. Major upgrades can take significantly longer due to factors such as the number of database objects.
You can minimize downtime during minor version upgrades by using Amazon RDS Proxy or an open source database proxy such as PgBouncer. When upgrading minor versions of RDS Multi-AZ DB clusters, it’s possible to achieve downtime of 1 second or less by using the Advanced JDBC Wrapper Driver.
Amazon RDS for PostgreSQL provides multiple upgrade options, including in-place upgrades. For customers who require short, predictable downtime, Amazon RDS supports a fully managed blue/green deployment option. This approach involves provisioning a new DB instance with the target PostgreSQL version, validating the upgrade, and then swapping the database endpoints to minimize disruption.
Rolling upgrades with fallback using pgactive
Although blue/green deployment is effective for minimizing downtime during major version upgrades, some customers opt for a rolling upgrade approach to maintain high availability. In this method, new nodes running the updated version can join with an existing node, configured to accept writes, and set up for bidirectional data replication. This enables a gradual and smooth transition to the new database version while minimizing disruption to the production environment.
The pgactive extension enables active-active replication for Amazon RDS for PostgreSQL, allowing all participating nodes to handle read and write operations simultaneously. Data changes are automatically synchronized across all nodes in the cluster using PostgreSQL logical replication. pgactive includes built-in conflict detection, automatic resolution methods (such as last-write-wins), and conflict monitoring.
The following are some of the key benefits of using a logical replication mechanism like pgactive for a database upgrade:
- Multi-tenant support – If you’re running an application in a multi-tenant environment, pgactive allows you to selectively replicate only the required schemas or databases to a higher version. Starting with pgactive version 2.1.4, it’s now possible to include or exclude specific tables.
- Minimal downtime – Upgrading with pgactive significantly reduces downtime. The only interruption required is updating the application’s connection string to the new database.
- Seamless rollback – By setting up active-active replication with pgactive, you can roll back your application to an older database version with minimal downtime and operational complexity.
Solution overview
For this post, we show an example for a major version upgrade. We use the source database version 14.12 and target version 16.3. The following diagram illustrates the solution architecture.
The process consists of the following steps:
- Set up the v14.12 source database.
- Set up the v16.3 (or higher) version database.
- Set up pgactive parameters and endpoints on both the v14.12 and v16.3 databases.
- Verify the data by bi-directionally replicating between v14.12 and v16.3 and then point the application to the higher version v16.3.
In the following sections, we walk through these steps with code examples.
Prerequisites
To follow along with this post, you should have the following prerequisites:
- An AWS account with permissions to create resources.
- Two RDS instances. For this post, we use v14.12 and v16.3, but pgactive is compatible with v11.22 and higher.
- An Amazon Elastic Compute Cloud (Amazon EC2) instance with connectivity to the DB instance. For more information, refer to Creating and connecting to a PostgreSQL DB instance.
Create the source and target databases
In a real-world use case, you should already have a production database set up. For this post, we set up our source and target databases using the AWS Command Line Interface (AWS CLI). Complete the following steps:
- Use the following code to create a v14.12 database as the source database:
- Connect to the database using psql and run the following query to verify the database version:
- Set up a higher version (v16.3) database environment for upgrade purposes:
- Connect to the new database using psql and run the following query to verify the database version:
Set up the pgactive extension between v14.12 and v16.3
To set up pgactive on both the source and target databases, follow these steps:
- Configure database parameters and create the pgactive extension on both source and target databases.
- Set the
enable_pgactive
parameter to 1 in a DB parameter group and assign this parameter group to your DB instance. - Create a DB parameter group with
enable_pgactive
set to 1.
Run the following commands on both databases:You can also modify the parameter group directly from the AWS Management Console. For more information, see Working with DB parameter groups.
If you’re adding the parameter to an existing RDS instance, you must restart the database for the settings to take effect. For example, to restart an RDS instance named
pgactive-inst1
, use the following command: - Retrieve the endpoints for the two RDS for PostgreSQL instances.
For the purposes of this post, we call theseendpoint-pgactive14
andendpoint-pgactive16
. - To check if the
shared_preload_libraries
parameter contains pgactive, run the following command on both thepg-14
andpg-16
instances: - Install the pgactive extension:
Initialize the replication group
Next, you set up the connection information and credentials for the replication accounts to log in to each node. pgactive uses the foreign data interface to securely manage and store these credentials.
Make sure that the shared_preload_libraries
parameter is set, and the pgactive extension mentioned in the previous section is created on both v14.12 and v16.3.
- Run the following commands to set up the connection information for the
pg-14
endpoint: - To check if the DB instance is ready, use the following command:
If the command succeeds, you should see the following output:
- Run the following commands to set up the connection information for the
pg-16
endpoint:If the commands succeed, pgactive tries to synchronize the database. If the command runs successfully, you should see output similar to the following:
On pg-16
, check to see data is present and insert sample data and verify it was replicated on pg-14
. Now your database is logically synced to v16.3 using pgactive. You can perform functional and pressure or stress testing to make sure the new version is functionally good and giving optimal performance.
Prepare for the switchover
Now you can point your application by redirecting traffic to the new database (v16.3) by updating the connection endpoint in the application. A major advantage of using pgactive is the ability to gradually shift traffic from the old database to the new one. For example, if the same database endpoint is used across multiple microservice applications, you don’t need to update all microservices simultaneously. Instead, you can progressively change the endpoint to the new database (v16.3) as you test and validate each application. If you’re running a multi-tenant environment, then you could only replicate the required tables or database and point the application to the new version.
Limitations
1. Handling Sequences:
The pgActive extension is built on PostgreSQL’s logical replication system, ensuring automatic replication of tables and data. However, it comes with certain limitations, including the fact that sequence values are not synchronized after a node’s initial synchronization, which may require additional handling in distributed environments.
As a one-time operation, you need to generate sequence update SQL using the following code:
Apply the output of the above command to the joining node. This will set the correct last_value for sequences on the joining node.
NOTE: pgactive has a global sequence mapping function. This function generates globally-unique values by combining three components: the sequence number, a unique node ID, and a timestamp.
2. Materialized views:
When a new node joins, materialized views will be created with data up to the time when the initial COPY was completed. To get up-to-date data, it is recommended to refresh the materialized views
3. Support for DDL replication
If a new table is added to any node, PgActive will automatically add it to PgActive’s replication set, but it is the user’s responsibility to create the new table on each of the participating nodes. pgactive has the same limitations as PostgreSQL logical replication.
4. Handling of existing logical replication
Pgactive prohibits the use of pglogical plugin and subscription functionality, in other words no inbound replication is allowed. If you have existing slot(s) replicating to downstream logical subscribers, these are permitted. However, these downstream subscribers will need to be resynchronized after cutover to a newer version node.
5. Handling of tables without Primary Key
Tables without primary keys will replicate in pgactive. However, update/delete operations on these tables will not replicate. As a workaround, you have two options: either create a unique index and use it as the replication identity, or REPLICA IDENTITY FULL on the table.
Migrate a multi-tenant environment
The following example demonstrates a multi-tenant database migration. This approach shows how to gradually migrate specific applications in a multi-tenant environment from an older to a newer database version, eliminating the need to upgrade all applications simultaneously. For our use case, the current DB instance (v14.12) is serving applications 1, 2, 3, and 4. Complete the following steps to perform the migration:
- Deploy the new DB instance (v16.3).
- Configure the pgactive extension to enable bidirectional replication between v14.12 and v16.3.
- Selectively migrate applications 3 and 4 to the new database instance (v16.3), while applications 1 and 2 remain connected to the old instance (v14.12).
The following diagram illustrates this setup.
Monitor replication lag
Monitoring and alerting replication lag is crucial for pgactive. pgactive can have lag at the decoding node or the applying node. Monitoring replication lag lets you diagnose potential issues with your active-active replication and helps mitigate the risk of introducing conflicting changes into your system or running out of disk space.
When the receiving node is down due to maintenance, networking issues, or hardware issues, the write-ahead log (WAL) will accumulate on the WAL sender node and if issue is not rectified on time, the sender node might run out of disk space or the WAL can get to a point where the receiving node might never catch up.
When at the receiving node, if the WAL applies results in error due to schema differences, unique or primary key violation, or other reasons, the WAL will get accumulated on the receiving node and eventually the node will run out of disk space if the issue isn’t rectified in time.
The following output is from a two-node pgactive setup where pgactive.pgactive_get_replication_lag_info()
was executed on the pgactive1
node.
In this output, lag between pgactive1
and pagactive2
, and pgactive2
and pgactive1
is shown:
At a minimum, the following alerts should be monitored:
- active – It should alert when
active
isfalse
, which indicates that the slot is not currently being used actively (the subscriber instance has disconnected from the publisher). - pending_wal_decoding – In PostgreSQL’s logical replication, WAL files are stored in binary format. The publisher must decode these WAL changes and convert them into logical changes (such as insert, update, or delete operations). The metric
pending_wal_decoding
indicates the number of WAL files waiting to be decoded on the publisher side. Several factors can cause this number to increase:- When the subscriber is not connected,
active
status will befalse
andpending_wal_decoding
will increase. - The slot is active, but the publisher isn’t keeping up with the volume of changes in the WAL.
- When the subscriber is not connected,
- pending_wal_to_apply – The metric
pending_wal_apply
indicates the number of WAL files waiting to be applied on the subscriber side. This number can increase due to several factors that prevent the subscriber from applying changes and can potentially cause a disk full scenario:- Schema differences. For example, when you have changes in the WAL stream for a table named
sample
, but that table doesn’t exist on the subscriber side. - Values in the primary key columns were updated.
- Secondary unique indexes can cause data divergence.
- Schema differences. For example, when you have changes in the WAL stream for a table named
Rollback
Because the pgactive extension is already set up between the new primary and old production database, transactions are still replicating to the old primary (v14.12) and it will be in sync. If you need to roll back, you can simply point the application back to the old primary.
Clean up
To avoid incurring future charges, delete the resources you created. If you want to delete the old database after the upgrade, complete the following steps:
- Detach the old database from the replication group:
- Drop the pgactive plugin:
- Delete the old DB instance.
Conclusion
In this post, we demonstrated how to use the pgactive replication extension to upgrade your database, reduce downtime, and increase application availability. Active-active replication is a powerful database replication technique commonly used for high availability and other use cases.
If you have any questions, comments, or feedback, please leave them in the comments section.
About the Authors
Santosh Bhupathi is a Senior Database Specialist Solutions Architect based in Philadelphia. With a focus on relational and NoSQL databases, he provides guidance and technical assistance to customers to help them design, deploy, and optimize database workloads on AWS.
Chirag Daveis a Principal Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He maintains technical relationships with customers, making recommendations on security, cost, performance, reliability, operational efficiency, and best practice architectures.