Using pgactive: Active-active Replication Extension for PostgreSQL on Amazon RDS for PostgreSQL
Database replication is a method that copies changes between databases instances, and is a key component for use cases like high availability, reducing latency between an application and its source data, moving data between systems such as production and test, infrastructure migration, and others. Relational databases such as PostgreSQL typically support an active-standby model of replication, where one database instance accepts changes and makes them available to one or more read-only databases. Because changes in an active-standby cluster can only occur on a single instance, it’s more straightforward to build applications that work with this deployment topology because there is a single source of truth that contains all of the latest changes.
Sometimes it’s preferable to have a database cluster that uses an active-active replication topology, where it’s possible to write data on multiple instances in the same cluster. In an asynchronous active-active replication deployment, multiple databases in a cluster can accept changes and replicate them to other databases, but this means the database cluster doesn’t have a single source of truth. Use cases for this include running a multi-Region high availability database cluster, reducing write latency between applications and databases, performing blue/green updates of applications, and migrating data between systems that must both be writable. Applications that work with active-active database clusters must be designed to handle situations that occur in this deployment topology, including conflicting changes, replication lag, and the lack of certain convenient database features such as incremental integer sequences.
A fundamental component of active-active replication is logical replication. Logical replication uses a data format that allows external systems to interpret changes before applying them to a target database. This lets the target system perform additional actions, such as detecting and resolving write conflicts or converting the statement into something that is supported in the target database software. PostgreSQL added native support for logical replication in PostgreSQL 10 in 2017, but still requires additional features to fully support an active-active replication topology. PostgreSQL’s design makes it possible to build the necessary components for supporting active-active replication in an extension while the development community continues to add it into the upstream project.
For workloads that require active-active replication for PostgreSQL, we’re pleased to announce the general availability of
pgactive, an active-active replication extension for PostgreSQL, on Amazon Relational Database Service (Amazon RDS) for PostgreSQL starting with version 15.4-R3.
pgactive is based on the open-source BDR project and includes functions that let you set up an active-active PostgreSQL cluster, conflict detection and automatic resolution methods (for example, last-write-wins), conflict monitoring, and more.
In this post, we walk through an example for creating an active-active replication setup between two RDS for PostgreSQL instances. We also show how to monitor replication lag between instances, and review how to detect and resolve conflicts. Finally, we discuss strategies to design applications to work with
To run the examples in this post, you’ll need to provision at least two RDS for PostgreSQL instances running PostgreSQL 15.4-R3 or higher. To run the multi-Region high availability example, you’ll need to deploy an RDS for PostgreSQL instance in two different Regions and set up VPC peering. Note that sending traffic between multiple Regions may incur additional costs.
Additionally, you’ll need to set the
rds.enable_pgactive and the
rds.custom_dns_resolution parameters to
1 in a DB parameter group, and assign the DB parameter group to your database instance.
Using the AWS Command Line Interface (AWS CLI), you can create a DB parameter group that sets
Finally, if you’re adding this parameter to an existing RDS instance, you’ll need to restart the database for the settings to take effect. For example, if you have an RDS instance named
pgactive-inst1, you can restart it using the following command:
Now that we have covered different application design considerations for running an active-active PostgreSQL cluster using
pgactive, let’s learn how to set up active-active replication between two RDS for PostgreSQL instances.
Use pgactive to deploy an active-active RDS for PostgreSQL cluster
For this example, ensure that you set up two RDS for PostgreSQL instances that are network accessible to each other. If the two instances are in the same Region, you can do this by having both instances within the same subnet or by configuring security groups. For more information, refer to Scenarios for accessing a DB instance in a VPC and Control traffic to your AWS resources using security groups. If these instances are in different Regions, you will need to take some additional steps. For details, refer to What is VPC peering?
pgactive uses PostgreSQL logical replication, which requires a privileged user. For this example, we use the default
postgres user, but you can use any user that has the
rds_superuser privilege. Additionally, we create a database on each RDS for PostgreSQL instance that’s specific for this example. Complete the following steps:
- On each RDS for PostgreSQL instance, run the following command:
- Retrieve the network endpoints for the two RDS for PostgreSQL instances.
For the purposes of this example, we call these
- To set up the RDS for PostgreSQL instance at
endpoint1, log in to the instance at
endpoint1and first ensure that the
shared_preload_libraries, you’ll observe the following output:
- Use the following commands to create a table that contains product information and add several products:
- Now install
pgactiveinto the database with the following command:
Next, you set up the connection information and credentials for the replication accounts to log into each node. You need to set up the connection information for both nodes, including the node you’re running the commands on, because
pgactiveneeds information to connect both to the remote node and back to itself.
pgactiveuses the foreign data interface to securely manage and store these credentials.
endpoint1, run the following commands to set up the connection information, substituting the placeholders with your actual values:
- Now you can initialize the replication group and add this first instance:
If the commands succeed, you’ll see the following output:
Another option (mandatory on RDS for PostgreSQL 15.4) instead of steps 6 and 7 is to create the group that way:
Now let’s set up the RDS for PostgreSQL instance at
- Log in to the instance at
endpoint2and first ensure that the
shared_preload_libraries, the preceding command will return the following:
pgactiveis installed in the database with the following command:
Next, you set up the connection information and credentials for the replication accounts to log into each node.
pgactiveuses the foreign data interface to securely manage and store these credentials.
endpoint2, run the following commands to set up the connection information:
- Join the RDS for PostgreSQL instance at
endpoint2to the active-active replication group:
If the commands succeed,
pgactivewill try to synchronize the database. If the command runs successfully, you will see output similar to the following:
Another option (mandatory on RDS for PostgreSQL 15.4) instead of steps 10 and 11 is to join the group that way:
- After a minute, at
endpoint2, check to see that the products are present:
If the data successfully synchronized, you’ll see the following results:
Let’s test that our setup works.
endpoint2, run the following command:
- Connect to the
endpoint1and run the following query:
You should see the following result:
Your active-active RDS for PostgreSQL cluster is now initialized! The rest of this post demonstrates different scenarios that can benefit from an active-active replication topology.
Monitoring replication lag
Replication lag measures the difference in the current state of data between instances. When using asynchronous active-active replication, a larger replication lag increases the risk of a conflict occurring if the same row is updated on different nodes increases. Monitoring replication lag lets you diagnose potential issues with your active-active replication setup and helps mitigate the risk of introducing conflicting changes into your system.
pgactive provides two ways of measuring replication lag: transaction ID and time-based. Transaction IDs are used to determine the transactions that are committed on the originating active instance and what transactions are committed on the destination active instance. The difference in transaction IDs, represented as log sequence numbers (LSNs), shows the difference in total data size to be committed. Time-based replication lag is the total amount of time that the destination active instance is lagging behind the transactions that were committed on the origin instance. Measuring replication lag using time is generally preferred because it’s easier to understand if an instance is 2 seconds behind instead of 300 MB behind.
pgactive makes both replication lag methods available using the
pgactive.pgactive_node_slots view. You can get the replication stats on the node using the following query:
You can use the following query to calculate the replication lag using both the transaction ID and time-based method:
Reviewing and correcting write conflicts
Conflicts in asynchronous active-active replication can occur when two active instances simultaneously modify the same row. Using the data in our
endpoint2 clusters as an example, let’s suppose a transaction on
soap to be
sapone while a transaction on
soap to be
sapone is applied.
pgactive logs all conflicts and uses the last-update-wins strategy of resolving conflicts, where it will accept the changes from the transaction with the latest timestamp. In our example, the change from
endpoint1 was made at
t=1 and the change from
savon was made at
pgactive will resolve the conflict on
endpoint1 and change
savon is the later update.
You can view all the conflicting transactions and how they were resolved in the
pgactive.pgactive_conflict_history table, as shown in the following code:
If you don’t like how the conflicting change was resolved, you can manually fix the conflict by running your own query to update the value in the affected row.
Considerations for building application with pgactive
pgactive is not a drop-in solution for all applications. Applications that use
pgactive for an active-active database cluster must make specific design decisions to ensure they can operate safely. Even if you are routing write traffic to a single
pgactive instance, you must make sure your application is architected to support an active-active replication topology. This section provides an overview of pgactive functionality and what design considerations you must make for your applications that use
pgactive, you must plan for conflicts and how to resolve them. Each database instance that uses
pgactive is an independent instance and can accept data changes from any source. When a change is sent to a database instance, PostgreSQL commits it locally and then uses
pgactive to replicate the change asynchronously to another system. Although using asynchronous replication between two PostgreSQL instances with
pgactive is helpful for performance and availability, it’s possible for two PostgreSQL instances to update the same record at the same time.
pgactive provides mechanisms for conflict detection and automatic resolution.
pgactive tracks the timestamp of when the transaction was committed on both systems, and will automatically apply the change of the latest timestamp.
pgactive also logs when a conflict occurs in the
pgactive.pgactive_conflict_history table, so you can manually choose a different solution. Even though
pgactive has automatic conflict resolution, part of your application may have received stale data, so you should design your application to account for this case.
As part of conflict detection,
pgactive must inspect any rows that have changes. This requires loading any affected row to perform the conflict detection.
pgactive does this through the use of a primary key, which is a constraint that acts as a unique identifier for every row in a table, because using a primary key is a performant way to retrieve a row.
pgactive requires that every table use a primary key, so your application must ensure all tables include a primary key constraint. Additionally, to prevent a certain kind of conflict that could produce duplicate or missing rows,
pgactive prevents updates directly to a column that has a primary key constraint. Although most application development frameworks will automatically add primary keys to tables, you should ensure that applications you intend to use with
pgactive include this constraint.
Developers often use integer sequences as their primary keys in PostgreSQL, for example, the PostgreSQL
bigserial types or via the
GENERATED BY DEFAULT AS IDENTITY SQL-standard syntax. However, integer sequences that increment by 1, which is the default in PostgreSQL, will introduce conflicts in an active-active replication deployment.
pgactive has functions to create nonconflicting sequences and convert sequences to nonconflicting sequences (
SELECT pgactive.convert_local_seqs_to_snowflake_id_seqs()). Additionally,
pgactive requires that integer-based sequences are 64-bit (
bigint) to mitigate the risk of sequence wraparound. You can also choose to use other identifiers, such as UUIDs.
pgactive doesn’t allow you to replicate schema changes (data definition language or DDL commands). Applying changes to schemas requires either a coordinated rollout using a two-phase commit (2PC).
pgactive doesn’t provide any support for large object commands, so your application should use another storage mechanism such as a
bytea type or Amazon Simple Storage Service (Amazon S3) for storing large objects.
pgactive ensures that all instances use the same version of any text collation providers, and will prevent an instance from joining if the provider version doesn’t match.
Finally, you’ll need to make decisions about your network topology that can help optimize performance and correctness. Network latency between PostgreSQL instances using
pgactive for active-active replication can directly impact correctness, because higher latency increases the risk of introducing conflicts. Additionally, large transactions can impact replication lag, because each affected row must be replayed and applied on a target instance. You should not use
pgactive for load balancing writes between multiple instances, because this doesn’t improve write performance and greatly increases the risk of introducing conflicts.
If you created RDS for PostgreSQL instances to run this example and no longer need to use them, you can delete them at this time. In this case, you don’t need to complete any of the following steps.
To remove a
pgactive instance from an application set and prepare to drop the
pgactive extension, you must first detach the node from the group. For example, to detach both nodes from the earlier example, you can run the following command:
After you have detached the nodes, you can run the
pgactive.remove_pgactive() command on each instance to disable
After you have successfully run these commands, you can drop the
Active-active replication is a powerful data transfer technique that can be used to maintain availability throughout different database operations and reduce write latency for applications distributed in multiple Regions.
pgactive makes it possible to deploy RDS for PostgreSQL instances using an active-active topology, and provides the foundational features for managing active-active replication such as conflict detection and resolution. However, although active-active replication can help maximize uptime, you must design your application to work with an active-active setup and use techniques to minimize write conflicts.
We are continuing to develop and add new features to
pgactive and we invite you to give the team feedback on how you are planning to use
pgactive. If you have feedback or feature requests, reach out to firstname.lastname@example.org and let us know about your use case!
We invite you to leave feedback in the comments section in this post.
About the author
Jonathan Katz is a Principal Product Manager – Technical on the Amazon RDS team and is based in New York. He is a Core Team member of the open source PostgreSQL project and an active open source contributor.