AWS Database Blog
Using pgactive: Active-active Replication Extension for PostgreSQL on Amazon RDS for PostgreSQL
Database replication is a method that copies changes between databases instances, and is a key component for use cases like high availability, reducing latency between an application and its source data, moving data between systems such as production and test, infrastructure migration, and others. Relational databases such as PostgreSQL typically support an active-standby model of replication, where one database instance accepts changes and makes them available to one or more read-only databases. Because changes in an active-standby cluster can only occur on a single instance, it’s more straightforward to build applications that work with this deployment topology because there is a single source of truth that contains all of the latest changes.
Sometimes it’s preferable to have a database cluster that uses an active-active replication topology, where it’s possible to write data on multiple instances in the same cluster. In an asynchronous active-active replication deployment, multiple databases in a cluster can accept changes and replicate them to other databases, but this means the database cluster doesn’t have a single source of truth. Use cases for this include running a multi-Region high availability database cluster, reducing write latency between applications and databases, performing blue/green updates of applications, and migrating data between systems that must both be writable. Applications that work with active-active database clusters must be designed to handle situations that occur in this deployment topology, including conflicting changes, replication lag, and the lack of certain convenient database features such as incremental integer sequences.
A fundamental component of active-active replication is logical replication. Logical replication uses a data format that allows external systems to interpret changes before applying them to a target database. This lets the target system perform additional actions, such as detecting and resolving write conflicts or converting the statement into something that is supported in the target database software. PostgreSQL added native support for logical replication in PostgreSQL 10 in 2017, but still requires additional features to fully support an active-active replication topology. PostgreSQL’s design makes it possible to build the necessary components for supporting active-active replication in an extension while the development community continues to add it into the upstream project.
For workloads that require active-active replication for PostgreSQL, we’re pleased to announce the general availability of pgactive
, an active-active replication extension for PostgreSQL, on Amazon Relational Database Service (Amazon RDS) for PostgreSQL starting with version 15.4-R3. pgactive
is based on the open-source BDR project and includes functions that let you set up an active-active PostgreSQL cluster, conflict detection and automatic resolution methods (for example, last-write-wins), conflict monitoring, and more.
In this post, we walk through an example for creating an active-active replication setup between two RDS for PostgreSQL instances. We also show how to monitor replication lag between instances, and review how to detect and resolve conflicts. Finally, we discuss strategies to design applications to work with pgactive
.
Prerequisites
To run the examples in this post, you’ll need to provision at least two RDS for PostgreSQL instances running PostgreSQL 15.4-R3 or higher. To run the multi-Region high availability example, you’ll need to deploy an RDS for PostgreSQL instance in two different Regions and set up VPC peering. Note that sending traffic between multiple Regions may incur additional costs.
Additionally, you’ll need to set the rds.enable_pgactive
and the rds.custom_dns_resolution
parameters to 1
in a DB parameter group, and assign the DB parameter group to your database instance.
Using the AWS Command Line Interface (AWS CLI), you can create a DB parameter group that sets rds.enable_pgactive
to 1
:
You can also modify the parameter group directly from the AWS Management Console. For more information, see Working with DB parameter groups.
Finally, if you’re adding this parameter to an existing RDS instance, you’ll need to restart the database for the settings to take effect. For example, if you have an RDS instance named pgactive-inst1
, you can restart it using the following command:
Now that we have covered different application design considerations for running an active-active PostgreSQL cluster using pgactive
, let’s learn how to set up active-active replication between two RDS for PostgreSQL instances.
Use pgactive to deploy an active-active RDS for PostgreSQL cluster
For this example, ensure that you set up two RDS for PostgreSQL instances that are network accessible to each other. If the two instances are in the same Region, you can do this by having both instances within the same subnet or by configuring security groups. For more information, refer to Scenarios for accessing a DB instance in a VPC and Control traffic to your AWS resources using security groups. If these instances are in different Regions, you will need to take some additional steps. For details, refer to What is VPC peering?
pgactive
uses PostgreSQL logical replication, which requires a privileged user. For this example, we use the default postgres
user, but you can use any user that has the rds_superuser
privilege. Additionally, we create a database on each RDS for PostgreSQL instance that’s specific for this example. Complete the following steps:
- On each RDS for PostgreSQL instance, run the following command:
- Retrieve the network endpoints for the two RDS for PostgreSQL instances.
For the purposes of this example, we call theseendpoint1
andendpoint2
. - To set up the RDS for PostgreSQL instance at
endpoint1
, log in to the instance atendpoint1
and first ensure that theshared_preload_libraries
parameter containspgactive
:If
pgactive
is inshared_preload_libraries
, you’ll observe the following output: - Use the following commands to create a table that contains product information and add several products:
- Now install
pgactive
into the database with the following command:Next, you set up the connection information and credentials for the replication accounts to log into each node. You need to set up the connection information for both nodes, including the node you’re running the commands on, because
pgactive
needs information to connect both to the remote node and back to itself.pgactive
uses the foreign data interface to securely manage and store these credentials. - On
endpoint1
, run the following commands to set up the connection information, substituting the placeholders with your actual values: - Now you can initialize the replication group and add this first instance:
If the commands succeed, you’ll see the following output:
Another option (mandatory on RDS for PostgreSQL 15.4) instead of steps 6 and 7 is to create the group that way:
Now let’s set up the RDS for PostgreSQL instance at
endpoint2
. - Log in to the instance at
endpoint2
and first ensure that theshared_preload_libraries
parameter containspgactive
:If
pgactive
is inshared_preload_libraries
, the preceding command will return the following: - Ensure
pgactive
is installed in the database with the following command:Next, you set up the connection information and credentials for the replication accounts to log into each node.
pgactive
uses the foreign data interface to securely manage and store these credentials. - On
endpoint2
, run the following commands to set up the connection information: - Join the RDS for PostgreSQL instance at
endpoint2
to the active-active replication group:If the commands succeed,
pgactive
will try to synchronize the database. If the command runs successfully, you will see output similar to the following:Another option (mandatory on RDS for PostgreSQL 15.4) instead of steps 10 and 11 is to join the group that way:
- After a minute, at
endpoint2
, check to see that the products are present:If the data successfully synchronized, you’ll see the following results:
Let’s test that our setup works.
- From
endpoint2
, run the following command: - Connect to the
app
database atendpoint1
and run the following query:You should see the following result:
Your active-active RDS for PostgreSQL cluster is now initialized! The rest of this post demonstrates different scenarios that can benefit from an active-active replication topology.
Monitoring replication lag
Replication lag measures the difference in the current state of data between instances. When using asynchronous active-active replication, a larger replication lag increases the risk of a conflict occurring if the same row is updated on different nodes increases. Monitoring replication lag lets you diagnose potential issues with your active-active replication setup and helps mitigate the risk of introducing conflicting changes into your system.
pgactive
provides two ways of measuring replication lag: transaction ID and time-based. Transaction IDs are used to determine the transactions that are committed on the originating active instance and what transactions are committed on the destination active instance. The difference in transaction IDs, represented as log sequence numbers (LSNs), shows the difference in total data size to be committed. Time-based replication lag is the total amount of time that the destination active instance is lagging behind the transactions that were committed on the origin instance. Measuring replication lag using time is generally preferred because it’s easier to understand if an instance is 2 seconds behind instead of 300 MB behind.
pgactive
makes both replication lag methods available using the pgactive.pgactive_node_slots
view. You can get the replication stats on the node using the following query:
You can use the following query to calculate the replication lag using both the transaction ID and time-based method:
Reviewing and correcting write conflicts
Conflicts in asynchronous active-active replication can occur when two active instances simultaneously modify the same row. Using the data in our endpoint1
and endpoint2
clusters as an example, let’s suppose a transaction on endpoint1
modifies soap
to be sapone
while a transaction on endpoint2
modifies soap
to be savon
before sapone
is applied.
By default, pgactive
logs all conflicts and uses the last-update-wins strategy of resolving conflicts, where it will accept the changes from the transaction with the latest timestamp. In our example, the change from soap
to sapone
on endpoint1
was made at t=1
and the change from soap
to savon
was made at t=2
on endpoint2
, so pgactive
will resolve the conflict on endpoint1
and change sapone
to savon
because savon
is the later update.
You can view all the conflicting transactions and how they were resolved in the pgactive.pgactive_conflict_history
table, as shown in the following code:
If you don’t like how the conflicting change was resolved, you can manually fix the conflict by running your own query to update the value in the affected row.
Considerations for building application with pgactive
pgactive
is not a drop-in solution for all applications. Applications that use pgactive
for an active-active database cluster must make specific design decisions to ensure they can operate safely. Even if you are routing write traffic to a single pgactive
instance, you must make sure your application is architected to support an active-active replication topology. This section provides an overview of pgactive functionality and what design considerations you must make for your applications that use pgactive
.
When using pgactive
, you must plan for conflicts and how to resolve them. Each database instance that uses pgactive
is an independent instance and can accept data changes from any source. When a change is sent to a database instance, PostgreSQL commits it locally and then uses pgactive
to replicate the change asynchronously to another system. Although using asynchronous replication between two PostgreSQL instances with pgactive
is helpful for performance and availability, it’s possible for two PostgreSQL instances to update the same record at the same time.
pgactive
provides mechanisms for conflict detection and automatic resolution. pgactive
tracks the timestamp of when the transaction was committed on both systems, and will automatically apply the change of the latest timestamp. pgactive
also logs when a conflict occurs in the pgactive.pgactive_conflict_history
table, so you can manually choose a different solution. Even though pgactive
has automatic conflict resolution, part of your application may have received stale data, so you should design your application to account for this case.
As part of conflict detection, pgactive
must inspect any rows that have changes. This requires loading any affected row to perform the conflict detection. pgactive
does this through the use of a primary key, which is a constraint that acts as a unique identifier for every row in a table, because using a primary key is a performant way to retrieve a row. pgactive
requires that every table use a primary key, so your application must ensure all tables include a primary key constraint. Additionally, to prevent a certain kind of conflict that could produce duplicate or missing rows, pgactive
prevents updates directly to a column that has a primary key constraint. Although most application development frameworks will automatically add primary keys to tables, you should ensure that applications you intend to use with pgactive
include this constraint.
Developers often use integer sequences as their primary keys in PostgreSQL, for example, the PostgreSQL serial
and bigserial
types or via the GENERATED BY DEFAULT AS IDENTITY
SQL-standard syntax. However, integer sequences that increment by 1, which is the default in PostgreSQL, will introduce conflicts in an active-active replication deployment. pgactive
has functions to create nonconflicting sequences and convert sequences to nonconflicting sequences (SELECT pgactive.convert_local_seqs_to_snowflake_id_seqs()
). Additionally, pgactive
requires that integer-based sequences are 64-bit (bigint
) to mitigate the risk of sequence wraparound. You can also choose to use other identifiers, such as UUIDs.
pgactive
doesn’t allow you to replicate schema changes (data definition language or DDL commands). Applying changes to schemas requires either a coordinated rollout using a two-phase commit (2PC).
pgactive
doesn’t provide any support for large object commands, so your application should use another storage mechanism such as a bytea
type or Amazon Simple Storage Service (Amazon S3) for storing large objects. pgactive
ensures that all instances use the same version of any text collation providers, and will prevent an instance from joining if the provider version doesn’t match.
Finally, you’ll need to make decisions about your network topology that can help optimize performance and correctness. Network latency between PostgreSQL instances using pgactive
for active-active replication can directly impact correctness, because higher latency increases the risk of introducing conflicts. Additionally, large transactions can impact replication lag, because each affected row must be replayed and applied on a target instance. You should not use pgactive
for load balancing writes between multiple instances, because this doesn’t improve write performance and greatly increases the risk of introducing conflicts.
Clean up
If you created RDS for PostgreSQL instances to run this example and no longer need to use them, you can delete them at this time. In this case, you don’t need to complete any of the following steps.
To remove a pgactive
instance from an application set and prepare to drop the pgactive
extension, you must first detach the node from the group. For example, to detach both nodes from the earlier example, you can run the following command:
After you have detached the nodes, you can run the pgactive.pgactive_remove()
command on each instance to disable pgactive
:
After you have successfully run these commands, you can drop the pgactive
extension:
Conclusion
Active-active replication is a powerful data transfer technique that can be used to maintain availability throughout different database operations and reduce write latency for applications distributed in multiple Regions. pgactive
makes it possible to deploy RDS for PostgreSQL instances using an active-active topology, and provides the foundational features for managing active-active replication such as conflict detection and resolution. However, although active-active replication can help maximize uptime, you must design your application to work with an active-active setup and use techniques to minimize write conflicts.
We are continuing to develop and add new features to pgactive
and we invite you to give the team feedback on how you are planning to use pgactive
. If you have feedback or feature requests, reach out to rds-postgres-extensions-request@amazon.com and let us know about your use case!
We invite you to leave feedback in the comments section in this post.
About the author
Jonathan Katz is a Principal Product Manager – Technical on the Amazon RDS team and is based in New York. He is a Core Team member of the open source PostgreSQL project and an active open source contributor.