AWS Database Blog
Replicate data between partitioned and non-partitioned tables using trigger functions in Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL
In this post, we show you how to replicate data between partitioned and non-partitioned tables using trigger functions in Amazon Aurora PostgreSQL-Compatible Edition. The solution also works with Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
Customers often decide to migrate partial applications and their associated database objects to AWS, which requires synchronizing data from AWS back to on-premises until the entire application is migrated to AWS. If the database schema is modernized or modified in the AWS environment, data replication tools may not be able to directly replicate back to the source schema. For example, when using Blu Age to migrate an IBM Db2 for iSeries database to PostgreSQL the tables in PostgreSQL will be partitioned. Data replication tools cannot directly replicate changes in the partitioned tables back to non-partitioned tables in Db2. This post describes an approach to create and automatically populate a compatible copy of the partitioned tables that can be used to synchronize data back to the source database.
Data replication tools, such as AWS Database Migration Service (AWS DMS) or Qlik Replicate, synchronize data from PostgreSQL to a different database engine do not support replicating from partitioned tables in PostgreSQL to a non-partition table. For example, in order to replicate data back from Amazon Aurora PostgreSQL or Amazon RDS for PostgreSQL to an IBM Db2 for iSeries on-premises, the partitioned table must first be converted to a non-partitioned table. The non-partitioned tables are then replicated back to an IBM Db2 for iSeries on-premises until the entire database is migrated to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL. For more information about performing a complete data migration from an on-premises IBM Db2 for iSeries to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL, see Migrate an IBM Db2 for iSeries database to Amazon Aurora PostgreSQL using Blu Age and Qlik Replicate.
Solution overview
PostgreSQL partitioning refers to splitting what is logically one large table into smaller physical pieces. In this solution, we replicate data from partitioned tables to non-partitioned tables using trigger functions to insert, update, and delete. Whenever the data get inserted, deleted, or updated to the partitioned table, the trigger functions automatically synchronize those changes to the non-partitioned tables, and the data can then be used for data replication back to the Db2 for iSeries database using AWS DMS or another data replication tool.
The following diagram illustrates the architecture of data replication using trigger functions.
In the previous diagram, the user application communicates only with the partitioned tables, and trigger functions are deployed to collect the changes from partitioned tables and apply them to non-partitioned tables, which are used as a source to replicate data back to an on-premises database.
After you deploy the trigger functions on partitioned tables, no further action is needed. The transactions committed to partitioned tables are automatically replicated to non-partitioned tables.
To implement this solution, we use the following high-level steps:
- Create a PostgreSQL database and schema
- Create a partitioned table with a primary key
- Create a non-partitioned table
- Create an insert function and trigger
- Create an update function and trigger
- Create a delete function and trigger
- Perform data replication testing
Prerequisites
You must have the following prerequisites in place to implement this solution:
- An active AWS account
- A bastion host to access the database: Windows or Linux, either hosted on premises or in the AWS Cloud
- An RDS for PostgreSQL or Aurora PostgreSQL instance with user permissions in place to create and deploy database objects. For more details, see Managing PostgreSQL users and roles
- PostgreSQL client (such as pgAdmin client ,psql, or DBeaver) is in place on the bastion host that is accessible for the RDS for PostgreSQL or Aurora PostgreSQL DB instance
Create a PostgreSQL database and schema
After you have connected to the database instance, complete the following steps:
- Run the following query to create a demo database:
Create database demo;
- Disconnect from the default database session and connect to the newly created database demo query tool.
- Run the following query to create a
demo_user
under the demo database:
A new database named demo
and new schema named demo_use
are now in the database instance.
Create a partitioned table with a primary key
Complete the following steps:
- Run the following SQL command to create a table with a primary key in the demo database:
A new table named city_part
is created under the schema demo_use
.
- Use the following SQL commands to create individual partitions of the partitioned table:
Create a non-partitioned table with a primary key
Run the following code to create a non-partitioned table with a primary key:
Create an insert function
Run the following code to create an insert SQL function. This function will automatically insert the data from the partitioned table to the non-partitioned table.
Create a trigger for the insert function
This trigger is a special type of stored procedure that automatically runs when an insert event occurs in the partitioned table. Create the trigger with the following code:
Create an update function
Run the following code to create an update SQL function. This function will automatically update data from the partitioned table to the non-partitioned table.
Create a trigger for the update function
This trigger is a special type of stored procedure that automatically runs when an update event occurs in the partitioned table. Create the trigger with the following code:
Create a delete function
Run the following code to create a delete SQL function. This function will automatically delete records from non-partitioned table when records are deleted from the partitioned table.
Create a trigger for the delete function
This trigger is a special type of stored procedure that automatically runs when a delete event occurs in the partitioned table. Create the trigger with the following code:
The following screenshot illustrates the database objects after the solution deployment using pgAdmin.
Data replication testing
Complete the following steps to test the solution:
- Run the following commands to insert the data in the partitioned table:
- Run the following queries to verify that data is inserted in the partitioned and non-partitioned tables automatically:
The following screenshot illustrates that after inserting the data in the partitioned table, the trigger function automatically runs and inserts the data in the non-partitioned table.
- Run the following command to update the data in the partitioned table:
- Run the following queries to verify that data is updated in the partitioned and non-partitioned tables automatically:
The following screenshot illustrates that after updating the data in the partitioned table, the trigger function automatically runs and updates the data in the non-partitioned table.
- Run the following command to delete the data from the partitioned table:
- Run the following queries to verify that data is deleted from the partitioned and non-partitioned tables automatically:
The following screenshot illustrates that after deleting the data from the partitioned table, the trigger function automatically runs and deletes the data from the non-partitioned table.
After the data replication is working as expected from the partitioned table to the non-partitioned table, you can use the non-partitioned table as a source to replicate data back to the on-premises database as the migration from the on-premises database to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL is complete.
Clean up
Use a PostgreSQL client to connect to your PostgreSQL DB cluster to connect to the RDS for PostgreSQL or Aurora PostgreSQL database instance and run the following query to remove the database:
For information about using pgAdmin to remove the database, see DROP DATABASE.
Additionally, delete the AWS resources (bastion host and database instance) if you no longer need them.
Conclusion
In this post, we showed you how to automate data replication from partitioned tables to non-partitioned tables for Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL using triggers. With this solution, no additional configuration is needed to run CDC between partitioned tables and non-partitioned tables. This post also discussed PostgreSQL key features, different operations associated with PostgreSQL triggers, their syntax, and example use cases.
If you have any questions, comments, or suggestions, leave them in the comments section.
About the Authors
Amit Upadhyay is a Senior Database Consultant with Amazon Web Services based in Texas, helps customers to deliver strategies, solution designs , proof of concepts & best practices and make the best out of their cloud journey. In his spare time, he enjoys seeking out new cultures, social activities, and staying up to date with the latest technology trends. You can find him on Linkedin.
Abhishek Pandey is a Senior Solutions Architect with Amazon Web Services based in Houston, TX. Outside of work, he loves to hang out with family and friends.
Sankaran Kannappan is a Database Consultant with Amazon Web Services, focusing on data center migration to the AWS Cloud, portfolio analysis, and modernizing commercial databases to AWS Cloud native and open source databases on AWS.