Replicate data between partitioned and non-partitioned tables using trigger functions in Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL

In this post, we show you how to replicate data between partitioned and non-partitioned tables using trigger functions in Amazon Aurora PostgreSQL-Compatible Edition. The solution also works with Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

Customers often decide to migrate partial applications and their associated database objects to AWS, which requires synchronizing data from AWS back to on-premises until the entire application is migrated to AWS. If the database schema is modernized or modified in the AWS environment, data replication tools may not be able to directly replicate back to the source schema. For example, when using Blu Age to migrate an IBM Db2 for iSeries database to PostgreSQL the tables in PostgreSQL will be partitioned. Data replication tools cannot directly replicate changes in the partitioned tables back to non-partitioned tables in Db2. This post describes an approach to create and automatically populate a compatible copy of the partitioned tables that can be used to synchronize data back to the source database.

Data replication tools, such as AWS Database Migration Service (AWS DMS) or Qlik Replicate, synchronize data from PostgreSQL to a different database engine do not support replicating from partitioned tables in PostgreSQL to a non-partition table. For example, in order to replicate data back from Amazon Aurora PostgreSQL or Amazon RDS for PostgreSQL to an IBM Db2 for iSeries on-premises, the partitioned table must first be converted to a non-partitioned table. The non-partitioned tables are then replicated back to an IBM Db2 for iSeries on-premises until the entire database is migrated to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL. For more information about performing a complete data migration from an on-premises IBM Db2 for iSeries to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL, see Migrate an IBM Db2 for iSeries database to Amazon Aurora PostgreSQL using Blu Age and Qlik Replicate.

Solution overview

PostgreSQL partitioning refers to splitting what is logically one large table into smaller physical pieces. In this solution, we replicate data from partitioned tables to non-partitioned tables using trigger functions to insert, update, and delete. Whenever the data get inserted, deleted, or updated to the partitioned table, the trigger functions automatically synchronize those changes to the non-partitioned tables, and the data can then be used for data replication back to the Db2 for iSeries database using AWS DMS or another data replication tool.

The following diagram illustrates the architecture of data replication using trigger functions.

In the previous diagram, the user application communicates only with the partitioned tables, and trigger functions are deployed to collect the changes from partitioned tables and apply them to non-partitioned tables, which are used as a source to replicate data back to an on-premises database.

After you deploy the trigger functions on partitioned tables, no further action is needed. The transactions committed to partitioned tables are automatically replicated to non-partitioned tables.

To implement this solution, we use the following high-level steps:

Create a PostgreSQL database and schema
Create a partitioned table with a primary key
Create a non-partitioned table
Create an insert function and trigger
Create an update function and trigger
Create a delete function and trigger
Perform data replication testing

Prerequisites

You must have the following prerequisites in place to implement this solution:

An active AWS account
A bastion host to access the database: Windows or Linux, either hosted on premises or in the AWS Cloud
An RDS for PostgreSQL or Aurora PostgreSQL instance with user permissions in place to create and deploy database objects. For more details, see Managing PostgreSQL users and roles
PostgreSQL client (such as pgAdmin client ,psql, or DBeaver) is in place on the bastion host that is accessible for the RDS for PostgreSQL or Aurora PostgreSQL DB instance

Create a PostgreSQL database and schema

After you have connected to the database instance, complete the following steps:

Run the following query to create a demo database:Create database demo;
Disconnect from the default database session and connect to the newly created database demo query tool.
Run the following query to create a demo_user under the demo database:
```
Create schema demo_user;
```

A new database named demo and new schema named demo_use are now in the database instance.

Create a partitioned table with a primary key

Complete the following steps:

Run the following SQL command to create a table with a primary key in the demo database:

CREATE TABLE demo_user.city_part (
id int4 NOT NULL PRIMARY KEY,
name varchar(30) NOT NULL,
state varchar(20),
population int4
)
PARTITION BY RANGE (id);

A new table named city_part is created under the schema demo_use.

Use the following SQL commands to create individual partitions of the partitioned table:

CREATE TABLE demo_user.city_id1 PARTITION OF demo_user.city_part
FOR VALUES FROM (MINVALUE) TO (10);
CREATE TABLE demo_user.city_id2 PARTITION OF demo_user.city_part
FOR VALUES FROM (10) TO (20);
CREATE TABLE demo_user.city_id3 PARTITION OF demo_user.city_part
FOR VALUES FROM (20) TO (30);
CREATE TABLE demo_user.city_id4 PARTITION OF demo_user.city_part
FOR VALUES FROM (30) TO (MAXVALUE);

Create a non-partitioned table with a primary key

Run the following code to create a non-partitioned table with a primary key:

CREATE TABLE demo_user.city_nopart (
id int4 NOT NULL PRIMARY KEY,
name varchar(30) NOT NULL,
state varchar(20),
population int4
);

Create an insert function

Run the following code to create an insert SQL function. This function will automatically insert the data from the partitioned table to the non-partitioned table.

CREATE OR REPLACE FUNCTION demo_user.function_insert_data() RETURNS TRIGGER AS
$BODY$
BEGIN
INSERT INTO
demo_user.city_nopart(id, name, state, population)
VALUES(new.id, new.name, new.state, new.population);
RETURN new;
END;
$BODY$
language plpgsql;

Create a trigger for the insert function

This trigger is a special type of stored procedure that automatically runs when an insert event occurs in the partitioned table. Create the trigger with the following code:

CREATE TRIGGER trig_insert_data
AFTER INSERT ON demo_user.city_part
FOR EACH ROW
EXECUTE PROCEDURE demo_user.function_insert_data();

Create an update function

Run the following code to create an update SQL function. This function will automatically update data from the partitioned table to the non-partitioned table.

CREATE OR REPLACE FUNCTION demo_user.function_update_data() RETURNS TRIGGER AS
$BODY$
BEGIN
UPDATE demo_user.city_nopart
SET name = NEW.name, state=New.state, population=NEW.population WHERE id = NEW.id;
RETURN NEW;
END;
$BODY$
language plpgsql;

Create a trigger for the update function

This trigger is a special type of stored procedure that automatically runs when an update event occurs in the partitioned table. Create the trigger with the following code:

CREATE TRIGGER trig_update_data
AFTER UPDATE ON demo_user.city_part
FOR EACH ROW
EXECUTE PROCEDURE demo_user.function_update_data();

Create a delete function

Run the following code to create a delete SQL function. This function will automatically delete records from non-partitioned table when records are deleted from the partitioned table.

CREATE OR REPLACE FUNCTION demo_user.function_delete_data() RETURNS TRIGGER AS
$BODY$
BEGIN
DELETE from demo_user.city_nopart WHERE id = OLD.id;
RETURN NEW;
END;
$BODY$
language plpgsql;

Create a trigger for the delete function

This trigger is a special type of stored procedure that automatically runs when a delete event occurs in the partitioned table. Create the trigger with the following code:

CREATE TRIGGER trig_delete_data
AFTER DELETE ON demo_user.city_part
FOR EACH ROW
EXECUTE PROCEDURE demo_user.function_delete_data();

The following screenshot illustrates the database objects after the solution deployment using pgAdmin.

Data replication testing

Complete the following steps to test the solution:

Run the following commands to insert the data in the partitioned table:

INSERT INTO demo_user.city_part(
id, name, state, population)
VALUES (1, 'Amit', 'TX', 21000000);

INSERT INTO demo_user.city_part(
id, name, state, population)
VALUES (2, 'AmitU', 'NY', 21000);

Run the following queries to verify that data is inserted in the partitioned and non-partitioned tables automatically:
```
SELECT * FROM demo_user.city_part;
SELECT * FROM demo_user.city_nopart;
```

The following screenshot illustrates that after inserting the data in the partitioned table, the trigger function automatically runs and inserts the data in the non-partitioned table.

Run the following command to update the data in the partitioned table:
```
update demo_user.city_part
set name='AmitUpadhyay' where id=2;
```
Run the following queries to verify that data is updated in the partitioned and non-partitioned tables automatically:
```
select * from demo_user.city_part;
Select * from demo_user.city_nopart;
```

The following screenshot illustrates that after updating the data in the partitioned table, the trigger function automatically runs and updates the data in the non-partitioned table.

Run the following command to delete the data from the partitioned table:
```
Delete from demo_user.city_part where id=2;
```
Run the following queries to verify that data is deleted from the partitioned and non-partitioned tables automatically:
```
select * from demo_user.city_part;
Select * from demo_user.city_nopart;
```

The following screenshot illustrates that after deleting the data from the partitioned table, the trigger function automatically runs and deletes the data from the non-partitioned table.

After the data replication is working as expected from the partitioned table to the non-partitioned table, you can use the non-partitioned table as a source to replicate data back to the on-premises database as the migration from the on-premises database to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL is complete.

Clean up

Use a PostgreSQL client to connect to your PostgreSQL DB cluster to connect to the RDS for PostgreSQL or Aurora PostgreSQL database instance and run the following query to remove the database:

drop database demo

For information about using pgAdmin to remove the database, see DROP DATABASE.

Additionally, delete the AWS resources (bastion host and database instance) if you no longer need them.

Conclusion

In this post, we showed you how to automate data replication from partitioned tables to non-partitioned tables for Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL using triggers. With this solution, no additional configuration is needed to run CDC between partitioned tables and non-partitioned tables. This post also discussed PostgreSQL key features, different operations associated with PostgreSQL triggers, their syntax, and example use cases.

If you have any questions, comments, or suggestions, leave them in the comments section.

About the Authors

Amit Upadhyay is a Senior Database Consultant with Amazon Web Services based in Texas, helps customers to deliver strategies, solution designs , proof of concepts & best practices and make the best out of their cloud journey. In his spare time, he enjoys seeking out new cultures, social activities, and staying up to date with the latest technology trends. You can find him on Linkedin.

Abhishek Pandey is a Senior Solutions Architect with Amazon Web Services based in Houston, TX. Outside of work, he loves to hang out with family and friends.

Sankaran Kannappan is a Database Consultant with Amazon Web Services, focusing on data center migration to the AWS Cloud, portfolio analysis, and modernizing commercial databases to AWS Cloud native and open source databases on AWS.