AWS Database Blog

Migrate ROW CHANGE TIMESTAMP from IBM Db2 for z/OS to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition

Customers migrate their mission-critical legacy on-premises databases based on IBM Db2 for z/OS to Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition for its scalability, performance, agility and availability.

In heterogeneous database migrations, not all features in the source database are supported or available in the target database. One such feature that is available in Db2 for z/OS is ROW CHANGE TIMESTAMP, which returns the last time when a row was inserted or modified.

In this post, we show you how to migrate ROW CHANGE TIMESTAMP from Db2 for z/OS to Amazon RDS for PostgreSQL or Aurora PostgreSQL-Compatible, which helps you record the last time when a row was inserted or modified in a table and retain similar source functionality on the target database. Additionally, we show you how to preserve the ROW CHANGE TIMESTAMP column value while copying the data to a similar structure table within the target database.

Understanding Db2 ROW CHANGE TIMESTAMP

When we define ROW CHANGE TIMESTAMP on a column, Db2 generates a timestamp value for each row as the row is inserted, and for any row in which any column is updated. If multiple records are inserted or updated with a single batch statement, the value for the ROW CHANGE TIMESTAMP column might be different for each row depending on when the record was processed.

As an example use case, we create the following sample employee tables in Db2.

CREATE TABLE EMP_INFO
(EMPNO CHAR(6) NOT NULL,
EMP_ADDRESS VARCHAR(300),
EMP_PHONENO CHAR(20),
EMP_INFOCHANGE NOT NULL
GENERATED ALWAYS FOR EACH ROW ON UPDATE
AS ROW CHANGE TIMESTAMP,
PRIMARY KEY (EMPNO)
);

ROW CHANGE TIMESTAMP is defined on the column EMP_INFOCHANGE, as shown in the following SQL statement, which automatically stores the timestamp value of when a particular record was created or modified.

The following query returns all the records that have changed in the last 30 days:

SELECT * FROM EMP_INFO
WHERE EMP_INFOCHANGE <= CURRENT TIMESTAMP
AND EMP_INFOCHANGE >= CURRENT TIMESTAMP - 30 DAYS;

ROW CHANGE TIMESTAMP in PostgreSQL

PostgreSQL doesn’t provide the ROW CHANGE TIMESTAMP feature to determine the timestamp when the row was last updated or inserted. However, you can implement the following solution.

  1. Connect to your PostgreSQL database.
  2. In the following DDL, EMP_INFOCHANGE represents the ROW CHANGE TIMESTAMP column in the target table, which is equivalent to the respective Db2 column in the source:
CREATE TABLE source_1.EMP_INFO
(EMPNO CHAR(6) NOT NULL,
EMP_ADDRESS VARCHAR(300),
EMP_PHONENO CHAR(20),
EMP_INFOCHANGE timestamp(6) without time zone DEFAULT clock_timestamp(),
CONSTRAINT pk_src_1 PRIMARY KEY (EMPNO)
);

The default value for the EMP_INFOCHANGE column must be clock_timestamp(), which gives the current date and time and changes for every DML statement run on the table.

  1. Define a trigger function as shown in the following code so that the EMP_INFOCHANGE column is updated with the latest timestamp whenever the trigger operation is an update or insert:
CREATE OR REPLACE FUNCTION source_1.tgf_emp_info_rct()
RETURNS trigger
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
BEGIN
-- Identify row update & insert and alter RCT (Row Change Timestamp)
-- Update timestamp to identify when was the row updated
NEW.EMP_INFOCHANGE := clock_timestamp();
RETURN NEW;
END;
$BODY$;
  1. Define a row level before the trigger at the table level to run during an insert or update operation:
-- Trigger: tgr_emp_info_rct

CREATE TRIGGER tgr_emp_info_rct
BEFORE INSERT OR UPDATE
ON source_1.emp_info
FOR EACH ROW
EXECUTE FUNCTION source_1.tgf_emp_info_rct();

This trigger is invoked one time for every row that is inserted (or updated).

Now, when we insert or update any record in the table source_1.emp_info, the EMP_INFOCHANGE column is generated or modified automatically, as shown in the following examples.

The following is code for an example insert:

INSERT INTO source_1.emp_info(
EMPNO,
EMP_ADDRESS,
EMP_PHONENO
)
VALUES (1, 'RCT-Test-1', '1234-1234'),
(2, 'RCT-Test-2', '4567-4567');

We get the following output:

# select * from source_1.emp_info;
-------+--------------+-----------------+----------------------------
 empno | emp_address  | emp_phoneno     | emp_infochange             
-------+--------------+-----------------+----------------------------
    1  | RCT-Test-1   | 1234-1234       | 2022-02-19 00:41:56.558682
    2  | RCT-Test-2   | 4567-4567       | 2022-02-19 00:41:56.558719
(2 rows)

The following code shows an example update:

UPDATE source_1.emp_info SET EMP_ADDRESS ='RCT-Val-11', EMP_PHONENO ='0000-1111' WHERE empno=1;

We get the following output:

# select * from source_1.emp_info;
-------+--------------+-----------------+----------------------------
 empno | emp_address  | emp_phoneno     | emp_infochange             
-------+--------------+-----------------+----------------------------
  1    | RCT-Val-11   | 0000-1111       | 2022-02-19 00:49:25.351306
  2    | RCT-Test-2   | 4567-4567       | 2022-02-19 00:41:56.558719
(2 rows)

If a user tries to override the timestamp value for the EMP_INFOCHANGE column, the operation is ignored because the trigger overrides the value with clock_timestamp(). We show you how to handle this case later in this post.

Preserving ROW CHANGE TIMESTAMP values in the target table while copying data from the source

There are two ways to transfer or migrate data from a source database table to a new target database table with ROW CHANGE TIMESTAMP column defined while maintaining the original timestamp value in the target table.

Let’s take our case, where we are migrating data from source Db2 for z/OS table to the target table (source_1.emp_info) on PostgreSQL database. Since we have used TRIGGER to implement ROW CHANGE TIMESTAMP functionality in EMP_INFOCHANGE column, our intent is that these triggers do not get executed until we copy data fully from Db2 to PostgreSQL.

If these triggers are not disabled then it will result in new values in EMP_INFOCHANGE column causing reconciliation breaks between source and target.

To solve this issue, you can implement one of the following 2 solutions on the target server.

Solution 1: Use the ALTER command

In this solution we explicitly disable the triggers on the table to avoid any change in ROW CHANGE TIMESTAMP column and enable it back once data copy activity is complete.

You can enable or disable triggers using the following ALTER commands before copying data:

  1. Run the following command:
ALTER TABLE source_1.emp_info DISABLE TRIGGER tgr_emp_info_rct ;
  1. Enter the data copy commands from source to target.
  2. Run the following command:
ALTER TABLE source_1.emp_info ENABLE TRIGGER tgr_emp_info_rct ;

You can use this same solution if you’re copying data from PostgreSQL as source.

Solution 2: Use session_replication_role

This setting controls firing of replication-related triggers and rules for the current session. The effect of this will be that triggers, foreign key checks, and rules will not fire if it is set to replica .

So, you can set session_replication_role to replica for the source_1.tgf_emp_info_rct trigger function while data copy is going on, as shown in the following code.

This setting can also be changed at server level by altering value in parameter-group.

Note: Reset the value of session_replication_role to ‘origin’ (default) once data copy is complete or you can remove this setting from function definition altogether and re-create it.

Command :

SET session_replication_role= ‘replica’;

See the following example:

CREATE OR REPLACE FUNCTION source_1.tgf_emp_info_rct()
RETURNS trigger
LANGUAGE 'plpgsql'
SET session_replication_role = ‘replica’
AS $BODY$
DECLARE
BEGIN
-- Identify row update & insert and alter RCT (Row Change Timestamp)
-- Update timestamp to identify when was the row updated
NEW.EMP_INFOCHANGE := clock_timestamp();
RETURN NEW;
END;
$BODY$;

Solution 2 is better suited during initial data-migration activity where you can disable all triggers and foreign-keys at server level. However, in Solution 1 you can disable trigger for specific tables without affecting other functionality at server level.

Limitations for this solution

The solution provided in this post doesn’t cover the ROW CHANGE TIMESTAMP expression feature in Db2 for z/OS. This allows you to use the expression in a SQL query to return the last time a row was changed or as a filter predicate to restrict rows without the need for defining the ROW CHANGE TIMESTAMP column on the underlying table.

The following table summarizes the feature comparison between Db2 and the solution provided in this post while migrating to PostgreSQL.

Feature Db2 PostgreSQL
If multiple rows are inserted or updated with a single statement, the value for the ROW CHANGE TIMESTAMP column might be different for each row Yes Yes
If the data type is specified, it must be TIMESTAMP WITHOUT TIME ZONE with a precision of 6 Yes Yes
Use the ROW CHANGE TIMESTAMP expression to read data directly in the select command Yes No

Conclusion

In this post, we discussed the solution to migrate ROW CHANGE TIMESTAMP from Db2 for z/OS to RDS for PostgreSQL or Aurora PostgreSQL target databases. We also discussed the use case in replicating data across a similar table structure in the target database. You can apply this solution while migrating from Db2 LUW as well.

If you have questions or feedback, leave a comment.


About the Authors

Sai Parthasaradhi is a Database Migration Specialist with AWS Professional Services. He works closely with customers to help them migrate and modernize their databases on AWS.

Vikas Gupta is a Lead Database Migration Consultant with AWS Professional Services. He loves to automate manual processes and enhance the user experience. He helps customers migrate and modernize workloads in the AWS Cloud, with a special focus on modern application architectures and development best practices.