AWS Database Blog

Migrating tables from IBM Db2 for z/OS to Amazon RDS for Db2

Amazon Relation Data Service (Amazon RDS) for Db2 is the latest addition to the Amazon RDS family of database engines. This service offers a fully managed solution on scalable hardware, designed to deliver optimal performance within minutes. It features an optional Multi-AZ deployment, which synchronously replicates data to a standby databases instance in a different availability zone, ensuring high availability and reliability. Amazon Web Services (AWS) takes care of provisioning, patching, backups, and monitoring of the Amazon RDS for Db2 instances, significantly reducing operational overhead. This allows database administrators (DBA) to focus on enhancing applications performance instead of handling routine maintenance tasks.

AWS Mainframe Modernization is designed to modernize mainframe applications, delivering numerous benefits such as cost efficiency, scalability, flexibility, enhanced security, improved integration, support for DevOps and CI/CD, resource optimization, global reach, and continuous updates and innovation.

When it comes to selecting the relational database to migrate data from Db2 for z/OS, RDS for Db2 provides the best compatibility among other relational database management systems (RDBMSs) thereby minimizing the migration effort and timelines. The data types, SQL queries, Java or SQL stored-procedures, user-defined functions and triggers can be migrated from Db2 for z/OS with fewer changes to run on RDS for Db2.

In this post, we outline the migration methodology for converting schema and data from Db2 on z/OS to RDS for Db2.

Key RDS for Db2 features applicable to mainframe migration

For complete list refer to RDS for Db2 features; the following features are particularly relevant for migrating data from the mainframe to RDS for Db2.

  • Code page – EBCDIC 037 is the commonly used code page in Db2 for z/OS. When migrating applications off a mainframe, you might want to use ASCII single-byte character set (SBCS) code page ISO-8859-1 (Latin-1) for compatibility. RDS for Db2 supports many codepages including ISO-8859-1.
  • Collate sequence – When using an EBCDIC codepage, the default collation sequence in Db2 for z/OS is EBCDIC. When you replatform an application, it might be important to have the EBCDIC collate sequence on the target database to get consistent results especially for SQL queries using ORDER BY. RDS for Db2 supports EBCDIC collate sequence which is helpful in this case.
  • CPU, memory, and IOPS – RDS for Db2 supports standard purpose (m6i series), memory optimized (r6i and x2iedn series) and burstable (t3 series) instance classes. The x2iedn instance classes provide 1:32 vCPU to memory ratios with the largest instance x2iedn.32xlarge with 128 vCPU and 4 TiB memory. You can choose recently announced io2 block express EBS volumes for RDS that provide sub-millisecond latency and can scale up to 256,000 IOPS with a maximum of 4,000 Mbps throughput per your mainframe workload requirements. When it comes to converting mainframe MIPS (Million instructions per second) to vCPU there is no empirical relationship between MIPS and vCPU. You can use between 75 – 300 MIPS equivalent to 1 vCPU depending on the type of the workload. The instance size can be easily scaled up and down as per your test results.
  • Connectivity to migration tools – You can use RDS for Db2 as a target for many data replication tools that support Db2 Linux, Unix, Windows (LUW) as a target. We discuss some of these in the Tools to migrate data from Db2 for z/OS to RDS for Db2 section in this post.
  • Homogeneous Db2 federation – RDS for Db2 supports homogeneous federation, which enables connection to a Db2 for z/OS database from within your RDS for Db2 database. Use this facility to copy data from mainframes directly. Refer to the section Db2 Federation in this post.

In the next sections, we discuss the process of converting the database and schema for migrating data from Db2 for z/OS to RDS for Db2.

Converting Db2 for z/OS database to RDS for Db2

A subsystem on Db2 on z/OS is an instance of the Db2 system software that manages databases and their related resources. Each subsystem is independent, and applications connect to it to perform data processing tasks like querying, updating, and managing data.

In Db2 for z/OS, a database is a logical grouping of tablespaces, tables, indexes, and other related database objects whereas in RDS for Db2 databases are physical to which applications connect to access data.

Consult the following table for database conversion:

Feature Db2 on z/OS RDS for Db2
Subsystem Separate schemas for applications Move relevant schemas to an Amazon for RDS for Db2 database
Separate schemas for environments example, development, test and, production Use a separate Amazon RDS for Db2 database for each environment.
Separate subsystem for each environment example, Development, Test, Production. Convert each environment to an equivalent RDS for Db2 database
Database Databases are logical Ignore the database names
Schema.table Schema and tables names are unique across a subsystem Migrate them as-is

Schema Conversion

Tools for schema migration include Db2 for z/OS tools (for example, BMC Catalog Manager, RC/Migrator, ADB2DEN), third-party tools (for example, Erwin, Toad for DB2, DBeaver), and IBM native tools (for example, IBM Data Studio, db2Look). These tools help generate DDL for DB2 for z/OS tables, enabling object creation in RDS for Db2. However, many tools do not convert advanced DDLs, such as those for partitioned tables, and may not accurately reflect constraints, views, stored procedures, and triggers. Therefore, evaluate each tool carefully.

In this post we use the ADB2GEN utility, part of the IBM Db2 Administration Tool for z/OS, to extract DDLs for table and associated objects such as indexes, views, triggers, check constraints, referential constraints, nested or typed tables, identity, sequences, their proper starting values, and others.

We provide you a Python script that takes the DDL generated from the ADB2GEN utility as an input, removes the syntax not appropriate for RDS for Db2, and provides RDS for Db2 compatible DDLs to create tables, indexes, views, and constraints. The Python script accepts the command-line parameters for the target schema, data tablespace, and index tablespace and updates the syntax appropriately for RDS for Db2.

The Python script takes care of the following few features for adjusting the DDLs for RDS for Db2.

  1. Removes the CREATE TABLESPACE statement if it was extracted. It is advised to create tablespaces beforehand with the right page size that fits your workload and table size. Refer to Common tasks for tablespaces for how to create bufferpools and tablespaces in RDS for Db2. You can specify table and index tablespaces as an input to Python script.
  2. Removes FOR SBCS DATA clause as it isn’t supported for CHAR/VARCHAR data types in RDS for Db2.
  3. Removes the non-supported WITHOUT TIME ZONE clause for TIMESTAMP data.
  4. The script removes VOLATILE and NOT VOLATILE clauses because they aren’t supported in the CREATE TABLE statement. If you need to declare a table VOLATILE, you can use ALTER TABLE statement after the table creation.
  5. The script makes required changes to all the extracted CREATE INDEX statements. The DDL extracted from Db2 z/OS might have an explicit CREATE UNIQUE INDEX statement for a primary key. If the statement exists, it will fail with the warning SQL0605W (an index with a matching definition already exists) because RDS for Db2 automatically creates a unique index when there is a primary key constraint defined in the CREATE TABLE statement. RDS for Db2 creates the primary key index with a naming convention of SQLxxxxxxx which can lead to object validation discrepancies. To address this, you can rename the automatically generated index name to match the original index name defined in the Db2 for z/OS.
  6. All tables in Db2 z/OS have one implicit or explicit clustered index. The script removes the CLUSTER and NOT CLUSTER keyword from the CREATE INDEX statement. You can modify the script to keep the CLUSTER keyword. RDS for Db2 supports the CLUSTER keyword; however, NOT CLUSTER keyword isn’t supported.
  7. In Db2 for z/OS, table compression is defined at the table space level whereas it is defined at the table level in Db2 LUW. We recommend adaptive compression for tables and indexes. The script adds COMPRESS YES to all tables however for indexes, it will keep the property as captured in the CREATE INDEX statement on Db2 for z/OS. You may want to reconsider this property for certain tables and modify the generated DDL statements manually.
  8. In RDS for Db2 you can create local users, groups, and roles or use Kerberos authentication. ADB2GEN can extract privileges granted on the objects. The script preserves the extracted GRANT statements in their original form as there is no syntax change needed.

The steps for extracting and converting DDLs are as follows:

  1. Run ADB2GEN on mainframe to extract DDLs for the tables and its dependent objects. The utility can be run for individual tables or for a schema.
  2. Transfer the output from the ADB2GEN utility to an Amazon Elastic Compute Cloud (Amazon EC2) instance or a local Linux machine. Ensure Python version 3.8 or later is installed on this machine. If you intend to execute converted DDLs from this machine, install a Db2 client version 11.5.9 or newer.
  3. Download Python script to the Linux machine.
  4. Run Python script.

Syntax:

python zos_to_luw_ddl_conversion.py <zos_ddl_file> <target_schema> <target_data_tablespace> <target_index_tablespace> <output_file>

Example:

python zos_to_luw_ddl_conversion.py EMPLOYEE.DDL SCHEMA1 TS_DATA_8K TS_INDEX_8K EMPLOYEE.out

5. Execute the converted DDL directly from the Linux machine post connecting to the RDS for Db2 database or use an administration tool of your choice.

Let’s explore an example of a DDL conversion.

The following is a table DDL extracted from Db2 z/OS v12 using ADB2GEN.

------ Table ------
CREATE TABLE MFZOSDB2.EMPLOYEE
(EMPNO               INTEGER NOT NULL,
NAME                VARCHAR(200) FOR SBCS DATA NOT NULL,
DEPTNO                           VARCHAR(50) FOR SBCS DATA NOT NULL,
POSITION            CHAR(10) FOR SBCS DATA NOT NULL,
DOB                              DATE NOT NULL ,
DOJ                         TIMESTAMP (6) WITHOUT TIME ZONE NOT NULL WITH DEFAULT,
JOIN_MONTH          SMALLINT NOT NULL,
SALARY               DECIMAL(10, 2) NOT NULL,
BONUS         DECIMAL(10, 2) NOT NULL,
CONSTRAINT JOIN_MONTH
CHECK (JOIN_MONTH BETWEEN 1 AND 12),
CONSTRAINT EMPNO
PRIMARY KEY (EMPNO))
IN MFZOSDB2.TBSP1
AUDIT NONE
DATA CAPTURE CHANGES
WITH RESTRICT ON DROP
CCSID      EBCDIC
APPEND NO ;
GRANT DELETE,INSERT,SELECT,UPDATE ON TABLE MFZOSDB2.EMPLOYEE TO DB2USER;
COMMIT;
------ Index ------
CREATE UNIQUE INDEX MFZOSDB2.EMPLOYE1
ON MFZOSDB2.EMPLOYEE
(EMPNO        ASC)
USING STOGROUP SMSDSNG
PRIQTY 720 SECQTY -1
ERASE  NO
FREEPAGE 30 PCTFREE 18
GBPCACHE CHANGED
CLUSTER
INCLUDE NULL KEYS
BUFFERPOOL BP1
CLOSE NO
COPY NO
DEFER NO
DEFINE YES
PIECESIZE 2 G;
COMMIT;
CREATE INDEX MFZOSDB2.EMPLOYE2
ON MFZOSDB2.EMPLOYEE
(EMPNO         ASC,
SALARY        ASC,
BONUS          ASC)
USING STOGROUP SMSDSNG
PRIQTY 720 SECQTY 720
ERASE  NO
FREEPAGE 31 PCTFREE 20
GBPCACHE CHANGED
NOT CLUSTER
INCLUDE NULL KEYS
BUFFERPOOL BP1
CLOSE NO
COPY NO
DEFER NO
DEFINE YES
PIECESIZE 2 G;
COMMIT;
------ View ------
SET CURRENT SQLID='DB0VDB20';
SET CURRENT SCHEMA='DBSYSADM';
SET CURRENT PATH = "SYSIBM","SYSFUN","SYSPROC","SYSIBMADM","CA7DB0" ;
SET CURRENT APPLICATION COMPATIBILITY = 'V12R1';
CREATE VIEW DB0VDB20.V_EMPLYE AS
SELECT EMPID, NAME, POSITION, DOJ
FROM MFZOSDB2.EMPLOYEE ;
SET CURRENT APPLICATION COMPATIBILITY = 'V12R1M503';
SET CURRENT SQLID='DBSYSADM';
GRANT ALL ON TABLE DB0VDB20.V_EMPLYE TO DB2USER;
COMMIT;
------ Foreign Key ------
ALTER TABLE MFZOSDB2.EMPLOYEE FOREIGN KEY EMP$DEP$
(DEPTNO)
REFERENCES MFZOSDB2.DEPT
(DEPTNO)
ON DELETE RESTRICT ENFORCED;
COMMIT;

The following is the DDL for RDS for Db2 that the Python script produces:

------ Table ------
CREATE TABLE SCHEMA1.EMPLOYEE
(EMPNO               INTEGER NOT NULL,
NAME                VARCHAR(200)  NOT NULL,
DEPTNO                           VARCHAR(50)  NOT NULL,
POSITION            CHAR(10)  NOT NULL,
DOB                              DATE NOT NULL ,
DOJ                         TIMESTAMP (6)  NOT NULL WITH DEFAULT,
JOIN_MONTH          SMALLINT NOT NULL,
SALARY               DECIMAL(10, 2) NOT NULL,
BONUS         DECIMAL(10, 2) NOT NULL,
CONSTRAINT JOIN_MONTH
CHECK (JOIN_MONTH BETWEEN 1 AND 12),
CONSTRAINT EMPNO
PRIMARY KEY (EMPNO))
IN TS_DATA_8K INDEX IN TS_INDEX_8K
COMPRESS YES;
GRANT DELETE,INSERT,SELECT,UPDATE ON TABLE SCHEMA1.EMPLOYEE TO DB2USER;
------ Index ------
CREATE UNIQUE INDEX SCHEMA1.EMPLOYE1
ON SCHEMA1.EMPLOYEE
(EMPNO        ASC)
INCLUDE NULL KEYS
;
CREATE INDEX SCHEMA1.EMPLOYE2
ON SCHEMA1.EMPLOYEE
(EMPNO         ASC,
SALARY        ASC,
BONUS          ASC)
INCLUDE NULL KEYS
;
------ View ------
CREATE VIEW SCHEMA1.V_EMPLYE AS
SELECT EMPID, NAME, POSITION, DOJ
FROM SCHEMA1.EMPLOYEE ;
GRANT ALL ON TABLE SCHEMA1.V_EMPLYE TO DB2USER;
------ Foreign Key ------
ALTER TABLE SCHEMA1.EMPLOYEE FOREIGN KEY EMP$DEP$
(DEPTNO)
REFERENCES SCHEMA1.DEPT
(DEPTNO)
ON DELETE RESTRICT ENFORCED;

After preparing DDLs for RDS for Db2, we now discuss tools to migrate data from Db2 on z/OS to RDS for Db2.

Tools to migrate data from Db2 for z/OS to RDS for Db2

There are many tools available for data migration from Db2 for z/OS tables to RDS for Db2. The choice of a tool depends on whether you need a one-time migration or change data capture (CDC) from Db2 for z/OS to RDS for Db2.

Use one-time migration when you can take downtime on Db2 for z/OS tables and migrate data through full load. When it is not possible to take a downtime, initial load followed by CDC should be used.

For large tables, in a one-time migration strategy, we suggest to migrate data in parts by using the partition keys, date, timestamp, or ID columns. Take cold and hot data into consideration. Cold data can be migrated earlier while hot data can be migrated closer to the cutover date.

In the following table we describe which tool is best suited for a migration strategy. Note, many tools can be used for both strategies.

Tool Full or Initial Load Change Data Capture (CDC)
AWS DMS Yes No
Precisely Mainframe Replication Yes Yes
Mainframe tools Yes No
Db2 Export Yes No
Db2 Federation Yes No
Qlik Replicate Yes Yes
IBM Data Replication – Q Replication Yes Yes
IBM Data Replication – SQL Replication Yes Yes

Let’s discuss a few tools with their benefits and limitations –

AWS DMS

AWS Database Migration Service (AWS DMS) supports data migration for IBM Db2 for z/OS database version 12 as a source and RDS for Db2 as a target.

AWS DMS can be used only for full or initial data load as it does not support CDC from Db2 for z/OS to RDS for Db2.

AWS Mainframe Modernization Data Replication with Precisely

AWS Mainframe Modernization Data Replication with Precisely enables the data replication capability of the AWS Mainframe Modernization service. It allows near real-time replication of data from the mainframe (IBM z/OS) and midrange (IBM i) data sources, like Db2, IMS, and VSAM to a wide range of AWS Cloud databases.

AWS Mainframe Modernization Data Replication supports RDS for Db2 as target when we use Db2 on z/OS or Db2 on IBM i (IBM AS400s) as source. See Unlock Mainframe Data with Precisely Connect and Amazon Aurora for information on how to set up Precisely. Although the post is for Aurora as target, it gives a good insight into the process to setup this offering.

AWS Mainframe Modernization Data Replication supports both initial load and CDC however initial load uses inserts, not bulk load. You can use methods described earlier to load tables initially and then start CDC based on timestamp.

AWS Mainframe Modernization Data Replication is available via the AWS Marketplace AMIs:

These AMIs facilitate rapid provisioning of Amazon EC2 instances, creating a tailored environment for data replication from Mainframe and midrange systems to AWS using Precisely.

To get started, refer to AWS Mainframe Modernization Data Replication with Precisely documentation.

Mainframe tools

You can extract data from tables in Db2 for z/OS in a delimited format using various tools available for the mainframe. Some popular tools include Db2 High Performance unload (HPU), File-AID, Db2 UNLOAD, or Fast Unload. HPU is generally considered the fastest option. The extracted data files may need to be converted from EBCIDC to ASCII format. This conversion can be performed using various tools and utilities available on the mainframe.

Post conversion, copy the data files to a Linux machine with Db2 client installed or an Amazon Simple Storage Service (Amazon S3) bucket.

You can install AWS Command Line Interface (AWS CLI) on mainframe and use it to copy files to Amazon S3.

These files can be loaded using Db2 LOAD CLIENT command from the Linux machine or use the ADMIN_CMD stored procedure for files on the Amazon S3 bucket. The LOAD CLIENT command method is useful for migrating small tables. For large tables, Amazon S3 is the preferred approach.

You can use these tools in two scenarios:

  • Full data migration when log replication isn’t needed.
  • Initial bulk load of large tables, with subsequent data synchronization using a CDC tool.

Db2 Export

Db2 export command can be run from an on-premise Linux machine with Db2 client installed and connectivity to Db2 for z/OS tables.

To avoid potential issues caused by data containing the same delimiter as the column delimiter in delimited (DEL) formats, use the Integration Exchange Format (IXF) as the output file format for your data. Having said this, IXF format files are comparatively bigger than DEL format. In cases where you do not have delimiter conflicts, use DEL files.

Smaller data files can be loaded directly from the on-premise machine to the RDS for Db2 database. However, to mitigate network latency delays, large data files should be first transferred to Amazon S3 before loading them into the database.

This method is best suited for full or initial load of small to medium sized tables.

Db2 Federation

As mentioned previously, RDS for Db2 supports homogenous federation, which enables you to access data from Db2 for z/OS. After establishing the federation, you can use Load command with CURSOR file type to select data from a table in Db2 z/OS and load it directly into a table in RDS for Db2. This mechanism facilitates data loading without the additional step of exporting data into files and transferring them to the target location. This approach can be used for full data migration or Initial load of large tables.

Qlik Replicate

Qlik Replicate is a data integration and replication tool that facilitates data ingestion and real-time CDC across a variety of databases including Db2 for z/OS as source and Db2 LUW as a target.

While Db2 for z/OS is supported natively by Qlik, for Db2 LUW you must use ODBC endpoint.

Qlik’s ODBC driver endpoint for Db2 LUW doesn’t support bulk load, which means that data ingestion operation uses inserts that are slow and generates transaction log files. We would advise you to load initial data for large tables using Db2 federation, or Mainframe tools as described previously. Post initial data load, the CDC can be triggered using a timestamp.

IBM Data Replication IIDR

IIDR IBM Data Replication is a data synchronization tool that keeps multiple data stores in sync in near real time. It provides two different solutions: SQL Replication and Q Replication. In SQL Replication, committed source changes are staged in relational tables before being replicated to the target database while in Q Replication committed source changes are written in messages that are transported through message queues (IBM MQ) to the target database.

SQL Replication is easier to implement than Q Replication, which requires the setup and management of an IBM MQ infrastructure. However, the overhead on the source database is higher with SQL Replication compared to Q Replication.

If your organization already has IBM MQ infrastructure in place, Q Replication might be more natural to integrate into your existing systems. Refer to Near zero-downtime migrations from self-managed Db2 on AIX or Windows to RDS for Db2 using IBM Q Replication for more information.

If your data migration demands high performance, low latency, and the ability to handle large volumes of data reliably, Q Replication might be the better choice. However, if simplicity, lower cost, and moderate performance are key considerations, SQL Replication could be more suitable.

Conclusion

In this post, we provided a comprehensive guide on how to migrate both the schema and data from Db2 z/OS to RDS for Db2. The migration process is broken down into two main parts: schema conversion and data migration.

Firstly, we introduced a Python script designed to facilitate the schema conversion. This script automates the transformation of Db2 z/OS schema definitions into a format compatible with RDS for Db2, simplifying what would otherwise be a complex and error-prone task.

Following the schema conversion, we discussed several tools available for data migration. Each tool’s benefits and limitations were highlighted to help you choose the most suitable one for your specific migration scenario. Some tools offer high performance and automation, while others provide greater control and customization options.

By using the provided Python script and selecting the appropriate data migration tool, you can streamline the migration process, minimize downtime, and ensure data integrity.

We invite you to comment on this post and leave suggestions for future posts about Amazon RDS for Db2. Review the Amazon RDS for Db2 User Guide to learn more about the service.


About the authors

Divaker Goel is a Senior Database Consultant at Amazon Web Services’ Professional Services team with over 20 years’ experience in databases. He specializes in guiding customers through their migration journey to AWS cloud, helping them optimize and modernize their applications and databases.

Vikram S Khatri is a Sr. Product Manager for Amazon RDS for Db2. Vikram has over 20 years of experience in Db2. He enjoys developing new products from the ground up. In his spare time, he practices meditation and enjoys listening to podcasts.