Migrate an IBM Db2 for iSeries database to Amazon Aurora PostgreSQL using Blu Age and Qlik Replicate

One of the migration patterns that you can use to modernize and migrate mainframe workloads is automated refactoring using Blue Age tooling. This pattern often requires you to unlock mainframe data from on-premises databases (like IBM Db2 for iSeries) in order to be able to move to the cloud and the technical constraints and steps they have to go through to migrate the data. With the increasing demand for mainframe modernization and specifically for the automated refactoring approach, it’s important to be able to guide our customers and partners throughout this process.

In this post, we show you the steps to migrate an IBM Db2 for iSeries database to Aurora PostgreSQL by using Qlik Replicate during a mainframe modernization project that uses Blu Age automated transformation technology.

To implement this solution, you need the following services:

Blu Age – Blu Age powers the automated refactoring pattern within AWS Mainframe Migration service. Blu Age Refactor automatically creates modern applications from legacy monolithic mainframes or mid-range source code. Applications in COBOL, generated COBOL, PL/1, NATURAL, RPG/400, and COBOL/400, and their respective underlying databases and data files Db2, Db2/400, VSAM, IMS, and IDMS are transformed into modern distributed applications relying on Angular, Java/Spring, PostgreSQL, or other databases such as Amazon Aurora, Amazon Relational Database Service (Amazon RDS) for PostgreSQLOracle Database, and IBM Db2.
Qlik – Qlik Replicate is a solution that provides replication between various endpoints such as source IBM Db2 for iSeries and target Aurora PostgreSQL. Qlik Replicate is a good fit to complete a full data migration followed by CDC, especially when you want to migrate a partial application from on premises to AWS and need to replicate changes back from AWS to on premises until the entire application moves to AWS.

Solution overview

In the following sections, we explain the two-step process for migrating an IBM Db2 for iSeries database to Aurora PostgreSQL:

Refactor the source database to the target database with Blu Age refactoring tools.
Migrate data (full and CDC) from the source database to the target database using Qlik Replicate.

In this post, we first refactor a source IBM Db2 for iSeries database to a target Aurora PostgreSQL database using Blu Age modernization tools. The refactoring engine generates a target data model, which is functionally equivalent to the original database. To achieve this, some changes are introduced by the engine to the target model:

Database schema – The target tables belong to single database and a unique database schema. In addition, database scripts produced by the engine offer the flexibility to specify the target schema name.
Physical files – Each physical file of the original database is reflected as a table in the target data model, and each field is reflected as a column in the target table.
Logical files – Each logical file of the original database is reflected as an index in the target data model.
Multi-member tables – PostgreSQL partitions are attached by default to each table. This enables us to reflect the multi-member tables, which are very common in mainframe databases.
Primary keys – Five columns are added to each table in the target model:
- Technical primary key – The new primary key for each table in the target model.
- Record insertion date – The time and date when the record was inserted.
- Record update date – The time and date when updates happened in the record.
- Source file – The source table name that the record belongs to.
- Member file – The member name that the record belongs to.

The following diagram illustrates the architecture for database modernization using Blu Age tools.

Then we show you how to migrate data one time (full) and with ongoing replication (CDC) from the source IBM Db2 for iSeries database to the target Aurora PostgreSQL database using Qlik Replicate. In this step, the migration teams look for ways to load the data as fast as possible and also ensure data quality in the target. Therefore, it’s important to take the following points into consideration during strategy definition:

Table set definition – You need to define the specific list of tables to be loaded into the target.
Amount of data – After you can confirm the whole table set, the amount of data becomes critical for the strategy. It’s relevant to mention that mainframe databases often store both historic and operational data. Therefore, it’s valuable and cost-saving to analyze which data really needs to be loaded into the main database and which data could be stored or archived in a low-cost storage tier.
Migration approach –There are two approaches to migrate data:
- Big bang – This involves migrating the whole dataset in one go. It’s very common when dealing with small databases and non-critical workloads.
- Iterative migration – This involves moving data in iterations. This path requires more effort from the business as well as from the data engineers. However, this strategy allows you to identify errors and challenges in the early stages of the migration.
Tooling – Database migration scripts are also generated as part of the modernization process and are included in the artifacts delivered. However, these scripts require the data in a specific format like binary or CSV as input of the migration. This means that the data should be exported and loaded into a staging area before the final migration. This is not always possible because of legal or cost constraints. To address this challenge, you can use other extract, transform, and load (ETL) tools like Qlik Replicate to migrate the data directly from the source to the target. Finally, as a technical constraint for using Qlik Replicate, the landing data model must be previously created by using Blu Age migration scripts.
Data preparation – Data to be migrated should be loaded into a secondary repository to be examined and audited for inconsistencies, quality issues, duplicates, and data conflicts. This is a time-consuming activity, so it’s usually performed using automatic tools. It’s recommended as a good practice to take backups before the final migration.
Testing – This activity occurs throughout the whole migration process. It’s important to test every dataset for errors and ensure data quality. After all the issues have been fixed, additional iterations of functional testing should be run to guarantee the application behaves as expected. Finally, when all the data and functional issues are closed, the final migration can take place.
Security – Security should be job zero. Therefore, it’s also recommended to include the following controls in the migration strategy:
- Create specific database replication users for both ends and grant permissions based on least-privilege access.
- Set proper firewall rules only for the specific ports and directions involved in the replication.
- Set specific security groups along with proper inbound rules for the Amazon Elastic Compute Cloud (Amazon EC2) replication server in the AWS Cloud.
- Use AWS Key Management Service (AWS KMS) keys to encrypt the data at rest in the Amazon Elastic Block Store (Amazon EBS) volumes both for EC2 and Aurora clusters.
- Use AWS Identity and Access Management (IAM) roles for Amazon EC2 to limit the actions that the EC2 replication server can perform.
- Use Qlik Replicate-specific security options like users and groups to limit the actions that different teams can perform over replication tasks.
- Finally, use private connections like AWS Direct Connect to establish communication between the on-premises database and Aurora cluster.

The following diagram illustrates the architecture for data migration using Blu Age scripts (blue) and data migration using Qlik Replicate (green).

The architecture consists of the following key components (numbered in the preceding diagram):

IBM Db2 – This is the main server where the Db2 mainframe database resides.
Qlik Replicate – Qlik Replicate is deployed on Windows or Linux EC2 instances. It helps you boost data replication, ingestion, and streaming across a wide variety of databases. In this particular scenario, we used Blu Age database scripts to create the landing data model (tables, functions, indexes, schemas, and so on) and Qlik Replicate was the key component used to move the data from Db2 to Aurora PostgreSQL (initial load) and to set CDC tasks to establish ongoing data replication from on premises to AWS and vice versa.
Blu Age data migration scripts – Blu Age database migration scripts are also generated as part of the modernization process and are included in the artifacts delivered. Finally, as a technical constraint for using replication tools, the landing data model must be previously created using Blu Age database scripts.
Database layer- Aurora PostgreSQL is a fully managed, PostgreSQL and ACID–compliant relational database engine that combines the speed, reliability, and manageability of Aurora with the simplicity and cost-effectiveness of open-source databases. Aurora PostgreSQL is a drop-in replacement for PostgreSQL and makes it simple and cost-effective to set up, operate, and scale your new and existing PostgreSQL deployments, thus freeing you to focus on your business and applications. To learn more about Aurora in general, see What is Amazon Aurora?.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An active AWS account
A bastion host to access the database: Windows or Linux, either on premises or in the AWS Cloud
An on-premises Db2 iSeries database with user permissions to read data
A target Aurora DB instance provisioned with user permissions to create and deploy database objects
Proper configuration of firewalls in place to establish a connection between the on-premises database and Amazon EC2 and the target database
Qlik Replicate set up.
Database access tools pgAdmin, AS400/Db2 client IBMiAccess_v1r1 and IBMiAccess_v1r1_WindowsAP_English
Database credentials to create source and target endpoints:

- Source Db2 – Server name, user name, password, journal name, journal library (for Qlik only)
- Target Aurora – Host, port, user name, password, database name

To run the database migration, the migration scripts must be run on a machine with Java 8. This machine must have access to the Aurora PostgreSQL database server PostgreSQL14.

Refactor the source IBM Db2 for iSeries database to the target Aurora PostgreSQL database using Blu Age

We use Blu Age database modernization and velocity utilities for code refactoring. The application delivery contains the database migration scripts and inputs in the file Database-modernization-XXXX.zip.

The Blu Age refactoring team is responsible for providing the artifacts (DDL scripts) to create a target database along with the refactoring steps. The following screenshot shows the Blu Age artifacts and scripts for database migration.

The following screenshot illustrates the folder structure of BluaAge artifacts.

The .zip file contains the following folders:

database – The migration scripts and the conversion configuration from binary EBCDIC to PostgreSQL
scripts – The configuration and implementation of the migration

Complete the following steps for database migration:

Connect to the jump server EC2 instance. For instructions, refer to Connect using EC2 Instance Connect.
Create the target Aurora PostgreSQL database using the pgAdmin tool.
Open the script file delivered by the Blu Age team: …\Database-Modernization\scripts.

Customize the database config file DB-before.prefs located under \scripts\configs\db-

config\DB-before.prefs:
databaseType=POSTGRE
adminDataBaseName=postgres
adminDataBaseUser=[Enter the user name]
adminDataBasePassword=[Enter the password]
dataBaseHost=[Enter the DB host name]
dataBasePort=[Enter the DB port no]
dataBaseName=[Enter the DB name]
dataBaseUser=[Enter the user name]
dataBasePassword=[Please enter the password]

The following screenshot illustrates the configuration files.

Run the migrate-before.cmd command.
Monitor the log file _migration-logs.
Monitor and validate the objects on target database.

Open a pgAdmin session to the target Aurora database and run the following query to check the records count:

WITH tbl AS
(SELECT table_schema,
TABLE_NAME
FROM information_schema.tables
WHERE TABLE_NAME not like 'pg_%'
AND table_schema in ('public'))

SELECT table_schema,
TABLE_NAME,
(xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;

For more details about data purity, refer to Validate database objects after migrating from IBM Db2 z/OS to RDS for PostgreSQL or Aurora PostgreSQL

To grant permissions to the application user, run the following query on the target Aurora database:

CREATE ROLE [Enter the Role Name] WITH
LOGIN
NOSUPERUSER
NOINHERIT
NOCREATEDB
NOCREATEROLE
NOREPLICATION;
GRANT [Enter the pgadmin user] TO [Enter the Role Name];
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO [Enter the Role Name];

Migrate data (full and CDC) from the source IBM Db2 for iSeries database to the target Aurora PostgreSQL database using Qlik Replicate

In this section, we walk through the detailed steps to complete the data migration and replication using Qlik Replicate.

Install and configure Qlik Replicate

Follow the Qlik Replicate setup and user guide for installation and configuration for the data replication tool on a machine.
Install Qlik Replicate using the QlikReplicate_<version-build>_X64.exe installation kit. This kit runs on Windows 64-bit (x64) environments. For a list of the Windows versions supported by Qlik Replicate, see Windows software requirements.
Keep in mind the limitations and considerations while using Qlik Replicate for data replication and migration.
Follow the instructions in the setup wizard to install Qlik Replicate.
Later, if you need to start or stop the Qlik Replicate Server service, refer to Starting and stopping the Qlik Replicate Server service on Windows.

Register Qlik Replicate licenses

If this is the first time you’re using Qlik Replicate, you’re prompted to register a Replication Management license when the console opens. You may also need to register a Replication Analytics license, depending on whether you have obtained such a license from your Qlik Sales Representative.

Apply the license on the License page, as shown in the following screenshot.

The following screenshot illustrates how to register and manage Qlik Replicate licenses.

Define and manage Qlik Replicate user permissions

You can grant Qlik Replicate user’s different permissions according to the tasks you want them to perform. Four predefined roles are available: Admin, Designer, Operator, and Viewer. Each role has its own set of permissions, which are described in the following table.

You can set user permissions using Active Directory groups or local groups. To set user permissions using Active Directory groups, you can either create Active Directory groups with the names listed in the following table, or you can create Active Directory groups with different names. Then, add users to the groups according to the role you want them to perform.

Role	Active Directory Groups
Administrator	`AttunityReplicateAdmins`
Designer	`AttunityReplicateDesigners`
Operator	`AttunityReplicateOperators`
Viewer	`AttunityReplicateViewers`

The following screenshot illustrates how to manage user permissions of Qlik Replicate .

Qlik Replicate console

You can browse to the Qlik Replicate console using a supported web browser from a computer in the same network as the computer on which the Qlik Replicate server is installed. For information on supported browsers, see Supported browsers.
You can open the console from the Start menu of the computer where you installed Qlik Replicate.
To enable and control access to Qlik Replicate, you can create user roles as described in User permissions.
To open Qlik Replicate, choose the Start menu and in the All-Programs section, point to Qlik Replicate and choose Qlik Replicate Console.

The following screenshot illustrates how to launch Qlik Replicate console.

Define a Qlik Replicate endpoint

Before you can begin to design a task, you must add endpoints for the source IBM Db2 and target Aurora PostgreSQL databases to the Qlik Replicate server. To use an endpoint, you must have access to it somewhere in your system. When you add the endpoint to the Qlik Replicate server, you must provide the connection information and proper user credentials.
For a list of supported databases, see Supported Platforms and Endpoints.
Follow the steps to add an IBM Db2 for iSeries source endpoint to Qlik Replicate.

The following screenshot illustrates how to define endpoint of Qlik Replicate task.

On the Advanced tab, select Skip Journal Validation and Use table and schema system names. Under Internal Parameters, select UseShortColObjectNames, KeepTraillingSpaces, and set additionalConnectionProperties to QRYSTGLMT=*NOMAX;QUERYTIMEOUT=0.

You need to configure a PostgreSQL-based endpoint when replicating to any of the following databases:

PostgreSQL
Aurora PostgreSQL
Amazon RDS for PostgreSQL

Follow the steps to add a PostgreSQL-Based database as a target to Qlik Replicate.

The following screenshot illustrates how to customize endpoints for Qlik Replicate task.

Create and manage a Qlik Replicate task for data migration

After you add endpoints to the Qlik Replicate server, you can use them to build a replication task. For information on how to add an endpoint to a replication task, see Adding a source and target endpoint to a task.
The first step in this process is to define the source endpoint where your data is currently stored and the target endpoints where you want to replicate the data. To do this, drag one of the endpoints you added into the task map (in Designer mode). The next step in creating a replication task is selecting tables and views for replication.
To add source and target endpoints to a task, see Adding tasks.
For additional information about Qlik Replicate data types, see Replicate data types
See the following for information on how to work with tables and views:

The following screenshot illustrates how to create and manage Qlik Replicate task.

Define Qlik Replicate global rule

Global rules are a useful way of making changes across multiple tables and columns in the same task. You can define transformation rules that modify the source data or metadata before it reaches the target, and you can define filter rules that determine which records will be replicated to the target.

The following screenshot illustrates how to create Qlik Replicate global rules.

Configure a Qlik Replicate task for data migration

After you design a task, you can run and monitor its progress with one click in Qlik Replicate.

The following options are available:

Start Processing – This is available the first time you run the task only. This will run the initial full load operation. If change processing is also enabled for the task or if it is an apply changes only task type, change processing will start as soon as any changes are made to the source endpoint (the option switches to Resume Processing on the Run menu after the task has started).
Resume Processing – This resumes the task from the point that it was stopped. You can also resume processing by choosing Run if the task has been stopped.
Reload Target – This is only available when the Full Load or Full Load and Apply Changes replication options are enabled. It starts the full load and change processing (if enabled) from the beginning. Tables that have already been processed are handled according to the relevant Target table preparation setting. To replicate tables that were added to the local file channel task after the initial full load, you need to reload both the local and remote file channel tasks.

The following screenshot illustrates a full data migration using Qlik Replicate.

The following screenshot illustrates CDC using Qlik Replicate.

Export and import Qlik Replicate tasks

You can export replication tasks to a file (*.json) and import it to another instance of the Qlik Replicate server. This lets you use a task that you created in Qlik Replicate in a different environment—for example, if you created tasks in a development environment and now want to use the task in a production environment. When you export a task to a different environment, you may need to edit the task information. For example, you may need to change the connection string for an endpoint.

For more information, refer to Exporting tasks and Editing an exported (json) file.

Schedule a Qlik Replicate task for data replication

Use the Qlik Replicate Scheduler to schedule a one-time job or a recurring job for specific Qlik Replicate task operations. A job is essentially an operation that can be scheduled to occur once, daily, weekly, or monthly.

The following operations can be scheduled:

Run or resume a task
Stop a task
Reload a task

For more details, refer to Scheduling jobs.

The following screenshot illustrates how to schedule the jobs for Qlik Replicate task.

Clean up

To clean up your resources, complete the following steps:
Connect to the Aurora PostgreSQL DB instance using pgAdmin and drop the database using the following syntax:
```
Drop database 'database_name'
```
Connect to the EC2 instance where Qlik Replicate is running and uninstall the replicate services.
Drop the bastion host if newly created for this exercise.

Conclusion

In this post, we discussed an approach and the steps to modernize and migrate an IBM Db2 for iSeries database to Aurora PostgreSQL with help of Blu Age purpose-built tools and Qlik Data Replication tools.

We depicted how a mainframe modernization project can use Qlik Replicate to unlock the data from mainframe databases. Qlik Replicate helps you boost data replication, ingestion, and streaming across a wide variety of databases. In this particular scenario, we used database modernization scripts (produced by Blu Age tooling) to create the base database model (tables, functions, indexes, schemas, and so on) and used Qlik Replicate to move the data from Db2 to Aurora PostgreSQL (initial load) and set CDC tasks to establish ongoing data replication from on premises to AWS and vice versa.

If you have any questions, comments, or suggestions, leave a comment.

About the authors

Amit Upadhyay is a Senior Database Consultant with Amazon Web Services based in Texas, USA. Amit helps customers to modernize commercial and community database workloads on AWS to achieve their business goals.

Julian Camilo Hernandez Marin is a Cloud Infrastructure Architect with Amazon Web Services based in Medellin, Colombia. Outside of work, he loves to hang out with family and friends.

Abhishek Pandey is a Senior Solutions Architect with Amazon Web Services based in Houston, TX. Abhishek is passionate about diving deep with customers to architect creative solutions that support business innovation across different industries. Outside of work, he loves to travel with family and friends.

Sankaran Kannappan is a Database Consultant with Amazon Web Services, focusing on data center migration to the AWS Cloud, portfolio analysis, and modernizing commercial databases to AWS Cloud native and open-source databases on AWS.