AWS Big Data Blog

Stream mainframe data to AWS in near real time with Precisely and Amazon MSK

This is a guest post by Supreet Padhi, Technology Architect, and Manasa Ramesh, Technology Architect at Precisely in partnership with AWS.

Enterprises rely on mainframes to run mission-critical applications and store essential data, enabling real-time operations that help achieve business objectives. These organizations face a common challenge: how to unlock the value of their mainframe data in today’s cloud-first world while maintaining system stability and data quality. Modernizing these systems is critical for competitiveness and innovation.

The digital transformation imperative has made mainframe data integration with cloud services a strategic priority for enterprises worldwide. Organizations that can seamlessly bridge their mainframe environments with modern cloud platforms gain significant competitive advantages through improved agility, reduced operational costs, and enhanced analytics capabilities. However, implementing such integrations presents unique technical challenges that require specialized solutions. Some of the challenges include converting EBCDIC data to ASCII, where the handling of data types is unique to the mainframe, such as binary data and COMP data. Data stored in Virtual Storage Access Method (VSAM) files can be quite complex due to practices to store multiple different record types in a single file. To address these challenges, Precisely—a global leader in data integrity, serving over 12,000 customers—has partnered with Amazon Web Services (AWS) to enable real-time synchronization between mainframe systems and Amazon Relational Database Service (Amazon RDS). For more on this collaboration, check out our previous blog post: Unlock Mainframe Data with Precisely Connect and Amazon Aurora.

In this post, we introduce an alternative architecture to synchronize mainframe data to the cloud using Amazon Managed Streaming for Apache Kafka (Amazon MSK) for greater flexibility and scalability. This event-driven approach provides additional possibilities for mainframe data integration and modernization strategies.

A key enhancement in this solution is the use of the AWS Mainframe Modernization – Data Replication for IBM z/OS Amazon Machine Image (AMI) available in AWS Marketplace, which simplifies deployment and reduces implementation time.

Real-time processing and event-driven architecture benefits

Real-time processing makes data actionable within seconds rather than waiting for batch processing cycles. For example, financial institutions such as Global Payments have leveraged this solution to modernize mission-critical banking operations, including payments processing. By migrating these operations to the AWS Cloud, they enhanced user experience, improved scalability and maintainability, while enabling advanced fraud detection – all without impacting the performance of existing mainframe systems. Change data capture (CDC) enables this by identifying database changes and delivering them in real time to cloud environments.

CDC offers two key advantages for mainframe modernization:

  • Incremental data movement – Eliminates disruptive bulk extracts by streaming only changed data to cloud targets, minimizing system impact and ensuring data currency
  • Real-time synchronization – Keeps cloud applications in sync with mainframe systems, enabling immediate insights and responsive operations

Solution overview

In this post, we provide a detailed implementation guide for streaming mainframe data changes from DB2z through AWS Mainframe Modernization – Data Replication for IBM z/OS AMI to Amazon MSK and then applying those changes to Amazon Relational Database Service (Amazon RDS) for PostgreSQL using MSK Connect with the Confluent JDBC Sink Connector.

By introducing Amazon MSK into architecture and streamlining deployment through the AWS Marketplace AMI, we create new possibilities for data distribution, transformation, and consumption that expand upon our previously demonstrated direct replication approach. This streaming-based architecture offers several additional benefits:

  • Simplified deployment – Accelerate implementation using the preconfigured AWS Marketplace AMI
  • Decoupled systems – Separate the concern of data extraction from data consumption, allowing both sides to scale independently
  • Multi-consumer support – Enable multiple downstream applications and services to consume the same data stream according to their own requirements
  • Extensibility – Create a foundation that can be extended to support additional mainframe data sources such as IMS and VSAM, as well as additional AWS targets using MSK Connect sink connectors

The following diagram illustrates the solution architecture.

Precisely MSK architecture diagram

  1. Capture/Publisher – Connect CDC Capture/Publisher captures Db2 changes from Db2 logs using IFI 306 Read and communicates captured data changes to a target engine through TCP/IP.
  2. Controller Daemon – The Controller Daemon authenticates all connection requests, managing secure communication between the source and target environments.
  3. Apply Engine – The Apply Engine is a multifaceted and multifunctional component in the target environment. It receives the changes from the Publisher agent and applies the changed data to the target Amazon MSK.
  4. Connect CDC Single Message Transform (SMT) – Performs all necessary data filtering, transformation, and augmentation required by the sink connector.
  5. JDBC Sink Connector – As data arrives, an instance of the JDBC Sink Connector along with Apache Kafka writes the data to target tables in Amazon RDS.

This architecture provides a clean separation between the data capture process and the data consumption process, allowing each to scale independently. The use of MSK as an intermediary enables multiple systems to consume the same data stream, opening possibilities for complex event processing, real-time analytics, and integration with other AWS services.

Prerequisites

To complete the solution, you need the following prerequisites:

  1. Install AWS Mainframe Modernization – Data Replication for IBM z/OS
  2. Have access to Db2z on mainframe from AWS using your approved connectivity between AWS and your mainframe

Solution walkthrough

The following code content shouldn’t be deployed to production environments without additional security testing.

Configure the AWS Mainframe Modernization Data Replication with Precisely AMI on Amazon EC2

Follow the steps defined at Precisely AWS Mainframe Modernization Data Replication. Upon the initial launch of the AMI, use the following command to connect to the Amazon Elastic Compute Cloud (Amazon EC2) instance:

ssh -i ami-ec2-user.pem ec2-user@$AWS_AMI_HOST

Configure the serverless cluster

To create an Amazon Aurora PostgreSQL-Compatible Edition Serverless v2 cluster, complete the following steps:

  1. Create a DB cluster by using the following AWS Command Line Interface (AWS CLI) command. Replace the placeholder strings with values that correspond to your cluster’s subnet and subnet group IDs.
    aws rds create-db-cluster \
       --db-cluster-identifier cdc-serverless-pg-cluster \
       --engine aurora-postgresql \
       --serverless-v2-scaling-configuration MinCapacity=1,MaxCapacity=2 \
       --master-username connectcdcuser \
       --manage-master-user-password \
       --db-subnet-group-name "<subnet-security-group-id>" \
       --vpc-security-group-ids "<cluster-security-group-id>"
  2. Verify the status of the cluster by using the following command:
    aws rds describe-db-clusters --db-cluster-identifier cdc-serverless-pg-cluster
  3. Add a writer DB instance to the Aurora cluster:
    aws rds create-db-instance \
       --db-cluster-identifier cdc-serverless-pg-cluster \
       --db-instance-identifier cdc-serverless-pg-instance \
       --db-instance-class db.serverless \
       --engine aurora-postgresql
  4. Verify the status of the writer instance:
    aws rds describe-db-instances --db-instance-identifier cdc-serverless-pg-instance

Create a database in the PostgreSQL cluster

After your Aurora Serverless v2 cluster is running, you need to create a database for your replicated mainframe data. Follow these steps:

  1. Install the psql client:
    sudo yum install postgresql16
  2. Retrieve the password from secret manager:
    aws secretsmanager get-secret-value --secret-id '<cdc-serverless-pg-cluster-secret ARN>' --query 'SecretString' --output text
  3. Create a new database in PostgreSQL:
    PGPASSWORD="password" psql --host=<DATABASE-HOST> --username=connectcdcuser --dbname=postgres -c "CREATE DATABASE dbcdc"

Configure the serverless MSK cluster

To create a serverless MSK cluster, complete the following steps:

  1. Copy the following JSON and paste it into a new file create-msk-serverless-cluster.json. Replace the placeholder strings with values that correspond to your cluster’s subnet and security group IDs.
       {
         "VpcConfigs": [
           {
             "subnets": [
               "<cluster-subnet-1>",
               "<cluster-subnet-2>",
               "<cluster-subnet-3>"
             ],
             "securityGroups": ["<cluster-security-group-id>"]
           }
         ],
         "ClientAuthentication": {
           "Sasl": {
             "Iam": {
               "Enabled": true
             }
           }
         }
       }
  2. Invoke the following AWS CLI command in the folder where you saved the JSON file in the previous step:
    aws kafka create-cluster-v2 --cluster-name pgsqlmsk --serverless file://create-msk-serverless-cluster.json
  3. Verify cluster status by invoking the following AWS CLI command:
    aws kafka list-clusters-v2 --cluster-type-filter SERVERLESS
  4. Get the bootstrap broker address by invoking the following AWS CLI command:
    aws kafka get-bootstrap-brokers --cluster-arn "<msk-serverless-cluster-arn>"
  5. Define the environment variable to store the bootstrap servers of the MSK cluster and locally install Kafka in the path environment variable:
    export BOOTSTRAP_SERVERS=<kafka_bootstrap_servers_with_ports>

Create a topic on the MSK cluster

To create a Kafka topic, you need to install the Kafka CLI first. Follow these steps:

  1. Download the binary distribution of Apache Kafka and extract the archive in folder kafka:
    wget https://dlcdn.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz
       tar -xzf kafka_2.13-3.9.0.tgz
       ln -sfn kafka_2.13-3.9.0 kafka
  2. To use IAM to authenticate with the MSK cluster, download the Amazon MSK Library for IAM and copy to the local Kafka library directory as shown in the following code. For complete instructions, refer to Configure clients for IAM access control.
    wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.1/aws-msk-iam-auth-2.3.1-all.jar
    cp aws-msk-iam-auth-2.3.1-all.jar kafka/libs
  3. In the directory, create a file to configure a Kafka client to use IAM authentication for the Kafka console producer and consumers:
    security.protocol=SASL_SSL
       sasl.mechanism=AWS_MSK_IAM
       sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required; sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
  4. Create the Kafka topic, which you defined in the connector config:
    kafka/bin/kafka-topics.sh --create --bootstrap-server $BOOTSTRAP_SERVERS --command-config kafka/config/client-config.properties --partitions 1 --topic pgsql-sink-topic

Configure the MSK Connect plugin

Next, create a custom plugin available in the AMI at /opt/precisely/di/packages/sqdata-msk_connect_1.0.1.zip which contains the following:

  • JDBC Sink Connector from Confluent
  • MSK Config provider
  • AWS Mainframe Modernization – Data Repication for IBM z/OS Custom SMT

Follow these steps:

  1. Invoke the following to upload the .zip file to an S3 bucket to which you have access:
    aws s3 cp /opt/precisely/di/packages/sqdata-msk_connect_1.0.1.zip s3://<bucket>/
  2. Copy the following JSON and paste it into a new file create-custom-plugin.json. Replace the placeholder strings with values that correspond to your bucket.
    {
         "contentType": "ZIP",
         "description": "jdbc sink connector",
         "location": {
           "s3Location": {
             "bucketArn": "arn:aws:s3:::<bucket>",
             "fileKey": "sqdata-msk_connect_1.0.1.zip"
           }
         },
         "name": "jdbc-sink-connector"
       }
  3. Invoke the following AWS CLI command in the folder where you saved the JSON file in the previous step:
    aws kafkaconnect create-custom-plugin --cli-input-json file://create-custom-plugin.json
  4. Verify plugin status by invoking the following AWS CLI command:
    aws kafkaconnect list-custom-plugins

Configure the JDBC Sink Connector

To configure the JDBC Sink Connector, follow these steps:

  1. Copy the following JSON and paste it into a new file create-connector.json. Replace the placeholder strings with appropriate values:
    {
         "connectorConfiguration": {
           "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
           "connection.url": "jdbc:postgresql://<postgresql-endpoint>
    /dbcdc?currentSchema=public",
           "config.providers": "secretsmanager",
           "config.providers.secretsmanager.class": "com.amazonaws.kafka.config.providers.SecretsManagerConfigProvider",
           "connection.user": "${secretsmanager:MySecret-1234:username}",
           "connection.password": "${secretsmanager:MySecret-1234:password}",
           "config.providers.secretsmanager.param.region": "<region>",
           "tasks.max": "1",
           "topics": "pgsql-sink-topic",
           "insert.mode": "upsert",
           "delete.enabled": "true",
           "pk.mode": "record_key",
           "auto.evolve": "true",
           "auto.create": "true",
           "value.converter": "org.apache.kafka.connect.storage.StringConverter",
           "key.converter": "org.apache.kafka.connect.storage.StringConverter",
           "transforms": "ConnectCDCConverter",
           "transforms.ConnectCDCConverter.type": "com.precisely.kafkaconnect.ConnectCDCConverter",
           "transforms.ConnectCDCConverter.cdc.multiple.tables.enabled": "true",
           "transforms.ConnectCDCConverter.cdc.source.table.name.ignore.schema": "true"
         },
         "connectorName": "pssql-sink-connector",
         "kafkaCluster": {
           "apacheKafkaCluster": {
             "bootstrapServers": "<msk-bootstrap-servers-string>",
             "vpc": {
               "subnets": [
                 "<cluster-subnet-1>",
                 "<cluster-subnet-2>",
                 "<cluster-subnet-3>"
               ],
               "securityGroups": ["<cluster-security-group-id>"]
             }
           }
         },
         "capacity": {
           "provisionedCapacity": {
             "mcuCount": 1,
             "workerCount": 1
           }
         },
         "kafkaConnectVersion": "3.7.x",
         "serviceExecutionRoleArn": "<arn-of-a-role-that-msk-connect-can-assume>",
         "plugins": [
           {
             "customPlugin": {
               "customPluginArn": "<arn-of-custom-plugin-that-contains-connector-code>",
               "revision": 1
             }
           }
         ],
         "kafkaClusterEncryptionInTransit": {"encryptionType": "TLS"},
         "kafkaClusterClientAuthentication": {"authenticationType": "IAM"},
         "logDelivery": {
           "workerLogDelivery": {
             "cloudWatchLogs": {
               "enabled": true,
               "logGroup": "<loggroup>"
             }
           }
         }
       }
  2. Invoke the following AWS CLI command in the folder where you saved the JSON file in the previous step:
    aws kafkaconnect create-connector --cli-input-json file://create-connector.json
  3. Verify connector status by invoking the following AWS CLI command:
    aws kafkaconnect list-connectors

Set up Db2 Capture/Publisher on Mainframe

To establish the Db2 Capture/Publisher on the mainframe for capturing changes to the DEPT table, follow these structured steps that build upon our previous blog post, Unlock Mainframe Data with Precisely Connect and Amazon Aurora:

  1. Prepare the source table. Before configuring the Capture/Publisher, ensure the DEPT source table exists on your mainframe Db2 system. The table definition should match the structure defined at \$SQDATA_VAR_DIR/templates/dept.ddl. If you need to create this table on your mainframe, use the DDL from this file as a reference to ensure compatibility with the replication process.
  2. Access the Interactive System Productivity Facility (ISPF) interface. Sign in to your mainframe system and access the AWS Mainframe Modernization – Data Repication for IBM z/OS ISPF panels through the supplied ISPF application menu. Select option 3 (CDC) to access the CDC configuration panels, as demonstrated in our previous blog post.
  3. Add source tables for capture:
    1. From the CDC Primary Option Menu, choose option 2 (Define Subscriptions).
    2. Choose option 1 (Define Db2 Tables) to add source tables.
    3. On the (Add DB2 Source Table to CAB File panel), enter a wildcard value (%) or the specific table name DEPT in the (Table Name) field.
    4. Press Enter to display the list of available tables.
    5. Type S next to the DEPT table to select it for replication, then press Enter to confirm.

This process is like the table selection process shown in figure 3 and figure 4 of our previous post but now focuses specifically on the DEPT table structure.

With the completion of both the Db2 Capture/Publisher setup on the mainframe and the AWS environment configuration (Amazon MSK, Apply Engine, and MSK Connect JDBC Sink Connector), you now have a fully functional pipeline ready to capture data changes from the mainframe and stream them to the MSK topic. Inserts, updates, or deletions to the DEPT table on the mainframe will be automatically captured and pushed to the MSK topic in near real time. From there, the MSK Connect JDBC Sink Connector and the custom SMT will process these messages and apply the changes to the PostgreSQL database on Amazon RDS, completing the end-to-end replication flow.

Configure Apply Engine for Amazon MSK integration

Configure the AWS side components to receive data from the mainframe and forward it to Amazon MSK. Follow these steps to define and manage a new CDC pipeline from DB2 z/OS to Amazon MSK:

  1. Use the following command to switch to the connect user:
    sudo su connect
  2. Create the apply engine directories:
    mkdir -p \$SQDATA_VAR_DIR/apply/DB2ZTOMSK/ddl
         connect> mkdir -p \$SQDATA_VAR_DIR/apply/DB2ZTOMSK/scripts
  3. Copy the sample script from dept.ddl:
    cp \$SQDATA_VAR_DIR/templates/dept.ddl \$SQDATA_VAR_DIR/apply/DB2ZTOMSK/ddl/
  4. Copy the following content and paste it in a new file $SQDATA_VAR_DIR/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.sqd. Replace the placeholder strings with values that correspond to the DB2z endpoint:
    -----------------------------------------------------------------------
       Name: DB2TOKAF: Z/OS DB2 To Kafka
       -----------------------------------------------------------------------
       SUBSTITUTION PARMS USED IN THIS SCRIPT:
       ---------------------------------------------------------------------
       JOBNAME DB2TOKAFKA;
       -----------------------------
       TABLE DESCRIPTIONS
       ---------------------------
       BEGIN GROUP SOURCE_TABLES;
       DESCRIPTION Db2SQL /var/precisely/di/sqdata/apply/DB2ZTOMSK/ddl/dept.ddl AS DEPT KEY IS DEPTNO;
       END GROUP;
       -------------------------------------------------------------
       DATASTORE SECTION
       -------------------------------------------------------------
       SOURCE DATASTORE
       DATASTORE cdc://<DB2z endpoint with port>/dbcg/DBCG_TBTSS388T6 OF UTSCDC AS CDCIN DESCRIBED BY GROUP SOURCE_TABLES;
       -- TARGET DATASTORE
       DATASTORE kafka:///pgsql-sink-topic/table_key OF JSON AS TARGET KEY IS DEPTNO DESCRIBED BY GROUP SOURCE_TABLES;
       ---------------------------------
       PROCESS INTO TARGET
       SELECT { REPLICATE(TARGET) } FROM CDCIN;
  5. Create the working directory:
    mkdir -p /var/precisely/di/sqdata_logs/apply/DB2ZTOMSK
  6. Add the following to $SQDATA_DAEMON_DIR/cfg/sqdagents.cfg:
    [DB2ZTOMSK]
       type=engine
       program=sqdata
       args=/var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.prc --log-level=8
       working_directory=/var/precisely/di/sqdata_logs/apply/DB2ZTOMSK
       stdout_file=stdout.txt
       stderr_file=stderr.txt
       auto_start=0
       comment=Apply Engine for MSK from Db2z
  7. After the preceding code is added to the sqdagents.cfg section, reload for the changes to take effect:
    sqdmon reload
  8. Validate the apply engine job script by using the SQData parse command to create the compiled file expected by the SQData engine:
    sqdparse $SQDATA_VAR_DIR/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.sqd $SQDATA_VAR_DIR/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.prc

    The following is an example of the output that you get when you invoke the command successfully:

    SQDC042I mounting/running sqdparse with arguments:
    SQDC041I args[0]:sqdparse
    SQDC041I args[1]:/var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.sqd
    SQDC041I args[2]:/var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.prc
    SQDC000I *******************************************************
    SQDC021I sqdparse Version 5.0.1-rel (Linux-x86_64)
    SQDC022I Build-id 4f2d7c16728aa2e40c610db7d5a6e373476a9889
    SQDC023I (c) 2001, 2025 Syncsort Incorporated. All rights reserved.
    SQDC000I *******************************************************
    SQDC000I
    SQD0000I 2025-03-31 00:59:10
    >>> Start Preprocessed /var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.sqd
    000001 ----------------------------------------------------------------------
    000002 -- Name: DB2TOKAF:  Z/OS DB2 To Kafka
    000003 ----------------------------------------------------------------------
    000004 --  SUBSTITUTION PARMS USED IN THIS SCRIPT:
    000005 ----------------------------------------------------------------------
    000006
    000007 JOBNAME DB2TOKAFKA;
    000008
    000009 ----------------------------
    000010 -- TABLE DESCRIPTIONS
    000011 ----------------------------
    000012 BEGIN GROUP SOURCE_TABLES;
    000013 DESCRIPTION Db2SQL /var/precisely/di/sqdata/apply/DB2ZTOMSK/ddl/dept.ddl  AS DEPT
    000014 KEY IS DEPTNO;
    000015 END GROUP;
    000016
    000017 ------------------------------------------------------------
    000018 --       DATASTORE SECTION
    000019 ------------------------------------------------------------
    000020
    000021 -- SOURCE DATASTORE
    000022 DATASTORE /var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB0A.ENGINE3.DEPT.COPY
    000023           OF UTSCDC
    000024           AS CDCIN
    000025           DESCRIBED BY GROUP SOURCE_TABLES;
    000026
    000027 -- TARGET DATASTORE
    000028 DATASTORE 
    000029           OF JSON
    000030           AS TARGET
    000031           KEY IS DEPTNO
    000032           DESCRIBED BY GROUP SOURCE_TABLES;
    000033
    000034 ----------------------------------
    000035
    000036 PROCESS INTO TARGET
    000037 SELECT
    000038 {
    000039     REPLICATE(TARGET)
    000040 }
    000041 FROM CDCIN;
    <<< End Preprocessed /var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB2ZTOMSK.sqd
    >>> Start Preprocessed /var/precisely/di/sqdata/apply/DB2ZTOMSK/ddl/dept.ddl
    000001 CREATE TABLE DEPARTMENT
    000002 (
    000003    DEPTNO char(3) NOT NULL,
    000004    DEPTNAME varchar(36) NOT NULL,
    000005    MGRNO char(6),
    000006    ADMRDEPT char(3) NOT NULL,
    000007    LOCATION char(16),
    000008    CONSTRAINT PK_DEPTNO PRIMARY KEY (DEPTNO)
    000009 ) ;
    <<< End Preprocessed /var/precisely/di/sqdata/apply/DB2ZTOMSK/ddl/dept.ddl
    Number of Data Stores...................: 2
    Data Store..............................: /var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB0A.ENGINE3.DEPT.COPY
      Alias.................................: CDCIN
      Type..................................: UTS Change Data Capture
      Number of Records.....................: 1
        Record Name.........................: DEPARTMENT
        Record Description Alias............: DEPT
        Record Description Length...........: 72
        Number of Fields....................: 5
          ................................... TYPE            OFF   LEN   XLEN  EXT
          ................................... ---------- ----- ----- ----- -----
          DEPTNO............................: CHAR(3)             0     3     3
          DEPTNAME..........................: VARCHAR(36)         3    38    38
          MGRNO.............................: CHAR(6)             7     6     6
          ADMRDEPT..........................: CHAR(3)            14     3     3
          LOCATION..........................: CHAR(16)           17    16    16
    Data Store..............................: 
      Alias.................................: TARGET
      Type..................................: JSON
      Number of Records.....................: 1
        Record Name.........................: DEPARTMENT
        Record Description Alias............: DEPT
        Record Description Length...........: 70
        Number of Fields....................: 5
          ................................... TYPE            OFF   LEN   XLEN  EXT
          ................................... ---------- ----- ----- ----- -----
          DEPTNO............................: CHAR(3)             0     3     3
          DEPTNAME..........................: VARCHAR(36)         3    38    38
          MGRNO.............................: CHAR(6)            41     6     6
          ADMRDEPT..........................: CHAR(3)            47     3     3
          LOCATION..........................: CHAR(16)           50    16    16
    Section.................................: SQDSTP000
      Number of steps.......................: 1
    SQDC017I sqdparse(pid=4023) terminated successfully
  9. Copy the following content and paste it in a new file /var/precisely/di/sqdata_logs/apply/DB2ZTOMSK/sqdata_kafka_producer.conf. Replace the placeholder strings with values that correspond to your bootstrap server and AWS Region.
    metadata.broker.list=<kafka_bootstrap_servers_with_ports>
         security.protocol=SASL_SSL
         sasl.mechanism=OAUTHBEARER
         sasl.oauthbearer.config="extension_AWSMSKCB=python3,/usr/lib64/python3.9/site-packages/aws_msk_iam_sasl_signer/cli.py,--region,<region>"
         sasl.oauthbearer.method="default"
  10. Start the apply engine using the controller daemon by using the following command:
    sqdmon start ///DB2ZTOMSK
  11. Monitor the apply engine through the controller daemon by using the following command:
    sqdmon display ///DB2ZTOMSK --format=details

    The following is an example of the output that you get when you invoke the command successfully:

    Engine..................................: DB2ZTOMSK
    version.................................: 5.0.1-rel (Linux-x86_64)
    git.....................................: f021c29a84c1a99f59144288aeeb2cb8fa494485
    jobname.................................: DB2TOKAFKA
    parsed..................................: 20250320172610278108
    started.................................: 2025-03-20.17.47.23.444474
    started (UTC)...........................: 2025-03-20.17.47.23.444474 (1742492843444)
    updated (UTC)...........................: 2025-03-20.17.47.25.901018 (1742492845901)
    Input Datastore.........................: /var/precisely/di/sqdata/apply/DB2ZTOMSK/scripts/DB0A.ENGINE3.DEPT.COPY
    Alias...................................: CDCIN
    Type....................................: UTS Change Data Capture
      Records Read..........................: 14
      Records Selected......................: 14
      Bytes Read............................: 2892
    Output Datastore........................: kafka:///pgsql-sink-topic/table_key
    Alias...................................: TARGET
    Type....................................: JSON
      Records Inserted......................: 14
      Records Updated.......................: 0
      Records Deleted.......................: 0
      Formatted bytes.......................: 3458
      Unformatted bytes.....................: 448
    Total Output Formatted bytes............: 3458
    Total Output Unformatted bytes..........: 448
    SQDC017I sqdmon(pid=123540) terminated successfully

    Logs can also be found at /var/precisely/di/sqdata_logs/apply/DB2ZTOMSK.

Verify data in the MSK topic

Invoke the Kafka CLI command to verify the JSON data in the MSK topic:

kafka/bin/kafka-console-consumer.sh --bootstrap-server $BOOTSTRAP_SERVERS --consumer.config kafka/config/client-config.properties --topic pgsql-sink-topic --from-beginning --property print.key=true

Verify data in the PostgreSQL database

Invoke the following command to verify the data in the PostgreSQL database:

PGPASSWORD="password" psql --host=<DATABASE-HOST> --username=<user> --dbname=<database> -c "select * from \"DEPT\""

With these steps completed, you’ve successfully set up end-to-end data replication from DB2z to RDS for PostgreSQL, using AWS Mainframe Modernization – Data Replication for IBM z/OS AMI, Amazon MSK, MSK Connect, and the Confluent JDBC Sink Connector.

Cleanup

When you’re finished testing this solution, you can clean up the resources to avoid incurring additional charges. Follow these steps in sequence to ensure proper cleanup.

Step 1: Delete the MSK Connect components

Follow these steps:

  1. List existing connectors:
    aws kafkaconnect list-connectors
  2. Delete the sink connector:
    aws kafkaconnect delete-connector --connector-arn "<arn-of-connector>"
  3. List custom plugins:
    aws kafkaconnect list-custom-plugins
  4. Delete the custom plugin:
    aws kafkaconnect delete-custom-plugin --custom-plugin-arn "<arn-of-custom-plugin>"

Step 2: Delete the MSK cluster

Follow these steps:

  1. List MSK clusters:
    aws kafka list-clusters-v2 --cluster-type-filter SERVERLESS
  2. Delete the MSK serverless cluster:
    aws kafka delete-cluster --cluster-arn "<arn-of-msk-serverless-cluster>"

Step 3: Delete the Aurora resources

Follow these steps:

  1. Delete the Aurora DB instance:
    aws rds delete-db-instance --db-instance-identifier cdc-serverless-pg-instance --skip-final-snapshot
  2. Delete the Aurora DB cluster:
    aws rds delete-db-cluster --db-cluster-identifier cdc-serverless-pg-cluster --skip-final-snapshot.

Conclusion

By capturing changed data from DB2z and streaming it to AWS targets, organizations can modernize their legacy mainframe data stores, enabling operational insights and AI initiatives. Businesses can use this solution to take advantage of cloud-based applications with mainframe data to provide scalability, cost-efficiency, and enhanced performance.

The integration of AWS Mainframe Modernization – Data Replication for IBM z/OS AMI with Amazon MSK and RDS for PostgreSQL provides an enhanced framework for real-time data synchronization that maintains data integrity. This architecture can be extended to support additional mainframe data sources such as VSAM and IMS, as well as other AWS targets. Organizations can then tailor their data integration strategy to specific business needs. Data consistency and latency challenges can be effectively managed through AWS and Precisely’s monitoring capabilities. By adopting this architecture, organizations keep their mainframe data continually available for analytics, machine learning (ML), and other advanced applications.Streaming mainframe data to AWS in near real time represents a strategic step toward modernizing legacy systems while unlocking new opportunities for innovation, with data transfers occurring in subseconds. With Precisely and AWS, organizations can effectively navigate their modernization journey and maintain their competitive advantage.

Learn more about AWS Mainframe Modernization – Data Replication for IBM z/OS AMI in the Precisely documentation. AWS Mainframe Modernization Data Replication is available for purchase in AWS Marketplace. For more information about the solution or to see a demonstration, contact Precisely.


About the authors

Supreet Padhi

Supreet Padhi

Supreet is a Technology Architect at Precisely. He has been with Precisely for more than 14 years, with specialty in streaming data use cases and technology, with emphasis on data warehouse architecture. He is responsible for research and development in areas such as Change Data Capture (CDC), streaming ETL, metadata management, and VectorDBs.

Manasa Ramesh

Manasa Ramesh

Manasa is a Technology Architect at Precisely, with over 15 years of experience in software development. She has worked on several innovation-driven projects in Metadata Management, Data Governance and Data Integration space. She is currently responsible for research, design and development of metadata discovery framework.

Tamara Astakhova

Tamara Astakhova

Tamara is a Sr. Partner Solutions Architect in Data and Analytics at AWS, brings over two decades of expertise in architecting and developing large-scale data analytics systems. In her current role, she collaborates with strategic partners to design and implement sophisticated AWS-optimized architectures. Her deep technical knowledge and experience make her an invaluable resource in helping organizations transform their data infrastructure and analytics capabilities.