IBM & Red Hat on AWS
Synchronizing mainframe data from Db2 for z/OS with IBM Data Gate
Customers in different industries rely on IBM Z mainframes to run their mission-critical applications and process large-scale transactions. These systems generate and store vast amounts of valuable data that needs to be tracked and managed efficiently using tools like IBM Knowledge Catalog.
A hybrid solution to bridge the gap between mainframe and AWS combines IBM Knowledge Catalog, IBM Data Gate (Data Gate) on Red Hat OpenShift Service on AWS (ROSA), with Amazon Athena, a serverless analytics service for querying data using SQL, and Amazon QuickSight to create interactive dashboards and gain insights quickly.
This blog post shows you how to combine your mainframe and cloud data using these tools. You’ll learn how to identify metadata using IBM Knowledge Catalog, extract mainframe data to Db2 on AWS using Data Gate, and how Amazon Athena and Amazon QuickSight provide analytics and visualization capabilities on AWS.
IBM Tools Overview
Data Gate copies your mainframe data to the cloud in real-time. It creates a connection between a Db2 for z/OS subsystem and Db2 databases running on Cloud Pak for Data and continuously synchronizes data changes as they appear on Db2 for z/OS. This synchronization lets you use cloud services for analytics and machine learning while keeping your mainframe systems secure and reliable.
IBM Knowledge Catalog serves as your central hub for enterprise data management and governance. The catalog automatically tracks and organizes metadata across your organization, making data easy to find and use. Through its governance and compliance features, you can ensure your data remains trustworthy for analytics and machine learning projects. Teams across your organization can quickly locate and understand available data, leading to faster and better business decisions.
These two tools work together to modernize your data architecture. Data Gate handles the secure movement of data, while IBM Knowledge Catalog helps you organize and govern it effectively.
Data Management Process
Data Gate enables a hybrid cloud architecture with the data sources remaining on the mainframe in corporate data centers while ML, Analytics and Visualization applications run on the cloud. There is no additional software or installation required on the mainframe side as Data Gate uses internal tools for log reading and data unload. On the cloud side, the data management, data catalog and data store services run on top of Red Hat OpenShift on AWS as the standardized orchestration and management layer of the data and integration pipeline (Figure 1).
![Solution architecture containing spaning between the corporate data center and AWS. The architecture uses IBM Data Gate for data replication from Db2 for z/OS to Db2 on AWS and utilizes IBM Knowledge Catalog as a storage of metadata for the mainframe and the cloud data assets.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2024/12/18/MainframeDataIntegrationAWSDb2zOSDataGateIBMKnowledgeCatalog-2.png)
Figure 1. Mainframe data integration with AWS using Db2 for z/OS, Data Gate and IBM Knowledge Catalog.
Synchronizing data requires a secure TCP/IP network connection between the Db2 for z/OS source system and the Data Gate instance on IBM Cloud Pak for Data. For more details, refer to the Configuring network access between Data Gate and IBM Z documentation.
Data Gate on CP4D running on ROSA supports Persistent Volume Claims (PVC) with Amazon Elastic File System (EFS) for scalable shared file storage and Amazon Elastic Block Store (EBS) for high-performance block storage to meet different workload needs. Data at rest is encrypted using AWS-managed or customer-managed keys through AWS Key Management Service (KMS).
Pre-requisites
We set up our environment with several key components. First, we’ve installed IBM Cloud Pak for Data (CP4D) on ROSA. This can be done through the AWS Marketplace or following the installation guide in this blog post.
Next, we installed and configured Data Gate within CP4D to handle our mainframe data synchronization. We also installed IBM Knowledge Catalog in CP4D to manage our metadata. For our cloud database, we’ve installed IBM Db2 on CP4D to store our synchronized mainframe data.
Finally, we’ve established the connection between our mainframe environment and Data Gate on AWS.
Data Synchronization
Now that our environment is ready, we use Data Gate to copy our mainframe data to Db2 on CP4D. The replication happens in two phases. First, Data Gate performs an initial load to copy our existing data. Then it moves into continuous replication mode to keep our cloud data synchronized with the mainframe.
Initial Data Load Phase
In this phase, we perform an initial data load operation. A significant volume of data is transferred from Db2 for z/OS on the mainframe to Db2 on CP4D. As we transfer the data, Data Gate automatically converts characters from mainframe format (EBCDIC) to cloud format (UNICODE) and adjusts data types as needed.
We use the Db2 UNLOAD utility to transfer data efficiently from the mainframe. We process each table partition in parallel to maximize throughput. The mainframe tables remain fully available for reading and writing during this process because we run the UNLOAD utility under isolation level CS (Cursor Stability).
This initial load has minimal impact on our mainframe performance. Once we complete this phase, the core mainframe data becomes available in the cloud for further use.
Continuous Data Replication Phase
After completing our initial data load, we set up continuous data replication. We use Db2 z/OS’s built-in asynchronous log reading feature, which doesn’t require any additional installation or a separate Started Task on z/OS.
We achieve maximum performance and cost efficiency by enabling log reading on our System z Integrated Information Processor (zIIP) with 96% eligibility for offload. This means we can read and update our tables without impacting existing applications.
Data is encrypted at all times once it leaves the mainframe. SSL/TLS encryption is used for connections to Db2 z/OS and between all components handling data on the receiving end. When data is written to Amazon Simple Storage Service (Amazon S3), the default server-side encryption method of the S3 bucket is applied.
As we make changes to our mainframe data, Data Gate automatically synchronizes these updates with our Db2 on CP4D database in real-time. We now have a fully replicated dataset in the cloud that we can use for multiple purposes, such as:
- Machine learning models
- Analytics applications
- Big data processing
- Business intelligence tools
This setup lets us leverage AWS capabilities while maintaining the reliability of our mainframe system.
Managing Metadata Automatically
We streamline our metadata management by integrating Db2 for z/OS Data Gate with IBM Knowledge Catalog. This integration helps us maintain a complete enterprise data catalog with minimal effort.
Using the Data Gate interface, we synchronize all our metadata with a single click. This process captures everything we need, including our table structures, schemas, and the connections between our mainframe and cloud databases.
The automated synchronization saves us significant time compared to manual metadata management. Once synchronized, we use this metadata to support several key tasks:
- Enforcing data governance policies
- Running quality checks
- Implementing data masking rules
- Managing data access controls
This automation ensures our metadata stays current and accurate across both environments.
Connecting Data Gate to Db2 for z/OS Data
We begin by connecting Data Gate to our source Db2 for z/OS subsystem. In the Data Gate service interface, we enter the required connection details for our mainframe system. These include:
- Host address: DNS or IP address
- Username credentials
- Password
- Security certificates for authentication against the source database
For detailed instructions and more information on this setup, refer to the Data Gate documentation.
First you will start with configuring the connection to your Db2 for z/OS subsystem. For that you need to input information like host-name, port, credentials among other (Figure 2).
![Data Gate mask for defining connectivity configuration to a Db2 for z/OS subsystem. It contains information like host, port, subsystem, credentials and certificate information.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/16/DataGatePairingConfiguration.png)
Figure 2. IBM Data Gate pairing configuration.
After completing the pairing process, the table selection screen appears automatically. Select the source Db2 for z/OS tables from the left panel and their corresponding schema tables from the right panel (Figure 3).
![Data Gate screen for selecting source tables and schemas which have to be replicated to the cloud database. Here we have selected the schema LOANDEMO and the tables ACCOUNT, BORROWER_HISTORY, CREDIT and ORDERS friom the schema.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/16/SelectingSchemasTablesDataSynchronization.png)
Figure 3. Selecting schemas and tables for data synchronization.
Data Gate first loads an initial snapshot of the selected Db2 for z/OS tables to Db2 on AWS, then continuously synchronizes any subsequent data changes.
Figure 4 shows the Table tab, which displays all tables selected in the previous step. The Status column indicates that synchronization is active for all tables. Data Gate has completed the initial data snapshot to Db2 on AWS and now continuously synchronizes changes from the source Db2 for z/OS database.
![The Data Gate “Table” tab shows detailed information about all replicated tables, their first synchronization date, as well as their states. Here, we see the four selected tabls from the previous screen which all have the state “Active”, meaning that data is continuously kept up to date.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/16/DataGateDashboardOverviewSynchronizedTables.png)
Figure 4. Data Gate dashboard – overview of synchronized tables.
The Overview tab shows a summary of all tables that Data Gate maintains and their current states, as shown in Figure 5.
![The Data Gate “Overview” tab shows information about the replication process and all replicated tables.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/16/DataGateDashboardMainOverview.png)
Figure 5. IBM Data Gate dashboard – main overview.
Publishing metadata into IBM Knowledge Catalog
After successfully synchronizing Db2 for z/OS tables to Db2 on AWS, the next step is to register and maintain the database and table metadata in the enterprise data catalog. Data Gate provides one-click integration with IBM Knowledge Catalog through the Publish to catalog button in the Watson Knowledge Catalog integration panel (Figure 6).
![The Data Gate “Overview” tab also allows to publish metadata in IBM Knowledge Catalog. This can be done by clicking “Publish to catalog” in the IBM Knowledge Catalog panel.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/16/PublishingMetadataIBMKnowledgeCatalog.png)
Figure 6. Publishing metadata to IBM Knowledge Catalog.
After choosing the Publish to catalog button, Data Gate publishes database connectivity details, database and table information, and relationships between mainframe and cloud tables to the catalog. The Data Gate dashboard displays a link which allows you to access the populated catalog (Figure 7).
![After clicking the “Publish to catalog” button the IBM Knowledge Catalog panel provides a link to the target catalog. Also infrormation about the tartget catalog is presented as an Info message on the top of the screen.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/16/LinkMetadataCatalogDataGateDashboard-.png)
Figure 7. Link to the metadata catalog in the Data Gate dashboard.
You can access and use the published assets through IBM Knowledge Catalog (Figure 8) and other cloud tools like IBM Watson Studio (e.g. in a Jupyter Notebook).
![The IBM Knowledge Catalog dashboard shows all data assets that were created by Data Gate. Those are Connections to the target and source databases as well as Data Assets representing the source and the target tables.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/21/OverviewMetadataAssetsIBMKnowledgeCatalog-1.png)
Figure 8 Overview of metadata assets in IBM Knowledge Catalog.
For example, as presented in Figure 8, by choosing a specific data asset (such as the ORDERS data asset), and then by going to the Profile tab of the data asset, we can start an automated profiling job.
After the job is finished, the table profiling provides statistical analysis about the data, including value ranges, data distribution, and data quality scores for each column of the data (Figure 9).
![On the Profiling tab of a specific data asset (in this case, the Orders table), you can trigger a profiling job and see the results of it, summarizing statistical information about the table data.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2025/01/21/ExampleDataProfileIBMKnowledgeCatalog.png)
Figure 9. Example data profile in IBM Knowledge Catalog.
Also, Data Lineage, Data protection rules to prevent access across jurisdictions or data masking rules can now be applied based on the metadata synchronized to the catalog.
Analyzing Db2 Data with Athena and QuickSight
Once our Db2 for z/OS data has been synchronized to Db2 on CP4D, we can use Amazon Athena Federated Query, combined with the AthenaDb2Connector, to query the data stored in our database. We can then leverage Amazon QuickSight to visualize the data on interactive dashboards (Figure 10).
![Architecture diagram shows how to used the Amazon Athena data source connector for IBM Db2 to query the Db2 for z/OS data synchronized with Db2 on Cloud Pak for Data. And using Amazon QuickSight to build dashboards to gain insights to the data.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2024/12/11/AnalyzingDb2DataAthenaQuickSight.png)
Figure 10. Analyzing IBM Db2 data with Amazon Athena and Amazon QuickSight.
The Amazon Athena IBM Db2 connector (AthenaDb2Connector) enables Athena to access and query Db2 data directly, along with data from other sources, without requiring ETL jobs. This approach helps consolidate your data retrieval process, making it more efficient and reducing complexity.
To get started, we deployed the AthenaDb2Connector from the AWS Serverless Application Repository (Figure 11). The serverless application automates deployment and configuration, setting up the connector in your environment with minimal effort. Once deployed, the connector is immediately ready for use with Amazon Athena.
![Screenshot of the AWS Serverless Application Repository used to deploy the AthenaDb2Connector.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2024/12/11/DeployAthenaDb2ConnectorSAM.png)
Figure 11. Deploy the AthenaDb2Connector from the Serverless Application Repository.
With the connector in place, we can use SQL in Athena to query data from our Db2 instance. This direct connection reduces data movement and provides a fast, convenient way to analyze data within a single interface (Figure 12).
![Screenshot shows how to run federated queries against our Db2 instance with Amazon Athena.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2024/12/11/RunFederatedQueriesDb2Athena.png)
Figure 12. Running federated queries against our Db2 instance with Amazon Athena.
By integrating Amazon QuickSight with Amazon Athena Federated Query, we can visualize and analyze our Db2 data without additional setup. QuickSight connects to Athena, retrieving data from Db2 via the AthenaDb2Connector. This configuration enables the creation of dashboards, develop visualizations, and gain insights in real-time (Figure 13). This integration supports interactive analytics, helping us make faster, data-driven decisions.
![Amazon QuickSight screenshot showing a dashboard to visualize and gain insights into Db2 data replicated from the Mainframe to AWS.](https://d2908q01vomqb2.cloudfront.net/c097638f92de80ba8d6c696b26e6e601a5f61eb7/2024/12/11/GainInsightsMainframeDb2DataAmazonQuickSight.png)
Figure 13. Gaining insights into our mainframe data by analyzing the data replicated to Db2 on AWS.
Summary
IBM and AWS provide tools that support customers in a heterogeneous, hybrid-cloud environment to collate data into a single source for analysis. This enables customers to maximize the value of data residing in on-premises and the cloud.
Combining data from IBM mainframe workloads, and data from applications running on the AWS Cloud, businesses can have the best of both worlds. This enables customers to maximize existing on-premises investments while taking advantage of the many benefits of cloud computing at AWS.
Additional Content:
- IBM on AWS Partner Page
- Build a Modern Data Architecture on AWS with your IBM Z Mainframe
- Accelerate Data Modernization and AI with IBM Databases on AWS
- Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS (ROSA)
- Announcing Amazon RDS for Db2 with license through AWS Marketplace
- Modern data architecture on AWS with IBM Db2 for z/OS Data Gate
- Introducing Db2 Warehouse on AWS
- Amazon RDS for Db2 Licensing Options
Visit the AWS Marketplace for IBM solutions on AWS:
- IBM Data Gate on Cloud (BYOL)
- IBM Cloud Pak for Data
- IBM Cloud Pak for Data on Managed Openshift (BYOL)
- IBM Db2 for IBM Cloud Pak for Data
- IBM Db2 Standard Edition Hourly License Subscription (Amazon RDS)
- IBM Db2 Advanced Edition Hourly License Subscription (Amazon RDS)
- IBM Db2 Warehouse as a Service