IBM & Red Hat on AWS

Build a Modern Data Architecture on AWS with your IBM Z Mainframe

Customers across most industries run critical applications like bulk data processing, industry and consumer statistics, enterprise resource planning, and large-scale transaction processing on mainframe architectures such as IBM Z. Much of this data is created and stored using Db2 for z/OS. At the same time, these customers are also performing their modernized data operations on AWS Cloud, leaving a potential disconnect in their complex data landscapes.

In this blog we will discuss an overall architecture for how you can modernize your mainframe data on AWS, highlighting the importance of mainframe data by examining a specific use case of credit card analytics. We will show two different approaches to implement a modern data architecture, first with IBM Db2 for z/OS Data Gate and IBM Knowledge Catalog, and the second with AWS Glue.

Modern data architectures with mainframe

To analyze data that is spread across different data stores and locations efficiently, customers often move data. This data movement can get complex as the data grows. To address this, customers need a scalable and cost-effective data architecture, to simplify governance and data movement between various data stores and data lakes on AWS.

For that AWS proposes the modern data architecture. We can extend the modern data architecture with mainframe data (Figure 1).

Diagram showing AWS Modern Data Architecture with connctions to IBM Mainframe.

Figure 1. Modern data architecture on AWS with IBM Mainframe.

Use Case – Credit card analytics

Let’s take the example of a credit card business, to examine common questions customers need to answer regularly:

  • What’s the average balance per cardholder?
  • What is the average credit limit per cardholder?
  • What’s the expense-to-income ratio?
  • What is the average transaction size and frequency?
  • Any change from last month?

Millions of credit card users and billions of transactions are stored on Db2 for z/OS. To get accurate and current answers for the questions above, customers need to access that data store. These same customers have built business intelligence workloads on AWS, by leveraging different analytics services such as AWS Glue. Also, they need efficient ways to integrate their analytics workloads on AWS with the Db2 for z/OS data, without impacting the performance of current running transactions on the mainframe

Solution overview

Figure 2 illustrates a conceptual data infrastructure for a credit card company. Later we’ll see two different implementations of this architectural concept.

High-level diagranm showing Mainframe data connection with AWS services. Thre are different number labels for each steps.

Figure 2. Mainframe data integration with AWS.

In this scenario, the account and transaction data reside in Db2 for z/OS, IBM Information Management System (IMS) or Virtual Storage Access Method (VSAM) files on mainframe and are undergoing exponential growth.

To build modern data solutions, with a trustworthy analytics foundation for business purposes, the solution needs to enable integration between your on-premises mainframe data and services on the AWS Cloud.

To allow for secure and reliable data transfer, the solution needs to configure the network connectivity between your Amazon Virtual Private Cloud (Amazon VPC) and your on-premises network using AWS Site-to-Site VPN or AWS Direct Connect (DX).

Your on-premises firewall must be configured to allow communication between AWS Glue and your Db2 for z/OS running on-premise.

The steps below help you implement solution the shown above in Figure 2:

  1. Establish a secure connection (1), to encrypt data-in-transit between your corporate datacenter and AWS
  2. With a secured connection, your mainframe data can be virtualized (2.1), replicated and subject to ETL processes (2.2).
  3. Enrich your data with pertinent business context, catalogue it using technical and business metadata (3).
  4. Use encryption at rest, access controls, governance artifacts with quality rules and data catalog tools to protected your data and reduce the risk of unauthorized access (3).

With the mainframe data available, catalogued and governed on AWS, you analyze it with different cloud native tools (4) to generate real-time insights users such as:

    • Business analysts, who can more easily search from catalogs and retrieve data such as balances, and transaction sizes.
    • Software engineers, who can implement mobile banking applications that retrieve data directly from a data store in the cloud.
    • Machine learning engineers, who are using the mainframe sources for data exploration tasks and to build Machine Learning (ML) models.

Two different implementation approaches

Example Approach: IBM Db2 for z/OS Data Gate

If your business demands real-time or near-real-time mainframe data, you can leverage IBM Db2 for z/OS Data Gate on the IBM Cloud Pak for Data deployed on AWS (Figure 3).

A version of Diagram 2, but with the Steps 2 and 3 shown as parts of IBM Cloud Pak for Data software package, running on Reed Hat OpenShift.

Figure 3. Handling Db2 for z/OS data and metadata in Cloud Pak for Data on AWS.

This implementation offers a modern solution for synchronizing and accessing Db2 for z/OS data, facilitating analytics and AI initiatives with minimal effort and expense. It also allows workloads to access a transactionally consistent copy of Db2 for z/OS data hosted on the AWS Cloud.

With Db2 for z/OS Data Gate, you can seamlessly replicate mainframe data to destinations such as Amazon Relational Database Service (RDS) for Db2 or IBM Db2 on Cloud Pak for Data. This provides continuous data synchronization, ensuring your data is always current while minimizing the impact on mainframe transactions.

IBM Cloud Pak for Data

IBM Db2 for z/OS Data Gate (Db2 Data Gate) simplifies delivering synchronized data from IBM Db2 for z/OS to IBM Cloud Pak for Data on AWS for direct access. Its capabilities provide:

  • One-click metadata integration with IBM Knowledge Catalog on Cloud Pak for Data
  • Query acceleration support for analytical Db2 for z/OS queries
  • Simplified installation and handling of stored procedures on Db2 for z/OS when both Db2 Data Gate and IBM Db2 for z/OS Analytics Accelerator are used
  • Optimizations to integrate within hybrid cloud deployment models to position Db2 for z/OS data into the center of the IBM data fabric strategy

This continuous data synchronization process works to minimize lag between the mainframe and the cloud, increasing data accuracy and availability. But, it’s important to note that this solution requires expertise in mainframe technologies and significant upfront efforts to get the solution up and running.

Metadata processing plays a key role in effective data governance and cataloguing. IBM Knowledge Catalog is an enterprise data catalogue solution that helps you automate metadata processing and simplify the cataloguing of data assets. Making it easier for users to discover and understand available data sets. IBM Knowledge Catalog is available on AWS as part of IBM Cloud Pak for Data.

Example Approach: AWS Glue

Another option you can adopt is using AWS services, especially if you are already using AWS Glue or rely on other analytics services on AWS to derive insights (Figure 4).

Diagram from Figure 2, but organizing Steps 2 and 3 using AWS nativer services, and AWS Glue.

Figure 4. Using AWS Glue to transfer data from Db2 for z/OS over to AWS.

AWS Glue runs extract, transform, and load (ETL) jobs to transfer data from Db2 for z/OS leveraging JDBC connections, persisting the processed data to Amazon S3 data store in Parquet or other supported data formats.

AWS Transfer Family and AWS DataSync move non-relational data sources, such as VSAM files from on-premises corporate data centers to Amazon S3. Once the raw data from flat files is available on Amazon S3, customers can run AWS Glue ETL jobs to transform and prepare the data to be consumed by other workloads.

Once the mainframe relational and non-relational data is available on Amazon S3 data lakes, customers trigger AWS Glue crawlers, to scan, classify, extract schema information, and store their metadata automatically in AWS Glue Data Catalog.

AWS Glue Data Catalog allows you to seamlessly store, annotate, and share your metadata on AWS, with a highly scalable collection of tables organized into databases. This helps you keep a uniform repository, where different systems can store and find metadata to keep track of data in data silos.

Customers can build these types of workflows from the AWS Glue blueprint or manually create workflows one component at a time, leveraging the AWS management console or the AWS Glue API.

AWS Glue provides a one stop integration option for modern workloads such as ecommerce, mobile, data virtualization and lake house construction that operate in a complex hybrid-cloud environments with a governance requirement. It provides a comprehensive data catalog solution to help manage metadata at scale, data source connectivity using your own JDBC drivers, protection of your sensitive data source credentials with AWS Secrets Manager, and comprehensive audit and governance capabilities.

Network requirements in this example

In this scenario, the network connectivity between your Amazon Virtual Private Cloud (Amazon VPC) and your on-premises network must be configured using AWS Site-to-Site VPN or AWS Direct Connect (DX). Your on-premises firewall must be configured to allow communication between AWS Glue and your Db2 for z/OS running on-premise.

To access your on-premises data sources, AWS Glue uses elastic network interfaces (ENIs) in an Amazon VPC private subnet, to provide network connectivity for AWS Glue through your VPC. Security groups are attached to ENIs and configured by the selected JDBC connection. They will control the traffic that is allowed to leave and reach your ENIs.

Summary

To support customers in enhancing their agility, maximizing investment value, and accelerating innovation, customers can choose between different approaches to implement mainframe data modernization solutions.

By comparing these methods, your business can make informed decisions based on their specific needs and priorities. Whether they prioritize data currency or a simplified and ready for use experience, the combination of IBM software on the AWS Cloud, and AWS services provide flexible options for building modern applications that take advantage of your mainframe data.

Additional resources