Migration & Modernization
Integration architectures between mainframe and AWS for coexistence
In this blog post, we will show you how to design the integration patterns and solutions for hybrid architecture during the transition phase.
Mainframe environments typically involve complex and tightly coupled integrations between applications which share data and/or code. As mainframe applications are migrated to the AWS Cloud, an incremental approach using the Strangler fig pattern is recommended for large migrations. The incremental approach results in building integrations that create a hybrid architecture between the mainframe and AWS Cloud during the transitional (migration) or transformational (modernization) phase.
Overview
Mainframe workloads are typically defined as a set of programs, middleware, data stores, dependencies, and resources that execute a cohesive set of business functions.
AWS proposes multiples patterns to modernize mainframe workloads, depending on the customer’s business and technical strategy objectives. We can categorize these options into two groups broadly.
- Migration & Modernization (Figure 1.1 – left)
- Augmentation & Integration (Figure 1.2 – right)
A workload migration or modernization aims to offload components from the mainframe and bring it to the AWS Cloud using a series of strategies according to the application and migration objectives (replatform, refactor, rewrite, repurchase).
A workload augmentation aims to build new business functions on AWS, leveraging the data of the mainframe.
Both approaches require an integration architecture to facilitate the coexistence between the mainframe and the AWS environment. This involves managing the interaction between the workloads that will remain on the mainframe during the transition phase or permanently and the workloads created or migrated to the AWS Cloud.
Approach
Typically, large mainframe workloads run in parallel and are tightly coupled to each other. For the strangler fig pattern, each workload is migrated separately. At a high level, workloads are migrated one after the other. We prioritize workload migrations based on business value, application complexity, integration points, and business criticality. Over time we decouple the mainframe workload by workload.
As we start migrating mainframe workloads, there are other workloads which are tightly integrated with it. They either have application to application, data to data, or application to data integrations. Figure 2 illustrates a scenario where some of the workloads are migrated to AWS while the others remain on the mainframe.
Figure 3 describes the three different types of integration.
- Application to application
- Application to data
- Data to data
The various integration types are not mutually exclusive; rather they can complement each other. The selection of the integration type is primarily influenced by the existing integration setup on the mainframe between the workloads. For example, if workload 1 interacts with workload 2 through an application component (such as CICS, COBOL, or MQ calls), then an application-to-application pattern must be established. Conversely, if workload 1 requires access to the data of workload 2, then either a data to data or an application to application pattern needs to be implemented. The decision between these patterns and associated technical implementation will be mainly based on three key criteria: throughput, performance, and the location of the primary data.
Integration patterns
The patterns below will help understand the application integrations in coexistence scenario and their available solutions. There are multiple products in the market to achieve the functionalities, a few of them will be discussed here.
Pattern 1 – Application to application integration pattern
Integration between applications, called application to application integration pattern, refers to the process of connecting two software applications or systems to enable them to work together. There are several types of architecture and integration, each serving different purposes and catering to various needs.
Architecturally, there are multiple patterns like Hub-and-Spoke, Enterprise Service Bus (ESB), or API management to integrate applications. These architecture patterns involve a central integration hub or a middleware platform that acts as an intermediary between the mainframe and other environments. Each application only needs to integrate with the hub, ESB, or the API management layer, which then manages data routing and transformation between the connected systems. This approach can simplify integration management and maintenance. The connectivity between the central hub, ESB, or API management layer and the mainframe will rely on the point-to-point integration patterns described in Figure 4.
Here are some of the most common integration types between the AWS Cloud and a mainframe.
Point-to-point using JCA connectors
In this type of integration, the two applications are directly connected to each other to exchange data. Point-to-point integration using Java Connector Architecture (JCA) connectors involve establishing direct connections between Java EE applications and mainframe subsystems such as CICS, IMS TM, or Db2 stored procedures. Point-to-point integration with JCA connectors offers benefits such as improved performance, scalability, support of transactionality, and security by establishing direct connections between Java applications and the mainframe. It can also introduce tight coupling between the integrated systems, making it less flexible and harder to maintain compared to more loosely coupled integration approaches like messaging or APIs.
The three main point-to-point solutions to integrate with CICS, IMS and Db2:
- CICS Transaction Gateway (CTG) to integrate with CICS. CTG can be deployed on z/OS or on open systems.
- IMS Connect to integrate with IMS. IMS Connect must be deployed on z/OS.
- Direct JDBC connection from an external application to trigger Db2 z/OS stored procedures.
Another notable aspect of point-to-point integration using JCA connectors is its unidirectional nature. This means that the integration flows from the AWS Cloud to the mainframe and not vice versa, except in the case of IMS Connect, which supports bidirectional communication.
API-based integration
RESTful API-based integration offers a flexible and standardized approach to integrating software systems. It enables interoperability, scalability, and ease of development. RESTful APIs are widely used in various domains, including web development, mobile apps, cloud computing, and Internet of Things (IoT). Applications using RESTful APIs based integration will have to be designed such that propagation of transactional context across the two environments is mitigated (e.g. using patterns like SAGA or compensation mechanisms). Failure to do so can lead to potential inconsistencies and synchronization issues.
Solutions like z/OS Connect from IBM or from OpenLegacy are available to API enable mainframe subsystems. z/OS Connect allows mainframe assets, such as programs, data, and transactions to be exposed as RESTful APIs. This enables these assets to be accessed and consumed by a wide range of modern applications and services in the cloud. One of the significant advantages of z/OS Connect is its bidirectional integration capabilities. It enables communication between modern applications and mainframe systems in both directions. This means that not only can modern applications consume services and data from the mainframe, but mainframe transactions and applications can consume applications and services from the AWS Cloud.
Message oriented and event driven integration
Applications communicate asynchronously through messages, which are queued and delivered reliably between systems. Message oriented and event driven integration can support various messaging patterns like publish-subscribe or request-reply. IBM MQ is one of the key messaging middleware that facilitates communication and data exchange between the mainframe and the AWS Cloud. It can be used to integrate with the mainframe by leveraging the publish-subscribe or request-reply pattern.
Another option is integrating Kafka with the mainframe through IBM MQ. It will involve using Kafka Connect with appropriate connectors to establish communication between Kafka and MQ. Kafka Connect can run on the mainframe or the cloud. It simplifies the integration process by providing a framework for building and deploying connectors that stream data between Kafka and mainframe applications. Kafka allows additional consumers to subscribe to topics relevant to their domain without the need for additional integration work between the mainframe and the AWS Cloud.
Pattern 2 – Data to data integration pattern
Once a workload is migrated to the AWS cloud while others are still in mainframe, there can be various methods based on frequencies to either transmit data to or from the mainframe. Figure 5 illustrates the different integration patterns that must be built to support the data transmission need and frequency.
Near real time data transfer
Near real time data transfer is a process which enables data updates to be replicated from one platform to another in near real time. The tools involved use change data capture (CDC) to migrate data in near real time based on change logs. Data transfer needs could be unidirectional, two directional, or bidirectional.
Unidirectional means that the data needs to be either replicated from mainframe data sources to AWS data sources or vice versa. Two directional means where the data replication needs to happen both ways but to different and unrelated tables. Bidirectional means where replication needs to happen both ways but related tables. Bidirectional replication should be last resort as it presents additional challenges of data conflicts due to updates to related tables. As we migrate applications to AWS from the mainframe, updates from applications in one platform can be made available in another instantly.
The AWS Mainframe Modernization service provides the capability of data replication between mainframe and AWS using AWS Mainframe Modernization Data Replication, powered by Precisely, CDC technology. It allows near real-time replication of heterogeneous data from mainframe and IBMi data sources like Db2, IMS, and VSAM to a wide range of AWS Cloud database destinations and vice versa. AWS data replication leverages low latency CDC technology where changes made to the source database are propagated in near real time to the target databases, while also ensuring data consistency, accuracy, freshness, and validity. The capability enables a range of use cases including coexistence scenarios, analytics, and new channels creation.
File based transfers
File based transfer mechanisms to move data out of mainframe exist for most of the enterprises. There are mechanisms like NDM/SFTP which can be used to support file transfer. One of the challenges in file transfer between mainframe and open systems is the data format difference. For files which do not have any mainframe COMP, COMP-3, and other binary fields, SFTP and NDM can work in data transfers as is (convert EBCDIC to ASCII base or chosen character set). For files that have the binary fields, specific conversion software is needed. The AWS Mainframe Modernization service offers file transfer capabilities to support variety of coexistence, augmentation, and migration use cases. With AWS Mainframe Modernization File Transfer, you can transfer and convert datasets and files with a fully managed service to accelerate and simplify modernization, migration, and augmentation use cases to the AWS Mainframe Modernization service and Amazon S3.
Extract, transfer, and load (ETL) based transfers
An ETL based transfer is a data integration and transfer mechanism to move data from the mainframe to AWS. The mainframe source (e.g., VSAM, Db2) data is extracted, organized, and cleansed as a part of transform process and uploaded to AWS. All of the ETL process uses JDBC connections to source and target. This methodology is supported by specialized ETL tools like AWS Glue or ISV products like IBM data stage, Informatica, and Precisely ETL connect, to migrate data from mainframe data sources to AWS data sources or vice versa.
Archived data transfer
Mainframe proprietary storage solutions such as virtual tape libraries (VTLs) hold valuable data locked in a platform with complex tools. This can lead to higher compute and storage costs on the mainframe for those data retrieval tasks. The pattern, archived data transfer, helps in moving the data from mainframe tapes to Amazon S3. BMC AMI Cloud enables customers to move their tapes in mainframes to Amazon S3.
Pattern 3 – Application to data integration pattern
This option is to implement an application to data integration across platform. (Figure 6). By application to data integration, we mean an application running either on AWS or the mainframe that will access data hosted remotely on either AWS or the mainframe.
It is generally preferable to have data-to-data integration to enable local data access and avoid the latency impacts associated with remote data access. If the data is so tightly coupled, implementing a data-to-data integration pattern becomes challenging. In these instances, application-to-data integration might be more suitable.
There are two variations of the application-to-data integration pattern.
- Application to data with one single copy of the data.
- Application to data leveraging dual writes.
Single copy of data pattern
In this variation of the pattern, there is a single source of truth for the data, which exists either on AWS or the mainframe. Any application not local to the data must perform remote access using techniques such as JDBC or a gateway. While this pattern simplifies data management by maintaining a single data copy, it introduces latency for remote applications accessing the data, impacting the overall performance of the application.
- Application on AWS, database on mainframe – In this type of integration, the applications on the cloud are directly connected to mainframe databases. Point-to-point integration with Java Connector Architecture (JCA) connectors offer benefits such as standardized interfaces, improved performance, portability, scalability, support of transactionality and security by establishing direct connections between Java applications on the cloud and databases on the mainframe. JCAs and JDBCs introduce tight coupling between the integrated systems, making it less flexible and harder to maintain. Point-to-point integrations using JCA connectors or JDBC are unidirectional in nature which means the integration flows from the application on the cloud to the mainframe database only.
- Application on mainframe, database on AWS or vice versa – There are various channels of integrating applications on the mainframe directly to a database on AWS or vice versa. Applications on the mainframe can use a Db2 federated server to access a database in AWS and vice versa. This can reduce ambiguity and only requires storing one copy of the data thereby reducing operational complexity.
Federation is a scaling technique that splits up databases by function. Federation of mainframe data provides real-time access to heterogeneous data in a unified way and allows consumption for distributed applications and databases on AWS or vice versa with minimal configuration overhead. Federated servers introduce a certain layer of complexity in terms joining data from different datastores which could impact query performance and scalability of the application.
Virtualization is another data management technique that allows applications to access and modify data without needing to know technical details on format or where it is located. IBMz Data Virtualization Manager (IBMz DVM) creates a single representation of data from multiple sources without the need to copy or move the data thus allowing distributed applications and databases on AWS to access the various datastores (IMS, IDMS, or Db2) and file system (sequential, VSAM, VSAM CICS, ADABAS, or MQ) on the mainframe. Virtualization masks your data implementations from application developers to safely expose mainframe assets as APIs into distributed channels on AWS applications and databases. Data virtualization is also limited to simple data processing using database joins and rudimentary data processing as opposed to data federation.
Dual writes pattern
In this variation of the pattern, there are two copies of the data, one on AWS and one on the mainframe. Instead of using a replication mechanism, the application performs dual writes (inserts/updates) to both locations. This pattern reduces latency impact because read operations are local, while write operations are performed both locally and remotely. It is well suited for applications with infrequent writes and frequent reads. The major drawback is the complexity introduced at the application level to perform dual writes within a single transaction, ensuring data integrity and consistency. This pattern provides real-time data copies at both locations, unlike data-to-data integration, which offers near real-time synchronization.
- Application on AWS and database on AWS and mainframe – In this type of integration, we keep synchronous copies of data both on AWS as well as on mainframe. The applications on AWS are directly connected to AWS databases and mainframe databases simultaneously. This integration is achieved using JCA (Java Connector Architecture) connectors which involves establishing direct connections between Java EE applications on AWS, databases on AWS and mainframe databases via JDBC. The choice for dual writes adds data resiliency to the architecture, but it can introduce performance issues in the application. The characteristics and nature of integration is similar to single copy of data pattern with application on AWS and database on the mainframe.
- Application on mainframe and database on Mainframe and AWS – The various channels of integrating applications on the mainframe directly to a mainframe database as well as database on AWS are similar to single copy of data pattern with the only distinction of storing synchronous copied of data on the mainframe as well as AWS.
Conclusion
As large customers migrate their mainframe applications to AWS, some will adopt an incremental approach using the strangler fig pattern to minimize the risks associated with a big-bang transition. This approach necessitates interoperability between the mainframe and the AWS Cloud. This post summarized the various integration patterns that facilitate this interoperability. There is not a one-size-fits-all solution for all integration scenarios. Each pattern has its own advantages and disadvantages. Careful consideration is required when choosing between these integration patterns. Key factors for the decision include throughput, performance, transactional context propagation, integrity, and the location of primary data.
AWS recommends connecting with a specialist for mainframe modernization at AWS at (mainframe@amazon.com) as you embark on the journey.