Mainframe data integration: Using mainframe data to build cloud native services with AWS

For International Women’s Day and Women’s History Month, we’re featuring more than a week’s worth of posts that highlight female builders and leaders. We’re showcasing women in the industry who are building, creating, and, above all, inspiring, empowering, and encouraging everyone—especially women and girls—in tech.

Many companies in the financial services and insurance industries rely on mainframes for their most business-critical applications and data. But mainframe workloads typically lack agility. This is one reason that organizations struggle to innovate, iterate, and pivot quickly to develop new applications or release new capabilities. Unlocking the value of mainframe data is one option in your modernization journey.

In this blog post, we will discuss some typical data integration patterns. Your goal might be developing new functions using mainframe data or creating new channels. Or, you might want to augment your mainframe capabilities with analytics, artificial intelligence (AI), and machine learning (ML). Your migration may require a hybrid architecture, and you need guidance on how to integrate the data.

The mainframe data integration patterns in this post use software services that facilitate data replication to Amazon Web Services (AWS):

File-based data synchronization
Change data capture
Event-sourced replication

Once data is liberated from the mainframe, you can develop new agile applications for deeper insights using analytics and ML. Or you can extend the mainframe application capabilities by creating a microservices- or voice-based mobile application. For example, if a bank could access their historical mainframe data to analyze customer behavior, they could develop a new solution based on profiles for loan recommendations.

If you are considering mainframe migration and modernization at scale, we recommend a tool-based industrial approach using AWS Mainframe Modernization service. This should reduce risks and accelerate modernization business benefits.

Solution overview: Mainframe data integration

Figure 1. Mainframe data integration conceptual flow

Mainframe integration: Architecture reference patterns

File-based batch integration

Integration scenarios often require replicating files to AWS, or synchronizing between on-premises and AWS. Use cases include:

Analyzing current and historical data to enhance business analytics
Providing data for further processing on downstream or upstream dependent systems. This is necessary for exchanging data between applications running on the mainframe, and applications running on AWS

This diagram shows a file-based integration pattern on how data can be replicated to AWS for interactive data analytics

Figure 2. File-based batch ingestion pattern for interactive data analytics

File-based batch integration – Batch ingestion for interactive data analytics (Figure 2)

Data ingestion. In this example, we show how data can be ingested to Amazon S3 using AWS Transfer Family Services or AWS DataSync. Mainframe data is typically encoded in extended binary-coded decimal interchange code (EBCDIC) format. Prescriptive guidance exists to convert EBCDIC to ASCII format.
Data transformation. Before moving data to AWS data stores, transformation of the data may be necessary to use it for analytics. AWS analytics services like AWS Glue and AWS Lambda can be used to transform the data. For large volume processing, use Apache Spark on AWS Elastic Map Reduce (Amazon EMR), or a custom Spring Boot application running on Amazon EC2 to perform these transformations. This process can be orchestrated using AWS Step Functions or AWS Data Pipeline.
Data store. Data is transformed into a consumable format that can be stored in Amazon S3.
Data consumption. You can use AWS analytics services like Amazon Athena for interactive ad-hoc query access, Amazon QuickSight for analytics, and Amazon Redshift for complex reporting and aggregations.

This diagram shows a file-based integration pattern on how data can be replicated to AWS for further processing by downstream systems

Figure 3. File upload to operational data stores for further processing

File-based batch integration – File upload to operational data stores for further processing (Figure 3)

Using AWS File Transfer Services, upload CSV files to Amazon S3.
Once the files are uploaded, S3 event notification can invoke an AWS Lambda function to load to Amazon Aurora. For low latency data access requirements, you can use a scalable serverless import pattern with AWS Lambda and Amazon SQS to load into Amazon DynamoDB.
Once the data is in data stores, it can be consumed for further processing.

Transactional replication-based integration (Figure 4)

Several integration scenarios require continuous near-real-time replication of relational data to keep a copy of the data in the cloud. Change Data Capture (CDC) for near-real-time transactional replication works by capturing change log activity to drive changes in the target dataset. Use cases include:

Command Query Responsibility Segregation (CQRS) architectures that use AWS to service all read-only and retrieve functions
On-premises systems with tightly coupled applications that require a phased modernization
Real-time operational analytics

This diagram shows a transaction-based replication (CDC) integration pattern on how data can be replicated to AWS for building reporting and read-only functions

Figure 4. Transactional replication (CDC) pattern

CDC tools in the AWS Marketplace can be used to manage real-time data movement between the mainframe and AWS.
You can use a fan-out pattern to read once from the mainframe to reduce processing requirements and replicate data to multiple data stores based on your requirements:
- For low latency requirements, replicate to Amazon Kinesis Data Streams and use AWS Lambda to store in Amazon DynamoDB.
- For critical business functionality with complex logic, use Amazon Aurora or Amazon Relational Database Service (RDS) as targets.
- To build data lake or use as an intermediary for ETL processing, customers can replicate to S3 as target.
Once the data is in AWS, customers can build agile microservices for read-only functions.

Message-oriented middleware (event sourcing) integration (Figure 5)

With message-oriented middleware (MOM) systems like IBM MQ on mainframe, several scenarios require integrating with cloud-based streaming and messaging services. These act as a buffer to keep your data in sync. Use cases include:

Consume data from AWS data stores to enable new communication channels. Examples of new channels can be mobile or voice-based applications and can be innovations based on ML
Migrate the producer (senders) and consumer (receivers) applications communicating with on-premises MOM platforms to AWS with an end goal to retire on-premises MOM platform

This diagram shows an event-sourcing integration reference pattern for customers using middleware systems like IBM MQ on-premises with AWS services

Figure 5. Event-sourcing integration pattern

Mainframe transactions from IBM MQ can be read using a connector or a bridge solution. They can then be published to Amazon MQ queues or Amazon Managed Streaming for Apache Kakfa (MSK) topics.
Once the data is published to the queue or topic, consumers encoded in AWS Lambda functions or Amazon compute services can process, map, transform, or filter the messages. They can store the data in Amazon RDS, Amazon ElastiCache, S3, or DynamoDB.
Now that the data resides in AWS, you can build new cloud-native applications and do the following:

- Push notifications. Use Amazon RDS or S3 event triggers to call Amazon Simple Notification Services (SNS) to push notifications to mobile devices.
- Build innovative services. Invoke ML services such as Amazon SageMaker directly from these data stores. Build voice interfaces using Amazon API Gateway, Amazon Lex, or Amazon Alexa skills.
- Enable new functions. Build new applications with business logic residing in microservices hosted by AWS Lambda or in containers within Amazon Elastic Container Service (ECS).

Conclusion

Mainframe data integration using AWS services enables you to reduce cost, create modern architectures, and integrate your mainframe and cloud-native technologies. You’ll be able to inform your business decisions with improved analytics, and create new opportunities for innovation and the development of modern applications.

Select your cookie preferences

AWS Architecture Blog

Mainframe data integration: Using mainframe data to build cloud native services with AWS

Solution overview: Mainframe data integration

Mainframe integration: Architecture reference patterns

File-based batch integration

File-based batch integration – Batch ingestion for interactive data analytics (Figure 2)

File-based batch integration – File upload to operational data stores for further processing (Figure 3)

Transactional replication-based integration (Figure 4)

Message-oriented middleware (event sourcing) integration (Figure 5)

Conclusion

More posts for Women’s History Month!

Other ways to participate

Resources

Follow

Learn

Resources

Developers

Help