Real-Time Mainframe Data Replication to AWS with tcVISION from Treehouse Software
By Andy Jones, Senior Technical Representative at Treehouse Software
By Joseph Brady, Director of Business Development at Treehouse Software
By Phil de Valence, Solutions Architect for Mainframe Modernization at AWS
Customers that still have business-critical data locked in mainframes want to exploit this data with Amazon Web Services (AWS) agile services for analytics purposes, to create new communication channels, and for quickly developing new innovations.
Fortunately, Treehouse Software’s tcVISION replicates data in real-time and bi-directionally between mainframes and AWS to allow for these new use cases.
Treehouse Software is an AWS Partner Network (APN) Standard Technology Partner that serves enterprise customers worldwide with solutions for real-time, bi-directional mainframe-to-cloud data migration and replication without any programming effort.
In this post, we describe the main Treehouse Software’s customer use cases, the tcVISION technical solution on AWS, and share a practical example of how to replicate data in real-time from DB2 z/OS to Amazon Aurora.
Mainframe Data to AWS Customer Use Cases
Mainframe data stores often hold large amounts of complex, critical data in proprietary legacy formats. This data can be difficult to extract and inconsistent with modern databases, data types, and data tools.
The tcVISION replication software solves this problem by replicating data to modern AWS databases in real-time, and allows for the following use-cases:
As soon as the mainframe data is unlocked and available within an AWS data store, such as Amazon Simple Storage Service (Amazon S3), customers can use the wide array of analytics and machine learning services for easy access to all relevant data, without compromising security or governance. Customers select AWS data services from data catalog and data processing to interactive analytics, real-time analytics, operational analytics, dashboards, and data warehousing.
Once mainframe data is on AWS, customers innovate by creating new functions with cloud speed. For example, some choose to create microservices with a complete serverless stack via AWS Lambda, accessing their mainframe data. Others decide to make mainframe data available to new channels, such as mobile users via Amazon API Gateway or voice devices such as Amazon Alexa. Mainframe data can also be easily moved into machine learning models.
Finally, customers can take advantage of the AWS global infrastructure to deploy applications with key mainframe data globally, quickly delivering innovations worldwide.
When piecemealing a large mainframe-to-AWS migration, some customers have to synchronize data between their mainframe and AWS. Bi-directional, real-time data replication allows incremental migration without manually developing data synchronization code.
tcVISION Technical Overview
tcVISION is a data replication software product performing real-time synchronization of mainframe data sources to AWS data stores. It allows critical mainframe data to be consumed by AWS services.
Figure 1 – tcVISION on AWS architecture overview.
tcVISION supports many mainframe data sources for both online and offline scenarios. Data can be replicated from IBM DB2 z/OS, DB2 z/VSE, VSAM, IMS/DB, CA IDMS, CA DATACOM, or Software AG ADABAS. tcVISION can replicate data to many targets including Amazon Aurora, Amazon Relational Database Service (Amazon RDS), or Amazon S3. To learn more, see the complete list of supported tcVISION sources and targets.
tcVISION has software components installed on the mainframe and on a Windows or Linux Amazon Elastic Compute Cloud (Amazon EC2) instance.
Often, customers establish multiple environments, such as development, quality assurance (QA), and production, with each being associated with a different mainframe LPAR hosting a tcVISION Manager and communicating with a corresponding tcVISION Manager installed on an Amazon EC2 instance. These components communicate over TCP/IP or SSL/TLS using a VPN or AWS Direct Connect.
tcVISION stores metadata in a relational database, such as Amazon RDS. The tcVISION Manager components are administered by the tcVISION Control Board, which can be installed on-premises or in an Amazon EC2 instance. This allows tcVISION users to create metadata, create and control replication scripts, and control database interaction. tcVISION’s product architecture is designed to minimize the mainframe resource utilization.
Metadata from the source and target environments is acquired via the tcVISION Control Board. Sources and targets can be mapped one-to-one, one-to-many, many-to-one, and many-to-many. There is built-in intelligence to understand mainframe and relational database management system (RDBMS) data types and stores.
The Control Board facilitates the mapping of the mainframe copybooks, redefines, data dictionaries, data catalogs, codepages, data type mapping, and more via the user-friendly interface. The Repository Editor allows users to control data transformations.
tcVISION Replication Modes
tcVISION’s synchronization process requires an initial bulk load of the mainframe source database into AWS data targets such as Aurora, Amazon RDS, or Amazon S3. After the initial bulk load, tcVISION’s Change Data Capture (CDC) is utilized to keep the mainframe data and AWS data source in constant synchronization.
The entire process is designed for minimal impact on the mainframe, meaning no source database outage during the bulk load, and minimal mainframe resource utilization during the bulk load and ongoing replication.
tcVISION’s bulk load performs the initial target database load, using mainframe source data. Source data can be read directly from the mainframe data store, or can be read from a mainframe backup or unload. The bulk load provides automatic translation of mainframe data types such as EBCDIC packed fields.
Generally, the highest performance is attained by using the backup or unload data versus a direct read of the mainframe database. Moving unload or backup data to the requisite tcVISION Amazon EC2 instance and using native database loaders minimizes network IO, and reduces load time.
tcVISION CDC enables real-time synchronization between the mainframe and AWS data sources, such as Amazon RDS. tcVISION utilizes native logging associated with each mainframe database to capture the data changes on the mainframe platform. This includes adds, updates, and deletes to specific data records.
For reliability, tcVISION operates on an ACID transactional basis, only applying committed transactions, and can restart CDC automatically.
When data needs to be replicated from the mainframe to an AWS data source and back from the AWS data source to the mainframe, tcVISION uses CDC on both source and target databases. It has built-in capabilities to fully support bi-directional replication:
- ‘Looping prevention’ ensures only data changes not made by tcVISION are acted upon.
- ‘Conflict detection’ allows users to pre-define specific actions to be taken when there are data conflicts encountered during bi-directional replication. For example, a conflict detection rule can be specified to change an INSERT to an UPDATE when a database record already exists.
Security, High Availability, and Scalability
tcVISION provides the quality of service required by enterprise data workloads for security, availability, and scalability.
From a security perspective, authentication and access control for tcVISION can be controlled by LDAP, Active Directory, or a mainframe SAF product, such as RACF, ACF2, or Top Secret. In-transit data between tcVISION managers (mainframe-to-AWS) and the Control Board can be encrypted via SSL/TLS. Temporary block storage-based CDC files can reside in encrypted form on disk.
Figure 2 – tcVISION high availability architecture on AWS.
During tcVISION’s CDC processing, high availability must be maintained in the AWS environment. The Amazon EC2 instance, which contains the tcVISION Manager, is part of an Auto Scaling Group spread across Availability Zones (AZs) with minimum and maximum of one Amazon EC2 instance.
Upon failure, the replacement Amazon EC2 instance tcVISION Manager is launched and communicates its IP address to the mainframe tcVISION Manager. The mainframe tcVISION Manager then starts communication with the replacement Amazon EC2 tcVISION Manager.
Once the Amazon EC2 tcVISION Manager is restarted, it continues processing at its next logical restart point, using a combination of the LUW and Restart files. LUW files contain committed data transactions not yet applied to the target database. Restart files contain a pointer to the last captured and committed transaction and queued uncommitted CDC data. Both file types are stored on a highly available data store, such as Amazon Elastic File System (EFS).
For production workloads, Treehouse Software recommends turning on Multi-AZ target and metadata databases.
tcVISION’s scalability is dependent on the type of replication process it performs. tcVISION can run parallel concurrent bulk load processing simultaneously on a single Amazon EC2 instance, or on multiple instances, giving horizontal scalability. Very large tables can be bulk loaded faster by splitting the process into multiple tasks, either by arbitrary intervals, or via row filtering. Row filtering can use a key, partition key, date, etc.
tcVISION scaling for CDC processing can be achieved by running multiple parallel replication streams. The first step is to analyze the files included in logical transactions, as these files must be processed together in sequence.
tcVISION’s CDC process ensures the integrity of each logical transaction, and these files must be processed together. For instance, sets of tables that do not participate in common transactions may be divided into parallel tasks by creating multiple processing scripts.
Transactional consistency is maintained within a task, so it’s important that tables in separate tasks do not participate in common transactions. This approach utilizes multiple tcVISION scripts to create separate replication streams that parallelize reads on the source data transformation, and writes to the target database.
tcVISION Replication from Mainframe DB2 z/OS to Amazon Aurora
tcVISION’s Control Board is a Windows Graphical User Interface (GUI) that allows users to configure the replication stream between various database platforms, including the IBM mainframe and AWS. Using the Control Board and built-in wizards, users can define the metadata and mappings between the mainframe and AWS database target.
The following sequence of screens shows the steps required to create the tcVISION metadata and scripts for replicating mainframe DB2 z/OS data to Amazon Aurora.
First, we access tcVISION Control Board.
We then log on to Amazon Aurora MySQL-compatible.
Next, we log on to DB2 z/OS.
We create metadata that is specific to the input (DB2) and output (Aurora) and the replication definition. In this example, DB2 tables are mapped to Amazon Aurora MySQL-compatible tables.
The tcVISION metadata wizard asks for the information required for the replication of the mainframe database to AWS. For DB2 z/OS, it asks for the mainframe DB2 Subsystem.
tcVISION presents the tables contained in the DB2 z/OS catalog on the mainframe. We select the schemas and associated tables for replication.
Once we complete the required tcVISION wizard-based screens, the tool automatically defines the mappings between the source and target. tcVISION’s Metadata Import Wizard creates a default mapping that handles data type conversion issues, such as EBCDIC to ASCII, Endianness conversion, codepages, redefines datatypes, and more.
After creating the tcVISION metadata, tcVISION allows us to automatically create the DDL to create the target database in Aurora.
tcVISION data scripts are created through wizards. Data scripts control the replication of data from the source (DB2 z/OS) to the target (Aurora). tcVISION bulk load scripts are a type of data script that performs the initial load of the Aurora database.
The script below shows data being accessed directly from the mainframe DB2 z/OS database. Another alternative reducing MIPS consumption is to read the data from a DB2 image copy.
After execution of the bulk load script, we can view replication statistics of the DB2 bulk load into Aurora.
To capture ongoing changes to DB2 in real-time, we create a DB2 z/OS CDC replication script.
The CDC replication is initiated from the tcVISION Control Board. The mainframe communicates to the Amazon EC2-based tcVISION replication manager. The tcVISION Control Board shows a graphical representation of the replication.
The CDC replication is now active capturing and replicating data changes whenever they occur on the DB2 z/OS side. We then decide to test it by making a change in the DB2 z/OS table.
This change is processed and replicated by tcVISION. The tcVISION Control Board shows the statistics highlighting one update was performed.
Now checking in Aurora, we notice the DB2 z/OS change has successfully been propagated to Aurora.
tcVISION on AWS Marketplace
Customers can launch tcVISION on the AWS Cloud in minutes and pre-configured via AWS Marketplace. There are three tcVISION products on AWS Marketplace:
- tcVISION Mainframe Batch Integration – This product loads data directly from a database unload, image copy, or backup. Consequently, it does not require an active connection to the mainframe. It requires obtaining a license separately from Treehouse Software.
- tcVISION Distributed Database Integration – This product allows replicating data between one source distributed database and one target distributed database. It includes a software license and is billed on a pay-as-you-go basis.
- tcVISION Enterprise Change Data Capture Integration – This product allows replicating mainframe data to AWS targets continuously and in real-time. It supports the many source mainframe databases or data files, and the many target AWS data stores described previously. It requires obtaining a license separately from Treehouse Software.
Learn More About tcVISION
Request a live demonstration of tcVISION replicating data from a mainframe to AWS. Just fill out the Treehouse demo request form, and a representative will be in touch to schedule a convenient date and time.
Treehouse Software – APN Partner Spotlight
Treehouse Software is an APN Standard Technology Partner. They provide solutions for real-time, bi-directional mainframe-to-cloud, open systems, LUW data migration and replication.
*Already worked with Treehouse Software? Rate this Partner
*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.