How to Enable Mainframe Data Analytics on AWS Using Model9
By Gil Peleg, Founder and CEO at Model9
Data insight is critical for businesses to gain a competitive advantage. Mainframe proprietary storage solutions such as virtual tape libraries (VTLs) hold valuable data locked in a platform with complex tools. This can lead to higher compute and storage costs, and make it harder to retain existing employees or train new ones.
When mainframe data is stored in a cloud storage service, however, it can be accessed by a rich ecosystem of applications and analytics tools.
Model9 is an AWS ISV Partner that enables mainframe customers to benefit from cloud technologies and economics with backup and archive directly to AWS cloud storage services, such as Amazon Glacier and Amazon Simple Storage Service (Amazon S3).
Model9 makes data delivered to Amazon Web Services (AWS) storage services readable and structured, enabling new analytics use cases. In this post, I will describe Model9’s new features and benefits with a step-by-step walkthrough and several customer use cases.
Backup and Archival with Model9 Cloud Data Manager for Mainframe
Model9’s patented technology lets mainframe customers take advantage of AWS storage services, from affordable long-term options like Amazon Glacier and Glacier Deep Archive, to highly durable, geographically dispersed, and flexible low-cost options such as Amazon S3 object storage.
Figure 1 – Model9 Cloud Data Manager for Mainframe.
In the architecture above, you can see the two main components of the Model9 Cloud Data Manager for Mainframe software product—a lightweight agent running n z/OS providing secure data delivery and retrieval functions directly to Amazon S3, and a management server running on AWS.
Model9 Cloud Data Manager for Mainframe provides storage, backup, restore, archive/migrate, and automatic recall for all mainframe data sets, volume types and z/OS UNIX files, as well as space management, stand-alone restore, and disaster recovery.
Model9 can run side-by-side with existing data management solutions to provide cloud capabilities and cost reductions. To achieve dramatic cost reductions, Model9 provides a complete replacement for on-premises VTL and legacy data management tools.
Learn more about the Model9 backup and recovery features and benefits in this APN Blog post: How Cloud Backup for Mainframes Cuts Costs with Model9 and AWS.
Mainframe Data Analytics via Model9 Cloud Data Gateway for Mainframe
Model9 recently added new features for delivering and transforming mainframe data directly to AWS, enabling easy and secure integration with popular cloud analytics tools, data lakes, data warehouses, databases, and ETL solutions running. The Model9 solution unifies data delivery for analytics with backup and space management processes.
Mainframe customers typically use virtual tapes as a secondary storage solution for three types of data:
- Daily incremental data set backups.
- Migrated/archived data sets as part of daily space management processing.
- DB2 database image copies.
When data is stored on proprietary mainframe virtual tapes, no other tools can access it from outside the mainframe ecosystem. In order to process the data, it must be retrieved from tape to the mainframe host, transformed into readable format, delivered to another platform, and then loaded into analytics tools.
To avoid this data retrieval, the majority of customers also send database updates and data sets to other platforms on a daily basis using ETL tools and data transfer software such as FTP.
This double data movement—intended solely to overcome the locked-in nature of on-premises, mainframe proprietary secondary storage—incurs high costs and wasted CPU consumption.
Model9 offers a new paradigm of writing mainframe data directly to a storage platform where the data can be accessed and consumed by non-mainframe analytics tools, without requiring double data movement.
Data set backups and archives created by Model9 Cloud Data Manager in Amazon S3 can be transformed into readable textual or binary format and processed by analytics tools such as Amazon Athena. DB2 image copies delivered by Model9 directly to S3 can be transformed to CSV or JSON format so that tables can be easily loaded into modern databases or data warehouses such as Amazon Aurora and Amazon Redshift.
How Model9 Cloud Data Gateway for Mainframe Works
Model9 Cloud Data Gateway for Mainframe runs on zIIP engines and delivers data sets directly to Amazon S3 cloud storage. Amazon EFS and Amazon EBS are supported as well.
Compression and encryption can be optionally applied before data is sent over the network. On AWS, the Model9 Data Transformation Service transforms data sets and databases to standard file formats (e.g. CSV or JSON) that can be consumed by analytics services.
When used together with Model9 Cloud Data Manager for z/OS, data transformation in the cloud is automatically applied to backed up and archived data, leveraging existing storage management scheduling policies and life cycle management.
Because mainframe data is kept in the cloud in its original format, it can be transformed in multiple ways to support future needs as your application requirements change.
Figure 2 – Model9 Cloud Data Gateway for Mainframe.
For efficient mass data delivery, Model9 leverages storage replication technologies such as FlashCopy, Concurrent copy, and DFDSS to deliver data to Amazon S3 in its original format. Mainframe data is then organized, indexed, and tagged with metadata, in order to enable identification, fast retrieval, and transformation.
Data sets can be transformed specifically or extracted from full volume dumps. For example, sequential, partitioned, and VSAM data sets are transformed to JSON files. DB2 image copies are transformed to CSV files.
Currently supported mainframe data sources for transformation include*:
- DB2 image copies
- VSAM data sets
- Sequential data sets
- Partitioned data sets
- Extended format data sets
* Data sources are updated regularly; please inquire with Model9 for latest support.
Customer Use Cases
In this section, I will describe common customer use cases for leveraging mainframe data for analytics on AWS.
Data Retention Compliance
For companies with regulatory requirements to keep data for long retention periods, Model9 can securely archive mainframe data to Amazon S3, Amazon Glacier, or Glacier Deep Archive for long-term retention periods at attractive costs. Data sets are always available for transparent automatic-recall by mainframe applications.
For additional protection and compliance with regulations (such as SEC 17a-4), Amazon S3 object lock may be applied to data delivered by Model9. Amazon S3 object lock prevents data from being deleted or overwritten by providing a Write-Once-Read-Many (WORM) protection model.
Some companies retain data even after their mainframe platform has been decommissioned. The Model9 Management Interface can be used to search for mainframe data sets stored in cloud storage, and then invoke the Model9 Data Transformation Service running on AWS to make data available for applications and analytics tools with no need for a mainframe or for retaining old equipment.
Data Warehouse and Data Lake
As data analytics requirements evolve and business needs change, having the data in its original format enables changing and updating data analytics processes on-demand. However, because mainframe data is stored on proprietary storage systems, it’s very complex to access and manipulate from outside the mainframe platform.
Mainframe ETL and data integration services transform the data on the mainframe before loading it into a target data store. If, in the future, the data is needed in a different format or structure, it has to be transformed again on the mainframe and loaded again to a target data store.
Model9 offers a new approach, delivering mainframe data to the cloud in its original format and enabling any transformation, both in the present and in the future, to run outside of the mainframe platform with no access to a mainframe. This approach is known as Extract, Load, Transform (ELT), in contrast to the traditional Extract, Transform, Load (ETL) used by legacy mainframe tools.
To keep data fresh, data delivery can be scheduled at the desired frequency; for example, every 30 minutes. Once loaded into Amazon Redshift or kept within an Amazon S3-based data lake, the mainframe data can be queried and analyzed just like any other data.
Mainframes generate and store valuable core business data. Customers gain deep business insights and improve business decisions by leveraging modern business intelligence tools to analyze mainframe data jointly with other data sources.
Today, it’s very complex and expensive to load mainframe data into cloud analytics services because data usually has to be transformed on the mainframe before being delivered to cloud services. This transformation consumes expensive mainframe CPU cycles and increases customer software monthly license charges.
With Model9, data is transformed by the Model9 Data Transformation Service, which runs on AWS and does not waste any mainframe CPU cycles at all. Data is then loaded directly into AWS analytics services such as Amazon Athena, or processed through Redshift or Aurora.
As a core business platform, where most business transactions run on, mainframes generate vast amounts of machine data such as system logs, security records, and audit statistics. When this data is stored on proprietary mainframe storage systems on-premises, it’s hard to use it for DevOps, monitoring, and automation processes.
By using Model9, you can send mainframe system, security, and audit data directly to the Amazon S3, where it can be transformed from binary machine data to structured data that can be loaded and parsed by operational intelligence services running on AWS.
For example, System Management Facility (SMF) records, which are regularly collected by mainframe customers into tape or generation data sets, can be sent via Model9 Cloud Data Gateway for Mainframe to Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) as part of the standard SMF collection process. This can be used together with machine data generated by other platforms to provide a complete monitoring picture.
In this section, I will demonstrate how to deliver and load a DB2 table into Amazon Athena and Redshift. Once data is on AWS, it can be queried and analyzed by a variety of AWS services.
Task #1: Deliver DB2 image copy to Amazon S3 and transform into a CSV file
The following JCL job is used to deliver DB2 data, stored in a DB2 table image copy, directly to S3 and then transform it into a CSV file that can be queried by Amazon Athena or loaded into Redshift.
This JCL or similar job can be integrated into mainframe automation processes or scheduled by standard job schedulers to ensure ongoing mainframe data delivery to AWS.
Figure 3 – JCL job to copy data to Amazon S3 and transform to CSV format.
The first job step creates an image copy from the specified DB2 table. Image copies (ICs) are a common DB2 table backup format, created using standard DB2 utilities as part of the daily DB2 backup process. ICs can be incremental or may contain the full table data.
In the second job step, the DB2 image copy is delivered by Model9 to S3. For simplicity, this job uses a predefined data delivery policy, but many controls can be set such as the S3 region, bucket name, object name prefix, and AWS credentials. This step may also be performed from the Model9 graphical management interface.
The last job step invokes the Model9 service on AWS, to transform the image copy on S3 to a CSV file.
Task #2: Query data in Amazon S3 using Amazon Athena
The screenshot below shows how to define a table in Amazon Athena from the transformed DB2 table in S3. The pictured DDL query defines an external table from a file stored in S3 and defines the schema to access the file.
In this case, the file has been created by Model9 Cloud Data Manager backup of a DB2 table and transformed to CSV format by Model9 Cloud Data Gateway transformation service.
Figure 4 – Amazon Athena query to define table from DB2 CSV file.
After the table is defined, it can be queried just like any other table and there’s no difference between mainframe data and data that originated from other platforms.
The screenshot below shows the result of querying all columns in the table.
Figure 5 – Amazon Athena query result showing DB2 data.
Task #3: Load DB2 table into Amazon Redshift data warehouse
The following screenshot demonstrates how to load a table in CSV format from Amazon S3 to Redshift.
The pictured DDL query defines a table schema with multiple columns and copies the records from a file stored in S3 into the table based on the defined schema. The file was stored in AWS S3 using Model9 Cloud Data Manager backup of a DB2 table.
Figure 6 – Amazon Redshift table creation and DB2 CSV data load.
After the table is created, it can be queried just like any other table and there is no difference between mainframe data and data that originated from other platforms.
The screenshot below shows the result of querying all columns in the table.
Figure 7 – Amazon Redshift query result showing DB2 data.
After completing the tasks above, you will be able to use mainframe data for standard analytics processes on AWS, such as Amazon Athena and Amazon Redshift.
In this post, we discussed how to securely and efficiently deliver mainframe data directly to storage and analytics services on AWS using Model9 Cloud Data Manager and Model9 Cloud Data Gateway.
These software solutions help you avoid duplicate data movement for backup and data analytics, enabling you to leverage mainframe data to gain deep business insights.
I invite you to try Model9 for free and see that copying mainframe data sets to/from AWS can be as easy as running an IEBCOPY job. For more information, download our free Model9 cloud copy tool.
The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.
Model9 – AWS Partner Spotlight
Model9 is an AWS ISV Partner. Its software connects the mainframe directly over TCP/IP to cloud storage and allows customers to supplement or eliminate the need for VTLs, physical tapes, and existing data management products.
*Already worked with Model9? Rate this Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.