AWS Cloud Operations Blog
Selecting File Systems for AWS Mainframe Modernization
Mainframe applications often execute business-critical functions, which have to be resilient, scalable, and cost-efficient. This imperative applies to the multiple layers and components supporting the application, including files, datasets, and their supporting storage systems. When modernizing these applications and files with AWS, choosing the right file-system for the right application data profile is essential. In addition, file systems must meet stringent functional and non-functional requirements that balance multiple dimensions, such as performance, availability, and cost. This post describes the typical requirements for mainframe applications and AWS file-system options and provides an approach for choosing an appropriate AWS file-system.
Mainframe applications file-system use-cases and requirements
Mainframe applications use various data stores and storage types for business data: relational databases, datasets, files, hierarchical databases, inverted list databases, network model databases, etc. Datasets (or data sets) are structured data files with an organization and generally logical records. In the context of AWS mainframe modernization, files and datasets are common and require thorough analysis for designing and choosing the right target solution. This post focusses on file-based use-cases that rely on file-system solutions.
There are many mainframe dataset types that can be accessed for read or write purposes: Physical Sequential (PS), Partitioned Datasets (PDS), Generational Data Groups (GDG) and Virtual Storage Access Method (VSAM). PS, PDS, and GDG datasets are accessed record by record sequentially. VSAM has multiple organizations with different access methods: VSAM ESDS datasets are accessed sequentially. VSAM KSDS and RRDS datasets are accessed sequentially or randomly, or dynamically. And VSAM LDS datasets are rarely accessed directly by applications.
There are two main methods for executing mainframe applications: First, users can interactively make online real-time requests or transactions; Second, batch jobs can be automated and scheduled for bulk processing. When migrating and modernizing mainframe online applications, the corresponding data is often moved into relational databases for high availability and distributed locking management. On the other side, batch applications processing records in bulk have Input/Output (I/O) intensive storage needs, and they favor storing batch data in file-systems because of their high performance for I/O Operations Per Second (IOPS), throughput, and low latency.
Mainframe batch jobs, which are often single-threaded, read input datasets and write outputs datasets via batch utilities or programs. In most cases, a single batch thread processes a dataset, or there could be few batch jobs or steps running in parallel and accessing a dataset.
For the high availability requirements of mainframe applications, the AWS well-architected best practice is to deploy applications across AWS Availability Zones (AZ), and to consider designing a solution across AWS Regions. Therefore, it’s a common requirement to deploy the storage solution across multiple Availability Zones with fast failover. Because mainframe applications often host core-business data, solutions must usually preserve data integrity with no data loss during AZ failover or failback. If the application runs across multiple compute nodes across multiple AZs, then data integrity must be preserved during read/write access from any AZ with mechanisms such as distributed locking.
Datasets migrated with AWS Mainframe Modernization service
The new AWS Mainframe Modernization service is an AWS Cloud native platform for migrating, modernizing, and executing mainframe applications on AWS. It includes on-demand tools and a managed runtime environment with extensive automation and simplified interfaces. It supports two popular migration and modernization patterns: Automated Refactoring and Replatforming. In addition, this AWS service allows for datasets to be migrated and modernized into a relational database, or they can be stored in file-systems.
For example, with AWS Mainframe Modernization Blu Age solution, one option is migrating datasets to sequential or indexed files stored in a file-system. To facilitate this, the BluSAM data access layer is used and allows for exposing the migrated data to the modernized application business logic. In addition BluSAM includes optimizations for low-latency, caching, and compression.
The AWS Mainframe Modernization Micro Focus solution allows the datasets to be stored as sequential, indexed, or relative datasets in a file-system. It supports many dataset organizations, such as PS, PDS, GDG, and VSAM. Migrated programs call the Micro Focus File Handler Application Programming Interface (API), by default, to perform all I/O operations on all of the standard file organizations.
With both AWS Mainframe Modernization patterns, the requirements and considerations for file-system availability, IOPS, throughput, latency, and cost efficiency described previously apply.
AWS file-system options for modernized mainframe applications
Millions of customers use AWS storage services to streamline their business, increase agility, reduce costs, and speed up innovation. AWS offers a broad portfolio of data storage services, including block storage, file storage, and object storage. In this post, we focus on the following primary high performance storage services and file-systems, as well as their differentiated characteristics for mainframe applications. These file-systems are AWS managed services automating the hardware management with a simple interface on the AWS Console.
Amazon FSx makes it easy and cost-effective to launch, run, and scale feature-rich, high-performance file-systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Users can choose between four widely-used file-systems: NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. We focus here on Linux-compatible file-systems, which are NetApp ONTAP, OpenZFS, and Lustre.
Amazon Elastic File System (Amazon EFS) is a simple, serverless, set-and-forget elastic file-system. Amazon EFS systems can automatically scale from gigabytes to petabytes of data without needing to provision storage. Thousands of computing instances can access an Amazon EFS file-system simultaneously, and Amazon EFS provides consistent performance to each computing instance. The system is highly durable and highly available. Amazon EFS means that there are no minimum fees or setup costs, and users only pay for what they use.
Amazon Elastic Block Store (Amazon EBS) is an easy-to-use, scalable, and high-performance block-storage service which can host file-systems. It provides multiple volume types that allow customers to optimize storage performance and costs for a broad range of applications. These volume types are divided into two major categories: SSD-backed storage (gp2, gp3, io1, and io2) for transactional workloads, and HDD-backed storage (sc1 and st1) for throughput intensive workloads. This post details Amazon EBS io2, gp2, and st1.
Amazon EC2 Instance Store provides temporary block-level storage for your Amazon Elastic Compute Cloud (Amazon EC2) instance with often a file-system preformatted. This storage is located on disks that are physically attached to the host computer. The data in an instance store persists only during the lifetime of its associated instance, so it’s better suited for intermediate storage for transient datasets used temporarily across steps within a batch job. The Amazon EC2 instance type determines the size of the instance store available and the type of hardware used for the instance store volumes. Instance store volumes are included as part of the instance’s usage cost.
Contrasting file-systems functional capabilities for mainframe applications
The choice of AWS storage offerings allows for optimizing the specific mainframe application requirements. Each file-system option has its benefits and limitations. The following table compares some important functional capabilities for Amazon EFS, Amazon FSx for Lustre , FSx for NetApp ONTAP, FSx for OpenZFS, and Amazon EBS (io2, io2 block express, gp2, and st1).
If the application must be executed on compute nodes across multiple Availability Zones for high availability with fast failover, then the Muti-AZ deployment option becomes important and both FSx for NetApp ONTAP and Amazon EFS can meet such requirement.
Durability is the probability that a file will remain intact and accessible after a period of one year. We can improve durability by spreading and copying data across locations.
Availability is the availability of the storage service itself measured by the probability that the storage service is available to access the data when needed.
New features and capabilities are added regularly to AWS file-system and storage options. Prior to making a file-system selection, you should review the latest capabilities on the storage services’ respective websites. For Amazon FSx, you can review the latest selection guidance in Choosing an Amazon FSx File System. For Amazon EFS, you can review When to choose Amazon EFS. And for Amazon EBS, you can review the use cases and characteristics in Amazon EBS volume types.
Contrasting file-systems relative performance for mainframe applications
To compare the performance of the AWS file-system options, we recommend running some performance benchmark tests with the specific application logic and data format for evaluating the file-systems. Because these AWS file-systems are available on-demand in minutes with pay-as-you-go pricing, you can quickly test multiple options and identify suitable options.
For example, we performed a benchmark to measure the performance of multiple AWS file-system options for a specific AWS-owned COBOL application. This application executes various batch workload data access patterns. This batch application simulates bulk data processing with a single execution thread with exclusive access to the datasets (no sharing). Datasets in this test have a record length of 1000 bytes. The results of these performance tests are unique to the tested AWS-owned applications and configurations and shouldn’t be generalized to other applications. Your mileage may vary. The following figures show the performance results obtained for multiple data access types with each tested file-system option. Note that ec2_instance represents Amazon EC2 Instance Store in these figures.
The figure above shows the average elapsed time for sequential reads of 1000 bytes records on a 20 Gigabyte sequential dataset (SEQREAD). A lower elapsed time is better. It also shows the average elapsed time for sequential reads of 1000 bytes records on a 20 Gigabytes indexed dataset (SEQ-ISAMREAD). In this example, FSx for NetApp ONTAP combines the Multi-AZ deployment capability alongside strong performance for sequential reads.
The figure above shows the average elapsed time for random reads of 1000 bytes records on a 20 Gigabyte indexed dataset (RND-ISAMREAD). A lower elapsed time is better. Across these example performance tests, the Amazon EC2 Instance Store shows high performance, but its use-cases are limited to situations where the data is local and temporary within the lifetime of the Amazon EC2 instance itself.
The figure above shows the average elapsed time for writing sequential records of 1000 bytes to a 100 Gigabyte sequential dataset. A lower elapsed time is better. In this example, Amazon EBS volumes show a high performance suitable for applications deployed and running in one Availability Zone. One Amazon EBS volumes use-case can be batch applications, which would not require multi-Availability Zones file-system availability.
Contrasting file-systems relative cost for mainframe applications
Cost optimization is an important pillar of an AWS Well-Architected design. In addition to functional capabilities and performance characteristics, we must balance the file-system selection decision with the cost profile for each option. Therefore, we recommend that you run some cost evaluations with the specific configurations and requirements for which you’re evaluating the file-systems. You can find the pricing for each file-system option on their respective website, or estimate the cost using the AWS Pricing Calculator.
For example, we have estimated the monthly costs of multiple AWS file-system options for the specific configuration that we evaluated for our performance benchmark with specific assumptions. These results in Table 2 should not be generalized to other configurations. Your mileage may vary.
Approach for selecting a file-system for mainframe applications
The optimal storage solution varies based on many dimensions, including application type, data access pattern, file size, access method, required performance, availability, durability, and costs. There is no one-solution-fits-all. An AWS Well-Architected design for a mainframe application may use one or multiple storage solutions.
When evaluating file-systems for mainframe applications, we recommend that you first identify file-system options which meet functional requirements, such as availability, deployment, durability, replication, and throughput. Second, you should perform a cost estimate for the specific technical sizing required by the application. Third, you should evaluate the file-system performance with application-specific real performance tests on AWS, benefiting from the fact that you can quickly test AWS on-demand file-systems. Finally, if a mainframe application has stringent I/O performance requirements, then you can validate file-system options by performing a Proof Of Concept (POC) early in the migration and modernization effort.
For example, we have combined the performance and cost analysis for the specific AWS-owned COBOL application that we tested in the performance benchmark described in the previous sections. Therefore, it shouldn’t be generalized to other applications. Nonetheless, it provides an example of how you can combine the evaluation dimensions for making a better selection decision.
The figure above shows the positioning of the different file-system options considering their annual costs and their average response times to write a record of 1000 Bytes of data to a dataset. The size of the bubble represents the variance of the performance, which means that the bigger the bubble, the less the standard variation, and the more stable the performance.
Go build
AWS provides various file-systems options and configurations that can satisfy numerous requirements from mainframe applications. Selecting a file-system for a specific application involves meeting functional requirements, evaluating performance and cost, and balancing the results to define the optimal file-system. By combining AWS managed file-systems with the AWS Mainframe Modernization service and its cloud native runtime environments, you can quickly create environments on-demand in minutes, and test optimal application and file-system configurations. You can experiment with the managed service yourself and build an AWS Cloud native runtime environment by following the tutorials.
About the authors: