This Guidance helps customers design a resilient batch process application using AWS services. The batch application is deployed across two AWS Regions for automated failover and failback from one Region to another and leverages Amazon Simple Storage Service (Amazon S3) Multi-Region Access Points (MRAPs). With this architecture, you can obtain insights from your applications that help you make decisions on when to failover batch applications from a primary to a standby Region. While single-Region architectures are sufficient to support most customer's resilience requirements, the multi-Region architecture in this Guidance is ideal for customers with more demanding needs for resiliency.
Architecture Diagram
-
Primary Region
-
Standby Region
-
Primary Region
-
Please note: This architecture shows the multi-Region, event-driven workload when running in the primary Region.
Step 1
Add the file to an Amazon Simple Storage Service (Amazon S3) bucket using the MRAP. MRAP routes the file to one of the S3 buckets, which will replicate the object to the other bucket.
Step 2
Amazon S3 invokes the AWS Lambda function putObject in both Regions.
Step 3
The Lambda function will resolve the TXT record in an Amazon Route53 private hosted zone to determine if it is the active Region. If it is, the workflow will continue. If it is not, the function will exit and not take further action.The function in the active Region writes metadata on the file to the Amazon DynamoDB batch state table, including that the processing has started, and starts the first AWS Step Functions workflow.
Step 4
The main orchestrator Step Functions workflow orchestrates file processing by splitting the file into small chunks and then passing it to chunk file processor Step Functions workflow.
Step 5
The chunk file processor Step Functions workflow is responsible for processing each row from the chunk file.Step 6
The merged file is written to Amazon S3, which replicates it to the standby Region's bucket.
Step 7
A pre-signed URL is generated using the MRAP so the user can retrieve the file from the closest Amazon S3 bucket. The routing logic is abstracted from the client.
Step 8
Amazon Simple Email Service (Amazon SES) mails the pre-signed URL to recipients so they can retrieve the file from one of the S3 buckets through the MRAP. -
Standby Region
-
Please note: This architecture shows the multi-Region, event-driven workload when failing over to the standby Region.
Step 1
AWS Systems Manager runbook in the standby Region is manually triggered, and it initiates a failover to the standby Region.
Step 2
The runbook invokes a Lambda function that connects to the Route53 Application Recovery Controller (ARC) cluster to toggle the TXT record in the Route53 private hosted zone.
Step 3
The runbook waits 15 minutes for Amazon S3 replication to finish and then invokes a second reconciliation Lambda function that reads the batch state DynamoDB global table to determine the names of the objects to start processing.The function then re-copies those objects within the secondary bucket into the “input” directory. It also logs any objects that were unfinished (according to the DynamoDB table status) but were not present in the S3 bucket in the standby Region.
Step 4
The newly created Amazon S3 putObject event invokes the Lambda function.
Step 5
The function will resolve the TXT record in the Route53 private hosted zone to determine if it is the active Region. Since the failover function in Step 2 altered the TXT record, the workflow will continue.The function writes metadata on the file to the DynamoDB batch state table, including that the processing has started, and starts the first Step Functions workflow.
Step 6
The main orchestrator Step Functions workflow orchestrates file processing by splitting the file into small chunks and then passing it to chunk file processor Step Functions workflow.
Step 7
The chunk file processor Step Functions workflow is responsible for processing each row from the chunk file.
Step 8
The merged file is written to Amazon S3, which replicates it to the standby Region's bucket.
Step 9
A pre-signed URL is generated using the MRAP so the user can retrieve the file from the closest S3 bucket. The routing logic is abstracted from the client.
Step 10
Amazon SES mails the pre-signed URL to recipients so they can retrieve the file from one of the S3 buckets through the MRAP.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
You can deploy this Guidance with infrastructure as code (IaC) to make any modifications. We also provide a dashboard that helps you understand performance and make iterations to the Guidance so you can achieve your desired performance characteristics.
-
Security
We implemented least privilege access on Identity and Access Management (IAM) roles attached to the Lambda functions, so these roles only have permission to access the resources they need. You can use a pre-signed Amazon S3 URL to access S3 buckets. In this Guidance, these URLs come with a set expiration time of sixty minutes to protect resources from unrestricted access.
-
Reliability
This Guidance replicates data across Regions to allow for full redundancy in the standby Region. This multi-Region approach allows you to failover to another Region in disaster recovery scenarios. Within the Region, you can use retry logic and decoupled processing.
-
Performance Efficiency
We chose the services in this Guidance based on their abilities to reduce cost and complexity and enhance performance. You can test the Guidance with the provided example files and modify processes based on your specific use case.
-
Cost Optimization
This Guidance uses serverless services that allow you to pay only for the resources you consume during batch processing. With these services, your costs are directly associated to the number of processed items for each batch job.
-
Sustainability
The serverless and managed services scale to meet changes in demand. AWS handles the provisioning of the underlying resources. This helps you avoid provisioning unneeded resources.
Implementation Resources
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.