AWS Cloud Operations Blog
Resiliency Journey : exploring how AWS Resilience Hub and Migration Acceleration Program come together
In today’s rapidly evolving digital landscape, the cloud has become the backbone of innovation, scalability, and efficiency for businesses worldwide. As customers embark on their cloud migration journeys, whether the migration has been motivated by the intention of accelerating innovation, reducing operational and infrastructure costs, or exiting your on-prem datacenter, migrating to the cloud presents an important opportunity to increase the resilience of your critical applications. Resilience allows workloads, a collection of resources and code that delivers business value, such as a customer-facing application or backend process, to respond and quickly recover from failures.
Traditionally, the process of establishing thorough disaster recovery plans within on-premises environments has encountered numerous obstacles, including elevated costs, resource intensiveness, difficulties in scaling, and challenges in conducting effective testing. This frequently led to longer recovery times, necessitated manual interventions, and presented technological barriers. These issues often arose because organizations traditionally addressed disaster recovery plans in the later stages of their discussions and decision-making processes.
The shift to cloud infrastructure provides an opportunity for organizations to embrace a ‘shift left‘ mindset in their disaster recovery and resilience approach, and one key success factor is deciding about resilient architecture and testing early in the migration process. By adopting this approach, businesses can proactively avoid potential challenges, reduce costs, and establish a more resilient and robust cloud infrastructure.
As per the AWS Resilience shared responsibility model, we are responsible for the hardware, software, networking and facilities of our cloud services, providing a reliable Global Cloud Infrastructure, allowing customers to be better isolated and protected from issues such as natural disasters, power outages, and more. It also alleviates customers from simple outage related operations, such as replacing failed hard drives or network switches. Your responsibility as a customer, remains making conscious decisions on architecting your applications to leverage AWS Global Cloud Infrastructure, as well as operating and monitoring the applications behavior to ensure its meeting the business required availability levels.
This blog covers a mechanism to assessing your applications architecture for resilience, by validating its resilience posture at migration (or post migration) time, using AWS Resilience Hub. With the recent addition of Resilience Hub to the Migration Acceleration Program 2.0 included service list (MAP 2.0), customers can leverage the program benefits extending to Resilience Hub usage fees. (What’s new Resilience Hub)
Key building blocks for a resilient migration plan
The AWS Migration Acceleration Program (MAP) is a comprehensive and proven cloud migration program based on our experience in migrating thousands of enterprise customers to the cloud. MAP provides customers with tools that reduce costs and automate and accelerate execution, tailored training approaches and content, expertise from Partners in the AWS Partner Network, a global partner community, and AWS investment. MAP also uses a proven three-phased framework (Assess, Mobilize, and Migrate and Modernize) to help you achieve your migration goals.
During the mobilize phase, the key focus is on crafting a comprehensive migration plan and refining the business case. This phase involves addressing organizational readiness gaps identified in the assess phase. Emphasis is placed on establishing the baseline environment, often referred to as the “landing zone,” while concurrently enhancing operational readiness and fostering the development of cloud-related skills.
A strong migration plan starts with a deeper understanding of the interdependencies between applications, and evaluates migration strategies to meet your business case objectives. It must include steps to define the criticality of the application being migrated, so solutions architects can design the application in the cloud in a way that can withstand minor disturbances, and also to be fully recoverable in case of a catastrophic situation.
Integral to the planning process is the inclusion of Recovery Time Objective (RTO) and Recovery Point Objective (RPO) goals. These metrics serve as benchmarks, enabling solutions architects to design resilient applications that can endure disruptions while maintaining a defined level of recoverability. It’s crucial to consider that applications hold varying degrees of criticality and necessitate different availability levels. This often leads to the implementation of tiering systems, such as gold, silver, tier 1, tier 2, to meet diverse business requirements.
Furthermore, customers must articulate how they intend to operationalize the control of their organization’s resiliency requirements. Defining and enforcing processes, particularly in subsequent stages, ensures the verification of application teams’ adherence to established guidelines. The utilization of RPO and RTO to articulate system resilience requirements serves as an effective starting point in the resilience journey. Once defined for a specific application, configuring these objectives in Resilience Hub is recommended as the initial step towards ensuring a resilient architecture.
AWS Resilience Hub
AWS Resilience Hub gives you a central place to define, validate, and track the resiliency of your AWS application. AWS Resilience Hub helps you to protect your applications from disruptions, and reduce recovery costs to optimize business continuity to help meet compliance and regulatory requirements.
You can use AWS Resilience Hub to assess your infrastructure and get architectural recommendations to improve the resiliency of your applications. Further recommendations provide code (as AWS CloudFormation) for meeting your resiliency policy, implementing tests, alarms, and standard operating procedures (SOPs) that you can deploy and run with your application in your integration and delivery (CI/CD) pipeline. After you deploy an application into production, you can add AWS Resilience Hub to your CI/CD pipeline to validate every build before it is released into production.
Getting started
In this section, we’ll furnish a detailed, step-by-step guide outlining our perspective on the integration of Resilience Hub and the stages of the Migration Acceleration Program, as outlined in the following diagram (Figure 1):
In the Assess phase you:
- Define your resilience operational structure within your organization
- Create an initial backlog of business-critical applications that are going to be migrated, and the initial expectations of availability and criticality for those applications
In the Mobilize phase you:
- Refine your application criticality and tiering. Each tier or application must have clear defined RPO and RTO goals, aligned with business expectations of their applications availability.
- Start working with Resilience Hub to define your organization resiliency policies
The Well-Architected Framework – Migration Lens can be used as guidance to define and review the required resilience information during the Assess and Mobilize phases
As you navigate through the Resilience Hub console, you will find a placeholder to define your resilience polices (Figure 2), including RPO and RTO targets for application, infrastructure, Availability Zone, and regional disruptions.
To create policies in Resilience Hub, follow through the AWS Resilience Hub User Guide.
In the Migrate and Modernize phase you:
With the policies properly created in Resilience Hub, it’s time to add the applications as they are migrated. There are different methods for defining how an application should be reflected in Resilience Hub: CloudFormation stacks, AWS Resource Groups, AppRegistry applications, Terraform state files or Amazon Elastic Kubernetes Service (Amazon EKS) cluster resources. If you’re already utilizing the previously mentioned methods as part of the migration process, you will find adding new applications to Resilience Hub a simple process.
You can create a new application in Resilience Hub (Figure 3), and specify what resource collection type represents the application. Resilience Hub will automatically import all resources that are part of the application structure.
In case of large migrations, involving multiple applications, we encourage automating the process of adding applications to Resilience Hub. Building Guardrails will alert your cloud operations teams when applications that do not have a defined resilience posture are deployed in your AWS account. A good example of automation has been published in the aws-resilience-hub-tools github repository, demonstrating ways to integrate Resilience Hub with Amazon CodePipeline, Amazon EventBridge, Resource Groups, Jenkins and GitHub actions.
If you participate in the MAP, and signed the MAP term after September 28th, 2023, AWS Resilience Hub is part of the included service list, allowing you to extend the program credits to the Resilience Hub related service fees. Ensure you add the correct tag key/value combination (Figure 4) for your MAP agreement while creating applications in Resilience Hub, and the MAP credits will be applied to your account.
Conclusion
Assessing your resilience posture at migration time, allows you to quickly correct details in application architectures to meet the planned RPO and RTO definitions. It can reduce risk and the overall reengineering cost of improving the resilience of an application prior getting in production.
If you are currently going through a cloud migration, we recommend leveraging AWS Resilience Hub as part of your migration journey. Resilience Hub can be integrated into your CI/CD pipeline to validate the resiliency of your applications right after deployment. Take a proactive approach to building resilience into your applications from the start – it will pay dividends in the long run.
To get started with Resilience Hub, review the AWS Resilience Hub User Guide to define your resiliency policies. Then, integrate Resilience Hub into your migration process using the automation examples in the aws-resilience-hub-tools GitHub repository.
This will help your migrated applications meet your defined resilience requirements, and your operation teams have the standard operational procedures, monitoring and alerting in place to support the applications and react in a timely fashion, reducing the overall time to recover when your application is having a bad day.