AWS Storage Blog

Ensure workload resilience with AWS Elastic Disaster Recovery and Arpio

Planning for disaster recovery (DR) is a critical component of any IT operations practice. Applications that run on AWS benefit from the reliability that is built into the platform. However, they can still be impacted by from natural disasters, technical failures, and accidental or malicious human actions. Consequently, implementing best practices around disaster recovery is an important element of the reliability pillar of the AWS Well-Architected Framework.

AWS Elastic Disaster Recovery (DRS) is a block-level replication DR solution for physical and virtual workloads that aids in accelerating and automating DR failover. For AWS workloads, DRS can replicate data stored on Amazon Elastic Compute Cloud (EC2) instances to a disaster recovery environment in real-time, so that no data is lost during a failover.

For workloads running in the cloud, multiple AWS services beyond Amazon EC2 may also need to be recovered. Arpio is an AWS partner solution that orchestrates disaster recovery of many core services in AWS, including AWS Identity and Access Management (IAM) and Amazon Virtual Private Cloud (VPC). Arpio uses the cloud-native backup, replication, and recovery mechanisms required for the data and infrastructure of each service. For Amazon EC2 workloads, Arpio complements DRS by ensuring replication and recovery of dependent and adjacent services utilized by the Amazon EC2 workload.

In this post, we discuss how Arpio and Elastic Disaster Recovery deliver a disaster recovery solution for AWS workloads that is both comprehensive and easy to implement, helping you ensure the resilience of your AWS workloads.

Integrated recovery of infrastructure and data

DR of an entire AWS workload requires more than the ability to recover the data on which the workload depends. The entire cloud environment must be recreated in the recovery environment. In a recovery scenario, Arpio’s capability to backup and restore infrastructure complements Elastic Disaster Recovery’s ability to restore Amazon EC2 instance data while ensuring low Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).

At the network level, Arpio replicates the full configuration of your Amazon VPC, ensuring that subnets, routing, gateways, endpoints, and elastic IPs are replicated identically to their primary environment configuration. By default, Arpio will replicate the private IP address space within these elements so that the recovery environment is an exact replica of the primary environment. If you need to use alternate IP addresses in the recovery environment, that can be configured.

At the security and access-management level, Arpio replicates network access control lists (ACLs) and security groups, as well as IAM entities such as instance profiles, IAM roles, and IAM managed policies. All IAM policies are translated during the replication process so that they are appropriate for the recovery environment.

This replicated infrastructure forms the foundation of the environment into which Amazon EC2 instances and other AWS services can be recovered. Once the underlying infrastructure is recovered, Arpio can use Elastic Disaster Recovery to restore the Amazon EC2 instances.

The configuration of these restored Amazon EC2 instances is driven by an EC2 launch template that Elastic Disaster Recovery maintains for each Amazon EC2 server in the primary environment. Arpio first updates this Amazon EC2 launch template to have the appropriate settings to create a new server that is an identical clone of the primary environment server. These settings include instance type and size, the instance’s private IP address, and the Amazon Elastic Block Store (EBS) volume configuration of all replicated volumes. Arpio then requests that Elastic Disaster Recovery launch the recovered server.

Amazon EC2 instances can be launched in parallel or in a coordinated sequence based upon server-level dependencies. Arpio is configured to understand server dependencies and launches them accordingly. Once Elastic Disaster Recovery has recovered all the Amazon EC2 instances, then Arpio can recover other resources that depend upon these servers. Elastic Load Balancing (ELB) and their target groups can be recovered to sit in front of these servers as necessary to serve traffic.

The following shows a screenshot of the Arpio console:

Arpio console

Automated DR of adjacent services

Many cloud workloads store data in other managed services, and they often rely on compute services that extend beyond Amazon EC2 instances. These resources also need to be recovered during a disaster.

Arpio’s disaster recovery capabilities extend beyond Amazon EC2 to recover managed data services like Amazon Simple Storage Service (S3), Amazon Elastic File System (EFS), Amazon Relational Database Service (RDS). Arpio replicates and recovers managed compute services such as Amazon EC2 Auto Scaling and AWS Elastic Beanstalk, as well as containerized compute resources including Amazon Elastic Container Service (ECS), AWS Fargate, and Amazon Elastic Container Registry (ECR). Arpio can also recover serverless workloads that rely on AWS Lambda.

Conclusion

Responsibly operating any IT infrastructure, including workloads that run in AWS, requires thoughtful investment in DR to ensure that no natural disaster, infrastructure failure, or malicious actor can interrupt service. In this post, we detailed how Arpio and Elastic Disaster Recovery deliver an integrated solution that is easily implemented to protect a complete AWS environment. Arpio is a SaaS solution available in the AWS Marketplace.

If you have any comments or questions, you can add them in the comments section.

Alex Berkov

Alex Berkov

Alex is the manager of the CloudEndure Solutions Architecture team. He joined AWS in early 2019 as part of the CloudEndure acquisition. Alex is focused on helping customers shift and operate their disaster recovery strategy in AWS. A native New Englander, Alex spends his time off with his family on the slopes during the winter and at the beach during the summers.

Doug Neumann

Doug Neumann

Doug is the co-founder and CEO at Arpio, where his team is eliminating the undifferentiated heavy lifting associated with disaster recovery for cloud-native workloads. Arpio's technology helps companies like SAS, Finnair, and Scotts/Miracle-Gro mitigate disaster risk, streamline compliance, and maximize service levels by fully automating the backup and disaster recovery process for their AWS environments. Before Arpio, Doug led software engineering teams building highly resilient cloud architectures at Bandwidth and Microsoft.