AWS Storage Blog

Automated storage cost optimization in CloudEndure Disaster Recovery

In a previous blog, I introduced several concepts on optimizing operational costs in CloudEndure Disaster Recovery. The main cost driver when using CloudEndure Disaster Recovery, offered by AWS, is Amazon EBS. A challenge arises when choosing the right volume type for the workload, across the multiples volumes that may be attached. This becomes a trial-and-error process, or requires monitoring the intensity of writes to determine low change rate volumes.

AWS follows the Amazon virtuous cycle model to reduce prices on services. The newly introduced cost optimization feature for CloudEndure Disaster Recovery provides a way to reduce operational costs while maintaining recovery point objectives (RPO).

In this blog, I focus on enabling the cost optimization feature in CloudEndure Disaster Recovery so that it automatically selects lower-cost EBS volumes based on replication requirements. This new feature dynamically changes the EBS volume type used in the staging area to a lower-cost volume that supports real-time asynchronous replication. As many replicated volumes have low input/output (I/O) utilization, you can take advantage of lower-cost volumes, such as Cold HDD (SC1).

Using cost optimization

CloudEndure Disaster Recovery uses either magnetic (<500 GB) or GP2 (>500 GB) EBS volumes as a default. Within the project replication settings, this is noted as Use fast SSD data disks. You can reduce costs by selecting Use slower, low-cost standard disks from the pull-down menu, which uses ST1 for all volumes greater than 500 GB. You can define additional granularity of the EBS volume types used in the machine level replication settings. Here, you can explicitly choose disk types on a per volume basis. This level of granularity provides the ability to use the right EBS volume type for the specific requirement of the volume. When you enable cost optimization, CloudEndure Disaster Recovery monitors replication reads and writes, calculates throughput, and dynamically sets the volume type to optimize costs. You can only apply this to volumes 125 GB and larger.

If the combined throughput is lower than the baseline throughput over the past 6 hours, CloudEndure Disaster Recovery dynamically changes volumes. Volumes revert to the default type in any of the following instances:

  • The write rate is consistently greater, or the daily peak rate is greater than the burst rate for more than 15 minutes.
  • The burst balance drops below 10%, either consistently or for a period greater than 15 minutes.
  • The initial synchronization of the I/O utilization exceeds requirements. The I/O utilization during the initial sync may be higher due to the read requirements during this process. Therefore, default volumes may be used during this time.

Enabling the cost optimization feature

Enable the cost optimization feature when setting up a new project. Select Cost optimize by dynamically changing disk type within the project replication settings. Project level settings only affect new agent installations.

Project replication settings - enabling CloudEndure Disaster Recovery cost optimization feature

Project replication settings

You can also enable the cost optimization feature for machines that already have an agent installed. Select Cost optimize by dynamically changing disk type within the machine replication settings.

Machine replication settings - enabling CloudEndure Disaster Recovery cost optimization feature

Machine replication settings

Finally, to support this feature, we made several changes to the AWS Identity and Access Management (IAM) policy used by CloudEndure Disaster Recovery. You can view the updated policy here. Specifically, the cloudwatch:GetMetricData and ec2:ModifyVolume actions were added to monitor and modify EBS volumes.

Conclusion

In this blog, I reviewed how to optimize storage automatically with CloudEndure Disaster Recovery to reduce your overall operating costs. As most disks have a low I/O utilization, there are opportunities to use lower performance EBS volume types for replication. This new feature is quick to implement and can have significant positive impacts on your EBS cost. As requirements for workloads vary, the ability to automate selection of the appropriate resource will not only reduce infrastructure costs, but operational costs as well.

Visit the CloudEndure Disaster Recovery page to get started and for case studies of customers that have shifted their recovery site to AWS. You can view additional best practices in the CloudEndure documentation.

Alex Berkov

Alex Berkov

Alex is the manager of the CloudEndure Solutions Architecture team. He joined AWS in early 2019 as part of the CloudEndure acquisition. Alex is focused on helping customers shift and operate their disaster recovery strategy in AWS. A native New Englander, Alex spends his time off with his family on the slopes during the winter and at the beach during the summers.