AWS Partner Network (APN) Blog

Meet Your Recovery Time Objectives with Druva and AWS

By Girish Chanchlani, Storage Partner Solutions Architect at AWS
By Peter Elliman, Director of Product Marketing at Druva

Druva Logo-2
Druva-APN-Badge-5
Connect with Druva-2

When choosing between a software-as-a-service (SaaS) backup and recovery solution versus an on-premises one, the most common question that comes up is about the speed of recovery.

Since they host backup data in the public cloud and not in local data centers, can SaaS solutions be used to meet customers’ business-critical recovery time objective (RTO) needs?

To find out, we ran a series of data restore tests with Druva’s SaaS backup and recovery solution for data center workloads: Druva Phoenix. In this post, we’ll walk through the results so you can get an idea of the speed of recovery possible from a cloud backup and recovery solution.

Druva is an AWS Advanced Technology Partner with an AWS Competencies in Storage, Government, and Digital Workplace. Druva is a 100 percent SaaS platform built on Amazon Web Services (AWS) that provides data protection and management across endpoints, data centers, and cloud workloads.

Druva Phoenix protects file systems and applications, including Windows and Linux servers, VMware and Hyper-V based virtual machines (VMs), network-attached storage (NAS) filers, SQL and Oracle databases.

It also protects cloud-based file systems and workloads such as Amazon Elastic File System (Amazon EFS), Amazon FSx for Windows File Server, and Amazon Elastic Compute Cloud (Amazon EC2) instances.

About Druva Phoenix Backup and Recovery Solution

Druva’s Phoenix solution is designed to protect on-premises and cloud-based workloads. As shown in Figure 1, it has two main components:

  • An agent that’s installed on applications that need to be protected.
  • Druva’s backup and recovery service running on AWS.

At backup time, the agent reads the data from the source application, compresses and deduplicates it, and sends it to Druva’s service where the data gets stored on Amazon Simple Storage Service (Amazon S3).

This flow reverses during a restore operation, where the backed up data is sent to the agent, which then restores the data back to the application.
.
Druva-Phoenix-1.1

Figure 1 – Druva Phoenix backup and recovery solution.

Druva’s backup and recovery service is available in 15 AWS regions today, including AWS GovCloud (US). Customers can easily elect to use the service in their region of choice during the configuration process.

Backup and Recovery Setup and Tests

The main objective of these tests is to measure the restore performance from Druva’s service over different distances. To achieve that, we backed up file-based data from an Amazon EC2 instance running in one AWS region, and tested restores, over the public internet, to EC2 instances running in three AWS regions.

For each of these three AWS locations, we also introduced network packet loss conditions to artificially simulate latency. This was to simulate restores going back to customers’ on-premises environments that could be connected over less reliable or low bandwidth networks.

To simulate network latency during restores, we injected packet loss at the Linux driver level using the traffic control (tc) netem command. We chose packet losses of 1 percent and 5 percent. For example, we used this command to inject a packet loss of 1 percent:

tc qdisc add dev ens5 root netem loss 1%

We backed up around 880 GB of file data from the instance in US-East-1 (N. Virginia) region to Druva’s backup and recovery service running in the same region. Files varied in sizes from 1 KB to 16 MB.

File data was mostly unique within the backup set to eliminate the impact of deduplication. We restored this data back to instances running in US-East-1 (N. Virginia), US-East-2 (Ohio), and US-West-2 (Oregon).

We artificially introduced network loss of 1 percent and 5 percent on those instances.

Restore instance characteristics were:

  • Amazon EC2 instance of type C5.9xlarge running Ubuntu 16.04 server version.
  • 10 Gbps network bandwidth.
  • 3, 1TB, IO1 Amazon Elastic Block Store (Amazon EBS) volumes, each providing 3000 IOPS, stripped in a RAID 0 configuration—this was used as the data drive.
  • Restore hosts were intentionally given beefy configurations to remove any contention and ensure the best possible RTO.

Druva’s Phoenix agent was installed on all instances to enable backups and restores.

These were the baseline restore test results with no injected packet loss:

Backup Data Storage Location Recovery Target Packet Loss Data (GB) Duration Restore Throughput (GB/hr) Avg. Ping to Source Time (ms)
US-East-1 US-East-1 0% 882 0:25:49 2,050 0.96
US-East-1 US-East-2 0% 882 0:29:37 1,787 43
US-East-1 US-West-2 0% 882 2:09:42 408 68.66

The results indicate that distance between the source and destination—and hence network latency—plays a big part in determining the restore throughput. As the distance increases, restore throughput goes down.

When we introduced packet loss of 1 percent and 5 percent for each of the restore tests, we got these results:

Backup Data Storage Location Recovery Target Packet Loss Data (GB) Duration Restore Throughput (GB/hr) Avg. Ping to Source Time (ms)
US-East-1 US-East-1 0% 882 0:25:49 2,050 0.96
1% 0:25:20 2,089 0.95
5% 0:26:25 2,008 0.94
US-East-1 US-East-2 0% 882 0:29:37 1,787 43
1% 0:30:49 1,717 43
5% 0:32:06 1,649 43
US-East-1 US-West-2 0% 882 2:09:42 408 73
1% 2:15:00 391.43 73
5% 2:29:55 353.09 73

The impact of latency simulation is most apparent in the restore tests to instances in US-East-2 and US-West-2. In these test results, we discovered that latency impacts restore throughput by 10-30 percent, and thus the duration of recovery.

Druva-Phoenix-2

Figure 2 – Graph of baseline restore test results with 1 %/ 5 % injected packet loss.

From the tests, it can be seen that the best restore performance is observed when the restore destination is in the same region as the one hosting the backups. As this distance increases, restore throughput goes down. We see an additional drop in restore throughput as network latency increases.

When choosing a SaaS-based data protection solution, what most determines the speed of recovery—and, hence, the recovery time objective (RTO)—is the region chosen for storing the backups.

For critical applications with low RTO requirements, choose the service running in the AWS region closest to those applications for faster performance. For example, if your production site is located on the US East Coast, choose Druva’s service running in US-East-1, as the service in other regions may not provide optimal recovery times due to the physical distance.

Other factors that impact restore performance are backup data type, network latency, network bandwidth and restore host machine characteristics.

For protecting on-premises applications, one way to improve network connection speed is to use a service such as AWS Direct Connect. With this service, you can establish private connectivity between AWS and your data center, office, or colocation environment. In many cases, this can reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections.

Conclusion

SaaS-based data protection solutions offer compelling value in providing customers a simplified and easy to manage solution, often at an attractive price point when compared to solutions deployed on-premises.

They can be used for meeting a wide range of RTO requirements, from recovering mission critical applications with RTO in minutes, to non-critical applications. Carefully consider factors such as the region hosting the backups, network latency and bandwidth, restore host characteristics, backup data type, and others, when using these solutions to meet your data management needs.

Learn more about Druva’s solutions and read one of the Druva on AWS customer success stories:

We encourage you to contact the AWS or Druva teams to talk about moving backup to the cloud. With a free trial, you can get started in under 15 minutes.

.
Druva-APN-Blog-CTA-1
.


Druva – AWS Partner Spotlight

Druva is an AWS Competency Partner and SaaS platform that provides data protection and management across endpoints, data centers, and cloud workloads.

Contact Druva | Partner Overview | AWS Marketplace

*Already worked with Druva? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.