Use cases for Amazon EFS Replication

Organizations in regulated industries are frequently required to maintain a copy of their business-critical file data in another location to meet corporate compliance or disaster recovery (DR) mandates. Furthermore, in highly distributed environments, applications need a reliable method to move or copy their file data to a different location for low latency access or to use a copy of the file data for development or testing.

Amazon EFS is a serverless, fully elastic file system that provides high levels of durability and availability for your workloads and applications, including big data and analytics, media processing workflows, content management, web serving, and home directories. With the announcement of Amazon EFS Replication last year, you can use Amazon EFS replication to create a replica of your Amazon EFS file system in the AWS Region of your preference.

In this blog, we go over the different use cases that require copy of data and demonstrate how you can use Amazon EFS Replication, along with monitoring and cost optimization strategies, to satisfy these needs. If you want to learn how to configure replication, review the blog “New – Replication for Amazon Elastic File System (EFS)” and the Amazon EFS Replication documentation.

Overview of common use cases

Let’s review some of the scenarios where you might want to consider using Amazon EFS replication.

Disaster recovery (DR)

Establishing the correct recovery objective targets at an application level is a critical part of business continuity planning. Recovery Time Objective (RTO) is the maximum acceptable delay between the interruption of service and restoration of service. This determines what is considered an acceptable time window when service is unavailable. Recovery Point Objective (RPO) is the maximum acceptable amount of time since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service.

As part of the DR strategy, you may need to plan for an event when data is unavailable to applications and users. Even though Amazon EFS is a highly durable and available file system, you may have a need to securely create a copy of your critical data in a different AWS Region or an Availability Zone (AZ). Amazon EFS Replication allows you to natively create a copy of your file system in an AWS Region or AZ of your choice. Amazon EFS Replication automatically and transparently copies your data from your source file system to the destination and can maintain an RPO of 15 minutes for most file systems. In the event of downtime, you can failover your application, for example from US-EAST-1 to US-WEST-2. Within minutes, your destination file system is made available in WRITEABLE mode allowing you to continue to run your application with minimal interruption. For more information about monitoring when the last replication successfully finished, see the Monitoring replication status section in this blog and failover documentation.

Figure 1: Disaster recovery between regions

Compliance

In addition to your DR strategy, your data handling needs to follow the compliance laws that apply in your geographic region like GDPR, HIPAA, and SEC. These laws often mandate maintaining a copy of your data in a secondary location that is hundreds of miles away from the primary location. You can user Amazon EFS Replication for such use case to copy your data to a Region of your choice.

Figure 2: Replication for compliance needs

Data migration

As DevOps or database admins, you may receive requests to create temporary data stores for your development teams or you may be upgrading or refactoring your apps that require the same dataset. You may need to create a copy of a Git repository, an unstructured data set, or your database backups that are stored on your Amazon EFS file systems into a sandboxed environment such as dev or alpha. Amazon EFS Replication aids in migrating your data or creating copies of it in as little as two clicks with no management overhead. Once the data migration is complete, you can delete the replication and use your replica file system independent of your source. Finally, when you have completed your testing, you can even delete the replica and create a new copy when necessary.

Migration of data from one instance to another

Figure 3: Migration of data from one instance to another

Data distribution

Certain workloads require you to create a read-only copy of your data for your users or applications to consume in a different region. You may have a Machine Learning (ML) application that uses a large data set to train and build a model which is then distributed for read-only consumption, or you may have critical data that needs to be distributed to a different location for analysis. You can use Amazon EFS Replication to fulfill this need because while replication is enabled, your replica file system is available for read only access. Your applications and users can not only access the data in a different region in read only mode, but they will automatically receive updates to it. Since the replica file system is in read only state, you can rest assured that the data cannot be updated and its integrity is maintained.

Data Distribution

Figure 4: Data Distribution

Data co-location

Some critical workloads and applications that are extremely latency sensitive. Sometimes, the data that these applications require is physically stored in a different region thousands of miles away. These applications may not tolerate the longer network round-trip (RTT) for each request. In this case, you can use Amazon EFS Replication to create a copy of your data in an AWS Region closer to the application to reduce the network RTT and take advantage of sub-millisecond read latencies and simultaneous access from thousands of clients that Amazon EFS already provides. You can use this strategy in instances where you are designing applications to avoid cross-Region data transfer charges and would like to have a local copy for read only purpose.
Low latency file access

Figure 5: Low latency file access

Monitoring replication status

Amazon EFS offers options to determine when the last successful sync occurred for a given replication configuration. This time stamp is important to understand if RPO needs are met or not. Any changes to data on the source file system that occurred before this time have been successfully replicated, and any changes after this time might not be fully replicated. You can do this via the AWS Management Console, AWS Command Line (CLI), AWS API, or Amazon CloudWatch. For more details refer to the documentation monitoring replication status.

Check last sync time

You can check the last sync time through the AWS Management Console, AWS CLI, or Amazon CloudWatch.

AWS Management Console

Sign in to the AWS Management Console and search for EFS in the search bar and select the service.
In the left corner, select File systems. Select your file system Name radio button, and then select View details on the top. Then select the Replication tab and check Last synced.

Last Synced’ time through AWS Management Console

Figure 6: Last Synced’ time through AWS Management Console

AWS CLI

To do this with the AWS Command Line (CLI), run the following command and replace <fs_id> with your file system id. You can refer the documentation for more details.

aws efs describe-replication-configurations -–file-system-id fs-0914c772

LastReplicatedTimestamp’ time through AWS CLI

Figure 7: LastReplicatedTimestamp’ time through AWS CLI

The LastReplicatedTimestamp property in the Destinations object shows the time that the last successful sync was completed. DescribeReplicationConfigurations is the equivalent API operation.

Amazon CloudWatch

In Amazon CloudWatch, the TimeSinceLastSyncCloudWatch metric for Amazon EFS shows the time that has elapsed since the last successful sync was completed. For more information, see Amazon CloudWatch metrics for Amazon EFS.

Cost optimization

Cost becomes an important element when planning for replication of your data set. There are several ways to optimize cost , let’s take a look at some of the most common strategies:

Amazon EFS lifecycle management

Using Amazon EFS lifecycle management is a common cost optimization strategy for Amazon EFS deployments. It automatically migrates your files into cost effective EFS Standard-Infrequent Access (Standard-IA) or One-Zone-Infrequent Access (One Zone-IA) storage classes after a set period of time. You define that period of time by using the Transition into IA lifecycle policy.

You can modify Amazon EFS lifecycle management settings at any time using either the AWS Management Console or AWS CLI. For more information, refer to the documentation.

Lifecycle management

Figure 8: Lifecycle management

You can configure lifecycle management for both source and target file system independently. For example, in the DR or Compliance use case, the target data will typically not be accessed unless you are performing gameday testing or there is an outage. For those use cases, you can achieve cost efficiency by enabling lifecycle management on the target file system with the shortest possible age-off policy. Please note that there is per GB transfer charge for accessing data in Infrequent Access class as noted in EFS pricing. In example below, both source and target file system are assumed to on Standard storage class with lifecycle policy setting as show in the table below.

	Source (US-EAST-1)	Destination (US-WEST-2)
Total storage	100 TB	100 TB
Storage in Infrequent Access	80 TB*	95 TB*
Blended storage price	$0.08/GB-month	$0.039/GB-month
Lifecycle management policy	30 days	7 days
Estimated monthly storage cost	$8,192	$3,968
Cost savings	Up to 52%

* Assumes 80% of files in Infrequent Access for source and 95% for destination

One Zone storage classes

The second cost saving strategy involves using One Zone classes for your destination file system. Amazon EFS Replication allows you to configure different storage class for source and destination file systems. You can leverage this capability for use cases in which you can create a One Zone file system at the target site. As illustrated in the example below, by using One Zone file system as your destination file system in addition to using short age-off policy like above, you can save up to 75% in costs. Example use cases may be when target site is serving read only data or has a copy of data for reducing latency to local users.

	Source (US-EAST-1)	Destination (US-WEST-2)
Total storage	100 TB	100 TB
Storage in Infrequent Access	80 TB*	95 TB*
Blended storage price	$0.08/GB-month	$0.021/GB-month
LCM policy	30 days	7 days
Estimated monthly storage cost	$8,192	$2,113
Cost savings	Up to 75%

* Assumes 80% of files in Infrequent Access for source and 95% for destination file systems

Conclusion

In this blog post, we’ve covered use cases for compliance, DR, data migration, data distribution & data co-location where you are looking to maintain a copy of your critical data using Amazon EFS Replication. We demonstrated how Amazon EFS Replication can meet the needs of maintaining a copy, outlined sample design patterns, demonstrated how to monitor status of your replication through AWS CLI, AWS Management Console, and Amazon CloudWatch, along with cost saving strategies. These approaches allow you to meet your business needs while improving data accessibility and availability.

Thanks for reading this blog post. If you have any comments or questions, feel free to leave them in the comments section.