Scheduling automated backups using Amazon EFS and AWS Backup
Using a shared file system is an important component for many computing infrastructures. For Linux systems, this is typically done using a network file system (NFS) and mounting it from the Linux hosts. Users can store data in their home directories and can share data with other users across the file system. Amazon Elastic File System (Amazon EFS) is a cloud-native service that can be used for this purpose for your Linux hosts.
While Amazon EFS provides a highly available and durable backing store for your files, it is also useful to create backups of the file system in case files are inadvertently deleted, or if prior versions of a file need to be restored. You can use AWS Backup to centrally manage and automate backups of your Amazon EFS file system and to restore recovery points.
This post will walk you through launching a CloudFormation stack that will create an Amazon EFS file system and an AWS Backup vault to securely store Amazon EFS backups. It will then guide you through taking an EFS file system backup and then restoring the backup from a specific recovery point.
Amazon EFS provides a simple, scalable, fully managed elastic NFS file system for running in AWS as well as on-premises resources. It is built to scale elastically to exabytes on-demand, growing and shrinking automatically as you add and remove files, eliminating the need to provision and manage capacity to accommodate growth.
EFS is well suited to support a broad spectrum of use cases. This includes highly parallelized, scale-out workloads that require the highest possible throughput, to single-threaded, latency-sensitive workloads.
Consider these Amazon EFS use cases:
- Lift-and-shift enterprise applications
- Big data analytics
- Web serving and content management
- Application development and testing
- Media and entertainment workflows
- Database backups
- Container storage
AWS Backup is a fully managed backup service that makes it easy to centralize and automate the protection of data across AWS services in the cloud as well as on premises using the AWS Storage Gateway. Using AWS Backup, you can centrally configure backup policies and monitor backup activity for AWS resources, such as Amazon EBS volumes, Amazon RDS databases, Amazon DynamoDB tables, Amazon EFS file systems, and AWS Storage Gateway volumes. AWS Backup automates and consolidates backup tasks previously performed service-by-service, removing the need to create custom scripts and manual processes.
Linux users often share a common file system on-premises or in the cloud. Third-party solutions exist for this, but they typically rely on a single file server or an infrastructure of file servers that need to be managed and maintained. Instead, we will use a fully managed service for this purpose and will only pay for the actual storage that we use.
This solution will create an encrypted Amazon EFS file system in an AWS region that you specify. Amazon EFS mounts will be created in each of the subnets in your existing Virtual Private Cloud (VPC). Your EC2 instances will access Amazon EFS mount points in their availability zone, which reduces the latency between your instances and the file system. The file system will also include a lifecycle rule that transitions files to a lower cost storage tier after a defined period of time (ranging from seven days to ninety days) if they haven’t been accessed during that period. After the file system is provisioned, you will be able to mount it from your EC2 instances.
The solution will also create a backup vault using AWS Backup. The backup vault serves as the repository for your Amazon EFS backups. The vault includes a resource policy that prevents the vault itself (which contains the backups) from being deleted, inadvertently or otherwise. Along with AWS Identity and Access Management (IAM) policies, the resource policy allows you to define who has the ability to delete the vault. The solution will also include a scheduled backup plan with the following characteristics:
- Daily backups that will be retained for 35 days.
- Weekly backups that will be retained for 90 days.
- Monthly backups that will be retained for 5 years.
The monthly backups will be moved to cold storage, where they will incur a reduced price. (Please see this documentation for pricing information for your region.)
You can change these values based on your own needs by modifying the FileSystemBackupPlan section in the CloudFormation template.
This scheduled backup plan applies to the entire file system, but you can modify the scope of the file system backup by modifying the FileSystemBackupSelection section in the CloudFormation template.
Before launching the solution, you will need an AWS account with permissions to create an Amazon EFS file system and mount targets (please see documentation here); a backup vault, backup plan, and backup selection; and an EC2 security group. To test the solution, you will also need an EC2 instance from which to attach an Amazon EFS mount target.
Launching the solution
To deploy the solution, click the following Launch Stack button to launch the stack. After you click the button, you must sign in to the AWS Management Console if you have not already done so. Enter the parameters that are appropriate for your environment.
|VpcId||The VPC in which the file system will reside.|
|IngressCidrBlock||10.0.0.0/16||Ingress IP address CIDR of SOURCE resources for Amazon EFS ingress security group rule. You can scope this down to be more specific to the IP address range of your instances.|
|Number of Private Subnets in VPC||2||This value will determine how many mount points to create and MUST match the number of private subnets in the “Private Subnets” parameter.|
|Private Subnets (ONLY ONE PER AVAILABILITY ZONE)||The list of PRIVATE subnets in which to place the mount targets. EACH SUBNET *MUST* BE IN A SEPARATE AVAILABILITY ZONE.|
Once the stack creation is complete, navigate to the EFS console to view the results.
Similarly, navigate to the AWS Backup console and view the properties of the newly provisioned vault.
Note in particular the Access policy. As an administrator, you would want to ensure that the ability to delete backup vaults is limited to users who have privileged access.
Mounting the File System
Now that you have a file system, let’s access it from an EC2 Linux instance. You can use SSH for this; I prefer to use AWS Systems Manager Session Manager to remotely access instances.
1. If you’re using an Amazon Linux EC2 instance, run the following command to install the EFS mount helper (see these instructions if you’re not using Amazon Linux):
sudo yum install -y amazon-efs-utils
2. Create a new directory in your EC2 instance, such as “efs”:
sudo mkdir efs
3. Mount the file system using the TLS mount option to encrypt the data in transit:
sudo mount -t efs -o tls fs-XXXXXXXX:/ efs
Replace the fs-XXXXXXXX with your file system ID. You can find this in the output of your CloudFormation stack or from the EFS console.
Please note that if you’re mounting the file system from another VPC (such as through a Transit Gateway or through VPC peering), then you’ll have to specify the mount target IP address for the target in the respective availability zone, like this:
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,n oresvport X.X.X.X:/ efs
Please see this documentation for more information on mounting a file system from another VPC.
4. Change to the efs directory and start writing some test files. These files will persist even if you unmount the file system. You can verify this by unmounting and remounting the file system, or by mounting the file system from a different EC2 instance.
The file system will be backed up on the schedule that we specified earlier. You can always take an on-demand backup of your file system outside of the schedule. To do this, navigate to the AWS Backup console and select Protected Resources. Click the Create on-demand backup button and select the options for your EFS file system.
5. To restore a backup, navigate to the AWS Backup console, select Protected Resources, and locate and click on your file system ID. If you have any backups at this point (either by creating one on-demand or by scheduling it), then you see the respective recovery points listed here. Select the one you want (perhaps by the designated date) and click the Restore button.
You can elect to restore the backup in the existing file system or into a new file system. Restoring it into the existing file system will create a directory in the file system root with the recovery point ID as the directory name.
To clean up this solution, follow these steps:
- Navigate to the CloudFormation console and delete the stack. Everything except for the Backup vault will be deleted; this is to ensure that your backups aren’t deleted by accident. To delete the vault, navigate to the AWS Backup console, click the Backup Vaults link, and select your vault.
- In the Access policy panel, click the Delete Policy
- Click the Delete button to delete the entire vault, including all backups.
In this post, I demonstrated how AWS Backup can complement Amazon EFS to create multiple backups of your file systems. The solution takes advantage of several cost-optimization features, including:
- Amazon EFS lifecycle rules that transition data to a lower-cost storage tier after thirty days
- AWS Backup lifecycle rules that delete daily backups after thirty-five days
- AWS Backup lifecycle rules that delete weekly backups after ninety days
- AWS Backup lifecycle rules that transition monthly backups to a lower-cost storage tier after ninety days and deletes them after five years
Thanks for taking the time to read this blog post and learn more about Amazon EFS and AWS Backup. Drop any comments or questions you may have below!
If you would like to read more about AWS Backup, check out this other post on the AWS Storage Blog: “Protecting your data with AWS Backup.”