AWS Storage Blog

Achieve highly available and durable database backup workflows with Amazon EFS

Deciding what storage to use as part of your database backup and restore workflows requires considering multiple factors: from format compliance, to scalability, to availability and durability of data, to costs. Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. Amazon EFS is an effective solution for backing up database workflows. It offers fully managed POSIX-compliant file storage that scales to petabytes on demand for changing backup sizes. It also sets the bar with the availability and durability needed for restores at an effective price of just $0.08/GB-month*. In this blog, I share some considerations and best practices for using EFS in your database backup workflow.

*pricing in US East (N. Virginia) Region, assumes 80% of your storage in EFS Infrequent Access

Why should I consider using Amazon EFS for my database backup workflow?

Database workflows often benefit from POSIX-compliant file storage for backups. The storage must scale up and down to changing needs and must be highly available and durable to support restores in case of an emergency. Amazon EFS provides all these capabilities without having to provision and manage infrastructure at effective prices of pennies per GB-month.

Storage Interface

First, you should consider the type of storage interface that works well with your database. Many customers use Amazon EFS for database backups, because most backup software has standard support to write backups to files. This means you can simply mount your EFS file system to your database and backup to a path on EFS. In addition, most databases have default support for POSIX-compliant storage. EFS provides POSIX-compliant shared file storage, while other storage options may require creating specialized code to enable backups and restores.

Availability and durability

You should also consider the availability and durability you need for your backups. We often see that customers are more likely to restore data within the first few weeks of backing up. Therefore, having a reliable, yet performant target is critical. While the time frame can vary across businesses, we have seen many workflows with a two to four-week retention policy for data in Amazon EFS. If data is needed, it is often needed quickly for disaster recovery purposes, which means your storage should be highly available and durable. EFS provides high availability and durability by storing data within and across multiple Availability Zones. The zones are fully isolated partitions with their own power infrastructure and are physically separated by a meaningful distance. That distance is many kilometers from any other zone, although all zones are within 100 km (60 miles of each other). All Availability Zones are also interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber providing high-throughput, low-latency networking between zones. By storing across multiple zones, data is protected from issues that can impact single data centers.

Size of backups

In addition, since the size of backups can vary over time, the elastic scalability of storage is important for many businesses’ ability to respond to on-demand needs and optimize cost. Amazon EFS can scale elastically on demand to petabytes without disrupting applications. You can grow and shrink your usage automatically as you add and remove files, eliminating the need to provision and manage capacity. In addition, this provides the added benefit of eliminating the management complexity of storage volumes and paying for only what you need.

Cost

As a next step in the workflow, many businesses opt to move backups to a less frequently accessed storage option to optimize cost once the data is less likely to be needed. Some workflows may stop here or further optimize costs based on data retention policies. We see some workflows that move backups to colder tiers of storage due to data retention policies that are years in length. For example, regulated businesses may be required to keep data for up to several years. Amazon EFS Infrequent Access (EFS IA) is a storage class that provides the price/performance that is cost-optimized for files not accessed everyday, with storage prices up to 92% lower compared to Amazon EFS Standard. The EFS IA storage class costs only $0.025/GB-month*. (*pricing in US East (N. Virginia) Region). Transitioning data to EFS IA is easy with EFS Lifecycle Management policies. You simply select a lifecycle policy that matches your needs, and EFS will automatically move your files from the EFS Standard storage class to the lower-cost EFS IA storage class.

A lifecycle policy defines the amount of time to keep data in Amazon EFS Standard based on when that file was last accessed. If you don’t access the file for the given time period, it is moved to the EFS IA storage class. EFS transparently serves data from both storage classes, so you don’t have to worry about managing in which storage class your data is located and can receive automative cost savings for older backups. While other storage options are available to optimize costs, they may not provide a single transparent POSIX interface and require creating and maintaining special code to move backups to other storage tiers and to make them accessible for restores.

Can I encrypt my data?

With Amazon EFS, you can encrypt both data in transit and data at rest. You can enable data at rest encryption when you create your file system using either an AWS managed or customer managed customer master key (CMK). Encryption of data at in transit is configured on a per-connection basis.

Which performance mode should I use?

Amazon EFS has two performance modes: General Purpose and Max I/O. We see customers using both options. However many databases use a single client for backup and restore operations and target lower metadata operations, which aligns well with General Purpose (GP) mode.

Which throughput mode should I use?

When considering which Amazon EFS throughput mode to use, you should look at the amount of data you will be backing up or restoring and the anticipated frequency. In most cases, we see customers using Amazon EFS in the standard Bursting Throughput mode, since they are typically doing backups only once or twice a day, and using a single client, which can support up to 250 MB/s. With Bursting Throughput mode, throughput on Amazon EFS scales as a file system data stored in the Standard storage class grows. Backups are typically spiky, driving high levels of throughput for shorter periods of time, and low levels of throughput the rest of the time. To accommodate this, Amazon EFS is designed to burst to high throughput levels for periods of time.

Amazon EFS uses a credit system to determine when file systems can burst. Each file system earns credits over time at a baseline rate that is determined by the size of the file system that is stored in the standard storage class. A file system uses credits whenever it reads or writes data. The baseline rate is 50 MiB/s per TiB of storage (equivalently, 50 KiB/s per GiB of storage). If you want to get the most out of your file system you can provision 250 MB/s for an additional charge with Provisioned Throughput (PT), which is the amount a single client backup can drive. If your storage allows you to drive greater than 250 MB/s, then you will not be charged extra for the PT.

In summary, Amazon EFS provides an effective solution for backing up database workflows. It offers fully managed POSIX-compliant file storage that scales to petabytes on demand for changing backup sizes. It also has the high availability and durability needed for restores. Best of all, this solution provides effective prices of pennies per GB-month without having to provision and manage infrastructure. Leave a comment telling us about how you have used Amazon EFS, whether it be for database backup workflows or otherwise!