AWS Storage Blog
Back up your on-premises applications to the cloud using AWS Storage Gateway
Data continues to grow exponentially worldwide, and protecting and preserving data is critical for all organizations. Some of our customers are engaged in multi-year journeys to the cloud, and they continue to manage and maintain on-premises storage infrastructure in a hybrid cloud model. For data protection and to meet long-term retention requirements, many organizations spend considerable time and resources to manage and maintain on-premises physical tape infrastructure, backup infrastructure, and off-site storage. With the need to preserve data for business and/or regulatory compliance, customers often run into storage capacity challenges on premises. This often delays the ability to meet new demands, hinders agility, and ultimately results in potential risks to data protection strategies.
Customers often ask how they should transform their on-premises data protection infrastructure to take advantage of AWS in order to address cost and scaling challenges. Using AWS Storage Gateway, customers can back up on-premises applications to virtually unlimited cloud storage. This enables customers to free up on-premises storage, while durably storing data in AWS.
In this post, I show how customers can back up and restore their on-premises applications to the cloud using Storage Gateway. I discuss use cases for File Gateway, Tape Gateway, and Volume Gateway, and how each of the gateways can be used as a storage target to cost-effectively back up on-premises applications to AWS.
AWS Storage Gateway
Customers use Storage Gateway to protect their on-premises applications and to reduce backup infrastructure and administration costs. They use Storage Gateway to back up files, applications, databases, and volumes to Amazon S3, Amazon S3 Glacier, Amazon S3 Glacier Deep Archive, and Amazon EBS – through files, volumes, snapshots, and virtual tapes in AWS. Some customers use Storage Gateway to seamlessly complement their existing storage infrastructure to offload and/or expand their on-premises storage capacity. Since there is no hardware procurement needed, these deployments are faster and shorter.
Using Storage Gateway, customers are able to seamlessly connect on-premises applications to AWS to leverage cloud storage scalability, reliability, durability, and economics. Storage Gateway supports standard storage protocols such as NFS, SMB, iSCSI, and iSCSI-VTL. Minimal changes are required to existing applications, and it’s an ideal solution for customers that must support a hybrid storage environment that bridges both on-premises and cloud. Storage Gateway uses a highly optimized data transfer mechanism, bandwidth management, and automated network resilience for efficient data transfer. All data is encrypted in-transit and at-rest in the cloud.
Depending on the use case, Storage Gateway provides three types of storage interfaces to back up customers’ on-premises applications:
- File Gateway lets customers store and access objects in Amazon S3 from file-based applications with local caching. Client access is provided via SMB and NFS, and each file is stored as an object in Amazon S3 with a one-to-one mapping.
- Tape Gateway is a drop-in replacement for physical tape infrastructure backed by cloud storage with local caching for low-latency data access.
- Volume Gateway provides on-premises block storage over iSCSI, backed by Amazon S3, with local caching, Amazon EBS snapshots, and clones.
On-Premises backup and restore using File Gateway
Databases and applications are often backed up directly to a file server on premises. File Gateway presents a file interface that enables customers to store database and application files as durable objects in Amazon S3 using NFS and SMB file protocol. From your on-premises data center or Amazon EC2, you can seamlessly access your backed-up database and application files in Amazon S3 via NFS and SMB.
Customers use File Gateway to store on-premises backed up files in Amazon S3 for applications, such as SAP, SQL Server, and Oracle, to reduce their on-premises backup storage footprint. No longer must customers worry about running out of storage capacity and expensive hardware refresh cycles. See how Kellogg’s performs Oracle and SQL Server database backups uses File Gateway to achieve RPO of 15 minutes and RTO of less than 10 minutes with File Gateway.
File Gateway includes a local cache to temporarily hold changed data that must be transferred to AWS and to locally cache data for low-latency read access. When you perform backups to your file share using File Gateway, data is stored locally first and then asynchronously uploaded to your Amazon S3 bucket. These backup files are stored as objects in the Amazon S3 buckets with a one-to-one file to object mapping.
Once the data is in Amazon S3, you can optionally use S3 Lifecycle policies to automatically archive data to lower-cost storage classes, such as Amazon S3 Glacier or Amazon S3 Glacier Deep Archive. Before setting up S3 Lifecycle policies, we recommend that you understand the differences in the two cold storage classes – especially in terms of storage costs, retrieval times, and retrieval costs. Amazon S3 Glacier is well suited for data archiving, and Amazon S3 Glacier Deep Archive is suitable for long-term retention of data that is infrequently accessed. Amazon S3 Glacier offers three choices for access to archives, from a few minutes to many hours – depending on your retrieval needs. On the other hand, S3 Glacier Deep Archive offers two access choices varying from 12 to 48 hours.
You may choose to archive data to a lower-cost storage class such as Amazon S3 Glacier or S3 Glacier Deep Archive. To do so, you must typically manually restore the object back to Amazon S3 Standard before it can be accessed through a File Gateway. To automate restoring the archived files, you can use a combination of Amazon CloudWatch and an AWS Lambda function to trigger a restore request to Amazon S3. This process is detailed in this blog.
File Gateway can be used with Amazon S3 Object Lock to enable write-once-read-many (WORM) file-based systems to store and access objects in Amazon S3 by using Object Lock’s Compliance or Governance modes. Any modifications such as file edits, deletes, or renames from the gateway’s NFS or SMB clients are stored as new versions of the object, without overwriting or deleting previous versions. This leaves the original, locked version of the object unchanged, enabling you to enforce policies as an added layer of data protection or for regulatory compliance.
For more information about File Gateway, click on the following links:
- Blog: How Bristol Myers Squibb uses Amazon S3 and AWS Storage Gateway to manage scientific data
- Blog: Store SQL Server backups in Amazon S3 using AWS Storage Gateway
- Blog: Automate restore of archived objects through AWS Storage Gateway
- Tech Talk: Migrating archive file data to AWS with File Gateway
Using Tape Gateway for on-premises backup and restore
Many customers use Tape Gateway as an easy drop-in replacement for physical tape infrastructure. Tape Gateway enables them to have a limitless collection of virtual tapes without requiring changes to their existing backup software or archiving workflows. Consequently, customers no longer have to deal with the hassles and challenges associated with physical media – tape loading and unloading, tape degradation, tape media migration, offsite tape vaulting, and magnetic tape library management.
Tape Gateway serves as a virtual tape library (VTL) and supports key backup applications. Tape Gateway presents an iSCSI interface and emulates a magnetic tape library that can be integrated into your backup or archive framework.
When you back up your data to AWS, the virtual tapes are stored in a virtual tape library in a service-managed Amazon S3 bucket. Active virtual tapes are stored in Amazon S3 Standard. You then use your backup application to move the virtual tape from the virtual tape library to the virtual tape shelf by either exporting or ejecting the tapes. Doing so archives your tapes to Amazon S3 Glacier or Amazon S3 Glacier Deep Archive. AWS regularly performs fixity checks on a regular basis to confirm that your data can be read, and no errors have been introduced.
As virtual tapes get used, Tape Gateway automatically creates new virtual tapes to maintain a minimum number of available tapes. Tape Gateway then makes these new tapes available for import by the backup application so that your backup jobs can run without interruption.
Access to virtual tapes in your virtual tape library is instantaneous. If the virtual tape containing your data is archived, and you must restore data from this tape, you must first retrieve the virtual tape using the AWS Management Console or API. To retrieve the virtual tape, select the virtual tape, then choose the Tape Gateway into which you want the virtual tape to be loaded. You can retrieve a tape archived in Amazon S3 Glacier and Amazon S3 Glacier Deep Archive to the virtual tape library in Amazon S3, typically within 3-5 hours or 12 hours respectively. Once the virtual tape is available in the virtual tape library, you can use your backup application to use the virtual tape to restore data.
Using Tape Gateway, your backup and archives are compressed and stored durably in Amazon S3. Your data is encrypted at rest using Amazon S3-managed encryption keys (S3-SSE) or your own AWS Key Management Service (AWS KMS) keys.
Every day, customers like Analog Devices, Ryanair, and Southern Oregon University use Tape Gateway to save on backup costs and time. Tape Gateway enables these customers to eliminate the use of physical tape, offsite tape warehousing, and maintenance associated with on-premises physical tape infrastructure.
To learn more about using Tape Gateway for on-premises application backup, check out the following links:
- Blog: How to easily replace physical tape-based backups with Tape Gateway
- Demo: Tape Gateway (VTL) Setup
- Implementation Guide: Replace Tape Backup with Cloud Storage
- Tech Talk: Move from Tape Backups to AWS in 30 Minutes
- Whitepaper: Replacing Tape with Cloud in Backup Workflows
Using Volume Gateway as a backup target for on-premises applications
Volume Gateway presents cloud-backed storage volumes to your on-premises application using iSCSI. On-premises systems mount the iSCSI volumes, and applications interact with the volumes as normal block storage. Data written to these volumes is compressed and can be asynchronously backed up as point-in-time snapshots of your volumes and stored in the cloud as EBS Snapshots.
Volume gateway is ideally suited for backup and restore of your application data as the point-in-time snapshots are securely stored in Amazon S3. Customers commonly use Volume Gateway to back up their on-premises virtual machines (VM) and databases as it provides for fast volume recovery for applications, such as Oracle, SQL Server, PostgreSQL, and many more. Customers also use Volume Gateway for disaster recovery and for protection against ransomware. For example, if you have applications whose hosts have been infected with ransomware, you can quickly restore to a previous application state before the infection. Specifically, the volume snapshots can be restored as EBS volumes on EC2 or as volumes on Volume Gateway, which are then presented as iSCSI volumes.
Volume Gateway supports the following volume modes:
- Cached Volumes: All data is stored in Amazon S3, and your frequently accessed data is cached locally on-premises. The Cached Volume configuration provides substantial cost savings on on-premises storage by minimizing the need to scale your storage on-premises while retaining low-latency access to your frequently accessed data.
- Stored Volumes: You store your data on-premises and asynchronously make point-in-time snapshots of this data to Amazon S3. The Stored Volume configuration is ideally suited for low-latency access to your dataset as Volume Gateway provides durable and inexpensive off-site backups that you can recover locally or from Amazon EC2.
For Cached Volumes, where your volume data is already stored in Amazon S3, you use EBS snapshots to preserve versions of your data. Using this approach, you can revert to a prior version when required or repurpose a point-in-time version as a new volume. You can initiate snapshots on a scheduled or ad hoc basis. When taking a new snapshot, only the data that has changed since your last snapshot is stored. If you have a volume with 100 GB of data, but only 5 GB of data have changed since your last snapshot, only the 5 additional GB of snapshot data is stored in Amazon S3. When you delete a snapshot, only the data not needed for any other snapshot is removed.
For Cached Volumes, the gateway gives you the ability to clone volumes from the most recent recovery point. Cloning of a volume does not require prior creation of EBS snapshots. Cloning enables faster recovery times as it is faster to create and access clones as Volume Gateway presents the volume instantly and copies data from the initial volume in the background. It is still recommended that EBS snapshots be used for backup and recovery purposes as it provides you more specific points in time for recovery purposes. Look at this tutorial to see how to use snapshots and clones with Volume Gateway to recover your volume data on-premises or in-cloud.
For Stored Volumes, where your volume data is stored on-premises, snapshots provide durable, off-site backups in Amazon S3. If you must recover a backup, you can create a new volume from a snapshot. You can also use a snapshot of your volume as the starting point for a new Amazon EBS volume, which you can then attach to an Amazon EC2 instance. Additionally, you can use AWS Backup to control scheduling and managing retention of these snapshots.
Using AWS Backup with Volume Gateway simplifies and centralizes backup management, thus reducing operational burden and making it easier to meet compliance requirements across all your AWS resources. AWS Backup allows you to set customizable scheduled backup policies that meet your backup requirements. Using AWS Backup, you can set backup retention and expiration rules so you no longer need to develop custom scripts or manually manage the point-in-time backups of your Volume Gateway volumes. Finally, you can manage and monitor backups across multiple Volume Gateways, and other AWS resources such as EBS volumes and Amazon RDS databases, from a central view.
TransferWise, a global financial technology company, has been using Volume Gateway to back up databases, which in turn creates EBS Snapshots that are managed by AWS Backup. Click here to watch how STEMCELL Technologies uses Volume Gateway to backup on-premises Oracle databases.
Getting Started
It’s fast and simple to get started with Storage Gateway. For a step-by-step demo and tutorial on how to setup Storage Gateway, check out this video. Check out the following for how to configure and get started with Storage Gateway:
- Blog: Cloud storage in minutes with AWS Storage Gateway
- Blog: Creating and activating File Gateway on VMware
- Blog: Deploying AWS Storage Gateway on Linux KVM hypervisor
- Blog: Deploy a highly available AWS Storage Gateway on a VMware vSphere cluster
- Video: Disaster Recovery Demonstration using AWS Storage Gateway for Cross-Site Failover
- Get started with AWS Storage Gateway
Summary
We discussed three different approaches to using Storage Gateway for backing up and restoring on-premises applications:
- File Gateway is used to seamlessly connect your on-premises applications to the cloud. It enables you to store application data files and backup images as durable objects in Amazon S3, while offering SMB or NFS access to data in Amazon S3 with local caching. Customers use File Gateway to back up on-premises applications including Microsoft SQL Server and Oracle databases and logs, resulting in cost savings while still retaining access from their premises.
- Tape Gateway offers a seamless way to shift physical tape-based backup and archive workflows to AWS – while keeping trusted backup applications and processes in place. It is often used by customers who require a cost-effective, durable, long-term, and offsite way to archive data.
- Volume Gateway provides iSCSI block storage volumes to your on-premises application and is frequently used for backup and disaster recovery purposes. Customers are using Volume Gateway to achieve lower RPO and RTO through the use of EBS snapshots and Cached Volume clones.
Storage Gateway enables you to reduce your on-premises backup infrastructure footprint and associated costs by leveraging Amazon S3. Storage Gateway can be provisioned and deployed within minutes, with minimal changes to your existing backup workflows, to achieve cost savings and to have access to virtually unlimited cloud storage.
Thank you for reading how Storage Gateway can be used to back up your on-premises applications to the cloud. Please leave a comment in the comments section if you have any questions.