AWS Storage Blog

How to easily replace physical tape-based backups with Tape Gateway

AWS has millions of active customers every month[1] and provides a wide array of cloud computing services to meet customers’ needs no matter where they are in their cloud adoption journey. As a member of the AWS Storage Gateway service team that builds hybrid cloud storage solutions for customers, I get to speak with many customers who are in a multi-year migration journey to cloud and are evaluating use of cloud for their storage needs. A common pattern I see is that an enterprise customer’s journey to cloud often starts with moving secondary workloads, such as backups and archives. This allows the customer cloud champion to demonstrate immediate business value of using cloud storage within their organization. As customers gain operational knowledge of using cloud storage for backups, they become comfortable with bringing additional workloads to the cloud and progress toward the end goal of harnessing AWS for all their business needs.

It is often important for customers starting their cloud journey to demonstrate quick wins internally and to do that, in the words of an AWS customer, they usually take “baby steps” to cloud. Taking baby steps is as much about organizational structure, culture, and internal process, as it is about a technological change. A gradual journey can mean maintaining business process fundamentals or changing them only slightly, while still deriving economic value from using cloud storage services.

What is AWS Storage Gateway?

This is where an AWS service such as AWS Storage Gateway comes in. Storage Gateway is a hybrid cloud storage service that gives customers on-premises access to virtually unlimited cloud storage. Storage Gateway provides storage protocols, such as NFS, SMB, and iSCSI through File Gateway, Tape Gateway, and Volume Gateway modes, allowing customers to seamlessly use cloud storage for their traditional on-premises applications. While customers can use all gateway types for backing up their on-premises applications to AWS, Tape Gateway use is often at the top of mind for many of them. Why?

Tape Gateway provides an easy drop-in replacement for customers’ physical tape infrastructure used for backup and archive. Customers can back up on-premises data as virtual tapes to AWS, instead of storing data on physical tapes, and they can do so without changing their current backup workflows or backup applications. Tape Gateway supports most major backup applications and is integrated with Amazon S3 Glacier Deep Archive, the lowest cost storage in the cloud, making it easy and cost-effective to transition backup and archive workloads off of physical tapes and into AWS.

How can customers start their cloud journey using Tape Gateway? In this blog post, I review the benefits of using Tape Gateway, and the steps a customer can take to deploy and use Tape Gateway.

Benefits of using Tape Gateway vs. using physical tape infrastructure

So, what are the benefits of using Tape Gateway compared to using physical tape infrastructure?

First, you remove the cost and complexity of managing a physical tape apparatus. With Tape Gateway, you don’t need to purchase tape libraries, tape media, cleaning tape cartridges, or deploy resources to manage them.

Second, physical tapes require specific environmental conditions for long-term storage and need careful handling. Restoring data from a physical tape that has deteriorated or is broken can be challenging, if not impossible. There is a chance that even if the tape is intact, it cannot be read because of a data decay problem or another issue, and in order to read the tape, you may have to buy a bread oven and bake the tape in it! Hear it directly from one of the AWS customers who experienced this.

Third, you don’t need to manage expensive migrations from older physical tapes to newer generation media. While physical media has a long lifespan, you still don’t want to waste your time, effort, and money behind a task as undifferentiated as migrating data from one tape to another, especially with the advent of a better alternative.

Benefits of using Tape Gateway vs. storing tapes offsite

There are several benefits of using Tape Gateway to store virtual tapes in AWS compared to storing tapes offsite.

First, all virtual tapes stored in Amazon S3, Amazon S3 Glacier, and Amazon S3 Glacier Deep Archive are stored across at least three geographically dispersed Availability Zones, protected by 11 9s of durability.

Second, AWS performs regular checks to confirm that data on virtual tapes is available to be read by your application when it’s needed.

Third, all virtual tapes stored in S3, S3 Glacier, and S3 Glacier Deep Archive are protected by S3 Server-Side Encryption using either default keys or your own AWS Key Management Service keys. In addition, you also avoid the physical security risk associated with tracking and monitoring the location and condition of physical tapes.

Fourth, restoring data from degraded or broken tapes that have been stored offsite can be a difficult experience as you can’t be sure of getting the data that you need, when you need it the most. With Tape Gateway, you always get the right data and can control restores on your own.

Finally, you can save money in monthly storage costs when using AWS. Storing your data in S3 Glacier Deep Archive costs only USD $1/TB/month in most AWS Regions. That is USD $12/year to store 1 TB of data!

Check out this short video (3:43) to learn even more about the benefits of Tape Gateway:

witch-from-tap-to-cloud-with-aws

In the following few sections, I take you through the steps to deploy a Tape Gateway on your premises, perform backups, and do restores, with color commentary in between on how Tape Gateway works.

Deploy Tape Gateway on-premises

You can think of the Storage Gateway service as having two components, similar to client-server computing model. The client in this case resides on your premises as a gateway and the server is the service software that runs on AWS. The gateway itself is deployed in your facilities on either a virtual machine (VM) or a hardware appliance, typically next to the backup application host. You can also choose to run Tape Gateway in Amazon EC2 if you are using the same backup application to protect both on-premises and in-cloud workloads. The service software provides intelligence to connect the gateway with AWS and make it a native element of the service architecture.

You can manage Tape Gateway using the AWS Management Console, AWS CLI, or AWS Storage Gateway API. To begin, you first access the AWS Management Console. Once in the Management Console, search for ‘Storage Gateway’ and go to the Storage Gateway console.

Once you are in the Storage Gateway console, click on the Get Started button.

At this point, select your desired gateway type. Select Tape Gateway here and click Next.

Next, select the host platform you want to install gateway software on. Your host platform choices are VMware ESXi, Microsoft Hyper-V, Amazon EC2, or hardware appliance.

Select host platform to install gateway software

If you download a VMware ESXi or Microsoft Hyper-V image, you can deploy that on a VM in your data center and assign it an IP address as shown here. You are required to open the necessary firewall ports to facilitate communication between the gateway and AWS. You allocate storage disk space for use by the gateway. The Tape Gateway appliance uses cache disk for locally committing incoming data from backup applications and uses an upload buffer disk as the working storage space to upload data from the gateway to AWS. A full list of configuration requirements are available here.

Next, select the service endpoint type you want the on-premises gateway to connect to in AWS, depending on whether you want network traffic from the gateway to traverse the internet en route to AWS, or you want traffic to stay private using Amazon Virtual Private Cloud (VPC). The gateway-to-AWS communication is encrypted over TLS regardless of which option you choose for endpoint type. Storage Gateway added VPC endpoint support in June 2019.

Select service endpoint type (public or VPC)

Once you click Next, you see the screen below to connect the Storage Gateway service in AWS to the local gateway. Here, you enter the same IP address as the one you had assigned earlier to the gateway VM and click Connect to gateway.

Enter gateway VM IP address and then click Connect to gateway

At this point, you see a screen like the one below asking to input gateway-related information along with the backup application you use with the gateway. Here you can select the gateway’s time zone, enter gateway name, and select a backup application from the drop-down list.

Input gateway-related information and backup application

Tape Gateway is qualified with all major backup applications, and AWS continues to expand the ecosystem of supported backup applications. For instance, on September 10, 2019, Storage Gateway added support for IBM Spectrum Protect hosted on Linux OS hosts. The Tape Drive type is set automatically unless you select Other as the Backup application. Once you click on Activate gateway, the gateway is activated and associated with your AWS account.

Select from the expanding ecosystem of supported backup applications

Create tapes on the Tape Gateway

Now, you can create virtual tapes on this gateway’s Virtual Tape Library (VTL), and present them to your backup application. You specify how many tapes you want to create (max of up to 10), tape size, tape barcode, and pool. On September 9, 2019, Tape Gateway increased maximum supported tape size from 2.5 TiB to 5 TiB, allowing you to store twice as much data on a single tape and simplify management by reducing the number of virtual tapes you must administer.

The Pool field specifies where you want to archive virtual tapes to once they are ejected by the backup application. Select Glacier Pool for archiving tapes to S3 Glacier storage class. You use S3 Glacier for more active archives where you can retrieve a tape, typically within 3-5 hours.

Choose Deep Archive Pool if you want to archive tapes to the S3 Glacier Deep Archive storage class. You use S3 Glacier Deep Archive for long-term data retention and digital preservation where data is accessed once or twice a year. You can retrieve a tape archived in S3 Glacier Deep Archive typically within 12 hours.

Select Deep Archive Pool IF you want to archive tapes to S3 Glacier Deep Archive Storage Class

To draw similarities between physical tape and virtual tape storage, storing virtual tapes in Glacier Pool is akin to storing a physical tape in an onsite vault, while storing virtual tapes in Deep Archive Pool is similar to storing a physical tape offsite. If you are looking for durable, and secure long-term storage for data preservation, look no further than S3 Glacier Deep Archive. S3 Glacier Deep Archive is not only the lowest cost cloud storage, but it also offers up to 75% storage cost savings compared to S3 Glacier.

Once you have created virtual tapes, you can see them in the Tapes view of the Storage Gateway console along with tape barcode, status, size of tape, how much the tape is used, associated gateway, and configured archival pool.

Assign tape to pool allows flexibility in moving virutal tapes from Glacier Pool to Deep Archive Pool

Tape Gateway also provides you flexibility to move virtual tapes from Glacier Pool to Deep Archive Pool. When creating new virtual tapes, you can choose to archive the ejected tapes in Glacier Pool and then move the same virtual tapes to Deep Archive Pool after 90 days, using the Assign tape to pool option in the Tape Actions menu.

Backup your data

After virtual tapes are created, you can present them to your backup application over iSCSI. When the backup application writes data to the gateway, the incoming writes are committed to the cache disk, acknowledged back to the application, and copied to the upload buffer disk. The gateway then compresses, encrypts, and asynchronously uploads data on the upload buffer disk to the Tape Gateway’s VTL that stores data in Amazon S3.

With Tape Gateway, the Storage Gateway service manages creating S3 buckets on your behalf and transitions tapes based on SCSI protocol exchange between the backup application and the gateway. With Tape Gateway, you don’t need to create and manage your own S3 buckets, or set lifecycle policies to transition data within S3 storage classes. You only need to manage virtual tapes and the gateway itself when using Tape Gateway.

When the backup application is actively writing data to the Tape Gateway, data is committed locally to the gateway’s cache and is then asynchronously uploaded to S3. Once the backup application ejects, exports, or unmounts a tape (different backup applications use different terminology), that tape is marked as read-only and moved from S3 to Glacier Pool or Deep Archive Pool, thereby archiving the tape. This is akin to ejecting a physical tape from the physical tape library and shipping it to an offsite storage facility. Tape Gateway marks the tape as read-only before moving it from VTL on S3 to the Glacier Pool or Deep Archive Pool to protect it from accidentally getting overwritten. You can only read data from, or delete, an archived tape.

Restore your data

Restoring virtual tapes archived in Glacier Pool or Deep Archive Pool is a two-step process. First, you need to retrieve tapes from Glacier Pool or Deep Archive Pool to the VTL on S3. You can do this from the Storage Gateway console by going to the Tapes view, finding one or more tapes, and selecting Retrieve tape to restore one or more tapes to a gateway of your choice. This operation moves your tapes from Glacier Pool or Deep Archive Pool to S3.

Restoring Virtual tapes archived in Glacier Pool or Deep Archive Pool

You can restore archived virtual tapes to the same gateway from which you performed the backup, or to a new gateway, running in your original data center, another site, or in Amazon EC2.

Virtual tapes can be exported through the gateway to a backup application for restore

Once your virtual tapes are in S3, you can export them through the gateway to a backup application for restore.

If you need to restore data on a virtual tape that’s not archived, all you need is to present that tape through the associated gateway to the backup application. Note in this case, your virtual tape is both readable and writable.

Summary

It is easy to get started with Tape Gateway. You can do a proof-of-concept and run Tape Gateway parallel to your physical tape infrastructure, or you can efficiently switch from physical tapes to Tape Gateway.

Individual Tape Gateway customers backup 10s of TBs per day, and have accumulated PBs of data in archives. Customers are also seeing the economic benefit of using Tape Gateway and S3 Glacier Deep Archive for long-term data retention. For instance, one customer recently told me, “Using Tape Gateway, we are now in a position to power off two tape libraries, two physical servers, stop filing a manifest with our offsite vendor every week, stop loading and reloading the physical tape libraries, and stop paying support on all of it.”

AWS built Tape Gateway to offer you an easy and seamless way to use cloud storage for your backup and archive needs. AWS built Amazon S3 Glacier Deep Archive to offer you an inexpensive, secure, and durable in-cloud storage option. Put the two together, and it provides you an undeniable business case to get started on replacing your physical tape infrastructure. The time to re-evaluate your long-term backup and archive strategy is now.

Tape Gateway is available in 20 AWS Regions, including South America (Sao Paulo; launched September 24, 2019), AWS GovCloud (US-West), and China (Beijing). Learn more about Tape Gateway features, resources, customers, and pricing by visiting our product page. If you have any questions, please comment on the blog post and I am happy to respond.

[1] As of October 31, 2016. References to AWS customers mean unique AWS customer accounts, which are unique e-mail addresses that are eligible to use AWS services. This includes AWS accounts in the AWS free tier. Multiple users accessing AWS services via one account are counted as a single account. Customers are considered active when they have had AWS usage activity during the preceding one-month period.