AWS Partner Network (APN) Blog

Case Study: ‘Industrial Strength’ Software using Amazon Web Services Storage

This is a guest post by Harley Puckett, Program Director for IBM Spectrum Protect development. IBM is an AWS Technology Partner.

As Amazon Web Services (AWS) continues to grow, software developers are discovering new ways to take advantage of its capabilities. IBM is no exception. My team makes IBM Spectrum Protect, formerly known as Tivoli Storage Manager, which is a leading backup and recovery software for large and mid-sized organizations. We recently delivered integration with AWS to our customers that’s easy to use, fast, and efficient.  If you’re writing storage-intensive software for use with AWS, this post may help you learn from our experiences and achieve fast results.

Before we began coding, we studied AWS documentation and met with several experts. We built prototypes using both Amazon S3 and OpenStack Swift APIs. We received multiple assurances from experts that our design was on the right track.  Most of our initial design was fine, but there were a few surprises. Our prototype was efficient, but didn’t deliver the scalable high performance that customers expect from IBM. We found ourselves back at the drawing board, with the goal of optimizing performance for heavy workloads while keeping administration simple.

Optimizing writes

When you write to ordinary disk blocks, there is very little abstraction between what you write and where it gets written.  When you write to object storage, there is more prep work to be done because you’re really talking to a higher level API over the network, authenticating, and getting confirmation that the object has been written. We know more operations are needed to write an object than a block.  The questions are, does it matter for your application, and what can you do about it? Under normal circumstances, IBM Spectrum Protect can manage dozens of concurrent backup streams.  Each stream of data expects to quickly complete its work. As we scaled up, multi-part writes and small delays due to network overhead accumulated, and we found our daily workload limits to be below our design goals.  We wanted IBM customers to be able to use AWS storage for all backup workloads, so we refactored our design to optimize write processing.

Figure 1:  Write optimized local cache design in IBM Spectrum Protect v7.1.7

In our improved design, we split write requests into two processes: writes to a cache and writes to Amazon S3. Our backup process now writes to a local object cache.  A new process reads from the local cache and writes to Amazon S3.  This change enables the Amazon S3 writer to stream data to AWS, without impacting the backup streams. As an added bonus of decoupling the backup streams from the object storage writer, there was added flexibility to be able to test various object sizes until we found the optimal size for our workload. In combination, these changes enable us to write to the cloud at nearly the same speed as file transfers.

IBM Spectrum Protect cloud storage pools using Amazon S3 can support all workloads: application, database, VM, and file backups, with a daily ingest rate of up to 56 terabytes of changed client data in a typical 8-hour backup window, helping organizations of all sizes to use AWS storage for backup data.

Optimizing reads

When performing a restore, backup software is read-intensive. Restore requests are often time-sensitive; data owners and application users are waiting, and often running low on patience. A slow restore can result in extended downtime, and in pay-per-use cloud environments, an inefficient read operation may result in higher costs.

 

Figure 2:  Read optimized data space reduction built into IBM Spectrum Protect v7.1.7

We were able to significantly improve read performance, and therefore reduce the time needed to perform a restore, by reducing the amount of data needing to be retrieved from Amazon S3. This space-saving technology is integral to IBM Spectrum Protect.

Over the years, we’ve learned that multiple space saving techniques deliver the best results for business data. Mixed workloads can present a number of space saving challenges, so it’s good to enable more than one technique.  Efficiency capabilities built into IBM Spectrum Protect include:

  • Deduplication – Local IBM Spectrum Protect backup servers keep track of duplicate data objects, and only send/retrieve unique data to and from Amazon S3. The behavior is similar to deduplication appliances, except no special hardware is required at the local site or on AWS to take advantage of it.
  • Compression – Yann Collet’s LZ4 compression algorithm achieves additional efficiency savings for most data, and for our use case, was the fastest lossless compression algorithm we tested.
  • Incremental ‘forever’ backups just store changed data, reducing the amount of data needing to be deduplicated and compressed.

Faster restores can enable more business critical workloads to take advantage of AWS storage, and meet their recovery time objectives.

Optimizing administration

We decided to provide guided setup with intelligent presets for backup administrators to optimize administration, so backup administrators can be more confident in their choices.

Figure 3:  Guided Storage Pool set up in IBM Spectrum Protect v7.1.7

Step-by-step instructions clearly communicate required tasks and whether they have been completed. When cloud storage is selected, IBM Spectrum Protect automatically turns on encryption so data sent to the cloud has end-to-end security, following AWS security best practices.  When Amazon S3 is selected, IBM Spectrum Protect automatically selects the closest AWS region, but also provides the option for other AWS regions to be selected from a drop-down list box.  The guided setup automatically enables features for optimized product efficiency and performance, further streamlining the process.  Once configured, administrators can see data moving to Amazon S3 at a glance from the IBM Spectrum Protect Operations Center.

Figure 4:  Managing Amazon S3 storage from IBM Spectrum Protect v7.1.7

By streamlining the process to setup and verify AWS cloud storage pools, IBM Spectrum Protect backup administrators can master hybrid cloud data protection in minutes, even if they don’t have deep cloud expertise.

If your software is storage-intensive and you want to support cloud storage, take time to optimize for reads, writes, and ease of administration.

Good luck and good programming.

See for yourself how easy it is to add AWS storage to IBM Spectrum Protect.  The following short video shows the steps.

https://www.youtube.com/watch?v=b0c0dV1wZz8&feature=youtu.be

Visit me at AWS re:Invent in Las Vegas, Nov. 28-Dec. 2, in the IBM booth, #434; or check out IBM Spectrum Protect at http://www.ibm.com/systems/storage/spectrum/protect/.

At re:Invent 2016, AWS and Richard Spurlock, CEO and Founder of Cobalt Iron, a IBM Spectrum Protect user, spoke at session STG-212 on Wednesday at 11:00 entitled, ‘Three Customer Viewpoints: Private Equity, Managed Services, and Government – How These Customers Transformed Business Operations through Storage’. Visit the AWS YouTube page later this week to find a recording of that presentation.

About the author

Harley Puckett is the Program Director for IBM Spectrum Protect development. He regularly presents at client briefings and user conferences. Harley spent 6 ½ years as an Executive Storage Software Consultant and manager of the Client Workshop Program in the Tucson Executive Briefing Center.  He was the Solutions Architect for IBM’s Global Archive Solutions Center in Guadalajara Mexico.  Prior to that he spent 9 ½ years as a senior development manager for IBM Tivoli Storage Manager (TSM).  Harley has been working on storage management at IBM for over 25 years.  The posting on this site is my own and doesn’t necessarily represent IBM’s positions, strategies or opinions.

LinkedIn:  https://www.linkedin.com/in/harleypuckett


The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.