AWS Big Data Blog

Moving Big Data Into the Cloud using Signiant Flight

by Matt Yanchyshyn | on | Permalink | Comments |  Share

Matt Yanchyshyn is a Principal Solutions Architect with Amazon Web Services

Introduction

In the first two parts of this series we discussed two popular products–out of many possible solutions–for moving big data into the cloud: Tsunami UDP and Data Expedition’s ExpeDat S3 Gateway. Today we’ll look at another option that takes a different approach: Signiant Flight.

Signiant’s products are already very popular for moving large files around, particularly at media companies. Many broadcasters, studios, gaming companies and others use products like Signiant Media Shuttle, Signiant Media Exchange and Signiant Manager+Agents to power their high-speed large file delivery workflows. As analytics workloads continue to expand in the cloud, these large file-delivery solutions are being applied to data transfer for big data.

Signiant Flight

Signiant Flight is a new offering that provides an easy way for AWS customers to push large amounts of data into Amazon Simple Storage Service (Amazon S3) without worrying about managing any additional cloud infrastructure. Flight is hybrid software-as-a-service (SaaS). This means Signiant manages the server-side part–the Amazon Elastic Cloud Compute (Amazon EC2) instances running Flight servers and the Amazon Simple Storage Service (Amazon S3) transfer components–while end users run a lightweight, client-side agent. All you have to do is install the local client, authenticate with AWS, set which Amazon S3 bucket to use, and drop a file in a watch folder. For big data workloads where users frequently move large data sets into Amazon S3 for processing with Amazon EMR and Amazon Redshift, Flight can be very handy.

When you use Signiant Flight to send files to Amazon S3, its backend automatically scales during high-volume transfer cycles. Flight’s backend is load-balanced across multiple Amazon EC2 instances spread across multiple AWS Availability Zones, so it is highly reliable without passing the complexity of setting this up to you.

Like the solutions discussed in the earlier posts in this series, Signiant’s accelerated file transfer protocol uses a mixture of TCP and UDP. This minimizes the impact of WAN latency on throughput which results in considerably faster transfers, especially for large files transferred over long distances. Signiant advertises transfer rates up to 200 times faster than FTP. Importantly, Signiant’s file transfer protocol also supports two features that are not supported in Tsunami UDP: AES-256 bit encryption and intelligent file transfer retries. If a transfer is interrupted for any reason, the transfer is restarted (using numerous file retry algorithms) and continues transferring from the point of interruption. If a file already exists in Amazon S3 and hasn’t been changed, Flight won’t upload the file.

Signiant Flight also supports batch-file transfers using manifests so you can move a large number of smaller files efficiently. This might be the case if you’ve pre-aggregated and compressed your data into a large number of smaller files to optimize Hadoop performance by closely matching file size to default HDFS block size. And if your data is compressed with a format such as GZIP that isn’t splittable, having multiple smaller files improves Hadoop performance by allowing multiple mapper tasks to process your data set in parallel. Or maybe you’re just loading data into a large number of Amazon Redshift tables, each with a distinct input file.

Once files arrive on Signiant Flight’s AWS-based back end, they are securely forwarded to Amazon S3 over HTTPS using the multi-part upload API. The data never touches disk and is pipelined directly into the user’s Amazon S3 storage.

Flight comes with a Windows or Mac graphical client, command-line interfaces and also has an SDK available in several programming languages. To learn more, check out their developer tools.

Setting up Signiant Flight

  1. Sign-up for Signiant Flight via the AWS Marketplace.
  2. Create an IAM user with read/write permissions to the Amazon S3 bucket where you would like to upload your files. There’s a great post on the AWS Security Blog on this topic.
  3. Install the Flight client and add the IAM credentials of the user you just created plus the Amazon S3 bucket where you would like to upload your files.

Big Data Upload using Signiant SkyDrop

  1. Drop a file into the watch folder that you specified in the Flight configuration. It will appear in Amazon S3 a few moments later.

Setting up the Command Line Interface (CLI)

  1. Configure the Flight CLI by adding your credentials, target Amazon S3 bucket, and key to the config.cfg file.
  2. To transfer a single file with the CLI, just use the –d upload parameters. In the example below I used a m3.xlarge Amazon EC2 instance located in us-east-1 running the AWS base Amazon Linux AMI with no additional tuning. I transferred a 1 GiB uncompressed file, generated using dd, to an Amazon S3 bucket located in US Standard. Importantly, this file is located on Amazon EC2 instance storage, so that Amazon Elastic Block Store (Amazon EBS) throughput doesn’t become a bottleneck and skew our testing. The average transfer rate in this case was around ~630 Mbps.
	skydrop -d upload /media/ephemeral0/test-1GiB.img

Big Data Upload Using Signiant SkyDrop

A more complex file transfer may involve a large number of files listed, one file per line, in a manifest:

	skydrop -d upload @manifest.txt -z -i

In this case, we use interactive move (-i) to see file transfer statistics in realtime and generate detailed transfer statistics (-z) at the end of the transfer.

Conclusion

Signiant’s Flight is an easy way to move big data into the cloud at high speed. Because it’s a SaaS solution, it’s easy to use and you don’t have to worry about deploying and maintaining a highly available and high-performance file transfer system architecture. Flight uses Signiant’s accelerated file transfer protocol for its transfers to its Amazon EC2-based back end and then optimizes transfers to Amazon S3 from there, so you can significantly decrease how long it takes to move your data into the cloud. Lastly, Flight’s encryption in transit and intelligent file transfer retries means that you can send files securely and reliably.

It’s easy to get started! Just look-up Signiant Flight at the AWS Marketplace. A free trial is available.

Question or suggestion? Leave a comment below!

—————————————-

Related:

Building and Maintaining an Amazon S3 Metadata Index without Servers

 

 

 

TAGS: