AWS Blog

Snowball HDFS Import

by Jeff Barr | on | in Amazon Elastic MapReduce, AWS Snowball, Launch | | Comments

If you are running MapReduce jobs on premises and storing data in HDFS (the Hadoop Distributed File System), you can now copy that data directly from HDFS to an AWS Snowball without using an intermediary staging file. Because HDFS is often used for Big Data workloads, this can greatly simplify the process of importing large amounts of data to AWS for further processing.

To use this new feature, download and configure the newest version of the Snowball Client on the on-premises host that is running the desired HDFS cluster. Then use commands like this to copy files from HDFS to S3 via Snowball:

$ snowball cp -n hdfs://HOST:PORT/PATH_TO_FILE_ON_HDFS s3://BUCKET-NAME/DESTINATION-PATH

You can use the -r option to recursively copy an entire folder:

$ snowball cp -n -r hdfs://HOST:PORT/PATH_TO_FOLDER_ON_HDFS s3://BUCKET_NAME/DESTINATION_PATH

To learn more, read Using the HDFS Client.

Jeff;