AWS News Blog

Hadoop Filesystem Using S3

Voiced by Polly

I blogged about Hadoop on EC2 late last year. In a nutshell, Hadoop is an open source implementation of Google’s MapReduce algorithm. MapReduce is a simple and efficient programming model for processing large data sets using a whole bunch of processors (you are supposed to start thinking of EC2 at this point).

Tom White sent me a note this week to inform me that he had implemented a Hadoop file system on top of S3.  This file system can be used as a full or partial replacement for HDFS, the Hadoop Distributed File System.

Because bandwidth between EC2 instances and data stored in S3 is not metered or billed, this is a very cost-effective way to process large amounts of data.

If you aren’t already running Hadoop on EC2, you can read all about how to do it here.


I would be overjoyed to hear from someone who’s used Hadoop on EC2 to do something really cool. Drop me an email.

— Jeff;

Modified 2/1/2021 – In an effort to ensure a great experience, expired links in this post have been updated or removed from the original post.
Jeff Barr

Jeff Barr

Jeff Barr is Chief Evangelist for AWS. He started this blog in 2004 and has been writing posts just about non-stop ever since.