AWS Official Blog

Hadoop Filesystem Using S3

by Jeff Barr | on | | Comments

I blogged about Hadoop on EC2 late last year. In a nutshell, Hadoop is an open source implementation of Google’s MapReduce algorithm. MapReduce is a simple and efficient programming model for processing large data sets using a whole bunch of processors (you are supposed to start thinking of EC2 at this point).

Tom White sent me a note this week to inform me that he had implemented a Hadoop file system on top of S3.  This file system can be used as a full or partial replacement for HDFS, the Hadoop Distributed File System.

Because bandwidth between EC2 instances and data stored in S3 is not metered or billed, this is a very cost-effective way to process large amounts of data.

If you aren’t already running Hadoop on EC2, you can read all about how to do it here.


I would be overjoyed to hear from someone who’s used Hadoop on EC2 to do something really cool. Drop me an email.

— Jeff;