Apache Hadoop on Amazon EMR

Why Apache Hadoop on EMR?

Apache™ Hadoop® is an open source software project that can be used to efficiently process large datasets. Instead of using one large computer to process and store the data, Hadoop allows clustering commodity hardware together to analyze massive data sets in parallel.

There are many applications and execution engines in the Hadoop ecosystem, providing a variety of tools to match the needs of your analytics workloads. Amazon EMR makes it easy to create and manage fully configured, elastic clusters of Amazon EC2 instances running Hadoop and other applications in the Hadoop ecosystem.

Applications and frameworks in the Hadoop ecosystem

Open all

Hadoop: the basic components

Open all

Advantages of Hadoop on Amazon EMR

Open all

Use cases

Apache and Hadoop are trademarks of the Apache Software Foundation.