Amazon EMR

Easily run and scale Apache Spark, Hive, Presto, and other big data workloads

Simplify management with rapid cluster provisioning, managed scaling, and automated software installation.

Develop, visualize, and debug big data applications with EMR Studio.

Secure your data and resources through customizable permissions in AWS Identity and Access Management (IAM), AWS Lake Formation, and Apache Ranger.

Run big data applications and petabyte-scale analysis faster, and at less than half the cost of on-premises solutions.

How it works

Amazon EMR is a platform for rapidly processing, analyzing, and applying machine learning (ML) to big data using open-source frameworks.
 Click to enlarge

Use cases

Build scalable data pipelines

Extract data from a variety of sources, process it at scale, and make it available for both applications and users. Leverage Apache Hudi features for petabyte scale data lakes.

Accelerate data science with ML

Use open-source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet built into EMR. Connect to SageMaker Studio for analysis, reporting, and model training.

Process real-time data streams

Analyze events from streaming data sources in real time to create long-running, highly available, and fault-tolerant streaming data pipelines.

Query any dataset

Query datasets from different data stores including object storage, relational databases, NoSQL databases, and more using open-source SQL tools.

How to get started

Find out how Amazon EMR works

Learn more about provisioning clusters, scaling resources, configuring high availability, and more.

Explore Amazon EMR features »

Explore Amazon EMR pricing

Pay-as-you-go by the second with options to run EMR clusters on Amazon Elastic Compute Cloud (EC2), Amazon EKS, or AWS Outposts.

Learn more about Amazon EMR pricing »

Get started with Amazon EMR

Learn about real-time stream processing, large-scale machine learning, and more using EMR.

Check out Amazon EMR tutorials »

Explore more of AWS