Amazon EMR
Easily run and scale Apache Spark, Hive, Presto, and other big data workloads
Simplify management with rapid cluster provisioning, managed scaling, and automated software installation.
Develop, visualize, and debug big data applications with EMR Studio.
Secure your data and resources through customizable permissions in AWS Identity and Access Management (IAM), AWS Lake Formation, and Apache Ranger.
Run big data applications and petabyte-scale analysis faster, and at less than half the cost of on-premises solutions.
How it works
Use cases
Build scalable data pipelines
Extract data from a variety of sources, process it at scale, and make it available for both applications and users. Leverage Apache Hudi features for petabyte scale data lakes.
Accelerate data science with ML
Use open-source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet built into EMR. Connect to SageMaker Studio for analysis, reporting, and model training.
Process real-time data streams
Analyze events from streaming data sources in real time to create long-running, highly available, and fault-tolerant streaming data pipelines.
Query any dataset
Query datasets from different data stores including object storage, relational databases, NoSQL databases, and more using open-source SQL tools.
How to get started
Find out how Amazon EMR works
Learn more about provisioning clusters, scaling resources, configuring high availability, and more.
Explore Amazon EMR pricing
Pay-as-you-go by the second with options to run EMR clusters on Amazon Elastic Compute Cloud (EC2), Amazon EKS, or AWS Outposts.
Get started with Amazon EMR
Learn about real-time stream processing, large-scale machine learning, and more using EMR.
