Big Data Platform – Amazon EMR – Amazon Web Services

Amazon EMR

Easily run and scale Apache Spark, Trino, and other big data workloads

Get started with Amazon EMR

Why Amazon EMR?

Amazon EMR is a big data processing service that accelerates analytics workloads with unmatched flexibility and scale. EMR features performance-optimized runtimes for Apache Spark, Trino, Apache Flink, and Apache Hive, drastically cutting costs and processing times. The service integrates seamlessly with AWS, simplifying data lake workflows and enterprise-scale architectures. With built-in auto-scaling, intelligent monitoring, and managed infrastructure, EMR lets you focus on extracting insights—not managing clusters—delivering petabyte-scale analytics efficiently without the operational overhead of traditional solutions.

Flexible deployment options

Why EMR Serverless?

Amazon EMR Serverless makes it easy for data analysts and engineers to run open-source big data analytics frameworks like Apache Spark without configuring, managing, and scaling clusters or servers. EMR Serverless is the fastest way to get started with all the features and benefits of Amazon EMR without the need for experts to plan and manage clusters.

EMR Serverless

Why Amazon EMR on Amazon EC2?

Amazon EMR on Amazon EC2 provides control over cluster configuration and supports long-running clusters, making it perfect for continuous data processing tasks that require specific hardware setups. You can install custom applications alongside popular frameworks like Apache Spark and Trino, while offering a wide range of EC2 instance types to optimize for both cost and performance. Integration with other AWS services and the ability to use Spot Instances makes it a cost-effective solution for organizations requiring granular control over their big data operations.

Why Amazon EMR on Amazon EKS?

Amazon EMR on Amazon Elastic Kubernetes Service (EKS) enables you to submit Apache Spark jobs on demand on EKS without provisioning EMR clusters. With EMR on EKS, you can run your analytical workloads on the same Amazon EKS cluster as your other Kubernetes-based applications to improve resource utilization and simplify infrastructure management.

Amazon EMR on Amazon EKS

Process your data with Amazon EMR in the next generation of Amazon SageMaker

Amazon EMR is available in the next generation of Amazon SageMaker, allowing you to effortlessly run Apache Spark, Trino, and other open-source analytics frameworks in a unified data and AI development environment.

Learn more.

Features

Amazon EMR runs Apache Spark and Iceberg read jobs 4.5x faster than open source Spark and Iceberg

Learn more

Amazon EMR runs Apache Spark and Apache Iceberg write jobs over 2x faster than open source Spark and Iceberg

Learn more

Benefits

Amazon EMR combines performance-optimized Apache Spark for faster, cost-efficient processing with the flexibility to choose instance types, including Spot Instances, and fully managed automatic scaling that dynamically right-sizes cluster—eliminating over-provisioning and reducing overall spend.

Amazon EMR is up 5.4x faster than open-source Apache Spark while maintaining API compatibility. It enables customers to deploy open- source frameworks of their choice – Apache Spark, Trino, Apache Flink, or Apache Hive. EMR supports popular open table formats like Iceberg, Hudi and Delta to accelerate time-to-insight.

EMR offers choice in deployment, including EMR Serverless for fully managed, infrastructure-free processing, EMR on EC2 for fine-grained cluster control, and EMR on EKS for Kubernetes native big data workloads. Whether running short-term clusters for on-demand jobs or long-running clusters for persistent tasks, EMR adapts to your operational needs while optimizing costs through flexible resource allocation and efficient scaling.

Amazon EMR in the next generation of Amazon SageMaker empowers you to run open-source frameworks like Apache Spark, Trino, and Apache Flink, allowing you to scale analytics workloads effortlessly—all without provisioning or managing infrastructure. With EMR’s capabilities in Amazon SageMaker, you can unify data processing and model development, enabling end-to-end workflows from raw data transformation to AI deployment in a single collaborative environment.

Transform months-long Apache Spark upgrades into efficient week-long projects through intelligent automation. The Spark upgrade agent streamlines enterprise-scale migrations by automatically analyzing and validating API changes across your entire codebase, significantly reducing both cost and complexity.

Use cases

Run large-scale data processing and what-if analysis using statistical algorithms and predictive models to uncover hidden patterns, correlations, market trends, and customer preferences.

Extract data from a variety of sources, process it at scale, and make it available for applications and users.

Analyze events from streaming data sources in real-time to create long-running, highly available, and fault-tolerant streaming data pipelines.

Analyze data using open-source ML frameworks such as Apache Spark MLlib, TensorFlow, and Apache MXNet. Connect to Amazon SageMaker Studio for large-scale model training, analysis, and reporting.

Get started with Amazon EMR

Features page

Find out how Amazon EMR works

Explore Amazon EMR Features

Pricing

Explore Amazon EMR pricing

Learn more about Amazon EMR pricing

Amazon EMR

Why Amazon EMR?

Flexible deployment options

Why EMR Serverless?

Why Amazon EMR on Amazon EC2?

Why Amazon EMR on Amazon EKS?

Process your data with Amazon EMR in the next generation of Amazon SageMaker

Features

Amazon EMR runs Apache Spark and Iceberg read jobs 4.5x faster than open source Spark and Iceberg

Amazon EMR runs Apache Spark and Apache Iceberg write jobs over 2x faster than open source Spark and Iceberg

Benefits

Use cases

Get started with Amazon EMR

Find out how Amazon EMR works

Explore Amazon EMR pricing

Learn

Resources

Developers

Help

Amazon EMR

Why Amazon EMR?

Flexible deployment options

Why EMR Serverless?

Why Amazon EMR on Amazon EC2?

Why Amazon EMR on Amazon EKS?

Process your data with Amazon EMR in the next generation of Amazon SageMaker

Features

Amazon EMR runs Apache Spark and Iceberg read jobs 4.5x faster than open source Spark and Iceberg

Amazon EMR runs Apache Spark and Apache Iceberg write jobs over 2x faster than open source Spark and Iceberg

Benefits

Cost-effective big data processing

Accelerate time-to-insight and optimize performance

Unparalleled deployment flexibility

Optimize data processing in Amazon SageMaker

Accelerate Spark upgrades with AI-assistance

Use cases

Perform big data analytics

Build scalable data pipelines

Process real-time data streams

Accelerate data science and ML adoption

Get started with Amazon EMR

Find out how Amazon EMR works

Explore Amazon EMR pricing

Learn

Resources

Developers

Help