Amazon EMR on EC2 Spot Instances

Performance, scale, and deep cost savings on big data workloads

Amazon EMR reduces the complexity of managing big data frameworks (e.g. Apache Spark and Hive), while taking advantage of cloud best practices such as separating compute and storage.

Due to the deep and broad scale of AWS, unused EC2 capacity is offered at up to a 90% discount (vs On-Demand pricing) through Amazon EC2 Spot Instances. While EC2 can reclaim Spot capacity with a two-minute warning, less than 5% of workloads are interrupted. Due to the fault-tolerant nature of big data workloads on EMR, they can continue processing, even when interrupted. Running EMR on Spot Instances drastically reduces the cost of big data, allows for significantly higher compute capacity, and reduces the time to process big data sets.

See Spot Instance price savings vs On-Demand by filtering for “Instance types supported by EMR” on the Spot Instance Advisor page.

Optimize Cost and Performance for Big Data Workloads with EC2 Spot Instances (1:49)

Benefits

Accelerate Compute

Spot’s discounted instance pricing allows you to run parallel tasks on a multitude of instance types, maximizing performance, meeting business SLAs and reducing time to market.

Further Reduce Costs

By making it easier to provision infrastructure and access instances at up to a 90% discount vs On-Demand pricing, you can run workloads at the lowest possible cost.

Build for Scale

Due to the operating scale of AWS, you can quickly ramp up short-lived but massive data jobs on unused EC2 compute capacity at a low cost.

Features

Instance Flexibility

If Spot Instances are interrupted in an EMR instance fleet, (core or task) then EMR will attempt to replenish the target capacity by launching Spot Instances from other specified capacity pools. By using multiple instance types when configuring an EMR cluster, EMR can scale out to different instance types, increasing the cluster’s resilience.

Capacity Awareness

By configuring the EMR cluster nodes with instance fleets, EMR will optimize clusters by analyzing different Availability Zones to find Spot capacity pools optimized for availability and cost.

Learn more about best practices for configuring clusters on EMR workloads with Spot for transient and long-running workloads.  

Customer Case Studies

Additional Resources

Amazon EMR on EC2 Spot Instances
Setting up EMR Clusters on Spot Instances
Short vs Long-Running EMR Clusters on Spot Instances
Read blog
Learn more

Read this blog to learn best practices for running Apache Spark on EMR with Spot Instances.

Learn more 
Next Step: Video
Webinar

Learn about best practices on how to scale big data workloads, as well as how to process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.

Watch now 
10 minute tutorial
10 minute tutorial

Follow this quick tutorial to set up your first EMR cluster with Spot Instances. This self-guided tutorial includes detailed instructions and step-by-step screenshots to help you start running EMR with Spot Instances in minutes.

Learn more