Amazon EMR on EC2 Spot Instances

Performance, scale, and deep cost savings on big data workloads

Amazon EMR reduces the complexity of managing big data frameworks (e.g. Apache Spark and Hive), while taking advantage of cloud best practices such as separating compute and storage.

Due to the deep and broad scale of AWS, unused EC2 capacity is offered at up to a 90% discount (vs On-Demand pricing) through Amazon EC2 Spot Instances. While EC2 can reclaim Spot capacity with a two-minute warning, less than 5% of workloads are interrupted. Due to the fault-tolerant nature of big data workloads on EMR, they can continue processing, even when interrupted. Running EMR on Spot Instances drastically reduces the cost of big data, allows for significantly higher compute capacity, and reduces the time to process big data sets.

See Spot Instance price savings vs On-Demand by filtering for “Instance types supported by EMR” on the Spot Instance Advisor page.

Amazon EMR on EC2 Spot Instances

Benefits

Accelerate Compute

Spot’s discounted instance pricing allows you to run parallel tasks on a multitude of instance types, maximizing performance, meeting business SLAs and reducing time to market.

Further Reduce Costs

By making it easier to provision infrastructure and access instances at up to a 90% discount vs On-Demand pricing, you can run workloads at the lowest possible cost.

Build for Scale

Due to the operating scale of AWS, you can quickly ramp up short-lived but massive data jobs on unused EC2 compute capacity at a low cost.

Features

Instance Flexibility

If Spot Instances are interrupted in an EMR instance fleet, (core or task) then EMR will attempt to replenish the target capacity by launching Spot Instances from other specified capacity pools. By using multiple instance types when configuring an EMR cluster, EMR can scale out to different instance types, increasing the cluster’s resilience.

Capacity Awareness

By configuring the EMR cluster nodes with instance fleets, EMR will optimize clusters by analyzing different Availability Zones to find Spot capacity pools optimized for availability and cost.

Defined Duration

While less than 5% of Spot Instances are interrupted, non-fault tolerant EMR workloads can request Spot Instances in 1 to 6 hour durations at up to a 50% discount vs On-Demand pricing.

Learn more about best practices for configuring clusters on EMR workloads with Spot for transient and long-running workloads.  

Customer Case Studies

Additional Resources

Setting up EMR Clusters on Spot Instances
Short vs Long-Running EMR Clusters on Spot Instances
Product-Page_Standard-Icons_03_Start-Building_SqInk
Learn more

Read this blog to learn best practices for running Apache Spark on EMR with Spot Instances.

Learn more 
Product-Page_Standard-Icons_01_Product-Features_SqInk
Online Workshop

Learn how to set up Apache Spark apps with EMR on Spot Instances. Become an EMR on Spot Instances expert by completing this self-guided workshop.

Learn more