Amazon EMR on EC2 Spot Instances
Amazon EMR reduces the complexity of managing big data frameworks (e.g. Apache Spark and Hive), while taking advantage of cloud best practices such as separating compute and storage.
Due to the deep and broad scale of AWS, unused EC2 capacity is offered at up to a 90% discount (vs On-Demand pricing) through Amazon EC2 Spot Instances. While EC2 can reclaim Spot capacity with a two-minute warning, less than 5% of workloads are interrupted. Due to the fault-tolerant nature of big data workloads on EMR, they can continue processing, even when interrupted. Running EMR on Spot Instances drastically reduces the cost of big data, allows for significantly higher compute capacity, and reduces the time to process big data sets.
See Spot Instance price savings vs On-Demand by filtering for “Instance types supported by EMR” on the Spot Instance Advisor page.
Benefits
Accelerate Compute
Spot’s discounted instance pricing allows you to run parallel tasks on a multitude of instance types, maximizing performance, meeting business SLAs and reducing time to market.
Further Reduce Costs
By making it easier to provision infrastructure and access instances at up to a 90% discount vs On-Demand pricing, you can run workloads at the lowest possible cost.
Build for Scale
Due to the operating scale of AWS, you can quickly ramp up short-lived but massive data jobs on unused EC2 compute capacity at a low cost.
Features
Instance Flexibility
If Spot Instances are interrupted in an EMR instance fleet, (core or task) then EMR will attempt to replenish the target capacity by launching Spot Instances from other specified capacity pools. By using multiple instance types when configuring an EMR cluster, EMR can scale out to different instance types, increasing the cluster’s resilience.
Capacity Awareness
By configuring the EMR cluster nodes with instance fleets, EMR will optimize clusters by analyzing different Availability Zones to find Spot capacity pools optimized for availability and cost.
Learn more about best practices for configuring clusters on EMR workloads with Spot for transient and long-running workloads.
Customer Case Studies
Additional Resources
Read this blog to learn best practices for running Apache Spark on EMR with Spot Instances.
Learn about best practices on how to scale big data workloads, as well as how to process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.
Follow this quick tutorial to set up your first EMR cluster with Spot Instances. This self-guided tutorial includes detailed instructions and step-by-step screenshots to help you start running EMR with Spot Instances in minutes.