AWS Big Data Blog

Ran Sheinberg

Author: Ran Sheinberg

Optimizing Amazon EMR for resilience and cost with capacity-optimized Spot Instances

Amazon EMR now supports the capacity-optimized allocation strategy for Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances for launching Spot Instances from the most available Spot Instance capacity pools by analyzing capacity metrics in real time. You can now specify up to 15 instance types in your EMR task instance fleet configuration. This provides Amazon […]

Best practices for running Apache Spark applications using Amazon EC2 Spot Instances with Amazon EMR

In this blog post, we are going to focus on cost-optimizing and efficiently running Spark applications on Amazon EMR by using Spot Instances. We recommend several best practices to increase the fault tolerance of your Spark applications and use Spot Instances. These work without compromising availability or having a large impact on performance or the length of your jobs.