AWS News Blog

New – Amazon EMR Instance Fleets

Today we’re excited to introduce a new feature for Amazon EMR clusters called instance fleets. Instance fleets gives you a wider variety of options and intelligence around instance provisioning. You can now provide a list of up to 5 instance types with corresponding weighted capacities and spot bid prices (including spot blocks)! EMR will automatically provision On-Demand and Spot capacity across these instance types when creating your cluster. This can make it easier and more cost effective to quickly obtain and maintain your desired capacity for your clusters.

You can also specify a list of Availability Zones and EMR will optimally launch your cluster in one of the AZs. EMR will also continue to rebalance your cluster in the case of Spot instance interruptions by replacing instances with any of the available types in your fleet. This will make it easier to maintain your cluster’s overall capacity. Instance fleets can be used instead of instance groups. Just like groups your cluster will have master, core, and task fleets.

Let’s take a look at the console updates to get an idea of how these fleets work.

We’ll start by navigating to the EMR console and clicking the Create Cluster button. That should bring us to our familiar EMR provisioning console where we can navigate to the advanced options near the top left.

We’ll select the latest EMR version (instance fleets are available for EMR versions 4.8.0 and greater, with the exception of 5.0.x) and click next.

Now we get to the good stuff! We’ll select the new instance fleet option in the hardware options.
Screenshot 2017-03-09 00.30.51

Now what I want to do is modify our core group to have a couple of instance types that will satisfy the needs of my cluster.

CoreFleetScreenshot

EMR will provision capacity in each instance fleet and availability zone to meet my requirements in the most cost effective way possible. The EMR console provides an easy mapping of vCPU to weighted capacity for each instance type, making it easy to use vCPU as the capacity unit (I want 16 total vCPUs in my core fleet). If the vCPU units don’t match my criteria for weighting instance types I can change the “Target capacity” selector to include arbitrary units and define my own weights (this is how the API/CLI consume capacity units as well).

When the cluster is being provisioned if it’s unable to obtain the desired spot capacity within a user defined timeout you can have it terminate or fall back onto On-Demand instances to provision the rest of the capacity.

All this functionality for instance fleets is also available from the AWS SDKs and the CLI. Let’s take a look at how we would provision our own instance fleet.

First we’ll create our configuration json in my-fleet-config.json:

[
  {
    "Name": "MasterFleet",
    "InstanceFleetType": "MASTER",
    "TargetOnDemandCapacity": 1,
    "InstanceTypeConfigs": [{"InstanceType": "m3.xlarge"}]
  },
  {
    "Name": "CoreFleet",
    "InstanceFleetType": "CORE",
    "TargetSpotCapacity": 11,
    "TargetOnDemandCapacity": 11,
    "LaunchSpecifications": {
      "SpotSpecification": {
        "TimeoutDurationMinutes": 20,
        "TimeoutAction": "SWITCH_TO_ON_DEMAND"
      }
    },
    "InstanceTypeConfigs": [
      {
        "InstanceType": "r4.xlarge",
        "BidPriceAsPercentageOfOnDemandPrice": 50,
        "WeightedCapacity": 1
      },
      {
        "InstanceType": "r4.2xlarge",
        "BidPriceAsPercentageOfOnDemandPrice": 50,
        "WeightedCapacity": 2
      },
      {
        "InstanceType": "r4.4xlarge",
        "BidPriceAsPercentageOfOnDemandPrice": 50,
        "WeightedCapacity": 4
      }
    ]
  }
]

Now that we have our configuration we can use the AWS CLI’s ’emr’ subcommand to create a new cluster with that configuration:

aws emr create-cluster --release-label emr-5.4.0 \
--applications Name=Spark,Name=Hive,Name=Zeppelin \
--service-role EMR_DefaultRole \
--ec2-attributes InstanceProfile="EMR_EC2_DefaultRole,SubnetIds=[subnet-1143da3c,subnet-2e27c012]" \
--instance-fleets file://my-fleet-config.json

If you’re eager to get started the feature is available now at no additional cost and you can find detailed documentation to help you get started here.

Thanks to the EMR service team for their help writing this post!

Randall Hunt

Randall Hunt

Randall Hunt

Senior Software Engineer and Technical Evangelist at AWS. Formerly of NASA, SpaceX, and MongoDB.