AWS Blog

New – CloudWatch Metrics for Spot Fleets

by Jeff Barr | on | in Amazon EC2 | | Comments

You can launch an EC2 Spot fleet with a couple of clicks. Once launched, the fleet allows you to draw resources from multiple pools of capacity, giving you access to cost-effective compute power regardless of the fleet size (from one instance to many thousands). For more information about this important EC2 feature, read my posts: Amazon EC2 Spot Fleet API – Manage Thousands of Spot Instances with One Request and Spot Fleet Update – Console Support, Fleet Scaling, CloudFormation.

I like to think of each Spot fleet as a single, collective entity. After a fleet has been launched, it is an autonomous group of EC2 instances. The instances may come and go from time to time as Spot prices change (and your mix of instances is altered in order to deliver results as cost-effectively as possible) or if the fleet’s capacity is updated, but the fleet itself retains its identity and its properties.

New Spot Fleet Metrics
In order to make it even easier for you to manage, monitor, and scale your Spot fleets as collective entities, we are introducing a new set of Spot fleet CloudWatch metrics.

The metrics are reported across multiple dimensions: for each Spot fleet, for each Availability Zone utilized by each Spot fleet, for each EC2 instance type within the fleet, and for each Availability Zone / instance type combination.

The following metrics are reported for each Spot fleet (you will need to enable EC2 Detailed Monitoring in order to ensure that they are all published):

  • AvailableInstancePoolsCount
  • BidsSubmittedForCapacity
  • CPUUtilization
  • DiskReadBytes
  • DiskReadOps
  • DiskWriteBytes
  • DiskWriteOps
  • EligibleInstancePoolCount
  • FulfilledCapacity
  • MaxPercentCapacityAllocation
  • NetworkIn
  • NetworkOut
  • PendingCapacity
  • StatusCheckFailed
  • StatusCheckFailed_Instance
  • StatusCheckFailed_System
  • TargetCapacity
  • TerminatingCapacity

Some of the metrics will give you some insights into the operation of the Spot fleet bidding process. For example:

  • AvailableInstancePoolsCount – Indicates the number of instance pools included in the Spot fleet request.
  • BidsSubmittedForCapacity – Indicates the number of bids that have been made for Spot fleet capacity.
  • EligibleInstancePoolsCount – Indicates the number of instance pools that are eligible for Spot instance requests. A pool is ineligible when either (1) The Spot price is higher than the On-Demand price or (2) the bid price is lower than the Spot price.
  • FulfilledCapacity – Indicates the amount of capacity that has been fulfilled for the fleet.
  • PercentCapacityAllocation – Indicates the percent of capacity allocated for the given dimension. You can use this in conjunction with the instance type dimension to determine the percent of capacity allocated to a given instance type.
  • PendingCapacity – The difference between TargetCapacity and FulfilledCapacity.
  • TargetCapacity – The currently requested target capacity for the Spot fleet.
  • TerminatingCapacity – The fleet capacity for instances that have received Spot instance termination notices.

These metrics will allow you to determine the overall status and performance of each of your Spot fleets. As you can see from the names of the metrics, you can easily observe the disk, CPU, and network resources consumed by the fleet. You can also get a sense for the work that is happening behind the scenes as bids are placed on your behalf for Spot capacity.

You can further inspect the following metrics across the Availability Zone and/or instance type dimensions:

  • CPUUtilization
  • DiskReadBytes
  • DiskReadOps
  • DiskWriteBytes
  • FulfilledCapacity
  • NetworkIn
  • NetworkOut
  • StatusCheckFailed
  • StatusCheckFailed_Instance
  • StatusCheckFailed_System

These metrics will allow you to see if you have an acceptable distribution of load across Availability Zones and/or instance types.

You can aggregate these metrics using Max, Min, or Avg in order to observe the overall utilization of your fleet. However, be aware that using Avg does not always make sense when used across a fleet comprised of two or more types of instances!

Available Now
The new metrics are available now.

Jeff;