Deep dive into Fargate Spot to run your ECS Tasks for up to 70% less
Author: Pritam Pal, Sr. EC2 Spot Specialist SA
AWS launched AWS Fargate Spot during late 2019 for customers looking for a cost effective way to run containers. This blog dives deep into how to use ECS Fargate Spot and Fargate Tasks to lower the cost of your workloads. I explain existing concepts like Container Stop Timeout, catching SIGTERM for graceful shut down, and introduce you to new Amazon Elastic Container Service (Amazon ECS) concepts like capacity providers and service events.
In 2017, AWS launched AWS Fargate for Amazon ECS. Fargate allows you to spend less time managing Amazon EC2 instances and more time building. Fargate Spot is a new purchase option that allows customers to launch Tasks on spare capacity with a steep discount. A Spot task is almost indistinguishable from an On-Demand Task with the following exceptions:
Price (per CPU-Hour and GB-Hour) of a Spot Task is variable, ranging between 50% to 70% off the price of an On-Demand Task, and a Fargate Spot Task may be interrupted (i.e stopped) when AWS needs the capacity back.
Fargate Spot runs on the same principle as Amazon EC2 Spot Instances. Your tasks run on spare capacity in the AWS Cloud. If you request to run your Task on Fargate Spot, your Tasks will run when capacity for Fargate Spot is available. As these Tasks run on spare capacity, you receive a two-minute notification when AWS needs capacity back, just like Spot Instances.
In this blog, I walk through how to launch a Fargate Spot Task using the AWS Management Console and command line interface (CLI), how to handle termination notices, what the Service Task Placement Failure Event looks like and other best practices to make sure you are a Fargate Spot champion.
ECS Fargate concepts
Before we go deep, let’s explore some ECS concepts, which we will use for Fargate Spot in this blog post.
StopTask and stopTimeout
Because Fargate Spot task may be interrupted with two minutes notice, you need to make sure you gracefully exit. For this, you can use concepts like stopTimeout parameters. When StopTask is called on a task, the equivalent of Docker stop is issued to the containers running in the task. If the container handles the SIGTERM value gracefully and exits within 30 seconds from receiving it, no SIGKILL signal is sent. You typically use stopTimeout parameter of the task definition to control this behavior. stopTimeout is time duration (in seconds) to wait before the container is forcefully stopped if it doesn’t exit normally on its own. For Fargate 1.3.0 or later, the max stop timeout value is 120 seconds. If the parameter is not specified, the default value of 30 seconds is used.
Capacity providers are a new way to manage compute capacity for containers. This tool allows the application to define its requirements for how it uses the capacity. With capacity providers, you can define flexible rules for how containerized workloads run on different types of compute capacity, and manage the scaling of the capacity. Capacity providers improve the availability, scalability, and cost of running tasks and services on ECS. As of now, each cluster can have up to six capacity providers and an optional default capacity provider strategy, which determines how the tasks are spread across the capacity providers. To run your tasks, you can either use the default capacity provider strategy or specify one of your own.
AWS Fargate and AWS Fargate Spot capacity providers do not need to be created. They are available to all accounts, and only need to be associated with a cluster to be available for use. When a new cluster is created using the Amazon ECS console, along with the Networking only cluster template, the FARGATE and FARGATE_SPOT capacity providers are associated with the new cluster automatically.
A cluster may contain a mix of FARGATE, FARGATE_SPOT and Auto Scaling Group (ASG) capacity providers, however at this moment, a capacity provider strategy may only contain either FARGATE or Auto Scaling Group capacity providers, but not both.
Now that I covered ECS Fargate concepts, lets jump into the technical walk through.
Launch ECS Fargate Spot Task using AWS Management Console
1. Open the Amazon ECS console
2. From the navigation bar, select the Region to use
3. In the navigation pane, choose Clusters
4. On the Clusters page, choose Create Cluster
5. Create a Networking only Cluster
With this option, you can launch a cluster with a new VPC to use for Fargate tasks. The FARGATE and FARGATE_SPOT capacity providers are automatically associated with the cluster, as shown in the following image.
6. Click on Update Cluster on top right-hand side to set a capacity provider strategy
In this example, I use a combination of FARGATE_SPOT and FARGATE capacity providers. I selected a Weight of 4 for FARGATE_SPOT and 1 for FARGATE. This means that for every five Tasks, four are started on FARGATE_SPOT and one on FARGATE. You can distribute this however you want. More tasks on Fargate Spot means more savings. But, if your workload requires high availability and you are not comfortable with interruptions, start with a ratio that works for you.
7. Let’s create a Task Definition first. Here are some great Task definitions to start with. Find the Task Definition link of left navigation panel, click Create a New Task Definition, Choose Fargate launch type, scroll down, near the bottom of the page find the Configure via JSON button. Delete the pre-populated JSON entry, copy the sample Fargate WebApp task definition from below and paste. Click Save. Click Create.
8. Now our Task definition is ready, we will run the same Task definition. Select the Task definition you just created, click Action, Run Task. Enter the number of Tasks you want to run. In this case I chose 10. After configuring the VPC and security groups, click Run Task.
The Run Task command from the last step starts ten Tasks, out of which eight Tasks launch on FARGATE_SPOT and two launch on FARGATE (The ratio I setup is 4:1). You can see the ratio by clicking on any Task, and finding the “Capacity provider” value for that Task under the details tab. Currently, there is no option to view all Tasks with particular “Capacity provider” in the Run Task console.
In next section, I explore how to Launch Fargate Spot Tasks using the AWS CLI.
Launch ECS Fargate Spot Task using AWS CLI
Create a cluster with capacity providers
While creating a new Cluster using CLI, you must specify capacity providers. In the following example, I specified two capacity providers FARGATE and FARGATE_SPOT.
Enter the following code to specify the capacity providers:
The Output of this command should result in this:
Launching a Task
Once the cluster is created, you can launch a Fargate Spot Task by calling RunTask and providing the Spot capacity provider in the –capacity-provider-strategy field. You also need to specify:
- A task definition
- Weight options for capacity providers
- A network configuration like Subnets, Security Groups
- How many Tasks you want to run
I defined these specifications in the following code:
If you specify count=10, and weight =1 for both providers, it would start 5 FARGATE_SPOT and 5 FARGATE Tasks.
Creating a service
In the example below, you create a service with FARGATE_SPOT only.
You can also create a service with a mix of Spot and On-Demand Tasks by calling CreateService and providing both Spot and On-Demand capacity providers in the capacity-provider-strategy field.
The base attribute is an optional field that says there should be at least four On-Demand Tasks (default base is 0, you cannot specify more than one capacity provider with a non-zero base). The weight is another optional field that says for the six remaining Tasks that are not managed by the base attribute, there should be one On-Demand Tasks for every two Spot Tasks.
Add the Fargate and Fargate Spot capacity providers to an existing cluster
How to handle Fargate Spot termination notices
By design, Fargate Spot is an interruptible service. When tasks using FARGATE_SPOT are stopped due to a Spot interruption, a two-minute warning is sent before a task is stopped.
The warning is sent as a task state change event to Amazon EventBridge and a SIGTERM signal to the running task. When using Fargate Spot as part of a service, the service scheduler will receive the interruption signal and attempt to launch additional Tasks on Fargate Spot if capacity is available.
To ensure that our containers exit gracefully before the Task stops, the following can be configured:
A stopTimeout value of 120 seconds (2 minutes) or less can be specified in the container definition that the task is using. Specifying a stopTimeout value gives us time between the moment the Task state change event is received and the point at which the container is forcefully stopped.
The SIGTERM signal must be received from within the container to perform any cleanup actions.
The following is a snippet of a Task state change event displaying the stopped reason and stop code for a Fargate Spot interruption.
Example service Task placement failure event
In case FARGATE_SPOT can’t place a Task due to capacity constraints; service Task placement failure events are delivered.
In the following example, the task attempted to use the FARGATE_SPOT capacity provider, but the service scheduler was unable to acquire any Fargate Spot capacity.
Amazon EventBridge enables you to automate your AWS services, and respond automatically to system events such as application availability issues or resource changes. Events from AWS services are delivered to EventBridge in near real-time. You can write simple rules to indicate which events are of interest to you and what automated actions to take when an event matches a rule.
More details on how to use Amazon ECS Events can be found here.
Fargate Spot pricing
With AWS Fargate, there are no upfront payments and you only pay for the resources that you use. You pay for the amount of vCPU and memory resources consumed by your containerized applications.
The price for Spot CPU-Hour and GB-Hour is the same across all Availability Zones and Task Configurations. However, the price varies throughout the day. The latest price is available at Fargate Pricing page. Pricing is per second with a 1-minute minimum. Duration is calculated from the time you start to download your container image (Docker pull) until the Task terminates, rounded up to the nearest second.
Fargate Spot best practices
As I wrap up, I want to focus on a few best practices about Fargate Spot.
- Fargate Spot is great for stateless, fault-tolerant workloads, but don’t rely solely on Spot Tasks for critical workloads, configure a mix of regular Fargate Tasks
- Applications running on Fargate Spot should be fault-tolerant
- Handle interruptions gracefully by catching SIGTERM signals
Fargate Spot is a great fit for parallelizable workloads like image rendering, Monte Carlo simulations, and genomic processing. However, customers can also use Fargate Spot for Tasks that run as a part of ECS services such as websites and APIs which require high availability.
ECS Fargate has already made its super easy to run containerized workloads without worrying about the setup and managing infrastructure. Fargate Spot makes it more affordable for your price sensitive workloads. With right mix of FARGATE and FARGATE_SPOT capacity providers you can get the optimal capacity for you tasks in budget.
Pritam is Sr. Specialist Solutions Architect in EC2 Spot team. For last 13 years he evangelized DevOps and Cloud adoption across industries and verticals. He likes to deep dive and find solutions to every day problems.