Containers
Optimizing Amazon Elastic Container Service for cost using scheduled scaling
Elasticity and cost have always been major factors in improving the operational efficiency of organizations, which in turn drives business transformation and agility. Elasticity is defined as the ability of the infrastructure (including application) to be able to seamlessly scale out and scale in based on the load. This is also called auto scaling. If the scale out/in happens based on a schedule, it is called scheduled auto scaling. This is critical for all our customers who spin up resources at the start of their activity and spin them down at the end of it. This not only helps in effectively managing the extra load of the system during peak times but also directly impacts cost as the extra infrastructure is scaled down when not in use. This post will combine Amazon Elastic Container Service (Amazon ECS) scheduled scaling with capacity provider and Spot integration to come up with a simple strategy/guidance for cost optimization.
What is an ECS capacity provider?
Amazon ECS capacity providers enable you to manage the infrastructure that the tasks in your clusters use. Each cluster can have one or more capacity providers and an optional default capacity provider strategy. The capacity provider strategy determines how the tasks are spread across the capacity providers in a cluster. When you run a task or create a service, you may use the cluster’s default capacity provider strategy or specify a capacity provider strategy that overrides the cluster’s default strategy.
ECS Capacity providers consist of the following components – capacity provider and capacity provider strategy.
A capacity provider is used in association with a cluster to determine the infrastructure that a task runs on. For Amazon ECS on Fargate users, the FARGATE and FARGATE_SPOT capacity providers are provided automatically. For more information, see using AWS Fargate capacity providers. For Amazon ECS on Amazon EC2 users, a capacity provider consists of a name, an Auto Scaling group, and the settings for managed scaling and managed termination protection. This type of capacity provider is used in cluster auto scaling. For more information, see Auto Scaling group capacity providers. One or more capacity providers are specified in a capacity provider strategy, which is then associated with a cluster as well as a service.
A capacity provider strategy gives you control over how your tasks use one or more capacity providers. When you run a task or create a service, you specify a capacity provider strategy. A capacity provider strategy consists of one or more capacity providers with an optional base and weight specified for each provider. The base value designates how many tasks, at a minimum, to run on the specified capacity provider. Only one capacity provider in a capacity provider strategy can have a base defined. The weight value designates the relative percentage of the total number of launched tasks that should use the specified capacity provider. For example, if you have a strategy that contains two capacity providers, and both have a weight of 1, then after the base is satisfied, the tasks will be split evenly across the two capacity providers. Using that same logic, if you specify a weight of 1 for capacityProviderA and a weight of 4 for capacityProviderB, then for every one task that is run using capacityProviderA, four tasks would use capacityProviderB.
A default capacity provider strategy is associated with each Amazon ECS cluster. This determines the capacity provider strategy the cluster will use if no other capacity provider strategy or launch type is specified when running a task or creating a service.
Solution Overview
Infrastructure setup
[Note: All commands below are run in us-east-2. Please update the region accordingly as per your specific requirements]
- Save the CloudFormation script below in a file called ecs-cp-infra.yaml.
AWSTemplateFormatVersion: 2010-09-09
Description: This template creates an empty ECS cluster along with a Spot and OnDemand Capacity provider
Parameters:
InstanceType:
Type: String
Default: t2.small
AllowedValues:
- t2.micro
- t2.small
- m4.large
Description: Enter t2.micro, t2.small, or m4.large. Default is t2.small
ECSAMI:
Description: AMI ID
Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
Default: /aws/service/ecs/optimized-ami/amazon-linux-2/recommended/image_id
ClusterName:
Description: Cluster Name
Type: String
Default: SchTestCluster
Resources:
myVPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsSupport: true
EnableDnsHostnames: true
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: Test
InternetGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
InternetGatewayId: !Ref InternetGateway
VpcId: !Ref myVPC
mySubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId:
Ref: myVPC
CidrBlock: 10.0.0.0/24
MapPublicIpOnLaunch: true
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref myVPC
DefaultPublicRoute:
Type: AWS::EC2::Route
DependsOn: InternetGatewayAttachment
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
mySubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref mySubnet1
InstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow http and https to client host
VpcId:
Ref: myVPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
EcsServiceLinkedRole:
Type: 'AWS::IAM::ServiceLinkedRole'
Properties:
AWSServiceName: ecs.amazonaws.com
Description: "Role to enable Amazon ECS to manage your cluster."
ecsInstanceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
RolePolicies:
Type: AWS::IAM::Policy
Properties:
PolicyName: ecsInstance
PolicyDocument:
Statement:
- Effect: Allow
Action:
- ec2:DescribeTags
- ecs:CreateCluster
- ecs:DeregisterContainerInstance
- ecs:DiscoverPollEndpoint
- ecs:Poll
- ecs:RegisterContainerInstance
- ecs:StartTelemetrySession
- ecs:UpdateContainerInstancesState
- ecs:Submit*
- ecr:GetAuthorizationToken
- ecr:BatchCheckLayerAvailability
- ecr:GetDownloadUrlForLayer
- ecr:BatchGetImage
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
Roles:
- Ref: ecsInstanceRole
ecsInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: "/"
Roles:
- Ref: ecsInstanceRole
ecsTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: schTestEcsTaskExecRole
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs-tasks.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: AmazonECSTaskExecutionRolePolicy
PolicyDocument:
Statement:
- Effect: Allow
Action:
# Allow the ECS Tasks to download images from ECR
- 'ecr:GetAuthorizationToken'
- 'ecr:BatchCheckLayerAvailability'
- 'ecr:GetDownloadUrlForLayer'
- 'ecr:BatchGetImage'
# Allow the ECS tasks to upload logs to CloudWatch
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
Resource: '*'
MyCluster:
Type: 'AWS::ECS::Cluster'
Properties:
ClusterName:
Ref: "ClusterName"
OnDemandConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId:
Ref: "ECSAMI"
SecurityGroups:
- Ref: "InstanceSecurityGroup"
IamInstanceProfile:
Ref: "ecsInstanceProfile"
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
echo ECS_CLUSTER=${ClusterName} >> /etc/ecs/ecs.config
InstanceType:
Ref: "InstanceType"
BlockDeviceMappings:
- DeviceName: "/dev/sdk"
Ebs:
VolumeSize: '50'
- DeviceName: "/dev/sdc"
VirtualName: ephemeral0
OnDemandServerGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- Ref: "mySubnet1"
LaunchConfigurationName:
Ref: OnDemandConfig
MinSize: '1'
MaxSize: '3'
SpotConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId:
Ref: "ECSAMI"
SecurityGroups:
- Ref: "InstanceSecurityGroup"
IamInstanceProfile:
Ref: "ecsInstanceProfile"
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
echo ECS_CLUSTER=${ClusterName} >> /etc/ecs/ecs.config
InstanceType:
Ref: "InstanceType"
SpotPrice: "0.05"
SpotServerGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- Ref: "mySubnet1"
LaunchConfigurationName:
Ref: SpotConfig
MinSize: '1'
MaxSize: '3'
OnDemandCapacityProvider:
Type: AWS::ECS::CapacityProvider
Properties:
Name: OnDemandCapProvider
AutoScalingGroupProvider:
AutoScalingGroupArn:
Ref: OnDemandServerGroup
ManagedScaling:
MaximumScalingStepSize: 10
MinimumScalingStepSize: 1
Status: ENABLED
TargetCapacity: 100
Tags:
- Key: environment
Value: test
SpotCapacityProvider:
Type: AWS::ECS::CapacityProvider
Properties:
Name: SpotCapProvider
AutoScalingGroupProvider:
AutoScalingGroupArn:
Ref: SpotServerGroup
ManagedScaling:
MaximumScalingStepSize: 10
MinimumScalingStepSize: 1
Status: ENABLED
TargetCapacity: 100
Tags:
- Key: environment
Value: test
2. Run the CloudFormation template ecs-cp-infra.yaml [Check the deployment status in the CloudFormation Console]
aws cloudformation create-stack --stack-name cp-cap-provider-stack \
--template-body file://ecs-cp-infra.yaml \
--parameters ParameterKey=InstanceType,ParameterValue=t2.small \
ParameterKey=ECSAMI,ParameterValue=/aws/service/ecs/optimized-ami/amazon-linux-2/recommended/image_id \
ParameterKey=ClusterName,ParameterValue=SchTestCluster \
--capabilities CAPABILITY_NAMED_IAM --region us-east-2
- Creates a VPC with a public subnet
- Creates an empty ECS cluster
- Creates an On-Demand auto scaling group (ASG)
- Creates a Spot ASG
- Creates an On-Demand capacity provider
- Creates a Spot capacity provider
3. Associate the Spot and On-Demand capacity providers with the ECS cluster and set a default capacity provider strategy on the ECS cluster
aws ecs put-cluster-capacity-providers \
--cluster SchTestCluster \
--capacity-providers SpotCapProvider OnDemandCapProvider \
--default-capacity-provider-strategy capacityProvider=SpotCapProvider,weight=1,base=2 capacityProvider=OnDemandCapProvider,weight=1 \
--region us-east-2
- base of 2 for the Spot capacity provider and a weight of 1 [Provider1]
- base of 0 for the On-Demand capacity provider and a weight of 1 [Provider2]
Note:
- A base value of 2 ensures that the first 2 tasks are always started on Spot instances. A weight of 1 equally distributes the remaining tasks between the Spot and On-Demand capacity providers.
- You can also use a custom strategy with On-Demand capacity provider [Provider1] with a base of 2 and weight 1 and Spot capacity provider [Provider2] with weight of 1
4. [Optional – Only required if the task definition does not exist] Save the json below in a file called demo-sleep-taskdef.json.
{
"family": "demo-sleep-taskdef",
"containerDefinitions": [
{
"name": "sleep",
"image": "amazonlinux:2",
"memory": 20,
"essential": true,
"command": [
"sh",
"-c",
"sleep infinity"]
}],
"requiresCompatibilities": [
"EC2"]
}
aws ecs register-task-definition --cli-input-json file://demo-sleep-taskdef.json \
--region us-east-2
5. Create a service [it will be created with the Cluster’s Default Capacity Provider Strategy]
aws ecs create-service \
--cluster SchTestCluster \
--service-name SchTestService \
--task-definition demo-sleep-taskdef \
--desired-count 1 \
--region us-east-2
The default capacity provider strategy provides the option of using Spot instances as steady state for your workloads with On-Demand instances for burst traffic. This option is more aggressive in terms of cost savings but with a higher risk profile. This is best suited for applications that can handle the downtime of Spot instance interruptions.
The custom capacity provider strategy enables you to use On-Demand instances as steady state for your workloads with Spot instances for burst traffic. This option has smaller cost savings but also a lower risk profile. This is best suited for applications that have to be running 24/7 and cannot afford any downtime.
ECS scheduled scaling
To use scheduled scaling, create scheduled actions, which tell Application Auto Scaling to perform scaling activities at specific times. When you create a scheduled action, you specify the scalable target, when the scaling activity should occur, and the minimum and maximum capacity. At the specified time, Application Auto Scaling scales based on the new capacity values.
Before you can create a scheduled action, you must register the scalable target. Use the register-scalable-target command to register a new scalable target. The following command registers an ECS service with Application Auto Scaling. This will scale the number of tasks in the ECS Service from a minimum of 1 task to a maximum of 10 tasks using the desired count.
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/SchTestCluster/SchTestService \
--min-capacity 1 --max-capacity 10 \
--region us-east-2
[Note: Please update the date and times below as per your specific requirements]
To scale out one time to 10 tasks at 3PM EST (7:00 PM UTC)
aws application-autoscaling put-scheduled-action --service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/SchTestCluster/SchTestService \
--scheduled-action-name single-scaleout-action \
--schedule "at(2020-08-30T19:00:00)" \
--scalable-target-action MinCapacity=10,MaxCapacity=10 \
--region us-east-2
To scale in one time to 1 task at 4PM EST (8:00 PM UTC)
aws application-autoscaling put-scheduled-action --service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/SchTestCluster/SchTestService \
--scheduled-action-name single-scalein-action \
--schedule "at(2020-08-30T20:00:00)" \
--scalable-target-action MinCapacity=1,MaxCapacity=1 \
--region us-east-2
To scale out to 10 tasks every day at 8AM EST (12:00 PM UTC)
aws application-autoscaling put-scheduled-action --service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/SchTestCluster/SchTestService \
--scheduled-action-name cron-scaleout-action \
--schedule "cron(0 12 * * ? *)" \
--scalable-target-action MinCapacity=10,MaxCapacity=10 \
--region us-east-2
To scale in to 1 task every day at 6PM EST (10:00 PM UTC)
aws application-autoscaling put-scheduled-action --service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/SchTestCluster/SchTestService \
--scheduled-action-name cron-scalein-action \
--schedule "cron(0 22 * * ? *)" \
--scalable-target-action MinCapacity=1,MaxCapacity=1 \
--region us-east-2
At the date and time specified for –schedule, if the value specified for MaxCapacity is below the current capacity, Application Auto Scaling scales in to MaxCapacity and if the value specified for MinCapacity is above the current capacity, Application Auto Scaling scales out to MinCapacity
Conclusion
In this blog post, I have shown how to set up a scheduled scaling policy for an ECS service using Spot Capacity Provider as the primary provider to reduce cost. Consider using Reserved Instances in your On-Demand Capacity Provider [ASG] to further reduce your costs.
Reference Blogs/Documentation
https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-scheduled-scaling.html
https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduling_tasks.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cluster-capacity-providers.html