Measuring service chargeback in Amazon ECS
Contributed by Subhrangshu Kumar Sarkar, Sr. Technical Account Manager, and Shiva Kumar Subramanian, Sr. Technical Account Manager
Amazon Elastic Container Service (ECS) users have been asking us for a way to allocate cost to the deployed services in a shared Amazon ECS cluster. This blog post can help customers think through different techniques to allocate costs incurred by running Amazon ECS services to owners who include specific teams or individual users. The post dives in to one technique that gives customers a granular way to allocate costs to Amazon ECS service owners.
Amazon ECS pricing models
Amazon ECS has two pricing models. In the Amazon EC2 launch type model, you pay for the AWS resources (e.g., Amazon EC2 instances or Amazon EBS volumes) that you create to store and run your application. Right now, it’s difficult to calculate the aggregate cost of an Amazon ECS service that consists of multiple tasks. In the AWS Fargate launch type model, you pay for vCPU and memory resources that your containerized application requests. Although the user knows the cost that the tasks incur, there is no out-of-box way to associate that cost to a service.
There are two possible solutions to this problem.
A. Billing based on the usage of container instances in a partitioned cluster.
One solution for service chargeback is to associate specific container instances with respective teams or customers. Then use task placement constraints to restrict the services that they deploy to only those container instances. The following image shows how this solution works.
Here, user A is allowed to deploy services only the blue container instances and user B is allowed on the green ones. Both users can be charged based on the AWS resources they use. E.g. the EC2 instances and the ALB etc.
This solution is useful when you don’t want to host services from different teams or users on the same set of container instances. However, an Amazon ECS cluster is getting shared, and the end users are still getting charged for the Amazon EC2 instances and other AWS assets that they’re using rather than for the exact vCPU and memory resources that their service is using. The disadvantage to this approach is that you could have provisioned excess capacity for your users and end up wasting resources. You also need to use placement constraints in all of your task definitions.
B. Billing based on resource usage at the task level.
Another solution could be to develop a mechanism to let the Amazon ECS cluster owners calculate the aggregate cost of an Amazon ECS service that consists of multiple tasks. The solution would have a metering mechanism and a chargeback measurement. When deployed for Amazon EC2 launch type tasks, the metering mechanism tracks the vCPU and memory that Amazon ECS reserves in the tasks’ lifetime. Then, with the chargeback measurement, the cluster owner can associate a cost with these tasks based on the cost incurred by the container instances that they’re running on. The following image shows how this solution works.
Here, unlike the previous solution, both users can use all the container instances of the ECS cluster.
With this solution, customers can start using a shared Amazon ECS cluster to deploy their tasks on any of the container instances. After the solution has been deployed, the cost for a service can be calculated at any point in time, using the cluster and the service name as input parameters.
With Fargate tasks, the vCPU and memory usage details are already available in vCPU-hours and GB-hours, respectively. The chargeback measurement in the solution aggregates the CPU and memory reservation of all the tasks that ever ran as part of a service. It associates a cost to this aggregated CPU and memory reservation by multiplying it with Fargate’s per vCPU per hour and perGB per hour cost, respectively.
This solution has the following considerations:
- Amazon EC2 pricing: For the base price of the container instance, we’re considering the On-Demand price.
- Platform costs: Common costs for the cluster (the Amazon EBS volume that the containers are launched from, Amazon ECR, etc.) are treated as the platform cost for all of the services running on the cluster.
- Networking cost: When you’re using bridge or host networking, there is no mechanism to divide costs among different tasks that are launched on the container instance.
- Elastic Load Balancing or Application Load Balancer costs: If services sit behind multiple target groups of an Application Load Balancer, there is no direct way of dividing costs per target group.
The solution has two components: a metering mechanism and a chargeback measurement.
The metering mechanism consists of the following parts:
- Amazon ECS task state change events
- Amazon CloudWatch Events rule
- AWS Lambda function
- Amazon DynamoDB table
The chargeback measurement consists of the following parts:
- Python script
- AWS Price List Service API
The following image shows the architecture of the solution’s metering mechanism.
As part of the deployment of the Metering mechanism, the user needs to do the following.
- A CloudWatch Events rule is created by the user to trigger a Lambda function on an Amazon ECS task state change event. Typically, a task state change event is generated with a call to the StartTask, RunTask, and StopTask API operations or when an Amazon ECS service scheduler starts or stops a task.
- User needs to create a DynamoDB table, which the Lambda function can update.
- Every time the Lambda function is invoked, it updates the DynamoDB table with details of the Amazon ECS task.
With the first run of the metering mechanism, it takes stock of all running Amazon ECS tasks across all services across all clusters. This data resides in DynamoDB from then on, and the solution’s chargeback measurement uses it.
The following image shows the architecture of the chargeback measurement.
When you need to find the cost associated with a service, run the ecs-chargeback Python script with the cluster and service names as parameters. This script performs the following actions.
- Find all the tasks that have ever run or are currently running as part of the service.
- For each task, calculate the up time.
- For each task, find the container instance type (for Amazon EC2 type tasks).
- Find what percentage of the host’s compute or memory resources the task has reserved. If there is no task-level CPU reservation for Amazon EC2 launch type tasks, a CPU reservation of 128 CPU shares (0.125 vCPUs) is assumed. In Amazon EC2 launch type tasks, you have to specify memory reservation at the task or container level during creation of the task definition.
- Associate that percentage with a cost.
- (Optional) Use the following parameters:
- Duration: By default, the script shows the service cost for its complete uptime. You can use the duration parameter to get the cost for a particular month, the month to date, or the last n days.
- Weight: This parameter is a weighted fraction that you can use to disproportionately divide the instance cost between vCPU and memory. By default, this value is 0.5.
The vCPU and memory costs are calculated using the following formulas:
- Task vCPU cost = (task vCPU reservation/total vCPUs in the instance) * (cost of the instance) * (vCPU/memory weight) * task run time in seconds
- Task memory cost = (task memory reservation/total memory in the instance) * (cost of the instance) * (1- vCPU/memory weight) * task run time in seconds
Solution deployment and cost measurement
Here are the steps to deploy the solution in your AWS account and then calculate the service chargeback.
1. Create a DynamoDB table named ECSTaskStatus to capture details of an ECS task state change CloudWatch event.
Primary partition key: taskArn. Type: string.
Provision RCUs or WCUs depending on your Amazon ECS usage.
For the rest, keep the default values.
aws dynamodb create-table --table-name ECSTaskStatus \ --attribute-definitions AttributeName=taskArn,AttributeType=S \ --key-schema AttributeName=taskArn,KeyType=HASH \ --provisioned-throughput ReadCapacityUnits=10,WriteCapacityUnits=20
2. Create an IAM policy named LambdaECSTaskStatusPolicy that allows the Lambda function to make the following API calls. Create a local copy of the policy document LambdaECSTaskStatusPolicy.JSON from GitHub.
3. Create an IAM role named LambdaECSTaskStatusRole and attach the policy to the role. Replace <Policy ARN> with the Amazon Resource Name (ARN) of the IAM policy.
4. Create a Lambda function named ecsTaskStatus that PUTs or UPDATEs the Amazon ECS task details to the ECSTaskStatus DynamoDB table. This function has the following details:
o Runtime: Python 3.6.
o Memory setting: 128 MB.
o Timeout: 3 seconds.
o Execution role: LambdaECSTaskStatusRole.
o Code: ecsTaskStatus.py. Use the inline code editor on the Lambda console to author the function.
5. Create a CloudWatch Events rule for Amazon ECS task state change events and configure the Lambda function as the target. The function puts or updates items in the ECSTaskStatus DynamoDB table with every Amazon ECS task’s details.
a. Create the CloudWatch Events rule.
b. Add the Lambda function as a target to the CloudWatch Events rule. Replace <Lambda ARN> with the ARN of the Lambda function that you created in step 4.
c. Add permissions for CloudWatch Events to invoke Lambda. Replace <CW Events Rule ARN> with the ARN of the CloudWatch Events rule that you created in step 5a.
The solution invokes the Lambda function only when an Amazon ECS task state change event occurs. Therefore, when the solution is deployed, no event is raised for current running tasks, and task details aren’t populated into the DynamoDB table. If you want to meter current running tasks, you can run the script ecsTaskStatus-FirstRun.py after creation of the DynamoDB table. This populates all running tasks’ details into the DynamoDB table. The script is idempotent.
To find the cost for running a service, run the Python script ecs-chargeback, which has the following usage and arguments.
To calculate the cost that a service incurs with Amazon EC2 launch type tasks, run the script as follows.
The following is sample output of running this script.
To get the chargeback for Fargate launch type tasks, run the script as follows.
The following is sample output of this script.
This solution can help Amazon ECS users track and allocate costs for their deployed workloads. It might also help them save some costs by letting them share an Amazon ECS cluster among multiple users or teams. We welcome your comments and questions below. Please reach out to us if you would like to contribute to the solution.