Run time sensitive workloads on ECS Fargate with clock accuracy tracking

Introduction

In part 1 and part 2 of this series, the importance of measuring time accuracy and relevant concepts were discussed. Additionally, we covered specifics on ways to put those concepts into practice, track metrics using Amazon CloudWatch and implement a practical solution for Amazon Elastic Compute Cloud (Amazon EC2) instances. In this part 3, we’ll use those concepts and apply them to the containers ecosystem and containerized workloads running on Amazon Elastic Container Service (Amazon ECS). This capability uses Amazon Time Sync Service to measure clock accuracy and provide the clock error bound for applications running in containers.

Solution overview

An increasing number of applications are being migrated to or are natively built for containers. Amazon Web Services (AWS) offers AWS Fargate as the convenient choice for running containerized workloads without having to manage the underlying servers and clusters of Amazon EC2 instances. As the number of applications hosted on AWS Fargate increases, the service offers features that help with adoption for those applications that have complex requirements. Time-sensitive workloads depend heavily on system time accuracy and synchronization. Operators now have built-in options in Amazon ECS Fargate they can rely on.

This post aims to extend the concepts already covered in the previous series and expand the solution to containers deployed using Amazon ECS Fargate. We’ll explore the practical application of this feature in a real world scenario. Amazon ECS made time accuracy metrics and calculations previously available in the Task Metadata endpoint version 4, which can be consumed directly by the containers. This post explains how to read these metrics and how to publish them into Amazon CloudWatch (i.e., which achieves the similar results to the Part 2) in Amazon ECS Fargate applications.

Walkthrough

The following exercise walks you through the steps to deploy a sample AWS Fargate task and then measure and track the time accuracy. There are two different approaches to monitor time with a sensitive workload. You could monitor the time within the application container or deploy a sidecar container with the time monitoring logic. In this walkthrough, we’ll deploy an Amazon ECS task with both an application container that monitors the time as well as a side car helper container. This walkthrough showcases both patterns, but in a real-world scenario only one may be used:

On demand checking – The application itself queries the endpoint, checks the drift ,and decides how to proceed. For demonstration purposes, our application will fetch the current metric values and display them. Generally, the application uses these metrics to make internal decisions whether to execute business logic rather than showing the values in the user interface (UI).
A cron-job approach – For regularly publishing the values as an Amazon CloudWatch metric. Running corn-jobs within containers is an interesting topic to explore. It allows us to apply the concept of a sidecar worker container, which can be also extrapolated to other use cases and suits our scenario. We can use this worker as a cron-job manager for regularly executing a script to check the clock synchronization in our containerized application. This cron-job worker can be re-utilized in any other scenario where the main application relies on additional periodic tasks.

Note – It is highly recommended to test this setup in a testing AWS Account, where you can freely explore the options.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account with necessary permissions to create the resources.
AWS Command Line Interface (AWS CLI) with appropriate credentials
Basic knowledge of creating and deploying applications using Docker containers on Amazon ECS.
Account permissions to create custom metrics and alerts in Amazon CloudWatch.
An Amazon Simple Notification Service (Amazon SNS) topic configured to deliver notifications.

Step 1. Clone the project repository

Clone the project GitHub repository to your local machine:

$ git clone https://github.com/aws-samples/amazon-ecs-fargate-clock-accuracy

Step 2. Explore and understand the project

Application code

The main application is a simple Python web engine, which displays a sample website. The application consumes the Task metadata endpoint, fetches the current clock timing metrics, and makes decisions based on those values. For example, before starting a critical operation, the application can perform this consultation to decide if the clock error bound is within the acceptable values and in sync before proceeding.

Sidecar worker

This worker runs alongside the main application and its function is to run periodic scripts using cron. In our scenario, the periodic task is a shell script that collects metadata and timing metrics from the Amazon ECS Fargate Tasks and then publishes them as an Amazon CloudWatch custom metric. The following Dockerfile installs the required packages to filter the data and interact with AWS:

FROM alpine:latest

# Upgrade and install required the packages
RUN apk update && apk add jq aws-cli

# Add our metrics script into a proper destination directory
RUN mkdir -p /opt/customscripts
COPY clockcheck.sh /opt/customscripts
RUN chmod +x /opt/customscripts/clockcheck.sh

# Setup our cron job
RUN crontab -l | { cat ; echo "* * * * * /opt/customscripts/clockcheck.sh" ; } | crontab -

# Run cron daemon in foreground with loglevel 2
CMD [ "crond", "-l", "2", "-f" ]

The script filters the Task metadata in order to publish the metrics under a convenient namespace. For simplicity, we choose to publish them under the Containers Insights namespace, alongside all the other relevant Task metrics. The metrics will be grouped by Cluster, Task ID, and Family. The script will fetch details from the metadata endpoint:

TASK_DETAILS=$(wget -qO- ${ECS_CONTAINER_METADATA_URI_V4}/task)
CLOCK_ERROR_BOUND=$(echo $TASK_DETAILS | jq -r '.ClockDrift.ClockErrorBound')
REFERENCE_TIMESTAMP=$(echo $TASK_DETAILS | jq -r '.ClockDrift.ReferenceTimestamp')
CLOCK_SYNCHRONIZATION_STATUS=$(echo $TASK_DETAILS | jq -r '.ClockDrift.ClockSynchronizationStatus')
CLUSTER=$(echo $TASK_DETAILS | jq -r '.Cluster' | awk -F '/' '{print $NF}')
SERVICE_NAME=$(echo $TASK_DETAILS | jq -r '.ServiceName')
FAMILY=$(echo $TASK_DETAILS | jq -r '.Family')
TASK_ID=$(echo $TASK_DETAILS | jq -r '.TaskARN' | awk -F '/' '{print $NF}')

For checking the complete script, please refer here.

Task definition

The Task definition has the following prerequisites:

Requires the Task Execution Role, which you can create with this documentation. You can use the AmazonECSTaskExecutionRolePolicy, which contains all the required permissions for the most common use cases.
Requires a Task IAM role, for being able to publish the metrics to Amazon CloudWatch.

We’ve provided a sample reference Task definition that you can use for building your applications.

Step 3. Create the required infrastructure resources

There are several resources that need to be provisioned before uploading our container images and deploying our Amazon ECS Tasks. The required resources include an Amazon ECS Cluster, an Amazon ECR private repository, Amazon ECS task definitions, and the required AWS IAM roles and policies. For simplifying these steps, we’ve provided an AWS CloudFormation template that automates the process.

The project includes a Makefile, which automates both the AWS CloudFormation deployment and container image builds. It can be used by running the following command:

$&gt; make all ACCOUNTID='123456789123' REGION='aa-bbbbb-X'

and then proceed to Step 5 directly.

In case you want to continue without the Makefile, you can create a Stack using the following steps to deploy the AWS CloudFormation template. Alternatively, the following CLI command does the same deployment from the command line:

$> aws --region=<your-aws-region> cloudformation create-stack \
      --template-body file://ecs-fargate-clock-accuracy.yaml \
      --stack-name ecsclockdemo \
      --disable-rollback \
      --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM

For reference, by default both the Amazon ECS Cluster and the Amazon ECR repository will be named ecsclockdemo. In case you don’t want to make use of this template, you can provision and create the above mentioned resources on your own following your preferences and customization by using the material provided in the GitHub repository. Also, please make sure you replace the placeholders in the sample Task Definition.

Step 4. Build and publish the Docker Images

A. Export relevant environment variables. Please replace the placeholders <your-aws-region> and <your-account-ID> with the AWS Region where you are working and your AWS Account ID, respectively:

$> export REGION=<your-aws-region>
$> export ACCOUNTID=<your-account-ID>
$> exrpot REPONAME=ecsclockdemo

B. Authenticate with Amazon ECR

$> aws ecr get-login-password --region ${REGION} | docker login --username AWS --password-stdin ${ACCOUNTID}.dkr.ecr.${REGION}.amazonaws.com

C. Build and push the Python application Docker image

$> cd app-python
$> docker build -t app-python .
$> docker tag app-python:latest ${ACCOUNTID}.dkr.ecr.${REGION}.amazonaws.com/${REPONAME}:app-python
$> docker push ${ACCOUNTID}.dkr.ecr.${REGION}.amazonaws.com/${REPONAME}:app-python

D. Build and push the cron worker Docker image

$> cd cron-worker
$> docker build -t cron-worker .
$> docker tag cron-worker:latest ${ACCOUNTID}.dkr.ecr.${REGION}.amazonaws.com/${REPONAME}:cron-worker
$> docker push ${ACCOUNTID}.dkr.ecr.${REGION}.amazonaws.com/${REPONAME}:cron-worker

Step 5. Run your Amazon ECS Fargate Tasks

At this stage, all the required resources are ready and our container images available for deployment. Now, you can either run a single stand-alone Amazon ECS Tasks or create an Amazon ECS service. In either case, make sure you select the ECS Cluster provisioned for this demonstration and the ecsclockdemo task definition.

As a quick reference, you can create a sample Amazon ECS Service as follows (make sure to replace the subnets and securityGroups placeholders with appropriate values):

aws ecs create-service \
  --cluster ecsclockdemo \
  --task-definition ecsclockdemo \
  --enable-execute-command \
  --service-name service-ecsclockdemo \
  --desired-count 6 \
  --network-configuration "awsvpcConfiguration={subnets=['subnet-123456'],securityGroups=['sg-12345678'],assignPublicIp=ENABLED}"

In either case, a special mention to the network configuration or AwsVpcConfiguration construct if using the CLI. You’ll need to specify here both the subnets and the Security Groups. It is important that: You select public subnets for your setup, set the flag assignPublicIp=ENABLED and ensure the selected Security Group allows access from your IP location to port 80. This ensures that the test web application is reachable from your location over public internet. These directives are assuming you are working in a testing AWS Account where you are free to explore options. If this isn’t the case, then please consider evaluating more restrictive options in accordance with your internal policies.

Step 6. Test the metrics and the application

You can refer to Amazon CloudWatch Metrics for finding the relevant and published data. By default, metrics are published under the ECS/ContainerInsights namespace and grouped by the ClusterName, Family, TaskID dimension. The metric published corresponds to the Clock error bound metric, which acts as a proxy for clock accuracy and measures the worst case scenario providing an upper bound for an acceptable error margin.

Graph demonstrating true time and instance clock deviation within the acceptable error margin.

Fig 1. The ClockErrorBound provides a boundary for the worst case offset of a clock regarding True time. Within these boundaries, the clock readings can be considered in sync.

You can create alarms and dashboards based on this deviation in order to alert if your tolerated threshold is crossed. Details regarding how to create those alarms can be referenced in the part 2 of this series.

Image demonstrating a sample custom CloudWatch Dashboard that can be built based on the collected timing metrics.

Fig 2. Dashboard showing a Service with 6 Tasks. We can observe that Task C has exceeded the acceptable threshold of 1 ms.

Finally, you can also navigate to the recently launched Amazon ECS Task through their Public IP address and open the web application. Within the Amazon ECS Service Console, you can navigate within the ecsclockdemo Amazon ECS Cluster and select the Tasks tab. You can then open the running Task by selecting its ID and navigate to the Public IP details within the Configuration section. The hyperlink opens the application in your browser where you can see the real time metrics.

Cleaning up

To avoid unnecessary cost, clean up the resources that we just created during this walkthrough. You can delete the AWS CloudFormation Stack once you finish this tutorial. Please note that you’ll need to delete the Amazon ECS resources (such as Tasks, Services, and ECR Docker Images) prior deleting the AWS CloudFormation Stack. These resources cannot be deleted via AWS CloudFormation if they aren’t empty.

Conclusion

In this post, we showed you the time sensitive applications and alternatives for accurately monitoring clock deviations using AWS Fargate built-in features. We also explored AWS CloudWatch metrics and dashboards that allow you to properly monitor and alert. The solutions covered are flexible, open source, and available in our GitHub repository. We provided templates and blueprints that can be adapted to your needs. Feel free to raise issues and send pull request in GitHub for bug fixes and improvements. We hope that this post helps to build your awareness on the options available for migrating time-sensitive applications to Amazon ECS Fargate.

Containers