Containers

Improving deployment visibility for Amazon ECS services

When deploying software, it’s critical to have visibility into all stages of the deployment process. Knowing the status of ongoing deployments, troubleshooting issues when things go wrong, and having an audit trail of past deployments are essential for ensuring a safe and reliable release process. Amazon Elastic Container Service (Amazon ECS) now provides enhanced observability features to address these needs. With service deployments and service revisions, you can gain deeper insights into your Amazon ECS-based application deployments.

In Amazon ECS, a service is a resource that runs long-running applications, where a group of identical tasks are deployed, managed, and scaled by Amazon ECS. When it’s time to release a new version of your software, Amazon ECS manages the deployment process, gradually replacing the old tasks with the new ones. Amazon ECS comes with built-in safeguards to facilitate safe software releases, such as circuit breaking, where Amazon ECS can be configured to automatically roll back to the previous version of a service if a new deployment fails. With today’s launch you now have greater visibility into the deployment process. There are two new named Amazon ECS resources: service revisions and service deployments, and a new set of APIs: listServiceDeployments, describeServiceRevisions, and describeServiceDeployments.

First, service revisions provide a record of the workload configuration Amazon ECS is attempting to deploy. This includes the task definition, container image, and service-level parameters such as Amazon Elastic Block Store (Amazon EBS) volumes, load balancers, and service connect configuration.

Service deployments provide a comprehensive view of an ongoing or previous service revision deployment. You can observe the starting point of the deployment (the source revision), which deployment is being rolled out (the target revision), and the status of the deployment with any circuit breakers or Amazon CloudWatch alarms that you have configured.

In the past, it was difficult to track the history of your Amazon ECS deployments. With this release, each service deployment, whether successful or not, is retained for 90 days and accessible through the Amazon ECS console and the listServiceDeployments API. This allows you to review your Amazon ECS service deployment history and understand which service revision was used for each rollout.

By providing these enhanced visibility and traceability features, Amazon ECS empowers you to better monitor, troubleshoot, and manage your application deployments, ensuring a more reliable and transparent release process. The following diagram shows the relationship between the existing Amazon ECS resources and the new service deployment and service revision.

Diagram showing the relationship between ECS Service Revisions and ECS Service Deployments

Diagram 1: Showing the relationship between an Amazon ECS service revision and a service deployment.

Solution overview

In this section we provide a hands-on example, demonstrating how service revisions and service deployments deliver more visibility when deploying software onto Amazon ECS. In this walkthrough we create a new service and wait for it to become stable. Next, we will create a revision with a known bug, and demonstrate how the new service deployment APIs and new deployments tab in the Amazon ECS console, allow us to troubleshoot this error and capture when circuit breaking is trigged.

Prerequisites

The following prerequisites are necessary to complete this solution:

Walkthrough

1. First, we export the environment variables used throughout the walkthrough. These resources should already be created and existing in your AWS account.

export AWS_REGION=eu-north-1
export ECS_EXECUTION_ROLE="arn:aws:iam::111222333444:role/ecsTaskExecutionRole"
export ECS_CLUSTER="default"
export VPC_SUBNET_ONE="subnet-07bd4d10ea848a008"
export VPC_SUBNET_TWO="subnet-0ebc3139ba5dcf871"
export VPC_SECURITY_GROUP="sg-003bf5ba3cb1a1168"

2. We register a new ECS Task Definition constructed of a single container that sleeps indefinitely.

cat <<EOF >>taskdefinition_one.json
{
    "family": "deployment-demo",
    "executionRoleArn": "${ECS_EXECUTION_ROLE}",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {
            "name": "demo",
            "image": "public.ecr.aws/amazonlinux/amazonlinux:2023-minimal",
            "command": [
                "/bin/bash",
                "-c",
                "echo 'sleeping' && sleep infinity"
            ]
       }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "cpu": "256",
  "memory": "512"
}
EOF

aws ecs register-task-definition \
  --cli-input-json file://taskdefinition_one.json

3. After the task definition has been registered, we create a new Amazon ECS service running on AWS Fargate using the existing VPC subnets and security groups.

cat <<EOF >>service.json
{
    "cluster": "${ECS_CLUSTER}",
    "serviceName": "deployment-demo",
    "taskDefinition": "deployment-demo",
    "desiredCount": 1,
    "launchType": "FARGATE",
    "deploymentConfiguration": {
        "deploymentCircuitBreaker": {
            "enable": true,
            "rollback": true
        }
    },
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [
                "${VPC_SUBNET_ONE}",
                "${VPC_SUBNET_TWO}"
            ],
            "securityGroups": [
                "${VPC_SECURITY_GROUP}"
            ],
            "assignPublicIp": "DISABLED"
        }
    }
}
EOF

aws ecs create-service \
    --cli-input-json file://service.json

4. Using the new listServiceDeployments APIs we can now list the service deployments associated with this service.

aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}"
As it is the first time the service has been created, there will only be one deployment. Within this deployment, note that it is IN_PROGRESS with a target service revision of 8255880275929051605.
{
    "serviceDeployments": [
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/brooVKavhUZYne-zDRzvK",
            "serviceArn": "arn:aws:ecs:eu-north-1:111222333444:service/default/deployment-demo",
            "clusterArn": "arn:aws:ecs:eu-north-1:111222333444:cluster/default",
            "startedAt": "2024-11-04T11:25:58.642000+00:00",
            "createdAt": "2024-11-04T11:25:56.589000+00:00",
            "targetServiceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
            "status": "IN_PROGRESS"
        }
    ]
}

The deployments list can also be seen in a new deployments tab in the Amazon ECS console.

new_ecs_deployments_tab

5. We can inspect this individual deployment with the new describeServiceDeployments API to get more detail, such as the desired number of Tasks of that revision and the status of the circuit breaker.

DEPLOYMENT_ARN=$(aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}" \
    --query serviceDeployments[0].serviceDeploymentArn \
    --output text) 

aws ecs describe-service-deployments \
    --service-deployment-arns $DEPLOYMENT_ARN

In the output you can observe that the sourceServiceRevisions key is empty, showing that this is the first deployment of the service.

{
    "serviceDeployments": [
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/brooVKavhUZYne-zDRzvK",
            "serviceArn": "arn:aws:ecs:eu-north-1:111222333444:service/default/deployment-demo",
            "clusterArn": "arn:aws:ecs:eu-north-1:111222333444:cluster/default",
            "createdAt": "2024-11-04T11:25:56.589000+00:00",
            "startedAt": "2024-11-04T11:25:58.642000+00:00",
            "updatedAt": "2024-11-04T11:25:59.093000+00:00",
            "sourceServiceRevisions": [],
            "targetServiceRevision": {
                "arn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
                "requestedTaskCount": 0,
                "runningTaskCount": 0,
                "pendingTaskCount": 0
            },
            "status": "IN_PROGRESS",
            "deploymentConfiguration": {
                "deploymentCircuitBreaker": {
                    "enable": true,
                    "rollback": true
                },
                "maximumPercent": 200,
                "minimumHealthyPercent": 100
            },
            "deploymentCircuitBreaker": {
                "status": "MONITORING",
                "failureCount": 0,
                "threshold": 0
            },
            "alarms": {
                "status": "DISABLED"
            }
        }
    ],
    "failures": []
}

This deployment can also be seen within the console.

6. To get more information on what is being deployed, we can describe the service revision using the describeServiceRevisions API.

REVISION_ARN=$(aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}" \
    --query serviceDeployments[0].targetServiceRevisionArn \
    --output text)

aws ecs describe-service-revisions \
    --service-revision-arns $REVISION_ARN

This is a combination of the task definition revision and any service level parameters.

{
    "serviceRevisions": [
        {
            "serviceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
            "serviceArn": "arn:aws:ecs:eu-north-1:111222333444:service/default/deployment-demo",
            "clusterArn": "arn:aws:ecs:eu-north-1:111222333444:cluster/default",
            "taskDefinition": "arn:aws:ecs:eu-north-1:111222333444:task-definition/deployment-demo:1",
            "launchType": "FARGATE",
            "platformVersion": "1.4.0",
            "platformFamily": "Linux",
            "networkConfiguration": {...},
            "guardDutyEnabled": false,
            "createdAt": "2024-11-04T11:25:46.617000+00:00"
        }
    ],
    "failures": []
}

7. Before continuing, we wait for the deployment to become SUCCESSFUL. This may take a couple of minutes and need you to rerun this command.

aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}" \
    --query serviceDeployments[0].status \
    --output text

8. Next, we roll out a new service revision containing a known bad command (running the exit 1 command in bash). To do so, we first register a second task definition.

cat <<EOF >>taskdefinition_two.json
{
    "family": "deployment-demo",
    "executionRoleArn": "${ECS_EXECUTION_ROLE}",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {
            "name": "demo",
            "image": "public.ecr.aws/amazonlinux/amazonlinux:2023-minimal",
            "command": [
               "/bin/bash",
                "-c",
                "echo 'sleeping' && sleep 15 && echo 'exiting' && exit 1"
            ]
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "256",
    "memory": "512"
}
EOF

aws ecs register-task-definition \
    --cli-input-json file://taskdefinition_two.json

Then we run the update-service command, creating a new service revision and triggering a deployment.

cat <<EOF >>updateservice.json
{
    "cluster": "${ECS_CLUSTER}",
    "service": "deployment-demo",
    "desiredCount": 1,
    "taskDefinition": "deployment-demo"
}
EOF

aws ecs update-service \
    --cli-input-json file://updateservice.json

9. We can monitor the service deployments associated with the service again with the listServiceDeployments.

aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}"

There should now be two deployments attached to this service, the initial completed deployment, and a new IN_PROGRESS deployment that we have just created.

{
    "serviceDeployments": [
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/jQkgVwkt0a2bd7vdlCZeN",
            ...
            "targetServiceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8522095445164283324",
            "status": "IN_PROGRESS"
        },
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/brooVKavhUZYne-zDRzvK",
            ...
            "targetServiceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
            "status": "SUCCESSFUL"
        }
    ]
}

This list is also shown in the deployments tab in the console.

ecs_deployments_list

10. If we inspect this new deployment, then we can observe that this deployment is moving the service between two different service revisions.

DEPLOYMENT_ARN=$(aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}" \
    --query serviceDeployments[0].serviceDeploymentArn \
    --output text)

aws ecs describe-service-deployments \
    --service-deployment-arns $DEPLOYMENT_ARN

The source revision has the ID 8255880275929051605 and the target revision has the ID 8522095445164283324. Secondly we can see that the circuit breaking is in a MONITORING state as the rollout is ongoing. Finally, we can see the number of failed tasks of this deployment against a failure threshold target that will trigger a roll back.

{
    "serviceDeployments": [
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/jQkgVwkt0a2bd7vdlCZeN",
            ...
            "sourceServiceRevisions": [
                {
                    "arn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
                    "requestedTaskCount": 0,
                    "runningTaskCount": 1,
                    "pendingTaskCount": 0
                }
            ],
            "targetServiceRevision": {
                "arn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8522095445164283324",
                "requestedTaskCount": 1,
                "runningTaskCount": 1,
                "pendingTaskCount": 0
            },
            "status": "IN_PROGRESS",
            "deploymentConfiguration": {
                "deploymentCircuitBreaker": {
                    "enable": true,
                    "rollback": true
                },
                "maximumPercent": 200,
                "minimumHealthyPercent": 100
            },
            "deploymentCircuitBreaker": {
                "status": "MONITORING",
                "failureCount": 0,
                "threshold": 3
            },
            "alarms": {
                "status": "DISABLED"
            }
        }
    ],
    "failures": []
}

Within the deployments tab within the Amazon ECS console, there is a new service revision comparison view. Allowing you to visualize the differences in these revisions.

ecs_deployment_comparison

11. We can monitor this deployment by re-running the describe-service-deployments command.

aws ecs describe-service-deployments \
    --service-deployment-arns $DEPLOYMENT_ARN

Over time the failureCount under the circuit breaker key rises as each Task fails. Eventually the circuit breaker becomes TRIGGERED and the service is rolled back to a previous deployment.

{
    "serviceDeployments": [
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/jQkgVwkt0a2bd7vdlCZeN",
            ...
            "sourceServiceRevisions": [
                {
                    "arn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
                    "requestedTaskCount": 1,
                    "runningTaskCount": 1,
                    "pendingTaskCount": 0
                }
            ],
            "targetServiceRevision": {
                "arn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8522095445164283324",
                "requestedTaskCount": 0,
                "runningTaskCount": 0,
                "pendingTaskCount": 0
            },
            "status": "ROLLBACK_SUCCESSFUL",
            "statusReason": "Service deployment rolled back because the circuit breaker threshold was exceeded.",
            "deploymentConfiguration": {
                "deploymentCircuitBreaker": {
                    "enable": true,
                    "rollback": true
                },
                "maximumPercent": 200,
                "minimumHealthyPercent": 100
            },
            "rollback": {
                "reason": "Service deployment rolled back because the circuit breaker threshold was exceeded.",
                "startedAt": "2024-11-04T11:51:20.200000+00:00",
                "serviceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605"
            },
            "deploymentCircuitBreaker": {
                "status": "TRIGGERED",
                "failureCount": 3,
                "threshold": 3
            },
            "alarms": {
                "status": "DISABLED"
            }
        }
    ],
    "failures": []
}

12. Finally, we can use the listServiceDeployments API again to observe a summary of all of the service deployments.

aws ecs list-service-deployments \
    --service deployment-demo \
    --cluster "${ECS_CLUSTER}"

Both the initial deployment and the failed deployment will be shown in the output.

{
    "serviceDeployments": [
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/jQkgVwkt0a2bd7vdlCZeN",
            ...
            "targetServiceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8522095445164283324",
            "status": "ROLLBACK_SUCCESSFUL",
            "statusReason": "Service deployment rolled back because the circuit breaker threshold was exceeded."
        },
        {
            "serviceDeploymentArn": "arn:aws:ecs:eu-north-1:111222333444:service-deployment/default/deployment-demo/brooVKavhUZYne-zDRzvK",
            ...
            "targetServiceRevisionArn": "arn:aws:ecs:eu-north-1:111222333444:service-revision/default/deployment-demo/8255880275929051605",
            "status": "SUCCESSFUL"
        }
    ]
}

Cleaning up

To remove the service and task definitions used in this walkthrough, you can use the following commands:

aws ecs delete-service \
    --cluster "${ECS_CLUSTER}" \
    --service "deployment-demo" \
    --force

TASK_DEFS=$(aws ecs list-task-definitions \
    --family-prefix "deployment-demo" \
    --query taskDefinitionArns \
    --output text)

for TASK_DEF in $TASK_DEFS
do
    aws ecs deregister-task-definition --task-definition "${TASK_DEF}"
done

Conclusion

In this post, we’ve explored the new service deployment and service revision features of Amazon ECS. These capabilities provide enhanced visibility and audibility into your software release process, empowering you to deploy with greater confidence. By using service deployments and service revisions, you can now gain deeper insights into the status of ongoing rollouts, troubleshoot issues more effectively, and review the history of past deployments. This level of observability helps create a more reliable and transparent software release lifecycle.

To learn more about the full capabilities of Amazon ECS, make sure to check out the comprehensive Amazon ECS documentation. For a hands-on experience, consider exploring the Amazon ECS workshop, which offers practical guidance and examples.