Implement custom service discovery for Amazon ECS Anywhere tasks
Amazon Elastic Container Service (Amazon ECS) is a managed container orchestration service offered by AWS. It simplifies the deployment, management, and scalability of containerized applications using Amazon ECS task definitions through the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS Software Development Kits (AWS SDKs).
Customers who require running containerized workloads, both on AWS and on-premises, often encounter challenges due to inconsistent tooling and deployment experiences across environments. Factors such as data gravity, compliance, and latency requirements contribute to these challenges.
To address these issues, Amazon ECS Anywhere extends the capabilities of Amazon ECS, which enables the deployment and management of containerized applications on-premises or in edge locations using a unified container orchestration platform. Amazon ECS Anywhere allows you to utilize the same Amazon ECS Application Programming Interfaces (APIs) and tools for deploying and managing container workloads on physical servers, virtual machines, or even Raspberry Pis, which ensures a consistent deployment experience across all environments.
Technical challenges of Amazon ECS Anywhere
One limitation of Amazon ECS Anywhere is the lack of native support for load balancing and service discovery. This limitation restricts the use case of deploying customer-facing applications with Amazon ECS Anywhere and dynamically scaling the workload similar to what is possible on AWS.
While there are some solutions for load balancing with third-party tools (e.g., F5 and Inlets), it’s also possible to simplify your architecture by using AWS services. This approach minimizes external dependencies, such as vendor licenses and product-specific configurations. In this post, we’ll utilize the AWS Application Load Balancer (ALB) for our solution.
The main challenge in implementing the solution lies in keeping the Amazon ECS service status synchronized with the instance targets behind the load balancer. This means that whenever changes occur in Amazon ECS services, the Amazon ECS tasks should automatically be registered or deregistered from the target group behind the ALB in response to those scaling events. The good news is that we can capture these events from the Amazon EventBridge event bus and utilize an AWS Lambda function with the AWS SDK to adjust the ALB targets accordingly.
There are two Amazon Virtual Private Cloud (VPC) in this demonstration:
- The first VPC, OnPremVPC, simulates the on-premises environment, where three Linux EC2 instances that run Ubuntu would be provisioned and the Amazon ECS Anywhere agent would be installed. Those simulate the on-premises virtual machines. In the same VPC, there are also another three Linux EC2 instances (also on Ubuntu) that run open-source HTTP proxy (Squid) for the outbound HTTPS access, which is required for Amazon ECS Anywhere agents. An internet-facing ALB would also be provisioned to simulate the on-premises Load Balancer.
- The second VPC, LambdaVPC, mainly hosts the AWS Lambda function, where it consumes the Amazon ECS Task State Change events stored in Amazon SQS queue, which are fired by Amazon EventBridge. The AWS Lambda function register or deregisters the IP targets for the ALB target group whenever those Amazon ECS tasks change their Host IP and Port. The VPC Peering between OnPremVPC and LambdaVPC is actually NOT required because the AWS Lambda function can register or deregister IP and Port for ALB target group through the VPC endpoint in the same VPC. The VPC Peering is provisioned in this demonstration to highlight or emphasize the typical dependency between LambdaVPC and on-premises network. For example, for the case that the AWS Lambda function needs to update the Load Balancer in on-premises (e.g., BIG-IP), then the connectivity to on-premises network (e.g., AWS Site-to-Site VPN or AWS Direct Connect) is required.
The following diagram describes the solution architecture of this post:
Diagram 1 – Architecture Diagram for Custom Service Discovery for ECS Anywhere Tasks
- Squid HTTP proxy in public subnets – There are both private subnets and public subnets in OnPremVPC. The Linux EC2 instances running Amazon ECS Anywhere agent are placed in private subnets without direct internet access. This simulates the typical on-premises environment with lockdown on internet access. Outbound HTTPS are proxied by an open-source proxy (i.e., Squid) running in other Linux EC2 instances, which are hosted in public subnets. There is an internal Network Load Balancer (NLB) for load balancing HTTP proxy request for Linux EC2 instances running HTTP proxy (Squid).
- SSM parameters for Amazon ECS Anywhere Activation ID and Activation Code – Registration of Amazon ECS Anywhere agents require Activation ID and Activation Code. For details of the registration, see Registering an external instance to a cluster. In this solution, both Activation ID and Activation Code are retrieved from the Amazon ECS Control Plane and are persisted in the Parameter Store, which is a capability of AWS Systems Manager (AWS SSM). This facilitates the automation in AWS CloudFormation for the registration of Amazon ECS Anywhere agents in Linux EC2 instances.
- Amazon ECS Anywhere Agent in Private subnets managed by Amazon ECS Control Plane – Amazon ECS Anywhere agent is placed in private subnets, with HTTP proxy configuration pointing to the Domain Name System (DNS) name of the internal NLB of HTTP proxy. Sample Amazon ECS tasks and services are deployed, where containers are managed and run in the Linux EC2 instance. The Amazon ECS Control Plane is responsible for the orchestration of evenly distributing containers for Amazon ECS tasks evenly distributed among the three Linux EC2 instances. NetworkMode is set to bridge with HostPort set to 0 in the Amazon ECS task definition, which means that containers of Amazon ECS tasks are assigned a host port from the range of 32768 – 61000 on-demand.
- Amazon ECS Task State Change events fired by Amazon ECS Control Plane – Since the Amazon ECS Control Plane is responsible for the task orchestration, whenever containers of Amazon ECS tasks are launched and are destroyed or relocated in the Linux EC2 instances, they initiate the Amazon ECS Task State Change events. Those events are delivered through Amazon EventBridge event bus.
- AWS Lambda function process the events in batch mode – An Amazon SQS is configured as the consumer of the Amazon EventBridge event bus for the Amazon ECS Task State Change events, and the AWS Lambda function consumes those events. BatchSize and MaximumBatchingWindowInSeconds are configured in the AWS Lambda Event Source Mapping against the Amazon SQS queue. This enables the batch processing of those events which eliminates the chatty invocation of the AWS Lambda function — and thus avoids the frequent Host IP and Port update against the Load Balancer.
- AWS Lambda function retrieves Amazon ECS tasks Information through the VPC endpoints – The AWS Lambda function, once a batch of events arrives, retrieves the Host Instance ID and Port information of every relevant Amazon ECS task, through the Amazon ECS VPC endpoint. To resolve the Host IP (which is a private IP), AWS Lambda retrieves the information through the AWS SSM VPC endpoint, by providing the Host Instance ID. Remember that the Amazon ECS Anywhere agent actually depends on the SSM agent. For the details this dependency, see External instances (Amazon ECS Anywhere).
- AWS Lambda function update ALB target groups through the VPC endpoints – After the Host IP and Port information are gathered for the relevant Amazon ECS tasks, the AWS Lambda function registers or deregisters the IP targets against the ALB target group. Those IP targets represent the Host IP and Port for the corresponding containers of Amazon ECS tasks running in Linux EC2 instances. The AWS Lambda function uses the Elastic Load Balancing (ELB) VPC endpoint for those updates.
- ALB Dispatch HTTP requests based on the up-to-date target group Information – Finally, with the up-to-date IP targets from the ALB target group, HTTP requests launched at the internet-facing ALB Listener are dispatched to the corresponding containers of Amazon ECS tasks accordingly.
There are three AWS CloudFormation templates in total that you deploy in Steps 1 to 3 below. Those steps help to provision the required AWS components for the solution in this post. The last one, Step 4, include commands to update Amazon ECS service desiredCount manually, and this help us to observe how the targets in ALB target groups are registered automatically with latest Host IP and Port information.
All the provisioning commands, verification, and post-configuration commands are also put in a markdown, all-commands.md, in the source code repository for easier reference. Some sample outputs of commands are trimmed with … (not the full version) to facilitate the reading of this post. For the full version of command outputs, refer to the markdown, all-outputs.md, in the source code repository.
To provision this solution, you need to have the following prerequisites:
- A Git client to clone the source code in a repository, and an AWS Identity and Access Management (AWS IAM) user with Git credentials.
- An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
- The latest version of the AWS Command Line Interface (AWS CLI). For more information, see installing, updating, and uninstalling the AWS CLI.
- A Linux bash shell, with jq (i.e., command line tool for JSON processing) installed. It is mainly for the filtering and formatting output from AWS CLI, for easier reference.
With the prerequisites ready, clone the source code repository of this post to a local directory:
Step 1 – Provision the Amazon ECS cluster, VPCs/Subnets, Amazon EC2 Launch Template, and ALB
Execute the following AWS CLI command to deploy the first AWS CloudFormation template, ecsa-svc-disc-1-ecs-vpc-ec2-alb.yml:
The AWS CloudFormation parameter, SecurityGroupIngressAllowedCidrParameter, controls the IP range that can access the SSH Port (22) of the HTTP proxy, as well as the HTTP Port (8080-8082) of the ALB. Instead of specifying 0.0.0.0/0, its recommended to use a more specific Public IP range specific to your testing clients.
The AWS CloudFormation template will:
- Provision the Amazon ECS cluster, ECSA-Demo-Cluster, for this demonstration. The Activation ID and Activation Code are retrieved and are persisted in the Parameter Store, by using the AWS CloudFormation custom resource (LambdaSSMActivationInvoke).
- Provision the VPC, Subnets, Security Groups and VPC Peering for OnPremVPC and LambdaVPC.
- Provision the Auto Scaling group (ASG), and the Amazon EC2 launch template for Linux EC2 instances (Ubuntu) for both the Amazon ECS Anywhere agent and the HTTP proxy.
- For both ASGs, the LaunchTemplateData section contains the UserData property for the required initialization for the Linux EC2 instances. For HTTP proxy, that’s the installation and required setup of Squid. For the Amazon ECS Anywhere agent, that’s the installation of the agent, as well as the registration using the generated Activation ID and Activation Code. The initialization for Amazon ECS Anywhere agent also includes the required configuration of HTTP Proxy for outbound internet access.
- The DesiredCapacity of launch template of both ASG is set to 3. Linux EC2 instances of HTTP proxy completes the initialization before those for Amazon ECS Anywhere agents, which is implemented by using WaitConditionHandle and WaitConditon in the AWS CloudFormation template.
- Provision the Amazon EC2 Key Pair for Linux EC2 instances for both HTTP proxy and Amazon ECS Anywhere agent. The private key of the Amazon EC2 Key Pair would be saved in SSM Parameter, /ec2/keypair/<Key Pair ID>.
- Provision the ALB Listener and ALB target groups. The target type of ALB target groups is set to IP, and initially there would be no registered targets. The AWS Lambda function (to be provisioned later) will register or deregister against those ALB target groups in Step 4.
Step 2 – Provision the Amazon ECS task definitions and services
Execute the following AWS CLI command to deploy the second AWS CloudFormation template, ecsa-svc-disc-2-ecs-service-task.yml:
The AWS CloudFormation template will:
- Provision the following Amazon ECS task definitions and services, with different Initial Desired Count.
Amazon ECS service
Amazon ECS task defintions
Initial Desired Count
- Amazon ECS task definition, DemoApp2, has two containers: container1 and container2, which are used later to demonstrate how ALB perform load-balancing to those containers by using two different frontend ports.
- For all the three containers in the above two task definitions, the same container image, ecr.aws/aws-containers/ecsdemo-nodejs:latest, is used. This container is a sample nodejs application, which shows a helloworld page printing the Host IP and Port information.
Step 3 – Provision the Amazon EventBridge event bus, Amazon SQS queue, and AWS Lambda function
Execute the following AWS CLI command to deploy the third AWS CloudFormation template, ecsa-svc-disc-3-sqs-lambda.yml:
The AWS CloudFormation template will:
- Deploy the Amazon SQS queue and Amazon EventBridge event bus
- Deploy the AWS Lambda function, ECSA-Demo-Cluster-Lambda-ProcessEvent, for processing the Amazon ECS Task State Change events
- Deploy the required VPC endpoints for the AWS Lambda function
The AWS CloudFormation template only deploys the setting of the AWS Lambda function without its main code. To deploy the main code, execute the following after the completion of the AWS CloudFormation deployment.
A mapping is required to link the Amazon ECS service with the ALB target group, so that the AWS Lambda function knows which target to update for the change of the Amazon ECS tasks’ Host IP and Port. We’ll achieve this by using the tag, ecs-a.lbName, associated with the Amazon ECS service.
Execute the following command to set the tag, ecs-a.lbName, for the two Amazon ECS services:
The tag, ecs-a.lbName, of Service-DemoApp1 is set to the Amazon Resource Name (ARN) of an ALB target group because there is only one container in its task definition. For Service-DemoApp2, it is set to the ARNs of the two ALB target groups because there are two containers in its task definition.
Amazon ECS service
Amazon ECS task defintions
Updated Desired Count
ALB target group
|Service-DemoApp1||DemoApp1||1 → 2||container0||ECSA-Demo-Cluster-TargetGroup-0|
|Service-DemoApp2||DemoApp2||3 → 6||container1
Verification and post-configuration
Execute the following to see the registered targets (Host IP and Port) of the ALB target group:
In the previous Sample output, be aware that there are NO targets shown under the Target Group Health section. It is expected, because there hasn’t been any Amazon ECS Task State Change event fired after the AWS Lambda function was provisioned. Thus, the ALB target groups didn’t get updated and is keeping its initial state of NO targets.
Step 4 – Update Amazon ECS service desiredCount and observe the registered targets in ALB target groups
Execute the following AWS CLI command to:
- Update the Desired Count of Service-DemoApp1 from 1 to 2
- Update the Desired Count of Service-DemoApp2 from 3 to 6
For the last command of aws ecs describe-services below, execute it a few times with a few seconds delay for each run, until the value of runningCount reaches the value of desiredCount.
Sample output (trimmed):
Verification and post-configuration
Wait for a minute for the Batch Window to be expired, so that the AWS Lambda function starts processing the events. Execute ecsa-svc-disc-show-tg-health.sh again, and verify the Targets (Host IP and Port) are registered successfully on the three ALB target groups:
Sample outputs (trimmed):
The URL section from the output of previous command shows the DNS name of the ALB for the three containers, running in these two Amazon ECS services. Make sure the SecurityGroupIngressAllowedCidrParameter, which you provided as the parameter for AWS CloudFormation template in Step 1, covers the Public IP range of your testing clients, before you execute the curl command below.
Execute the curl command to see if the ALB Listeners can dispatch the requests to the underlying Amazon ECS tasks. It may take a minute for the new targets to be effective in ALB, so please run the curl command multiple times (with a few seconds delay for each run), if you receive errors on the curl command initially.
The second line of the HTTP content:
indicates the information about ECS Service Name | Container Name | Host IP and Port.
Execute the curl command multiple times again. It is expected the Host IP and Port in the second line of HTTP Content may change. If the Host IP and Port has changed, this indicates how load balancing in the ALB works because HTTP requests has been dispatched to different containers (and thus showing different Host IP and Port).
Highlight of required modification for on-premises Load Balancer
For demonstration purpose, this post uses the ALB, which simulates on-premises Load Balancer, for the custom service discovery solution to the Amazon ECS Anywhere. This solution is flexible, so you can change the sample code a bit for your on-premises load balancer (as long as your on-premises load balancer provide API to change its member IP and port on-demand).
The following provide some high-level directions for the required modification.
- For each Amazon ECS service running Amazon ECS Anywhere Tasks, you need to add a tag, ecs-a.lbName, where the value would be the identifier of the on-premises Load Balancer.
- For the AWS Lambda function, ECSA-Demo-Cluster-Lambda-ProcessEvent, update the index.mjs to align with the following code:
- Create a new file lb-*.mjs (e.g., lb-your-onprem-lb.mjs) for the AWS Lambda function and provide your implementation. You can refer to the lb-alb.mjs for references.
Function to Override
· Get the identifier of the on-premises Load Balancer from the input parameter lbInfo.lbName
· Call the on-premises Load Balancer API to get the current member IP and port, and return those information
· From the input parameter, targetLbInfo, get the target Host IP and Port, which are the information retrieved from Amazon ECS Control Plane for the current up-to-date information of Host IP and Port
· Call getCurrentLoadBalancingInfo to get the current member IP and port from the on-premises Load Balancer API
· Compare the difference of previous two items, and return the changeLbInfo – the list of member IP and port to Add or Remove
· From the input parameter, changeLbInfo, get the list of member IP and port to Add or Remove
· Add or Remove the member IP and port, by call the on-premises Load Balancer API
Before you delete the Amazon ECS cluster, you need to de-register the Container Instances for Amazon ECS Anywhere. The first AWS CloudFormation template, ecsa-svc-disc-1-ecs-vpc-ec2-alb.yml, contains an AWS CloudFormation Custom Resource (LambdaECSACleanupInvoke), which use an AWS Lambda function to perform the de-registration for you automatically when the AWS CloudFormation stack is deleted.
Thus to avoid incurring future charges, just delete the following AWS CloudFormation stacks by executing the following command:
In this post, we showed you how to use the ALB for the Amazon ECS Anywhere services discovery, which has not been natively supported by Amazon ECS Anywhere at the time of publishing this post. The solution was implemented by capturing Amazon ECS Task State Change events using the Amazon EventBridge event bus and storing them in Amazon SQS queue. An AWS Lambda function was then configured to asynchronously update the ALB targets by comparing the current state of the ALB with the Amazon ECS service targets.
We have also addressed the possibility of implementing a similar solution with on-premise load balancing solutions. In such cases, you can make minimal changes to the AWS Lambda function code using your on-premises load balancer API/SDK. The overall architecture remains almost the same.
We hope that this post provided our customers with a reference architecture pattern for implementing service discovery for workloads on Amazon ECS Anywhere. To quickly refer to all the commands shown in this post, you can refer to the markdown in this github repo. The markdown also contains additional verification commands that help you to understand more about the setup of Amazon ECS Anywhere. To learn more from Amazon ECS Anywhere, you can also have a try on this workshop.