AWS Compute Blog

Service Discovery for Amazon ECS Using DNS

My colleagues Peter Dalbhanjan and Javier Ros sent a nice guest post that describes DNS-based service discovery for Amazon ECS.

——

Containers are generating a lot of interest due to benefits such as portability and speed of deployment. Containers are also a good fit for microservices as they offer a thin, modular, self-contained environment that enables rapid innovation in the lifecycle of an application.

One requirement for the microservices design pattern is a reliable framework to describe the relationship between different microservices. For example, if a portal application needs to communicate with a weather service to provide data, it needs to know how to connect with that service. This underlying framework is defined as service discovery.

Service discovery offers features such as the following:

  • application reachability (where is my application, how can I reach it)
  • health checks (is this application healthy)
  • updates (how do I know when a new application comes online)
  • metadata for application configuration such as environment variables

There are several third-party approaches such as Consul.io, Weave, Netflix Eureka but running a third-party tool to manage service discovery presents its own set of challenges and increases operational complexity.

Recently, we proposed a reference architecture for ELB-based service discovery that uses Amazon CloudWatch Events and AWS Lambda to register the service in Amazon Route 53 and uses Elastic Load Balancing functionality to perform health checks and manage request routing. An ELB-based service discovery solution works well for most services, but some services do not need a load balancer.

This post focuses on using a DNS-based approach for service discovery that leverages Amazon Route 53 private hosted zones for the remainder of use cases that don’t require a load balancer. By running a simple agent (ecssd_agent.go) on Amazon ECS container instances, customers do not need to worry about the management and maintenance of service discovery component. The agent receives all Docker events natively and registers the service into Route 53 private hosted zones.

As containers start and stop, the agent updates Route 53 DNS records. This solution also works if customers use their own third-party schedulers on top of ECS, such as Apache Mesos or Marathon. The agent is written in Go, has a minimal footprint, and is available on GitHub under the Apache license. We encourage customers to modify the code as needed, and provide feedback.

Architecture


In this architecture, there are two ECS container instances running in an ECS cluster with ecssd_agent.go running in the background. This agent is started automatically using upstart configured with the ecssd_agent.conf startup script. The agent listens to Docker events natively; it registers the service name and each task’s metadata, such as IP address and ports info, into the Route 53 private hosted zone. The agent also deregisters the service and its SRV record as soon as the Docker container is stopped, so it detects failures as fast as they happen.

Clients can access other applications via environment variables defined in the container configuration. Each service that you want available for service discovery requires an environment variable in the task definition. The name of the variable should be SERVICE_<port>_NAME, where is the port where your service is going to listen inside the container, and the value is the name of the microservice.

Below is the task definition of the Calc service:

"CalcDefinition": {
            "Type" : "AWS::ECS::TaskDefinition",
            "DependsOn": "DockerBuildWaitCondition",
            "Properties" : {
                "ContainerDefinitions": [
                    {
                        "Name": "calc-service",
                        "Image": { "Fn::Join" : ["", [
                            { "Ref" : "AWS::AccountId" }, ".dkr.ecr.", { "Ref": "AWS::Region" }, ".amazonaws.com/calc-demo-service:latest"
                        ]]},
                        "Cpu": "100",
                        "Memory": "100",
                        "PortMappings": [
                            {
                                "ContainerPort": 8081
                            }
                        ],
                        "Essential": true,
                        "Environment": [
                            {
                                "Name": "CALC_USERNAME",
                                "Value": "admin"
                            },
                            {
                                "Name": "CALC_PASSWORD",
                                "Value": "password"
                            },
                            {
                                "Name": "SERVICE_8081_NAME",
                                "Value": "calc"
                            }
                        ]
                    }
                ]
            }
        },

Specifying SERVICE_8081_NAME under Environment variables registers calc-service as a Route 53 SRV record. If you run a DNS query with the service name (calc-service), Route 53 responds with the IP address and port number associated with the SRV record. If a record has more than one value, Route 53 responds with a different response based on the built-in, round-robin algorithm.

You can specify the port in the portMappings property of your task definition. In this property, you can specify the port for the container (the port where the application is listening) and the port for the host. We recommend leaving the host port to be assigned dynamically so that you can launch more than one task of the same type per server.

To respond to container instance failures and unhealthy containers, customers can use the Lambda function (lambda_health_check.py) to remove records from Route 53. You can schedule the function to run every five minutes.

If there is an EC2 instance failure, the Lambda function recognizes the failure the next time it runs and performs a cleanup of the associated Route 53 records. For health checks, the function reads all the SRV records in Route 53 and performs HTTP Get against those containers (http://serverip:port/health). If the response code is different from 200, then it stops the ECS task and removes the associated Route 53 record. This is just an example of how to perform health checks; customers can extend this capability to perform custom health checks as needed.

This is a simple seamless implementation of service discovery for Docker containers in AWS. Customers can run any number of services without managing the complexity of running service discovery component.

Template implementation

You could leverage the CloudFormation template to build the infrastructure and visualize the solution in action. Here are the details of the CloudFormation template.

Service_Discovery_Using_DNS_Base.template

  • Creates a VPC with two subnets, route tables, Internet gateway, and security groups
  • Creates IAM roles
  • Creates an ECS cluster
  • Creates an Auto Scaling group and launches the configuration for the ECS cluster
  • Creates an Amazon Route 53 private hosted zone
  • Installs ecssd_agent on the container instances
  • Creates ECR repositories for the microservices applications (Portal, Time, Calc)
  • Builds the microservices applications (Portal, Time, Calc) and push the images to ECR
  • Registers task definitions and runs the microservices applications

CloudFormation builds the microservices application containers using an EC2 instance with a ‘docker builder’ tag. This EC2 instance downloads the microservices application code from the GitHub repo, builds the applications, and pushes the images to their corresponding ECR repositories. The permission required to push ECR images is granted by the AmazonEC2ContainerRegistryPowerUser IAM policy. For updating Route 53 records, all the EC2 instances need to have the AmazonRoute53FullAccess IAM policy.

The microservices application is made of three containers: Portal, Time, and Calculator apps. After CloudFormation completes the deployment, you can choose Outputs, PortalURL to see some examples of service discovery:

  • Portal: A front-end web service to the other two microservices applications. Review the source code for portal (portal.go) to see how it references other two microservice endpoints and use DNS for communication.
  • Time: A simple time service that returns the current date and time in the required format. Specify the format using the go standard. For example, you can write “2 Jan”, “15:04”, or “Jan -> 15”. You can use any combination of time in a string format. Enter “15:04 Jan 2” and choose Add to receive an appropriate response.
  • Calc: A simple calculator that offers addition, subtraction, and multiplication. Enter (2+6)*3 and receive a response with calculated results.

To take this further, stop either a Time or Calc container. You will see that the Route 53 record associated with the container gets deleted as soon as the container stops. Similarly, when the ECS service kicks in a new container, a new Route53 record is created automatically.

Cleanup

To clean up, delete the ECR repositories and Route 53 private hosted zone first. After this, deleting the CloudFormation stack deletes all the components involved in the implementation.
To keep the infrastructure in place for further testing, you can just delete the EC2 builder (with the ‘docker builder’ tag) as it is only responsible for creating and pushing Docker images to ECR.

Conclusion

Service discovery is a key component of a microservice architecture. By installing a simple agent on EC2 container instances, customers can take advantage of running service discovery via Route 53 DNS with less hassle and a worry-free implementation. You don’t need to maintain additional infrastructure or worry about added costs for running a service discovery solution.

If you have questions or suggestions, please comment below.