Deployment patterns for the AWS Distro for OpenTelemetry Collector with Amazon Elastic Container Service
The AWS Distro for OpenTelemetry (ADOT) is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. Cloud-native, distributed technology stacks are now the norm, but these architectures introduce operational challenges, which have led to the rise of observability. Several different patterns can be used for deploying ADOT for observability, and this blog post will describe the major patterns along with the pros and cons of those approaches.
The ADOT website provides excellent documentation on how to set up the AWS Distro for OpenTelemetry Collector in Amazon Elastic Container Service (Amazon ECS). The OpenTelemetry Collector includes components for exporting data to a variety of data sources, including Prometheus, AWS X-Ray, and Amazon CloudWatch, to name a few. The AWS Distro for OpenTelemetry Collector (AWS OTel Collector) is an AWS-supported version of the upstream OpenTelemetry Collector, which is distributed and supported by AWS. This component enables you to send telemetry data to Amazon CloudWatch and other supported backends, including partner ISV solutions.
Several other blog posts on ADOT are available, which cover topics such as sending metrics and traces to partner applications, migrating X-Ray tracing to AWS Distro for OpenTelemetry, and options for managing ADOT with AWS Systems Manager Distributor.
Regardless of how you’re using ADOT, when you run container-based workloads, you must decide how to deploy it. Those architectural decisions can have implications for your workload, especially as it begins to scale.
The sidecar pattern
The sidecar deployment pattern has been embraced by engineers to reduce a microservice’s scope of responsibility. In a sidecar pattern, a companion service runs next to your primary microservice. The primary microservice runs in the application container and contains the core logic for the microservice. The sidecar container augments the primary application container—in many cases without the primary container’s knowledge.
A common practice in the observability world is to use sidecars to provide container instrumentation. In the Amazon ECS world, sidecars are used to scrape Prometheus metrics into CloudWatch, instrument applications to use AWS X-Ray, and send metrics to Amazon Managed Service for Prometheus (AMP). The ADOT website shows how to configure the AWS OTel Collector to scrape metrics on an Amazon Elastic Container Service cluster and send those metrics to AMP using the sidecar pattern.
A main advantage of the sidecar pattern is that it is simple to configure and troubleshoot. Because each Amazon ECS task contains both the application and sidecar container, no service discovery is needed. If a task instance is not working correctly, the entire task can be shut down and recreated.
One thing that often gets overlooked when using the sidecar pattern is that the number of containers you must manage is at least
2 x Application Tasks. This is because the sidecar pattern requires a container sidecar for each Amazon ECS task that you have running. For example, having five running tasks that each require a sidecar, for a total of 10 running containers, doesn’t seem like a big deal. As workloads begin to scale, however, having 1,000 application tasks that each require a sidecar now means that engineers are managing at least 2,000 containers.
The sidecar pattern introduces an additional challenge in that the application container and the sidecar are coupled. If an update must be applied to a sidecar container, engineers must redeploy both the application and sidecar containers. Although teams can easily redeploy Amazon ECS tasks using blue/green deployment patterns, this architecture does create additional complexity that engineering teams must work around to avoid outages.
Furthermore, the sidecar pattern may incur additional costs. If using Amazon ECS, you will be using more resources and may need to provision additional EC2 instances to support your workload. For AWS Fargate for Amazon ECS, pricing is based on the vCPU and memory requirements of your task.
That said, the sidecar pattern does give you the most visibility into the state of your workload, and the AWS OTel Collector provides the most functionality when deployed via this pattern. For example, you can allow the collection of telemetry data that is specific to a particular application container.
Amazon ECS service pattern
The Amazon ECS service deployment pattern is similar to the DaemonSet pattern in Kubernetes. An Amazon ECS service allows you to run and maintain a specified number of instances of a task definition simultaneously in an Amazon ECS cluster. If any of the tasks fail, the Amazon ECS service scheduler launches another task instance to replace the failed task. In the ECS service pattern, each application container runs by itself in a task—without a sidecar container. Meanwhile, a separate Amazon ECS service within the Amazon ECS cluster runs the instrumentation container discussed in the previous section.
The Amazon ECS service pattern offers advantages over the sidecar pattern in that the application and the instrumentation services are no longer tightly coupled. Thus, if there is an update to the instrumentation container, the upgrade of that container can be done independently of any updates to the application container. Furthermore, compute costs are reduced because the number of instrumentation containers no longer has a 1:1 relationship with the application containers.
This pattern, however, is not without its challenges. Specifically, you must use Amazon ECS Service Discovery so that the containers within each of the services know about each other. AWS provides several examples of how service discovery works in Amazon ECS, including a blog post on how to scrape metrics to send off to Amazon Managed Service for Prometheus.
When you create a new Amazon ECS service, you have the option of enabling Amazon ECS Service Discovery for your service. Behind the scenes, when an Amazon ECS task spins up or down, it automatically registers with AWS Cloud Map. AWS Cloud Map in turn allows you to create namespaces for services, allowing you to group services logically together. Using service names in configuration, your services and containers can automatically route to the correct endpoint.
If you need more granular routing, then AWS App Mesh is fully compatible with Amazon ECS. AWS App Mesh allows more control over routing than you get with typical service discovery. For example, suppose you have version1 of a service and you want to deploy version2 to your cluster, but you only want to send 25 percent of all traffic to the new service until you are confident it is working properly. This type of routing configuration can be completed via AWS App Mesh without needing to change any application code or registered service names.
Although service discovery is definitely an additional challenge in configuring the service pattern, it’s worth noting that when you run the OTel Collector independently from the application, the Collector will not have visibility into application-specific state. Because of this, the OTel team generally recommends deploying the Collector as a sidecar, so you can benefit from all the functionality the Collector provides.
The AWS OpenDistro for Telemetry provides developers with a powerful mechanism to instrument applications. Traces and metrics can be sent to AWS services such as AMP, Amazon Managed Service for Grafana (AMG), or Amazon CloudWatch. Because of the open nature that ADOT and the OpenTelemetry project provide, you can also send telemetry data to other third-party ISVs. This flexibility gives developers many choices in building an observability stack.
As described in this blog post, there are two main patterns for deploying ADOT and the AWS OpenTelemetry Collector. The pattern you select has implications for how you deploy and manage your workload, as both the sidecar and ECS service patterns have advantages and disadvantages.