Distributed tracing with OpenTelemetry
These days, more and more systems deploy as a set of services using containers. You may already be using services like Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS) for quickly getting started with container workloads. Separating out services enables separation of concerns that can enable teams to operate independently with greater velocity. While getting containers up and running quickly is possible, there is a loss in observability when scaling server operations, especially when all of them are integral in handling an individual request.
How can we keep an eye on all the services that a request went through? This is where distributed tracing comes in. Tracing enables linking processing together between the services that handle a request, even as it goes across network boundaries between containers. To dive deep into distributed tracing, read the Dapper paper, which introduces the concept of tracing and forms the base for current systems.
Many solutions for tracing are available on the market. You may have seen AWS X-Ray, our fully managed service for tracing. Alternatively, perhaps you already have in-house tracing infrastructure set up with Zipkin. Maybe you are running with a full APM vendor such as Datadog. There is no one-size-fits-all solution to tracing—one of these solutions may fit your current workload better than others. Our goal is that AWS customers can get a great experience from any tracing solution. We want to support you in facilitating a reliable service for your own users. To that end, we are working with OpenTelemetry, a popular open source observability project, to bring you a one-stop shop for your AWS tracing needs.
How OpenTelemetry works with tracing
Instrumentation is the process of collecting telemetry information from an application. For example, when Spring is instrumented, information about timings and HTTP request information of a server request can be sent to tracing backends. Similarly, when Java JDBC is instrumented, information about timings and queries to an SQL database can be sent. Instrumentation is a key component of observability. Without it, monitoring systems have no data to work with to help debug issues.
Until now, each tracing solution providing their own way of instrumenting an application was common. This resulted in variance in the coverage of instrumentations. X-Ray users know that the coverage of our SDKs has not been as high as it should be. Additionally, switching solutions to compare them is difficult, and using multiple solutions simultaneously often makes sense.
Enter OpenTelemetry. OpenTelemetry is a collaborative effort by tracing solution providers to offer a common ground for instrumentation. Instead of each provider having their own solution for instrumentation, they support the common format defined by OpenTelemetry. At its core, OpenTelemetry is a specification describing how instrumentation should behave to provide consistent behavior among them. There are also language-specific OpenTelemetry projects, such as Java or Python. These projects create an instrumentation framework—the OpenTelemetry SDK—along with integrations with popular libraries, such as Spring, gRPC, or Flask. Apply instrumentation once, and you have access to tracing providers that fit your needs.
We want to make sure that customers have a great experience when using OpenTelemetry. We are committed to working upstream with OpenTelemetry to make sure that users get a full experience on AWS. This includes support for detection of AWS metadata, for example, an EC2 instance ID, or an Amazon ECS cluster name. OpenTelemetry fully supports the AWS Tracing header, which propagates information about a request between AWS Managed Services. This enables us to see a complete trace, even when the request passes through managed services such as Amazon API Gateway and AWS Lambda. We’ve packaged together AWS Distro for OpenTelemetry, which is open source with an Apache 2.0 License and comes preconfigured to enable these components and export data to AWS X-Ray and Amazon CloudWatch.
Getting started with OpenTelemetry and Java
The easiest way to get started with OpenTelemetry and a Java application is to use the OpenTelemetry Java Agent. The Java Agent is an independent Java binary that automatically finds usages of supported libraries in an application and instruments them with OpenTelemetry. No code change needed. To register it, start the application and pass the
-javaagent flag to the JVM.
If you have a JAR application already, use that. If not, let’s use the Spring PetClinic application as an example.
First, build the application:
Then, download the latest OpenTelemetry Java Agent:
We use Zipkin as the backend for this demo. Start it up in a separate terminal with Docker:
Now, let’s start the application. Note that we must set the
OTEL_RESOURCE_ATTRIBUTES variable to give a name for our service (the node within the larger set of services handling a request). We’ve set the exporter to Zipkin for this demo:
http://localhost:8080/ in a browser. Try navigating to pages, for example Find Owners.
After playing with the UI, open up the Zipkin UI at
http://localhost:9411/. Pressing the Run Query button will show us many traces for this application. Open up a trace, and we are able to see details for it. The Java agent instruments all this automatically so we don’t have to.
If we’re interested in seeing things in action for a complex setup with many backends, we have a playground application. If we send it a request and open up X-Ray, we’ll see a trace like this:
Spring, gRPC, MySQL, Redis, the AWS SDK, and others show up, all automatically instrumented by the agent. We can see a list of supported instrumentation on the opentelemetry-java-instrumentation repository. If your favorite framework is missing, let us know or ask for a pull request.
AWS Distro for OpenTelemetry would not be possible without help from our friends in the OpenTelemetry community. We consider the AWS OpenTelemetry project to be an extension of the OpenTelemetry project itself. Our goal is to share as much as possible, so if something seems hidden away, let us know. Our entire project is on GitHub, and we use it for everything, including:
- Code development
- Design docs
- Issue tracking
- Integration testing
We encourage you to join the community.
We hope this can help provide insight into your distributed systems. Should you be interested in contributing to an open source project, find the OpenTelemetry repository for your favorite language and review issues marked “help wanted.”