AWS Open Source Blog

Migrating X-Ray tracing to AWS Distro for OpenTelemetry

In the context of containerized microservices, we face the challenge of being able to tell where along the request path things happen and efficiently drill into signals. As a developer, you don’t want to fly blind and one popular way to provide these insights is distributed tracing. In this post we walk through migrating a distributed tracing setup for AWS X-Ray using AWS Distro for OpenTelemetry using Amazon Elastic Kubernetes Service (Amazon EKS).

AWS Distro for OpenTelemetry (ADOT) is the AWS distribution of the Cloud Native Computing Foundation (CNCF) OpenTelemetry project. ADOT enables you to use a standardized set of open source APIs, SDKs, and agents to instrument applications once and collect signals for multiple analytics solutions.

In this post, we will be focusing on the telemetry aspect of distributed traces and their consumption in AWS X-Ray. This post assumes that you’re using X-Ray and want to migrate to ADOT. Further, because we’re using Amazon EKS in this post, familiarity with Kubernetes is needed. Note that although we’re demonstrating the setup using a Kubernetes environment, the same functionality is possible to achieve in Amazon Elastic Container Service (Amazon ECS).

The target setup with an ADOT-enabled tracing for X-Ray looks as follows:

ADOT tracing setup

The preceding setup shows the ADOT Collector (see the following Background section for details), as a sidecar of an application, sending the traces to X-Ray. For the ADOT Collector to be able to write to X-Ray, we’re using a least privileges feature of Amazon EKS called IAM roles for service accounts (IRSA).

Before we get into the details, let’s step back a bit and make sure we’re clear on the terms used in OpenTelemetry.

Background

A collector is a set of components collecting and processing traces instrumented. The collector can do aggregation, smart sampling, and export traces a one or more tracing backends. The collector allows further processing of collected telemetry, such as adding additional attributes or scrubbing personal information.

Within a collector, one or more pipelines may be defined, each defining a path the data follows by using one or more of:

  • A receiver is how data gets into the OpenTelemetry collector. Generally, a receiver accepts data in a specified format, translates it into the internal format and passes it to one or more processors.
  • A processor can transform the data before forwarding it, that is can drop the data or add to it and forward it to an exporter.
  • An exporter typically forwards the data it gets to a destination, such as over the network to a backend like X-Ray or to a local file.

Further, in context of distributed traces, we’re using the following terms:

A trace in OpenTelemetry can be thought of as a directed acyclic graph (DAG) of spans, where the edges between spans are defined as parent/child relationship. A span encapsulates one or more events along with the start and finish timestamp as well as attributes (key-value pairs).

In ADOT we maintain the AWS OTel Collector, with a default configuration allowing you to send traces to X-Ray and metrics to Amazon CloudWatch:

AWS OTEL collector

With terminology and the relevant ADOT components out of the way, let’s see it in action.

Preparation

We want to set up an EKS cluster using eksctl that allows us to send traces to X-Ray using ADOT. For this, we first define
a cluster configuration (see also the configuration for eksctl docs for more on this) in a file called cluster-config.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: adotxray
  region: eu-west-1
  version: '1.18'
iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: xray
      namespace: adot
      labels: {aws-usage: "application"}
    attachPolicy:
      Version: "2012-10-17"
      Statement:
      - Effect: Allow
        Action:
        - "logs:PutLogEvents"
        - "logs:CreateLogGroup"
        - "logs:CreateLogStream"
        - "logs:DescribeLogStreams"
        - "logs:DescribeLogGroups"
        - "xray:PutTraceSegments"
        - "xray:PutTelemetryRecord"
        - "xray:GetSamplingRules"
        - "xray:GetSamplingTargets"
        - "xray:GetSamplingStatisticSummaries"
        - "ssm:GetParameters"
        Resource: '*'
managedNodeGroups:
- name: default-ng
  minSize: 1
  maxSize: 3
  desiredCapacity: 2
  ssh:
    allow: true
    publicKeyPath: ~/.ssh/work-default.pub
  labels: {role: mngworker}
  iam:
    withAddonPolicies:
      imageBuilder: true
      autoScaler: true
      externalDNS: true
      certManager: true
      ebs: true
      albIngress: true
      cloudWatch: true
cloudWatch:
  clusterLogging:
    enableTypes: ["*"]

Based on the above file we have everything to create the EKS cluster using the following command:

$ eksctl create cluster -f cluster-config.yaml 
[ℹ]  eksctl version 0.33.0
[ℹ]  using region eu-west-1
[ℹ]  setting availability zones to [eu-west-1a eu-west-1c eu-west-1b]
...
[ℹ]  building iamserviceaccount stack "eksctl-adotxray-addon-iamserviceaccount-kube-system-aws-node"
[ℹ]  building iamserviceaccount stack "eksctl-adotxray-addon-iamserviceaccount-adot-xray"
[ℹ]  deploying stack "eksctl-adotxray-addon-iamserviceaccount-kube-system-aws-node"
[ℹ]  deploying stack "eksctl-adotxray-addon-iamserviceaccount-adot-xray"
...
[✔]  EKS cluster "adotxray" in "eu-west-1" region is ready

With the infrastructure set up, let’s move on to the app level.

Sending traces to X-Ray

As pointed out in the beginning we want to use the ADOT Collector as a side-car to our app. Using the public image of our collector served via Amazon ECR Public and a sample app that is instrumented with ADOT to emit traces we can launch the following (store in a file called app.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: adot-trace
  namespace: adot
spec:
  selector:
    matchLabels:
      app: sample
  replicas: 1
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
        - name: trace-emitter
          image: public.ecr.aws/g9c4k4i4/trace-emitter:1
          env:
          - name: OTEL_OTLP_ENDPOINT
            value: "localhost:55680"
          - name: OTEL_RESOURCE_ATTRIBUTES
            value: "service.namespace=AWSObservability,service.name=ADOTEmitService"
          - name: S3_REGION
            value: "eu-west-1"
          imagePullPolicy: Always
        - name: adot-collector
          image: public.ecr.aws/aws-observability/aws-otel-collector:latest
          env:
            - name: AWS_REGION
              value: "eu-west-1"

Now launch the app using kubectl apply -f app.yaml and then check the ADOT Collector’s output and you should see something like the following:

$ kubectl logs pod/adot-trace-b45bdbdd9-2zqwl adot-collector 
AWS OTel Collector version: v0.4.0
2020-12-11T11:40:09.521Z        INFO    service/service.go:397  Starting AWS OTel Collector...  {"Version": "v0.4.0", "GitHash": "be64e63fbd972170e024cbf10d41b7fad0e94394", "NumCPU": 2}
2020-12-11T11:40:09.522Z        INFO    service/service.go:241  Setting up own telemetry...
2020-12-11T11:40:09.523Z        INFO    service/telemetry.go:101        Serving Prometheus metrics      {"address": "localhost:8888", "level": 0, "service.instance.id": "31343a96-ae2c-4333-9f30-d33f2e5a5f62"}
2020-12-11T11:40:09.523Z        INFO    service/service.go:278  Loading configuration...
2020-12-11T11:40:09.525Z        INFO    service/service.go:289  Applying configuration...
2020-12-11T11:40:09.525Z        INFO    service/service.go:310  Starting extensions...
2020-12-11T11:40:09.525Z        INFO    builder/extensions_builder.go:53        Extension is starting...        {"component_kind": "extension", "component_type": "health_check", "component_name": "health_check"}
2020-12-11T11:40:09.525Z        INFO    healthcheckextension/healthcheckextension.go:40 Starting health_check extension {"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "config": {"TypeVal":"health_check","NameVal":"health_check","Port":13133}}
2020-12-11T11:40:09.525Z        INFO    builder/extensions_builder.go:59        Extension started.      {"component_kind": "extension", "component_type": "health_check", "component_name": "health_check"}
2020-12-11T11:40:09.525Z        INFO    builder/exporters_builder.go:306        Exporter is enabled.    {"component_kind": "exporter", "exporter": "awsxray"}
2020-12-11T11:40:09.526Z        INFO    builder/exporters_builder.go:306        Exporter is enabled.    {"component_kind": "exporter", "exporter": "awsemf"}
2020-12-11T11:40:09.526Z        INFO    service/service.go:325  Starting exporters...
2020-12-11T11:40:09.526Z        INFO    builder/exporters_builder.go:92 Exporter is starting... {"component_kind": "exporter", "component_type": "awsxray", "component_name": "awsxray"}
2020-12-11T11:40:09.526Z        INFO    builder/exporters_builder.go:97 Exporter started.       {"component_kind": "exporter", "component_type": "awsxray", "component_name": "awsxray"}
2020-12-11T11:40:09.526Z        INFO    builder/exporters_builder.go:92 Exporter is starting... {"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf"}
2020-12-11T11:40:09.526Z        INFO    builder/exporters_builder.go:97 Exporter started.       {"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf"}
2020-12-11T11:40:09.526Z        INFO    builder/pipelines_builder.go:207        Pipeline is enabled.    {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2020-12-11T11:40:09.526Z        INFO    builder/pipelines_builder.go:207        Pipeline is enabled.    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2020-12-11T11:40:09.526Z        INFO    service/service.go:338  Starting processors...
2020-12-11T11:40:09.526Z        INFO    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2020-12-11T11:40:09.526Z        INFO    builder/pipelines_builder.go:61 Pipeline is started.    {"pipeline_name": "traces", "pipeline_datatype": "traces"}
2020-12-11T11:40:09.526Z        INFO    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2020-12-11T11:40:09.526Z        INFO    builder/pipelines_builder.go:61 Pipeline is started.    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2020-12-11T11:40:09.526Z        INFO    builder/receivers_builder.go:235        Receiver is enabled.    {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp", "datatype": "traces"}
2020-12-11T11:40:09.526Z        INFO    builder/receivers_builder.go:235        Receiver is enabled.    {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp", "datatype": "metrics"}
2020-12-11T11:40:09.526Z        INFO    awsxrayreceiver@v0.14.1-0.20201111210848-994cabe5d596/receiver.go:61    Going to listen on endpoint for X-Ray segments  {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray", "udp": "0.0.0.0:2000"}
2020-12-11T11:40:09.526Z        INFO    udppoller/poller.go:105 Listening on endpoint for X-Ray segments        {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray", "udp": "0.0.0.0:2000"}
2020-12-11T11:40:09.526Z        INFO    awsxrayreceiver@v0.14.1-0.20201111210848-994cabe5d596/receiver.go:73    Listening on endpoint for X-Ray segments        {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray", "udp": "0.0.0.0:2000"}
2020-12-11T11:40:09.526Z        INFO    builder/receivers_builder.go:235        Receiver is enabled.    {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray", "datatype": "traces"}
2020-12-11T11:40:09.526Z        INFO    service/service.go:350  Starting receivers...
2020-12-11T11:40:09.526Z        INFO    builder/receivers_builder.go:70 Receiver is starting... {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-11T11:40:09.536Z        INFO    builder/receivers_builder.go:75 Receiver started.       {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-11T11:40:09.536Z        INFO    builder/receivers_builder.go:70 Receiver is starting... {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray"}
2020-12-11T11:40:09.536Z        INFO    awsxrayreceiver@v0.14.1-0.20201111210848-994cabe5d596/receiver.go:98    X-Ray TCP proxy server started  {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray"}
2020-12-11T11:40:09.537Z        INFO    builder/receivers_builder.go:75 Receiver started.       {"component_kind": "receiver", "component_type": "awsxray", "component_name": "awsxray"}
2020-12-11T11:40:09.537Z        INFO    healthcheck/handler.go:128      Health Check state change       {"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "status": "ready"}
2020-12-11T11:40:09.537Z        INFO    service/service.go:253  Everything is ready. Begin running and processing data.
2020-12-11T11:41:09.526Z        WARN    awsemfexporter@v0.14.1-0.20201117192543-4a81c809e720/metric_translator.go:241   Unhandled metric data type.     {"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "DataType": "None", "Name": "processedSpans", "Unit": "1"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter.getCWMetrics

That’s all that’s necessary to send the traces using ADOT. Now head over to X-Ray console and check out what you can find there in the service map:

X-Ray service map

As well as the trace analytics:

X-Ray trace analytics

Now that you’ve seen the setup in practice, you likely wonder what’s next. Let’s have a look at a few resource to get you started.

As a developer, you’re first and foremost interested in instrumenting your services. Depending on the programming language you’re using we have SDKs in various maturity stages available. You can today already use the Java and Go SDKs for production environments. Especially if you’re interested in Java, we recommend you to peruse the blog post on Distributed Tracing using AWS Distro for OpenTelemetry. The JavaScript and Python SDKs are work in progress and we’re working on getting those production ready in the coming months.

In this post we demonstrated how to send traces from an ADOT-enabled app to X-Ray. We hope you can start your migration to ADOT for traces and going forward for metrics and logs. This is a fast moving space, so keep you an eye out for more posts in this direction. Learn more about AWS Distro for OpenTelemetry and how to get started using tracing on different compute services in our developer portal. You can connect with us and provide feedback in our forum.

Michael Hausenblas

Michael Hausenblas

Michael is a Principal Product Developer Advocate in the AWS container service team. He covers observability, Kubernetes, service meshes, as well as container security and policies. Before Amazon, Michael worked at Red Hat, Mesosphere (now D2iQ), MapR (now part of HPE), and in two applied research institutions in Ireland and Austria.