AWS adds Prometheus Remote Write Exporter to OpenTelemetry Collector

In this post, AWS intern Yang Hu describes how he made his first engineering contributions to the popular open source observability project—OpenTelemetry. His contributions to OpenTelemetry included adding a Prometheus Remote Write Exporter to the OpenTelemetry Collector. This exporter enables you to send system metrics generated from OpenTelemetry API, Prometheus instrumented libraries, or other sources, to a variety of Prometheus remote write integrated backends, including Cortex, Thanos, and InfluxDB. Users can visualize or configure alarms for the exported metrics to monitor the health of their services, improve performance, and detect anomalies. This post details the data path in OpenTelemetry, how this new exporter works, how to use it, and the lessons learned along the way.

The OpenTelemetry project’s vision is two parts. The first is to create an industry-wide, adoptable open standard for telemetry data. Second, the project wants to provide tools that support metrics, tracing, and logs. The OpenTelemetry API operations (the APIs) and SDKs (the SDKs) implemented in different languages provide solutions for telemetry generation and export. You can instrument your application with the APIs and send metrics data to Prometheus, Jaeger, or the OpenTelemetry Collector using exporters attached in the SDKs.

OpenTelemetry Collector complements the APIs and the SDKs. The Collector is an executable that receives telemetry data, optionally transforms it, and sends the data further. It supports several popular open source protocols for telemetry data send and receive, and offers a pluggable architecture for adding more protocols. This pluggable architecture enables the Collector to decouple metric generation from exporting to a backend. It supports any new telemetry-generating service or telemetry-receiving backend without requiring you to re-implement a language-specific exporter, or to redeploy application binaries. The Prometheus Remote Write Exporter that I added to the Collector enables users to export metrics from existing applications to Cortex without changing application code or redeployment. Theoretically, it could export to Thanos, InfluxDB, M3DB, or any other Prometheus remote write integrated backend, as well.

In the Collector, pipelines handle data receiving, transformation, and sending. Users can configure the Collector to have one or more pipelines. Each pipeline includes:

A set of Receivers that receive the data.
A series of optional Processors that transform data from receivers.
A set of Exporters that sends the data from the Processors further outside of the Collector.

The same Receiver can feed data to multiple pipelines and multiple pipelines can feed data into the same Exporter. The following diagram demonstrates the data flow through the Collector pipelines.

Diagram illustrating the data flow through the OpenTelemetry Collector.

The OpenTelemetry Collector uses the factory pattern to build data pipelines. Each component has a factory that creates the component, and a configuration that defines the parameters of the component. The OpenTelemetry application reads a user-provided configuration file, and invokes the correct factory it registered to create the component.

Factory and configuration

To integrate the Prometheus Remote Write Exporter into Collector pipelines, I implemented a factory and a configuration struct, in addition to an exporter that transforms and sends metrics. The factory and configuration struct invoke at the initialization of the Collector application. The application package has a dedicated component called exporters builder that invokes various factories. During initialization, the collector asks exporters builder for an implementation of the MetricExporter interface. Then, the exporter builder invokes the factory code inside the Prometheus Remote Write Exporter package to create an instance of the Prometheus Remote Write Exporter. When invoked, the factory uses the exporter helper package to wrap the Exporter implementation inside the MetricExporter interface, and returns the instance to the Collector application. Finally, the Collector application assembles the pipeline with the Exporter and starts all components in the pipeline. The following diagram demonstrates this process.

Diagram illustrating the flow of Prometheus Write Exporter creation.

Exporter

The actual Exporter receives data from a processor or receiver component in the pipeline. It transforms incoming metrics into a Prometheus Remote Write API compatible format. The Exporter exports transformed metrics via an HTTP request. Finally, it reports the number of successfully exported metrics to the Collector pipeline.

The evolving OpenTelemetry data definition was a challenge for me to implement metric translation correctly. Internally, the Collector uses the OpenTelemetry Protocol (OTLP) metric definition as its data format; however, the OTLP metrics definition is still under development and constantly evolving. To provide stable conversion, existing metric components ask the Collector to convert OTLP metrics into OpenCensus metrics (a legacy format) and perform transformation from there. This was a short-term solution; the translation to OpenCensus metrics deleted once the OTLP stabilized. I would have to refactor existing metric components in the Collector. When the development of the Prometheus Remote Write Exporter began, the OTLP definition was still under development. The community agreed that the next version and the stable version should not be significantly different from the existing OTLP definition; however, they could not agree on what the next version of the OTLP definition should look like. The community couldn’t decide between having a more complete semantic or better performance.

Under this context, I implemented the Prometheus Remote Write Exporter to convert OTLP metrics directly. At the time, the OTLP definition in the Collector was equivalent to, but not the same as, OTLP Proto v0.4.0. During development, there was an update of the OTLP metric definition, so I refactored the Prometheus Remote Write Exporter code to support the newer definition.

I also discovered the OpenTelemetry project lacked support for calculating cumulative metric values, thus, it did not support different export strategies to different backends.

Generally, there are two types of export strategies:

The client resets the value of each metric at the beginning of every collection interval. Then, it exports the value from that interval (delta values) to a backend, and lets the backend calculate the current value (cumulative value) of the metric.
The client maintains the state of each metric across collection interval, and exports the cumulative value to the backend.

The following diagram illustrates the difference between the two export strategies.

Diagram illustrating the difference between exporting delta and exporting cumulative.

The OpenTelemetry SDK does not maintain the states of metrics when exporting data to the Collector. This choice allows the SDKs to use less memory. As a result, however, the data path through the Collector only works with receiving backends, such as StatsD, which supports export strategy one. The same data path does not work with backends that support the second strategy, such as Prometheus and Cortex. This leads to a bug that Prometheus users discovered—delta values of counter metrics export directly to the Prometheus server as instantaneous gauge values. Because Prometheus expects incoming values to be cumulative and does not do aggregation, users see delta values when they expect cumulative values. The following diagram illustrates this bug.

Diagram illustrating the bug when exporting to Prometheus from the Collector

To overcome this gap in the data path, I proposed three different solutions:

Let the SDKs stay the same and add a metric aggregation processor to the Collector pipeline. The processor would aggregate delta values into cumulative values for cumulative backends. Delta values of the same metric then send to the same collector to promote the correct aggregation. This may not happen in the common use case of multiple Collector instances sitting behind a load balancer. To provide the condition, an extra requirement would be to set up a Collector agent next to each SDK to make sure that delta values aggregate correctly.
Make the SDKs OTLP exporter support configurable export strategy, with exporting cumulative values as default. This way, cumulative values export from the source, meaning no aggregation of metric values is required in the Collector pipeline. When you want to export to StatsD, they can enable the SDK to export delta values via configuration.
Implement both solution one and solution two. This supports all use cases, even if users are trying to export to Prometheus, and there is a requirement on memory usage of the SDKs. You could configure the SDK exporters to export delta values, and attach a Collector that contains the metric aggregation processor mentioned in solution one to each SDK.

I presented my solutions for discussion and review to the community, and filed a design for the metric aggregation processor. I wrote an OpenTelemetry Enhancement Proposal (OTEP) to explain solutions two and three. The maintainers of the OpenTelemetry Collector accepted the solutions. I documented the corresponding specification change, and implementation is forthcoming.

The following diagram illustrates different use cases that solution three addresses.

Diagram illustrating different use cases that the third solution addresses.

Once I addressed the metric transformation challenges, I moved on to implementing the transformation and exporting. With each export call, the exporter converts each OTLP metric to Prometheus TimeSeries based on the metric type, and stores the metrics in a map. Once all OTLP metrics from the current batch transform, it sends an HTTP request to the backend. The following sequence diagram demonstrates this process.

Sequence diagram illustrating the implementation of metrics transformation and export.

You can enable the Prometheus Remote Write Exporter in the configuration of the OpenTelemetry Collector. The following is a sample configuration for a Collector instance that receives, batches, and exports metrics from OTLP sources using the Prometheus Remote Write Exporter.

# config.yaml
receivers:
 otlp:
    protocols:
        grpc:
  
processors:
  batch:
    send_batch_size: 10000
    timeout: 10s
 
exporters:
    prometheusremotewrite:
        namespace: "test-space"
        sending_queue:
            enabled: true
            num_consumers: 2
            queue_size: 10
        retry_on_failure:
            enabled: true
            initial_interval: 10s
            max_interval: 60s
            max_elapsed_time: 10m
        endpoint: "localhost:8888"
        ca_file: "/var/lib/mycert.pem"
        write_buffer_size: 524288
        headers:
            Prometheus-Remote-Write-Version: "0.1.0"
            X-Scope-OrgID: 234

service:
    pipelines:
        metrics:
            receivers: [otlp]
            processors: [batch]
            exporters: [prometheusremotewrite]

Testing

To test the Prometheus Remote Write Exporter, I initialized an OpenTelemetry Collector with the Exporter. I created an OTLP load generator and a Prometheus load generator to send metrics in a different format to the Collector. I also created an Amazon Elastic Compute Cloud (Amazon EC2) instance with Cortex running on it, and configured the Collector to export metrics data to the EC2 instance. On the querying side, I set up a Grafana instance to visualize the exported metrics. I implemented a querier that programmatically gets data from Cortex’s query API, and writes query result to a text file. This text file can compare against inputs from load generators. The following diagram illustrates my testing setup.

Diagram illustrating the author's testing setup.

Interestingly, as I was testing, I identified a bug in the Prometheus receiver of the OpenTelemetry Collector. I outline how to fix it in GitHub.

Learning

Throughout this project, I learned a great deal about working with a large community of open source contributors and developing software. Coding is only 15% to 20% of the work. The bulk of work is communicating changing requirements, validating assumptions, soliciting feedback, and proposing solutions to challenges. Coming from school, I was used to writing code to solve defined problems, but there is no predefined problem when working on a large-scale, evolving project. I found that critically analyzing the situation, in addition to defining and scoping the problem, is equally as important as writing working code. Additionally, I embraced the tenets of open source design, which center on transparency and communication. Everything is reviewed and discussed without exception. Overall, working on OpenTelemetry was a great first step into the open source community. The experience motivates me to work on more open source projects.

References

Learn more about the OpenTelemetry Collector on GitHub.
Documented outline of backends that support the Prometheus Remote Write API.
Design for the Prometheus Remote Write Exporter.
Updates on OTLP definition.
Discussion around support for different exporting strategies on GitHub.
Discussion and support for various specification issues.
Discussion and support for common collector issues.

Yang Hu

Yang Hu is a senior at University of Washington, Bothell currently interning as a software developer at Amazon Web Services. He is interested in observability and cloud computing.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.