AWS Open Source Blog

AWS adds a C++ Prometheus Exporter to OpenTelemetry

In this post, two AWS interns—Cunjun Wang and Eric Hsueh—describe their first engineering contributions to the popular open source observability project OpenTelemetry. OpenTelemetry aims to develop an open standard and provides implementation in multiple programming languages for collecting telemetry data, including metrics, tracing, and logs. The interns contributed the C++ Prometheus Exporter to the OpenTelemetry project. This exporter takes collected metrics from OpenTelemetry and exports to Prometheus, a popular open source alerting and monitoring application. This post explains how the Prometheus exporter works with the previously introduced metrics pipeline, potential code uses, and lessons learned along the way.

We decided to build the Prometheus Exporter for C++ so that it was in a programming language we all knew. This exporter plugs into the C++ SDK to collect metrics for customer applications. Customers may use Prometheus as their metrics backend to monitor projects and systems. Our project fills the gap between the metrics collected by the Metrics SDK and the Prometheus for our customers.

Because we were implementing the exporter from scratch, we began by reading the OpenTelemetry specification for the exporter interface and writing a requirements document. We reviewed the existing implementation of Prometheus Exporter to see how other languages translated these requirements into a working component. We then composed a set of design documents for community review. Working with the test-driven development (TDD) principles, we wrote a test design document before the implementation.

Project details

Prometheus is an open source system monitoring and alerting toolkit that collects metrics data via a pull model over HTTP. The Prometheus server makes pull requests to the HTTP server exposed by the exporter and scrapes metrics data on a regular interval. Prometheus Exporter is used to maintain the HTTP server and to serve the metrics data collected by the OpenTelemetry Controller for Prometheus.

Screenshot of the Prometheus Exporter Component.

The Prometheus Exporter has three major components. The PrometheusExporter class implements the MetricExporter interface and provides an Export() function for the Metrics SDK controller to export the metric data. It also has a Shutdown() function and maintains a shutdown status to control whether to accept new export requests or not.

The PrometheusCollector class interacts with the intermediate collection inside it. It saves the exported data to the collection, and fetches all data from the collection when it receives a Prometheus pull request. It also implements the Collectable interface from Prometheus and provides a Collect() function to tell Prometheus to scrape data from it.

PrometheusExporterUtils contains all the helper functions that translate OpenTelemetry metrics to Prometheus metric data structures so that the data are acceptable for Prometheus.

Diagram illustrating the Prometheus Exporter data path.

The entire data pipeline splits into a producer side and a consumer side. The producer side begins with the Controller in the Metrics SDK. Metrics SDK collects and pre-processes metric data and calls the Export() function in the PrometheusExporter class to send data to exporters. You can read about this process in detail in the previous blog post.

Then the PrometheusExporter passes the same batch of data to PrometheusCollector by calling the AddMetricsData function. PrometheusCollector receives the batch and temporarily stores it in an in-memory collection. PrometheusExporter also exposes an HTTP endpoint and waits for Prometheus pull requests to scrape data from the collection.

The consumer side begins with a Prometheus pull request. First, the Prometheus server sends a pull request to the exposed HTTP endpoint. The HTTP server defines a Collectable interface and has a registry inside. Every class that implements the Collectable interface can register itself to the HTTP server. The HTTP server then scans all registered components to scrape data from them.

Our PrometheusCollector class implements the Collectable interface, then the Prometheus server finds our collector in the registry and calls the Collect function. PrometheusCollector fetches all data from the intermediate collection and calls helper functions in PrometheusExporterUtils class. These functions will parse the data into a structure that is acceptable by Prometheus. Finally, PrometheusCollector serves the parsed result.

Testing

We exhaustively tested our Prometheus Exporter to ensure the quality and validity of our code. We followed a TDD methodology to unit test the functionality of individual components of our exporter. We also performed integration testing, functionality testing of the exporter with the C++ Metrics SDK, and validation testing of the data passed to the exporter.

A more in-depth discussion of our testing methods can be found in the previous blog post.

Example

We demonstrate how users can export data manually in the following example pseudo-code snippet. This sample program starts a Prometheus Exporter instance and exposes an HTTP endpoint at localhost:8080. It builds four histogram metrics every second and exports to Prometheus every 15 seconds (60 metrics data points):

int main() {
    // declare HTTP endpoint to expose, and start a Prometheus exporter
    address = "localhost:8080";
    PrometheusExporter exporter{address};

    print("PrometheusExporter example program running on " + address +"...\n");

    vector<double> boundaries{10, 20, 30, 40, 50};
    aggregator = shared_ptr<Aggregator<int>>(
        new HistogramAggregator<int>(InstrumentKind::Counter, boundaries));

    // collection of records to export
    vector<Record> collection;
    record = get_record("histogram", 1, "{label-1:v1,label2:v2,}", aggregator);

    // export every 250 milliseconds (4 metrics every second)
    int counter = 0;
    while (true) {
        // a random number in 1~100
        int val = (rand() % 100) + 1;
        aggregator->update(val);
      
        counter++;
        std::this_thread::sleep_for(std::chrono::milliseconds(250));
        
        // export every 60 metrics
        if (counter % 60 == 0) {
            aggregator->checkpoint();
            collection.emplace_back(record);
            exporter.Export(collection);
            counter = 0;
        }
    }
}

When the Prometheus service is configured properly and started, it is able to scrape data from localhost:8080/metrics. We include the configuration to add our scrape job in prometheus.yml:

# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
  # omit other monitor jobs here...
  - job_name: 'test_prom_exporter'
    static_configs:
      - targets: ['docker.for.mac.localhost:8080']

Export result illustrated in Prometheus—Histogram buckets:

Export result illustrated in Prometheus—Histogram data count, 60 metrics every batch:

Export result illustrated in Prometheus, Histogram data count with 60 metrics each batch.

Export result illustrated in Prometheus—Histogram metrics sum:

Export result illustrated in Prometheus, Histogram metrics sum.

Lessons learned

Throughout this project, we learned a lot about designing and developing large-scale open source systems. We spent four weeks of our internship focusing on validating requirements, refining the system design and test strategy, and performing several design reviews.

We considered and discussed almost every part of the project thoroughly, including function definition, error handling, concurrency, scalability, and future enhancements. As a result, we found project implementation much easier and less error-prone.

Additionally, we joined an open source project, which was both instructive and productive. We encourage anyone to file issues and post comments discussion. We document and discuss everything in detail and with transparency. Overall, interning at AWS, and working on an open source project, was a great experience.

References

About the Authors

Cunjun Wang

Cunjun Wang

Cunjun Wang is a Master’s degree candidate at Columbia University major in Software Systems, currently interning as a software developer at Amazon Web Services. He loves Java back-end development, microservice development with the Spring framework, and concurrent programing.

Eric Hsueh

Eric Hsueh

Eric Hsueh is a senior pursuing a Bachelor’s degree in Computer Science at the University of California, Irvine. He is currently interning as a software engineer at Amazon Web Services, and is interested in machine learning and observability.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

Alolita Sharma

Alolita Sharma

Alolita is a senior manager at AWS where she leads open source observability engineering and collaboration for OpenTelemetry, Prometheus, Cortex, Grafana. Alolita is co-chair of the CNCF Technical Advisory Group for Observability, member of the OpenTelemetry Governance Committee and a board director of the Unicode Consortium. She contributes to open standards at OpenTelemetry, Unicode and W3C. She has served on the boards of the OSI and SFLC.in. Alolita has led engineering teams at Wikipedia, Twitter, PayPal and IBM. Two decades of doing open source continue to inspire her. You can find her on Twitter @alolita.