AWS Open Source Blog

AWS One Observability Demo Workshop: What’s new with Prometheus, Grafana, and OpenTelemetry 

Amazon Web Services (AWS) offers a variety of observability services and tools to gain visibility and insights about your workload’s health and performance. For example, Amazon CloudWatch and AWS X-Ray offer a variety of features to collect, ingest, and perform operations on traces, metrics, and log data generated from workloads. Purpose-built solutions, such as CloudWatch Container Insights, Lambda Insights, ServiceLens, X-Ray Insights, CloudWatch Synthetics, and others offer ways for you to more easily set up observability in a microservice environment.

During AWS re:Invent 2020, AWS launched Amazon Managed Service for Prometheus (AMP) and Amazon Managed Grafana two new open source-based managed services providing additional options to choose from. AWS also launched AWS Distro for OpenTelemetry (ADOT), a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. OpenTelemetry, which is part of the Cloud Native Computing Foundation, provides open source APIs, libraries, and agents to collect distributed traces and metrics for application monitoring. With AWS Distro for OpenTelemetry, you can instrument applications just once to send correlated metrics and traces to multiple AWS and partner monitoring solutions.

Given the pace at which new services and features are being launched in this space, we identified an opportunity to create a platform for customers and partners to get hands-on experience with AWS instrumentation options and the latest capabilities of AWS observability services in a self-paced, guided, sandbox environment.

We launched the One Observability Demo Workshop in August 2020 to do exactly that. This workshop is available in English, Spanish, Japanese and Korean languages to help customers around the world to get hands-on experience with the latest AWS monitoring and observability capabilities. As workshop owners, we continuously update the workshop content by adding new modules to keep up with new features and services being launched.

During the first week of March 2021, we launched 12 new features to the workshop, including three new modules covering AMP, Amazon Managed Grafana, and ADOT.

Workshop application architecture

The workshop uses an application called PetAdoptions that is available on GitHub. The application is built using a microservice architecture and different components of the application are deployed on a variety of services, such as Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and AWS Step Functions. The architecture of the application is shown in the following diagram.

PetAdoptions application architecture diagram.

As illustrated in the diagram, the application is not only deployed on a variety of different services but also written using different programming languages, such as Java, C#, Go, Python, and Node.js.

The different services that are part of the application architecture collect traces, metrics, and logs, which are then sent to CloudWatch and X-Ray. The following techniques are used to collect data from different services.

Trace collection

  • AWS X-Ray daemon running as sidecar in the PetSite front-end service hosted on Amazon EKS. The application is instrumented using the X-Ray SDK.
  • ADOT Collector is deployed as a sidecar in the PetSearchAPI ECS task, and the application is instrumented using the OTEL SDK for Java for trace collection.
  • ADOT Collector is deployed as a sidecar in the PetListAdoptions ECS task, and the application is instrumented using the OTEL SDK for Go for trace collection.
  • AWS X-Ray daemon is deployed as sidecar in the PayForAdoptions ECS task, and the application is instrumented using the AWS X-Ray Go SDK for trace collection.
  • Traces from the Python-based Lambda functions are collected using AWS Distro for OpenTelemetry Lambda support for Python.
  • Traces from the Node.js-based Lambda function is collected using the native built-in trace collection mechanism in AWS Lambda.
  • API Gateway and Step Functions are set up to collect traces using the built-in trace collection setup in the services.

Metric collection

  • CloudWatch Container Insights is set up to collect on all Amazon EKS and Amazon ECS clusters.
  • The PetSite front-end service hosted on EKS exposes application metrics in Prometheus format. These metrics are scraped by the CloudWatch Prometheus agent and sent to CloudWatch Metrics. In the AMP module, the ADOT collector configuration is used to demonstrate scraping metrics from the same end point.
  • A Prometheus server deployment and configuration is set up to collect infrastructure metrics from the EKS cluster.
  • Lambda Insights is used to collect additional performance metrics from all Lambda functions.
  • Metric collection is turned on in the API Gateway and other services as well.

Log collection

  • Log collection from all Amazon ECS clusters and the Amazon EKS cluster is set up using the Fluent Bit agent as part of CloudWatch Container Insights.
  • Logs from API Gateway and Lambda functions are collected using the default built-in log collection mechanism.

New features

Collect Prometheus metrics from Amazon EKS and ingest into AMP workspace

Using the AMP module, you can get hands-on experience with creating an AMP workspace, deploying a Prometheus server on EKS, configuring it to scrape Prometheus metrics from the EKS environment, and ingesting the metrics into the AMP workspace.

This AMP module also provides hands-on guidance to deploy the ADOT collector and to scrape custom application metrics (in Prometheus format) from the PetSite front-end service.

Visualize metrics, traces, and log data in Grafana

Using the Amazon Managed Grafana module, you can learn about creating an Amazon Managed Grafana workspace, and setting up AMP, CloudWatch, and X-Ray data sources. You can also learn about creating custom dashboards in Grafana by querying Prometheus metrics from AMP using PromQL, query X-Ray trace data using X-Ray filter expressions, and import out-of-the-box dashboards by querying CloudWatch Metrics.

Screenshot of an AMG workspace.

ADOT developer walkthrough

The ADOT module includes the following:

  • Go tracing walkthrough: This section shows how the OpenTelemetry SDK is used to instrument traces in a service built using the Go language, and it explains how to instrument requests and add metadata to the traces. For example:
    tracer := otel.GetTracerProvider().Tracer("petlistadoptions")
    _, span := tracer.Start(ctx, "MSSQL Query", trace.WithSpanKind(trace.SpanKindClient))
    
    sql := `SELECT TOP 25 PetId, Transaction_Id, Adoption_Date FROM dbo.transactions`
    
    // injecting custom attributes
    span.SetAttributes(
      label.String("sql", sql),
      label.String("url", r.safeConnStr),
    )
    rows, err := r.db.Query(sql)
    if err != nil {
      handleErr(err)
    }
    
    span.End()
    Go
  • Java tracing walkthrough: Learn how to configure auto-instrumentation for a Java Spring Boot application. The auto-instrumentation feature automatically captures traces from the application without code changes. This section explains how this is configured in the Dockerfile for the application. Additionally, the application has manual instrumentation implemented, which is also demonstrated.
  • Tracing Lambda functions: ADOT Lambda Support for Python supports collecting traces from Python-based Lambda functions through Lambda Layers. This section provides a hands-on experience about how this is set up.

CloudWatch ServiceLens

In this workshop, we use CloudWatch ServiceLens to explore the interconnections between a polyglot containerized application spanning different clusters, different containers orchestration tools, and serverless functions. The relation between each resources is easy to visualize in the ServiceMap console.

Screenshot of a visualization of the relation between different resources in the ServiceMap console.

As part of the “Correlating logs, metrics & traces” exercise, experience how ServiceLens lets you correlate logs, metrics, and traces and helps connect the data for troubleshooting issues.

CloudWatch Container Insights

CloudWatch Container Insights recently added support for Fluent Bit to collect logs from container workloads hosted on Amazon ECS, EKS, and Kubernetes on EC2. In this section, learn how Container Insights is set up on both ECS and EKS clusters using the Fluent Bit agent.

Walk through the module to understand how to navigate and make use of various features in the Container Insights console, and learn about its rich integration and correlation capabilities with metrics, logs, and traces in CloudWatch.

Screenshot of Container Insights console.

Screenshot of the Container Insights console showing performance monitoring.

The Fluent Bit agent’s performance can also be monitored using the metrics exposed by the agent. In this section, you can view the CloudWatch Dashboard created from the metrics generated by the Fluent Bit agent.

Screenshot of the EKS_FluentBit_Dashboard.

Lambda Insights

The Lambda Insights module provides a look into the Lambda Insights feature in CloudWatch. View the metrics collected from Lambda functions and navigate through the dashboards automatically created in Lambda Insights.

This section provides a hands-on experience in addressing a memory utilization problem in a Lambda function observed through Lambda Insights. Also, you will be able to observe the changes in metrics data after fixing the memory utilization problem, which offers a preview for how this feature might help in real-world situations.

Screenshot of Performance Monitoring within Lambda Insights.

Complete April 2021 feature update list

  1. New module: Amazon Managed Service for Prometheus (AMP) is a Prometheus-compatible monitoring service that makes it easy to monitor containerized applications at scale. A new module was added to the workshop that will guide you over the steps needed to:
    • Create AMP workspace.
    • Set up ingestion using Prometheus server from an EKS cluster.
    • Deploy AWS Distro for OpenTelemetry (ADOT) collector to ingest metrics on EKS. ADOT is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project.
    • Connect AMP and visualize metrics using self-hosted Grafana.
  2. New module: Amazon Managed Grafana is a fully managed service that is developed together with Grafana Labs and based on open source Grafana. A new module was added to the workshop that will guide you through the steps needed to:
    • Create Amazon Managed Grafana workspace.
    • Set up data sources to AMP, Amazon CloudWatch, and, AWS X-Ray.
    • Query metrics from AMP (Kubernetes metrics and application workload metrics from ADOT collector).
    • Query metrics from CloudWatch and demonstrate CloudWatch integration.
    • Query metrics from AWS X-Ray and demonstrate X-Ray integration.
  3. New module: AWS Distro for OpenTelemetry (ADOT) for Go developers.
    • Showcase how Go developers can make use of the ADOT SDK and walk through a practical example of how this has been implemented in the workshop app.
  4. New module: Real-world troubleshooting workflow using ServiceLens, Container Insights, Logs Insights, and CloudWatch Metrics.
  5. New module: Metrics Explorer. AWS Metrics explorer is a tag-based dashboard tool that enables customers to filter, aggregate, and visualize operational health and performance metrics by tags.
  6. New feature: PetListAdoptions Golang service uses ADOT OTEL SDK for Go to collect traces.
  7. New feature: Python-based Lambda functions use ADOT to collect traces using Lambda layers.
  8. New feature: PetSite front-end service now hosted on EKS.
  9. New feature: Use ADOT Collector to collect Prometheus application metrics from PetSite EKS cluster.
  10. New feature: Use Fluent Bit to collect metrics and logs from EKS clusters using CloudWatch Container Insights.
  11. New feature: Use AWS Firelens to collect metrics and logs from ECS clusters using CloudWatch Container Insights.
  12. New feature: PetSearch service was ported from C# to Java to showcase ADOT auto-instrumentation agent capabilities.
  13. New feature: Amazon CloudWatch automatic dashboards and dashboards sharing.

Workshop stats since launch in August 2020

  • 2,950 downloads
  • 150+ workshops conducted
  • 150,000+ page views
  • Available in four languages (English, Spanish, Japanese, Korean)

Any feedback?

We continue expanding the workshop with new features on a regular basis. If you have any feedback on the existing content or if you would like to see any new specific feature added, please let us know by raising an issue on our GitHub repo.

Imaya Kumar Jagannathan

Imaya Kumar Jagannathan

Imaya Kumar Jagannathan is a Principal Solution Architect focused on AWS Observability services including Amazon CloudWatch, AWS X-Ray, Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS Distro for Open Telemetry. He is passionate about monitoring and observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about microservice architecture design. He loves programming in C#, working with containers and serverless technologies. LinkedIn: /imaya.

Rodrigue Koffi

Rodrigue Koffi

Rodrigue is a Specialist Solutions Architect at Amazon Web Services for Observability. He is passionate about observability, distributed systems, and machine learning. He has a strong DevOps and software development background and loves programming with Go. Outside work, Rodrigue enjoys swimming and spending quality time with his family. Find him on LinkedIn at /grkoffi

Rafael Pereyra

Rafael Pereyra

Rafael Pereyra is a Principal. Security Architect at AWS Professional Services, where he helps customers securely deploy, monitor and operate solutions in the cloud. Rafael's interests includes containerized applications, improving observability, monitoring and logging of solutions, IaC and automation in general. In Rafael’s spare time, he enjoys cooking with family and friends.