Efficiently find and fix problems, improve application health, and deliver better customer experiences

Three foundational observability signals are metrics, logs (semi-structured data), and traces (flows of requests from beginning to end across all dependencies). These signals are the output of monitored environments, like containers, microservices, and applications. The goal is to provide an integrated experience for DevOps and Site Reliability Engineers to isolate critical events and use all the observability signals to isolate issues to containerized applications and microservices running anywhere. Amazon OpenSearch Service combines log and trace data analytics into a single solution.

Introduction to observability with Amazon OpenSearch Service on AWS On Air (21:19)

Observability operations

Amazon OpenSearch Service provides new capabilities to help solve your observability problems. Use open interfaces to collect, route, and transform telemetry data (including OpenTelemetry, Fluentd, Fluentbit, Logstash, Data Prepper, and more). You can search and analyze large amounts of semi-structured data with native capabilities. You can visualize, monitor, and alert with anomaly detection observability features of OpenSearch Dashboards, and conduct interactive analysis and visualization on data with Piped Processing Language (PPL), a query interface.

Amazon OpenSearch Service approaches the observability, trace analytics, log analytics, and application performance monitoring (APM) problem across four dimensions:

Collect: First, you need to collect data for analysis. Collection includes gathering, enriching, filtering, transforming, and normalizing data from multiple sources.

Detect: Often customers don’t detect issues as soon as they began, there is often a lag from when an issue starts and when you are notified. You want to reduce this as much as possible. Detection should be proactive and multi-faceted (i.e. alarms on telemetry). Anomaly detection is a key tool, as well as the ability to link together related alarms to reduce alarm fatigue. A core component of detection is also visualization and monitoring, which Amazon OpenSearch Service does with a component called OpenSearch Dashboards. You can even interactively analyze the data with tools like PPL.

Investigate: Investigation is where people spend the most amount of time during an operational event—and the investigation usually takes multiple people. This is the largest contributor to Mean Time to Incident (MTTI) and Mean Time to Recovery (MTTR). Cutting through the chaos and understanding what to focus on remains a difficult task. Leverage logs, metrics, and tracing to help you quickly conduct root cause analysis—while correlating across metrics, logs and traces. And collaborate on the investigations and document your analysis with OpenSearch Dashboard notebooks. 

Remediate: After you identify the cause of a failure, you need to remediate it. There is nothing worse than trying to fix something and making the situation worse. Don’t forget to do a post-event analysis to determine how you could have prevented the failure in the first place. Document proposed changes so you can prevent the issue from recurring. Your goal should be to ensure the same issue never happens again--but if it does, that you can identify and remediate it automatically.

How it works: OpenSearch Service

How it Works - OpenSearch Service

For log analytics, there are several tasks to build an ingest pipe. Producers are back-end servers, AWS services, web servers, and more—including OpenTelemetry, AWS Distro for OpenTelemetry, Jaeger, and Zipkin. Collectors move the data from the source, possibly transforming the data locally. For AWS native services, you can use Amazon Kinesis Agent or Amazon CloudWatch Agent. For open source, common collectors are Elastic Beats, Fluentd, Fluentbit, or OpenTelemetry Collector. Aggregators buffer information from the collectors, which importantly reduces the overall connections to Amazon OpenSearch Service. Amazon OpenSearch Service then indexes and analyzes the result of the aggregators. To visualize and monitor the results, you can use OpenSearch Dashboards or Kibana.

Application Performance Monitoring

Sometimes Application Performance Monitoring (APM) is the first maturity level of observability. But APM alone is not enough. Is your application actually performing as expected, even if your application monitoring dashboard is all green? Are your customers getting the user experience they need? What’s the usage of your application? Which parts of your application are hitting scale limits? From which geographic region are you seeing the biggest growth? Which trends can you visualize and plan for? If you could gather metrics, you could have confidence that when you deploy new code or change your infrastructure, you can see the impact of these changes. Observability advances APM to answer these additional questions.

Observability resources

Blogs and documentation

AWS What's New Post


New observability interface and log analytics

Amazon OpenSearch Service now includes an observability interface and log monitoring features, which provide developers and DevOps engineers with the insights they need to diagnose performance issues faster and reduce application downtime.

AWS Big Data Blog


Getting started with trace analytics

Developers and IT Ops teams can use this feature to troubleshoot performance and availability issues in their distributed applications.

AWS Documentation


Trace Analytics for Amazon OpenSearch Service

Learn how to use Trace Analytics, which is part of the OpenSearch Observability plugin, to analyze trace data from distributed applications.

Observability Use Case


What is observability?

Learn how to use Trace Analytics, which is part of the OpenSearch Observability plugin, to analyze trace data from distributed applications.

Workshops

Amazon OpenSearch Service’s Observability functionality allows you to go beyond simple monitoring to understand not just what events are happening, but why they are happening. In this workshop, learn how to instrument, collect, and analyze metrics, traces, and log data all the way from user front ends to service backends and everything in between. Put this together with Amazon OpenSearch Service, AWS Distro for OpenTelemetry, FluentBit, and Data Prepper.

Videos

Ensure reliability and uptime with observability solutions

Podcasts

Learn more about Amazon OpenSearch Service pricing
Visit the pricing page

Learn more about Amazon OpenSearch Service pricing.

Learn more 
Sign up for a free account
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building in the console
Ready to build?

Get started building in the AWS Management Console.

Get started