AWS Cloud Operations & Migrations Blog

Identify operational issues quickly by using Grafana and Amazon CloudWatch Metrics Insights (Preview)

Amazon CloudWatch has recently launched Metrics Insights (Preview) – a fast, flexible, SQL-based query engine that enables you to identify trends and patterns across millions of operational metrics in real-time. With Metrics Insights, you can easily query and analyze your metrics to gain better visibility into the health and performance of your infrastructure and large scale applications. This capability is now also available in Open Source Grafana and in Amazon Managed Grafana.

For the last two years, Grafana users have been analyzing CloudWatch metrics using SEARCH expressions. Now with the Metric Insights integration in Grafana release 8.3, which was completed in partnership with the AWS and Grafana Labs, Grafana extends the metric analytics capabilities with flexible querying and on-the-fly metric aggregations in order to respond promptly to operational problems. Metrics Insights also works with Grafana Alerting (introduced in Grafana 8.0), allowing users to create alerts powered by CloudWatch Metrics Insights that quickly target problems in infrastructures as they happen.

With Metrics Insights, you can slice and dice your operational metrics with your dimensions to dive deep and identify issues to the finest granular detail. For example, you can analyze thousands of Amazon EC2 instances by CPU utilization to troubleshoot an underperforming application. You can group your metrics by InstanceId to narrow down your analysis and identify failing instances rapidly. Once the instance is isolated, you can now recover the application by rebooting problematic instances. Moreover, you can use your queries to create powerful visualizations that will stay up to date as resources are deployed or shut down, helping you proactively monitor and identify issues quickly.

Metrics Insights comes with standard SQL query language, giving you the flexibility to group your metrics in real-time based on your use cases. You can use WHERE clause in order to narrow down your analysis or use ORDER BY to sort your results. For example, DESC displays in (descending) order, to list your time series such that the highest data point will be returned first. Likewise, LIMIT clause enables you to limit the number of results that can be returned, helping you to run “TOP N” type queries.

Amazon CloudWatch plugin to query CloudWatch Metrics Insights

CloudWatch Metrics Insights is available through the GetMetricData API, which means that if you already have a CloudWatch data source plugin setup in Grafana, there’s no need for additional configuration. To use Metrics Insights in the Amazon CloudWatch plugin in Grafana, you must select the Metric Query type.

The Metric Query type has two different modes — a Builder mode and a Code editor mode.

To use Metrics Insights in the CloudWatch plugin in Grafana, select Metric Query

Metric Query mode in the CloudWatch plugin

CloudWatch Metrics Insights uses a dialect of SQL. The syntax is as follows:

SELECT FUNCTION(MetricName)
    FROM Namespace | SCHEMA(...)
    [ WHERE labelKey OPERATOR labelValue [AND|...]]
    [ GROUP BY labelKey [, ...]]
    [ ORDER BY FUNCTION() [DESC | ASC] ]
    [ LIMIT number]

Metrics Insights makes it easy to get started. Even if you don’t know SQL, you can use the visual query builder to browse through your existing metrics and dimensions, by selecting the metrics of interest, resource names, and filters to narrow down your analysis. Then the console automatically constructs your SQL queries for you based on your selections.

The Builder mode allows you to construct SQL queries by selecting options from drop down fields

Builder mode in the CloudWatch plugin

You can use the code editor in the Code mode of the CloudWatch plugin to type in your raw SQL queries anytime to dive deep and identify issues to the finest granular detail. The code editor comes with built-in autocompletion support that provides suggestions for keywords, functions, namespaces, metrics, dimension keys, and dimension values.

The following example demonstrates how to view the top five instances with the highest average CPU utilization, ordered by maximum, and in descending order. The autocompleter provides smart suggestions throughout the composition of the query.

The Code mode allows you to type SQL queries manually. The code editor has an auto-completer feature that provides typing suggestions.

Code mode in the CloudWatch plugin

Grafana template variables allow you to create more interactive and dynamic dashboards. Instead of hardcoding values like namespace, metric, and aggregation functions in your queries, you can use variables in their place. Variables are displayed as dropdown lists at the top of the dashboard. These dropdowns make it easy to change the data displayed in your dashboard. When using Metric Query type in the CloudWatch plugin, you can use template variables in both the Builder and the Code mode.

Make your queries dynamic using Grafana template variables

Template variables in the Grafana plugin

In this post, we went through the integration between OpenSource Grafana and CloudWatch Metrics Insights, how you can use Metrics Insights’ SQL based query engine on OpenSource Grafana to flexibly analyze and group your metrics at scale in real time, and quickly identify the operational issues to reduce mean-time to resolution.

To learn more about CloudWatch Metrics Insights or to get started, check out the CloudWatch Metrics Insights documentation and from Grafana.

About the authors

Erik Sundell

Erik is a senior software engineer at Grafana Labs, the company behind Grafana. He
mainly focuses on integration between Grafana and cloud services such as AWS CloudWatch,
AWS Redshift and AWS Athena. Outside of work, you can find him hiking in the mountains,
playing music or hanging out with his family.

Omur Kirikci

Omur Kirikci is a Senior Product Manager for Amazon CloudWatch based in Dublin,
Ireland. He is passionate about creating new products and strives for success by ensuring
customer satisfaction. Before he joined AWS, Omur spent more than 15 years in product
management, program management, go-to-market strategy, and product development. Outside of
work, he enjoys being outdoors and hiking, spending time with his family, tasting different
cuisines, and watching soccer with friends.