Key Metrics for Amazon Aurora

This is a guest post by John Matson of Datadog. An expanded version of this post is available on the Datadog blog. Datadog is an Advanced APN Technology Partner, and is a Certified AWS MSP Technology Partner.

Amazon Aurora is a MySQL-compatible database offered on Amazon RDS (Relational Database Service). In addition to a number of performance benefits, Aurora provides valuable metrics that are not available for other RDS database engines.

In this article we’ll highlight a few key metrics that can give you a detailed view of your database’s performance.

There are three ways to access metrics from Aurora: you can collect standard RDS metrics through Amazon CloudWatch, detailed system-level metrics via enhanced RDS monitoring, and numerous MySQL-specific metrics from the database engine. Standard RDS metrics are reported at one-minute intervals; the other metrics can be collected at higher time resolution. The nuts-and-bolts section of this post discusses how to collect all these metrics.

Selected query metrics

Metric description	CloudWatch name	MySQL name
Queries	Queries (per second)	Queries (count)
Reads	SelectThroughput (per second)	Com_select + Qcache_hits (count)
Writes	DMLThroughput (per second)	Com_insert + Com_update + Com_delete (count)
Read query latency, in milliseconds	SelectLatency	–
Write query latency, in milliseconds	DMLLatency	–

The first priority in monitoring is making sure that work is being done as expected. In the case of a database, that means monitoring how queries are being executed.

You can monitor total query throughput as well as the read/write breakdown by collecting metrics directly from CloudWatch or by summing native MySQL metrics from the database engine. In MySQL, reads increment one of two status variables (Com_select or Qcache_hits), depending on whether or not the read is served from the query cache. A write increments one of three status variables depending on whether it is an INSERT, UPDATE, or DELETE.

In CloudWatch, all reads and writes are rolled into SelectThroughput and DMLThroughput, respectively, and their latencies are reported in the valuable Aurora-only metrics SelectLatency and DMLLatency.

For a deeper look into query performance, the performance schema stores lower-level statistics from the database server. More about the performance schema below.

Selected resource metrics

Metric description	CloudWatch name	Enhanced monitoring name	MySQL name
Read I/O operations per second	ReadIOPS	diskIO.readIOsPS	–
Write I/O operations per second	WriteIOPS	diskIO.writeIOsPS	–
Percent CPU utilized	CPUUtilization	cpuUtilization.total	–
Available RAM in gigabytes	FreeableMemory	memory.free	–
Network traffic to Aurora instance	NetworkReceiveThroughput (MB/s)	network.rx (packets)	–
Network traffic from Aurora instance	NetworkTransmitThroughput (MB/s)	network.tx (packets)	–
Open database connections	DatabaseConnections	–	Threads_connected
Failed connection attempts	LoginFailures (per second)	–	Aborted_connects (count)

As Baron Schwartz, co-author of High Performance MySQL, notes, a database needs four fundamental resources: CPU, memory, disk, and network. Metrics on all four fundamental resources are available via CloudWatch.

RDS now also offers enhanced monitoring that exposes detailed system-level metrics. With additional configuration, users can monitor load, disk I/O, processes, and more with very high time resolution.

Disk I/O metrics

The CloudWatch metrics ReadIOPS and WriteIOPS track how much your database is interacting with backing storage. If your storage volumes cannot keep pace with the volume of requests, you will start to see I/O operations queuing up, as reflected in the DiskQueueDepth metric.

CPU metrics

High CPU utilization is not necessarily a bad sign. But if your database is performing poorly while metrics for IOPS and network are in normal ranges, and while the instance appears to have sufficient memory, the CPUs of your chosen instance type may be the bottleneck.

Memory metrics

Databases perform best when most of the working set of data can be held in memory. For this reason, you should monitor FreeableMemory to ensure that your database instance is not memory-constrained.

Network metrics

For Aurora, the NetworkReceiveThroughput and NetworkTransmitThroughput metrics track only network traffic to and from clients, not traffic between the database instances and storage volumes.

Connection metrics

Aurora has a configurable connection limit, which can be checked or modified by navigating to the RDS console and selecting the parameter group that your RDS instance belongs to.

If your server reaches its connection limit and starts to refuse connections, it will increment the CloudWatch metric LoginFailures, as well as the similar MySQL metric Aborted_connects and the more specific MySQL Connection_errors_max_connections counter.

Collecting Aurora metrics

As mentioned at the outset, Aurora users can access metrics from Amazon CloudWatch and many more from the MySQL-compatible database engine. Below we’ll show you how to collect both CloudWatch and engine metrics for a comprehensive view. To collect and correlate all your metrics, you can use a monitoring tool that integrates both with CloudWatch and with the database instance itself. The final part of this post details how to monitor Aurora with Datadog, which will also allow you to monitor the new suite of RDS enhanced metrics. To monitor enhanced metrics on another platform, consult the AWS documentation.

Collecting CloudWatch metrics

Below we’ll walk through two ways of retrieving metrics from CloudWatch:

Using the AWS Management Console
Using the command line interface

Using the AWS Console

The AWS Console allows you view recent metrics and set up simple alerts on metric thresholds. In the CloudWatch console, select RDS from the list of services and click on “Per-Database Metrics” to see your available metrics:

Just select the checkbox next to the metrics you want to visualize, and they will appear in the graph at the bottom of the console.

Using the command line interface

To query RDS metrics from the command line, you need to install the CloudWatch command line tool. You can then view your metrics with simple queries like this one to check the SelectLatency metric across a one-hour window:

mon-get-stats SelectLatency
--namespace="AWS/RDS"
--dimensions="DBInstanceIdentifier=instance-name"
--statistics Maximum
--start-time 2016-02-18T17:00:00
--end-time 2016-02-18T18:00:00

Full documentation for the mon-get-stats command is available here.

Collecting database engine metrics

To get a deeper look at Aurora performance you will often need metrics from the database instance itself. Here we cover three methods of metric collection:

Querying server status variables
Querying the performance schema and sys schema
Using the MySQL Workbench GUI

Connecting to your RDS instance

The design of RDS means that you cannot directly access the machines running your database, as you could if you manually installed MySQL or MariaDB on a standalone server. That said, you can connect to the database using standard tools, provided that the security group for your Aurora instance allows it.

If Aurora accepts traffic only from inside its security group, you can launch an EC2 instance in that security group, and then apply a second security group rule to the EC2 instance to accept inbound SSH traffic. By SSHing to the EC2 instance, you can then connect to Aurora using the mysql command line tool:

mysql -h instance-name.xxxxxx.us-east-1.rds.amazonaws.com -P 3306 -u yourusername -p

Your instance’s endpoint (ending in rds.amazonaws.com) can be found in the RDS console.

Querying server status variables

Once you connect to your database instance, you can query any of the hundreds of metrics available, known as server status variables. To check metrics on connection errors, for instance:

mysql> SHOW GLOBAL STATUS LIKE '%Connection_errors%';

Querying the performance schema and sys schema

Server status variables largely capture high-level server activity. To collect metrics at the query level—for instance, to link latency or error metrics to individual queries—you can use the performance schema, which captures detailed statistics on server events.

Enabling the performance schema

Set the performance_schema parameter to 1 in the Aurora instance’s parameter group using the AWS console. This change requires an instance reboot.

Once the performance schema is enabled, server metrics will be stored in tables in the performance_schema database, which can be queried with ordinary SELECT statements.

Using the sys schema

Though you can query the performance schema directly, it is usually easier to extract meaningful metrics from the tables in the sys schema.

To install the sys schema, first clone the GitHub repo to a machine that can connect to your Aurora instance and position yourself within the newly created directory:

$ git clone https://github.com/mysql/mysql-sys
$ cd mysql-sys

Then, create an Aurora-compatible file for the sys schema:

$ ./generate_sql_file.sh -v 56 -b -u CURRENT_USER

Finally, load the file into Aurora, using the filename returned in the step above:

$ mysql -h instance-name.xxxxxx.us-east-1.rds.amazonaws.com -P 3306 -u yourusername -p < gen/sys_1.5.1_56_inline.sql

If this command returns an “Access denied” error pertaining to a specific line in the sql file, remove the offending line and re-run the command. For example:

ERROR 1227 (42000) at line 10486: Access denied; you need (at least one of) the SUPER privilege(s) for this operation 
$ sed -i '10486d' gen/sys_1.5.1_56_inline.sql # delete line 10486 
$ mysql -h instance-name.xxxxxx.us-east-1.rds.amazonaws.com -P 3306 -u yourusername -p < gen/sys_1.5.1_56_inline.sql # retry import

Now you can connect to Aurora using the mysql command line tool to access the sys schema’s many tables and functions. For instance, to summarize all the statements executed, along with their associated latencies, you would run:

mysql> select * from sys.user_summary_by_statement_type;

Using the MySQL Workbench GUI

MySQL Workbench is a free application for managing and monitoring MySQL databases. It provides a high-level performance dashboard, as well as a simple interface for browsing performance metrics (using the views provided by the sys schema).

If you have configured an EC2 instance to communicate with Aurora, you can connect MySQL Workbench to your Aurora instance via SSH tunneling:

You can then view recent metrics on the performance dashboard or click through the statistics available from the sys schema:

Monitor Aurora Using Datadog

You’ve now seen that you can easily collect metrics from CloudWatch and from the database engine itself for ad hoc performance checks. For a more comprehensive view of your database’s health and performance, however, you need a monitoring system that can correlate CloudWatch metrics with database engine metrics, that lets you see historical trends with full granularity, and that provides flexible visualization and alerting functionality. This post will show you how to connect Aurora to Datadog in two steps:

Connect Datadog to CloudWatch
Integrate Datadog with Aurora’s database engine

You can also use Datadog to collect, graph, and alert on the new enhanced monitoring metrics that are available for RDS. Full instructions are available in this post.

Connect Datadog to CloudWatch

To start monitoring metrics from RDS, you just need to configure our CloudWatch integration. Create a new user via the IAM console in AWS and grant that user read-only permissions to these three services, at a minimum:

EC2
CloudWatch
RDS

You can attach managed policies for each service by clicking on the name of your user in the IAM console and selecting “Permissions”.

Once these settings are configured within AWS, create access keys for your read-only user and enter those credentials in the Datadog app.

Integrate Datadog with Aurora’s database engine

To access all the metrics available for Aurora, you can monitor the database instance itself in addition to collecting standard metrics from CloudWatch.

Installing the Datadog Agent on EC2

Datadog’s Agent integrates seamlessly with MySQL and compatible databases to gather and report key performance metrics. Where the same metrics are available through the Datadog Agent and through standard CloudWatch metrics, the higher-resolution Agent metrics should be preferred. Installing the Agent usually requires just a single command.

Because you cannot install anything on an RDS database instance, you must run the Agent on another machine, such as an EC2 instance in the same security group.

Configuring the Agent for RDS

Complete instructions for capturing Aurora metrics with the Agent are available here. Experienced Datadog users will note that monitoring Aurora is just like monitoring MySQL locally, with two small configuration exceptions:

Provide the Aurora instance endpoint as the server name (e.g., instance_name.xxxxxxx.us-east-1.rds.amazonaws.com) instead of localhost
Tag your Aurora metrics with the DB instance identifier (dbinstanceidentifier:instance_name) to separate database metrics from the host-level metrics of your EC2 instance

Unifying your metrics

Once you set up the Agent, all the metrics from your database instance will be uniformly tagged with dbinstanceidentifier:instance_name for easy retrieval, whether those metrics come from CloudWatch or from the database engine itself.

View your Aurora dashboard

Once you have integrated Datadog with RDS, a comprehensive dashboard called “Amazon – RDS (Aurora)” will appear in your list of integration dashboards. The dashboard gathers the key metrics highlighted at the start of this post and more.

You can filter your RDS metrics by selecting a particular dbinstanceidentifier in the upper left of the dashboard.

Enhanced monitoring dashboard

If you have set up enhanced monitoring for Aurora, you can also access a specialized RDS Enhanced Metrics dashboard in Datadog.

Monitor all the things

Monitoring Amazon Aurora gives you critical visibility into your database’s health and performance. Plus, Datadog integrates with 100+ infrastructure technologies, so you can correlate Aurora performance with metrics and events from the rest of your stack. If you don’t yet have a Datadog account, you can sign up for a free trial here.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

AWS Partner Network (APN) Blog