AWS Database Blog

Creating an Amazon CloudWatch dashboard to monitor Amazon RDS and Amazon Aurora MySQL

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. A highly performant database is key to delivering latency SLAs, so monitoring is critical. Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. Amazon RDS sends metrics to CloudWatch for each active database instance every minute. Monitoring is enabled by default. For example, Amazon RDS sends CPU utilization, the number of database connections in use, freeable memory, network throughput, read and write IOPS information, and more.

As a part of Amazon RDS for MySQL database performance monitoring, it’s important to keep an eye on slow query logs and error logs in addition to default monitoring. Slow query logs help you find slow-performing queries in the database so you can investigate the reasons behind the slowness and tune the queries if needed. Error logs help you to find the query errors, which further helps you find the changes in the application due to those errors. However, monitoring these logs manually through log files (on the Amazon RDS console or by downloading locally) is a time-consuming process.

This post talks about monitoring the slow query log and error log by creating a CloudWatch dashboard using Amazon CloudWatch Logs Insights (which enables you to interactively search and analyze your log data in Amazon CloudWatch Logs). It also covers application host metrics that help you monitor your application host. This dashboard also includes some database-related metrics to troubleshoot performance issues, which we discuss in the post.

Pre-requisites

Before you get started, you must complete the following prerequisites:

  1. Create an RDS for MySQL instance or Aurora MySQL cluster and make sure you selected the option to publish error and slow query logs to CloudWatch.
  1. Make sure the CloudWatch log group is created and logs are published to CloudWatch by looking at the log groups.
  2. You can also encrypt log data using AWS Key Management Service (AWS KMS). For instructions, see Encrypt Log Data in CloudWatch Logs Using AWS Key Management Service.

Solution overview

In this post, we cover the following high-level steps:

  • Creating the CloudWatch dashboard manually
  • Creating the CloudWatch dashboard using a template
  • Viewing the CloudWatch dashboard

Creating the CloudWatch dashboard manually

This section discusses creating the CloudWatch dashboard with database metrics like the following:

  • DatabaseConnections
  • Deadlocks
  • DeleteLatency
  • UpdateLatency
  • InsertLatency
  • SelectLatency
  • CommitLatency
  • CommitThroughput
  • DMLThroughput
  • DeleteThroughput
  • InsertThroughput
  • SelectThroughput
  • UpdateThroughput

The available metrics depend on the database engine. For more information about supported metrics, see Overview of monitoring Amazon RDS and Monitoring Amazon Aurora DB cluster metrics.

The following steps are specifically for creating a metric dashboard that applies to the entire cluster (aggregated numbers from all cluster instances). If you want to create instance-specific widgets, you need to look for instances and use DB instance identifiers instead of clusters and DB cluster identifiers.

  1. On the Amazon CloudWatch console, under Dashboards, choose Create dashboard.
  2. For Name, enter a name, for example demo-mysql-dev.
  3. Choose Create dashboard.
  4. Choose Logs table as your widget type.
  5. Choose Next.
  6. Find your log group.

Log groups for Aurora MySQL and Amazon RDS for MySQL are different (see the following table). Log groups for Aurora MySQL include cluster, whereas log groups for Amazon RDS for MySQL contain instance. Choose the right log group for your slowquery log.

Log Group Aurora MySQL Amazon RDS for MySQL
Slowquery log /aws/rds/cluster/<your DB identifier>/slowquery /aws/rds/instance/<your DB identifier>/slowquery
Error log /aws/rds/cluster/<your DB identifier>/error /aws/rds/instance/<your DB identifier>/error

 

  1. In the text field, enter the following insights query:
    parse @message "# Time: * User@Host: * Id: * Query_time: * Lock_time: * Rows_sent: * Rows_examined: * timestamp=*;*" 
    as Time, User, Id, Query_time,Lock_time,Rows_sent,Rows_examined,timestamp,query | sort Time asc 
  1. Choose Create widget.

  1. Rename the title to Slow queries sorted by time.
  2. Choose Save dashboard.

  1. Choose Add widget.
  2. Repeat the preceding steps to add a widget for the slowquery log, named Slow queries sorted by query time, with the following query:
    parse @message "# Time: * User@Host: * Id: * Query_time: * Lock_time: * Rows_sent: * Rows_examined: * timestamp=*;*" as Time, User, Id, Query_time,Lock_time,Rows_sent,Rows_examined,timestamp,Query
    | display Time, Query_time, Query
    | sort Query_time desc
  1. Choose Add widget.
  2. Repeat the same steps to add a widget for the error log, named Error log file, with the following query:
    filter @message not like /Note/
  1. Choose Add widget to add a widget for the database CPU utilization.
  2. For the widget type, choose Line.
  3. Choose Next.
  4. Select Metrics.
  5. Choose Configure.
  6. Choose RDS.
  7. For DBClusterIdentifier, choose your DB identifier.
  8. For Metric name, choose CPUUtilization.
  9. Choose Create widget.
  10. Choose Add widget to add a widget for the database workload.
  11. For the widget type, choose Line.
  12. Choose Next.
  13. Select Metrics.
  14. Choose Configure.
  15. Choose RDS.
  16. For DBClusterIdentifier, choose your database’s metrics.

The metric list may include CommitThroughput, DMLThroughput, DeleteThroughput, InsertThroughput, SelectThroughput, and UpdateThroughput.

  1. Choose Create widget.

You can also create latency metrics to match with the throughputs.

  1. Choose Add widget to add a widget for the application CPU utilization.
  2. For the widget type, choose Line.
  3. Choose Next.
  4. Select Metrics.
  5. Choose Configure.
  6. Choose EC2.
  7. For Per-Instance Metric, choose your application metrics, such as CPU utilization, network in and out, or disk reads and writes.
  8. Choose Create widget.
  9. Repeat these steps to select any metrics you want, including your customized metrics, and add them to the dashboard.

For example, you can add databaseConnections, Deadlocks, DeleteLatency, UpdateLatency, InsertLatency, SelectLatency, and CommitLatency.

  1. Choose Save dashboard.
  2. To back up the dashboard, on the Actions drop-down menu, choose View/edit source.
  3. Choose Copy source.
  4. Enter the code into your preferred text editor and save it.

If you want to create a dashboard for another Aurora MySQL or RDS for MySQL database, you can use the backup JSON file as a template and follow the steps in the next section.

Creating the CloudWatch dashboard using a template

This section focuses on creating the CloudWatch dashboard using a template that creates the same widgets from the previous section.

  1. On the Amazon RDS console, under Databases, find the DB identifier name and Region of your database.

For example, the following screenshot shows the DB identifier is demo-mysql-dev, and the Region is us-east-1. The log group /aws/rds/cluster is for creating a dashboard for Aurora MySQL (/aws/rds/instance for Amazon RDS for MySQL).

  1. Copy the following template code to any text editor. Replace <your db identifier> with your DB identifier name. If your Region isn’t us-east-1, replace <your region> with your current Region.
    {
        "widgets": [
            {
                "type": "log",
                "x": 0,"y": 0,"width": 24,"height": 6,
                "properties": {
                    "query": "SOURCE '/aws/rds/cluster/<your db identifier>/slowquery' | #fields  @message\n parse  @message \"# Time: * User@Host: * Id: * Query_time: * Lock_time: * Rows_sent: * Rows_examined: * timestamp=*;*\" \n as Time, User, Id, Query_time,Lock_time,Rows_sent,Rows_examined,timestamp,query\n | sort Time asc\n \n",
                    "region": "<your region>",
                    "title": "Slow queries with detailed info",
                    "view": "table"
                }},
            {
                "type": "log","x": 0,"y": 6,"width": 24,"height": 6,
                "properties": {
                    "query": "SOURCE '/aws/rds/cluster/<your db identifier>/slowquery' | parse  @message \"# Time: * User@Host: * Id: * Query_time: * Lock_time: * Rows_sent: * Rows_examined: * timestamp=*;*\" \nas Time, User, Id, Query_time,Lock_time,Rows_sent,Rows_examined,timestamp,Query \n| display Time, Query_time, Query\n| sort Query_time desc\n \n\n",
                    "region": "<your region>",
                    "title": "Top Slow Queries sorted by Query Time",
                    "view": "table"
                }},
            {
                "type": "metric","x": 0,"y": 18,"width": 24,"height": 6,
                "properties": {
                    "metrics": [
                        [ "AWS/RDS", "ConnectionAttempts", "DBClusterIdentifier", "<your db identifier>", { "visible": false } ],
                        [ ".", "RowLockTime", ".", "." ],
                        [ ".", "DMLLatency", ".", "." ],
                        [ ".", "InsertLatency", ".", "." ],
                        [ ".", "InsertThroughput", ".", ".", { "visible": false } ],
                        [ ".", "Deadlocks", ".", "." ]],
                    "view": "timeSeries",
                    "stacked": true,
                    "region": "<your region>",
                    "period": 60,
                    "stat": "Average"
                }},
            {
                "type": "metric","x": 0,"y": 30,"width": 24,"height": 9,
                "properties": {
                    "metrics": [
                        [ "AWS/RDS", "ConnectionAttempts", "DBClusterIdentifier", "<your db identifier>", { "visible": false } ],
                        [ ".", "DatabaseConnections", ".", ".", { "stat": "Maximum" } ],
                        [ ".", "AbortedClients", ".", ".", { "visible": false } ]],
                    "view": "timeSeries",
                    "stacked": true,
                    "region": "<your region>",
                    "period": 60,
                    "stat": "Average"
                }},
            {
                "type": "metric","x": 0,"y": 39,"width": 24,"height": 6,
                "properties": {
                    "view": "timeSeries",
                    "stacked": true,
                    "metrics": [
                        [ "AWS/RDS", "CPUUtilization", "DBClusterIdentifier", "<your db identifier>" ]],
                    "region": "<your region>",
                    "title": "DB CPUUtilization",
                    "period": 60
                }},
            {
                "type": "metric","x": 0,"y": 45,"width": 24,"height": 6,
                "properties": {
                    "view": "timeSeries",
                    "stacked": true,
                    "metrics": [
                        [ "AWS/RDS", "DMLThroughput", "DBClusterIdentifier", "<your db identifier>" ],
                        [ ".", "DeleteThroughput", ".", "." ],
                        [ ".", "InsertThroughput", ".", "." ],
                        [ ".", "UpdateThroughput", ".", "." ],
                        [ ".", "SelectThroughput", ".", "." ],
                        [ ".", "CommitThroughput", ".", "." ]],
                    "region": "<your region>",
                    "title": "DB workLoad - CommitThroughput, DMLThroughput, DeleteThroughput, InsertThroughput, SelectThroughput, UpdateThroughput"
                }},
            {
                "type": "log","x": 0,"y": 12,"width": 24,"height": 6,
                "properties": {
                    "query": "SOURCE '/aws/rds/cluster/<your db identifier>/error' | fields @message\n| sort @timestamp desc\n| limit 200",
                    "region": "<your region>",
                    "title": "Top 200 lines of Error Log",
                    "view": "table"
                }
            }
        ]
    }
  1. On the CloudWatch console, choose Dashboards.
  2. Choose Create dashboard.

  1. Enter a name to this dashboard (for example, demo-mysql-dev).
  2. Choose Create dashboard.
  3. Choose Cancel.

  1. On the Actions drop-down menu, choose View/edit source.

The dashboard source page opens with the default code.

  1. Replace the default with the preceding template code.
  2. Choose update.
  3. Choose Save dashboard.

The following screenshots show the metrics on your new dashboard.

To create the dashboard using the AWS Command Line Interface (AWS CLI), use the following template:

aws cloudwatch put-dashboard \
--dashboard-name DB-MySQL-1 \
--dashboard-body '{
    "widgets": [
        {
            "type": "log",
            "x": 0,"y": 0,"width": 24,"height": 6,
            "properties": {
                "query": "SOURCE '/aws/rds/cluster/demo-mysql-dev/slowquery' | #fields  @message\n parse  @message \"# Time: * User@Host: * Id: * Query_time: * Lock_time: * Rows_sent: * Rows_examined: * timestamp=*;*\" \n as Time, User, Id, Query_time,Lock_time,Rows_sent,Rows_examined,timestamp,query\n | sort Time asc\n \n",
                "region": "us-east-1",
                "title": "Slow queries with detailed info",
                "view": "table"
            }},
        {
            "type": "log","x": 0,"y": 6,"width": 24,"height": 6,
            "properties": {
                "query": "SOURCE '/aws/rds/cluster/demo-mysql-dev/slowquery' | parse  @message \"# Time: * User@Host: * Id: * Query_time: * Lock_time: * Rows_sent: * Rows_examined: * timestamp=*;*\" \nas Time, User, Id, Query_time,Lock_time,Rows_sent,Rows_examined,timestamp,Query \n| display Time, Query_time, Query\n| sort Query_time desc\n \n\n",
                "region": "us-east-1",
                "title": "Top Slow Queries sorted by Query Time",
                "view": "table"
            }},
        {
            "type": "metric","x": 0,"y": 18,"width": 24,"height": 6,
            "properties": {
                "metrics": [
                    [ "AWS/RDS", "ConnectionAttempts", "DBClusterIdentifier", "demo-mysql-dev", { "visible": false } ],
                    [ ".", "RowLockTime", ".", "." ],
                    [ ".", "DMLLatency", ".", "." ],
                    [ ".", "InsertLatency", ".", "." ],
                    [ ".", "InsertThroughput", ".", ".", { "visible": false } ],
                    [ ".", "Deadlocks", ".", "." ]],
                "view": "timeSeries",
                "stacked": true,
                "region": "us-east-1",
                "period": 60,
                "stat": "Average"
            }},
        {
            "type": "metric","x": 0,"y": 30,"width": 24,"height": 9,
            "properties": {
                "metrics": [
                    [ "AWS/RDS", "ConnectionAttempts", "DBClusterIdentifier", "demo-mysql-dev", { "visible": false } ],
                    [ ".", "DatabaseConnections", ".", ".", { "stat": "Maximum" } ],
                    [ ".", "AbortedClients", ".", ".", { "visible": false } ]],
                "view": "timeSeries",
                "stacked": true,
                "region": "us-east-1",
                "period": 60,
                "stat": "Average"
            }},
        {
            "type": "metric","x": 0,"y": 39,"width": 24,"height": 6,
            "properties": {
                "view": "timeSeries",
                "stacked": true,
                "metrics": [
                    [ "AWS/RDS", "CPUUtilization", "DBClusterIdentifier", "demo-mysql-dev" ]],
                "region": "us-east-1",
                "title": "DB CPUUtilization",
                "period": 60
            }},
        {
            "type": "metric","x": 0,"y": 45,"width": 24,"height": 6,
            "properties": {
                "view": "timeSeries",
                "stacked": true,
                "metrics": [
                    [ "AWS/RDS", "DMLThroughput", "DBClusterIdentifier", "demo-mysql-dev" ],
                    [ ".", "DeleteThroughput", ".", "." ],
                    [ ".", "InsertThroughput", ".", "." ],
                    [ ".", "UpdateThroughput", ".", "." ],
                    [ ".", "SelectThroughput", ".", "." ],
                    [ ".", "CommitThroughput", ".", "." ]],
                "region": "us-east-1",
                "title": "DB workLoad - CommitThroughput, DMLThroughput, DeleteThroughput, InsertThroughput, SelectThroughput, UpdateThroughput"
            }},
        {
            "type": "log","x": 0,"y": 12,"width": 24,"height": 6,
            "properties": {
                "query": "SOURCE '/aws/rds/cluster/demo-mysql-dev/error' | fields @message\n| sort @timestamp desc\n| limit 200",
                "region": "us-east-1",
                "title": "Top 200 lines of Error Log",
                "view": "table"
            }
        }
    ]
}'

Viewing the CloudWatch dashboard

To view your dashboard, complete the following steps:

  1. On the CloudWatch console, make sure you’re in the right Region.
  2. On the navigation pane, choose Dashboards.
  3. Choose your dashboard (for this post, demo-mysql-dev).

You can see multiple sections on this page, as in the following screenshot. The top section is the information of your slow queries; the second is the top slow queries sorted by query time.

You can view the slow queries at any range of UTC time.

  1. To see the query in the default time ranges, choose your preferred time on the navigation bar.
  2. To create new time ranges, choose custom.
  3. Choose your preferred time range.

Summary

Manually viewing Amazon RDS for MySQL or Aurora MySQL slow query or error logs for specific errors on a daily basis is time consuming. Creating a dashboard to expose these logs makes monitoring errors or slow queries easy. This dashboard also shows how to complement log file monitoring with CloudWatch and Performance Insights data on one CloudWatch dashboard for the database, as well as optionally adding in application servers.


About the Authors

Shunan Xiang is a Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist to provide technical guidance and help Amazon customers to migrate their on-premises databases to AWS.

 

 

 

Baji Shaik is a Consultant with AWS ProServe, GCC India. His background spans a wide depth and breadth of expertise and experience in SQL/NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on-premises to Amazon RDS and Aurora PostgreSQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include “PostgreSQL Configuration“, “Beginning PostgreSQL on the Cloud”, and “PostgreSQL Development Essentials“. Furthermore, he has delivered several conference and workshop sessions.