AWS Database Blog

Importing Amazon RDS Performance Insights counter metrics to Amazon CloudWatch

Amazon RDS Performance Insights is a feature that monitors Amazon RDS database instances so that you can analyze and troubleshoot database performance. You can view Performance Insights data in the AWS Management Console. Alternatively, Performance Insights provides a public API so that you can query your own data. You can use this API to ingest data, to add Performance Insights data to existing monitoring dashboards, or to build your own monitoring tools. In this blog post we will use this API to send data to Cloudwatch.

Overview

Performance Insights collects a variety of operating system and database counter metrics to help monitor system performance. Each Amazon RDS database engine has its own particular set of counter metrics, such as the Aurora metrics. Although you should begin troubleshooting database performance by determining where your database time is going, counter metrics are an excellent secondary source of monitoring data. They can be useful when integrated into an existing system dashboard.

This post is intended for anyone interested in database counter metrics. I introduce one of the two Performance Insights API methods, GetResourceMetrics, to demonstrate how to create an AWS Lambda function that extracts that data from Performance Insights and integrates it into another monitoring system. This blog post will be using Boto3, Python3, the AWS CLI, and the Performance Insights API. To start, either create an Amazon RDS instance with Performance Insights enabled or enable Performance Insights on an existing instance.

AWS provides service functionality through public APIs as often as possible to empower customers to build their own solutions. Performance Insights is no different. Although you can interact with the service through the provided console, the API is designed to give you access to your own data.

The Performance Insights dashboard (shown in the following screenshot) is divided into three parts:

  • Counter metrics chart: Shows data for specific performance counter metrics.
  • Average active sessions chart: Shows how the database load compares to the database instance capacity, as represented by the Max CPU
  • Top load items table: Shows the top items that are contributing to database load.

The Performance Insights API has two actions:

  • GetResourceMetrics: Retrieves a time series of metric data. This method builds the Counter Metrics chart and the Average Active Sessions chart.
  • DescribeDimensionKeys: Retrieves the top metric keys over a period of time. This method builds the Top load items table.

GetResourceMetrics

This post focuses on using the GetResourceMetrics method to retrieve counter metrics. Using the AWS CLI, you can see more information about this API method by running the help command.

% aws pi get-resource-metrics help
...
get-resource-metrics
--service-type <value>
--identifier <value>
--metric-queries <value>
--start-time <value>
--end-time <value>
[--period-in-seconds <value>]
  • start-time: The inclusive datetime value to specify the start of the time-series query-range. By default, Performance Insights has seven days of retention, but this time period can be extended to two years.
  • end-time: The exclusive datetime value to query to specify the end of the time-series query range parameters.
  • period-in-seconds: The granularity of each returned data point. Currently, this value must be one of the following: 1, 60, 300, 3600, or 86400.
  • service-type: Helps identify the resource to query. Currently, the only valid value is RDS.
  • identifier: Identifies the resource. This is shown in the Amazon RDS console as the ResourceID when you look at the details of your instance. It’s returned as DbiResourceId when calling the RDS DescribeDbInstances API method.
  • metric-queries parameter: Specifies one or more queries for which to get results. Each query consists of a mandatory Metric parameter and optional GroupBy and Filter parameters.

In this post, we’ll be making simple metric queries for the last five minutes of data:

% aws pi get-resource-metrics
--service-type RDS \
--identifier <instance id> \
--start-time <cur time – 5 minutes> \
--end-time <cur time> \
--period-in-second 60 \
--metric-queries '[ {"Metric": "metric1" },{"Metric": "metric2"}]'

For a concrete example, see the following code. The identifier resource ID has to be changed,  otherwise, this code should work as is.

aws pi get-resource-metrics \
--service-type RDS \
--identifier <db-> \
--start-time `expr \`date +%s\` - 300 ` \
--end-time `expr \`date +%s\`` \
--period-in-second 60 \
--metric-queries \
'[{"Metric": "db.Transactions.xact_commit.avg" }, 
{"Metric": "os.general.numVCPUs.avg"}]' 

Step 1: Create a basic Lambda function

In the AWS Lambda console, choose Create Function, Author from Scratch.

Name: PerformanceInsightsCounterMetrics

Runtime: Python 3.7

Roles: Create a new role from one or more templates.

Role name: PerformanceInsightsCounterMetricsRole

 

Leave Policy templates blank.

Finally, to avoid any timeout issues, in Basic Settings, set the Timeout to 1 minute.

Schedule the function

After creating the Lambda function, you see the configuration. Lambda functions can be triggered by all kinds of events. In this case, you should run the Lambda function every five minutes.

CloudWatch Events has a scheduling pattern that you can use to accomplish this goal.

In the Lambda console, under Designer, choose CloudWatch Events as a trigger and then create Create a New Rule.

Rule Name: PICounterMetrics5M

Rule Type: Schedule Expression

Schedule Expression: rate (5 minutes)

Test the function

Choose Test to create a new test event.

Event Template: Amazon CloudWatch

Name: ScheduledTest

 

Accept the defaults and save.

After you’ve saved these settings, you should be able to choose Test and verify that the logs printed “Hello from Lambda.”

Step 2: Update the Lambda function to call the Performance Insights API

You’ve now created a basic Lambda function that doesn’t do anything. Your goal in this step is to call the Performance Insights API and print some counter metrics.

Find the database resource ID

Find your database resource ID in the RDS console by selecting your instance and viewing it in the Configuration tab. The name should start with “db-”. In this case, my instance ID is db-YTDU5J5V66X7CXSCVDFD2V3SZM.

Find the metrics to query

On the Performance Insights Counter Metrics page, you can configure the displayed metrics by selecting the configure widget, as shown in the following screenshot. To start, query for an OS metric, os.general.numVCPUs, because that value should be relatively constant. When you issue a query, combine a metric name with a statistic. In this case, you are querying for the average, so use os.general.numVCPUs.avg.

Update the Lambda function IAM role

In IAM, find the PerformanceInsightsCounterMetricsRole that you created. Add permissions to call the Performance Insights API by creating and adding a PerformanceInsightsFullAccess policy with the following permissions policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": "pi:*",
"Effect": "Allow",
"Resource": "arn:aws:pi:*:*:metrics/rds/*"
}
]
}

Update the Lambda function with the following code:

import time
import boto3
import logging


logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

# TODO – Change this to your instance ID from Step 2.1
instance_id = 'db-YTDU5J5V66X7CXSCVDFD2V3SZM'


def lambda_handler(event, context):
    get_resource_metrics(instance_id)

    return {
        'statusCode': 200,
        'body': 'ok'
    }


def get_resource_metrics(resource_id):
    pi_client = boto3.client('pi')
    response = pi_client.get_resource_metrics(
        ServiceType='RDS',
        Identifier=resource_id,
        StartTime=time.time() - 300,
        EndTime=time.time(),
        PeriodInSeconds=60,
        MetricQueries=[{'Metric': 'os.general.numVCPUs.avg'}]
    )

    logger.debug("response={}", response)
    return response

Running the Lambda function again should produce a successful response, and the logs should contain a response body, as shown in the following screenshot.

The response contains the following fields:

"AlignedStartTime": 1.55296386E9,
"AlignedEndTime":1.55296416E9,
"Identifier":”db-YTDU5J5V66X7CXSCVDFD2V3SZM”
"MetricList":[
{
    "DataPoints":[
       {"Timestamp":1.55296392E9,"Value":2.0},
       {"Timestamp":1.55296398E9,"Value":2.0},
       {"Timestamp":1.55296404E9,"Value":2.0},
       {"Timestamp":1.5529641E9,"Value":2.0},
       {"Timestamp":1.55296416E9,"Value":2.0}
    ],
    "Key":{"Metric":"os.general.numVCPUs.avg"}
}
]
  • AlignedStartTime/AlignedEndTime: The queried StartTime/EndTime, transformed to align to a granularity. Performance Insights pre-computes summaries of database activity over particular periods (which is why two-year retention is an option), but that means that you can’t query for arbitrary spans of time.
  • Identifier: The DbiResourceId passed in as a query parameter.
  • MetricList: One entry for every passed-in MetricQuery (in this example, I only had one), with every response containing a Key and a list of DataPoints.

Step 3: Update the Lambda function to integrate with the RDS API and send multiple counter metrics

In this step, you will update the Lambda function to send multiple counter metrics and send data from all Performance Insights-enabled Amazon RDS instances.

Update the Lambda function to add permission to call the RDS API

From the IAM console, select the PerformanceInsightsCounterMetricsRole and attach the AmazonRDSReadOnlyAccess policy.

Update the Lambda function with the following code:

import time
import boto3
import logging


logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

# Configure different metrics for different engines 
engine_metrics = {
    'aurora': ['db.SQL.Innodb_rows_read.avg', ],
    'aurora-postgresql': ['db.transactions.xact_commit.avg'],
    'postgres': ['db.transactions.xact_commit.avg']
}


def lambda_handler(event, context):
    pi_instances = get_pi_instances()

    for instance in pi_instances:
        get_resource_metrics(instance)

    return {
        'statusCode': 200,
        'body': 'ok'
    }


def get_pi_instances():
    rds_client = boto3.client('rds')
    response = rds_client.describe_db_instances()

    return filter(
        lambda _: _.get('PerformanceInsightsEnabled', False),   
        response['DBInstances']
    )


def get_resource_metrics(instance):
    pi_client = boto3.client('pi')

    metric_queries = []
    if engine_metrics.get(instance['Engine'], False):
        for metric in engine_metrics[instance['Engine']]:
            metric_queries.append({'Metric': metric})

    if not metric_queries:
        return

    pi_client.get_resource_metrics(
        ServiceType='RDS',
        Identifier=instance['DbiResourceId'],
        StartTime=time.time() - 300,
        EndTime=time.time(),
        PeriodInSeconds=60,
        MetricQueries=metric_queries
    )

The top of the code block defines an engine_metrics dictionary that maps an engine name to a list of metrics for that engine. You can customize the map to control exactly which counter metrics to deliver.

Step 4: Update the Lambda function to send the Performance Insights data to CloudWatch

The final step is to update the Lambda function to send the Performance Insights data to CloudWatch.

From the IAM console, select the PerformanceInsightsCounterMetricsRole and attach CloudWatchFullAccess permissions.

The Lambda function will call PutMetricData with the following key parameters:

    • Namespace: Use PerformanceInsights.
    • Dimensions: Add the InstanceId as a dimension.
    • MetricName/Value/Timestamp: Use the values that I get back from the Performance Insights API.

Updated Lambda code:

import time
import boto3

pi_client = boto3.client('pi')
rds_client = boto3.client('rds')
cw_client = boto3.client('cloudwatch')

engine_metrics = {
    'aurora': ['db.SQL.Innodb_rows_read.avg', ],
    'aurora-postgresql': ['db.transactions.xact_commit.avg'],
    'postgres': ['db.transactions.xact_commit.avg']
}


def lambda_handler(event, context):
    pi_instances = get_pi_instances()

    for instance in pi_instances:
        pi_response = get_resource_metrics(instance)
        if pi_response:
            send_cloudwatch_data(pi_response)

    return {
        'statusCode': 200,
        'body': 'ok'
    }


def get_pi_instances():
    response = rds_client.describe_db_instances()

    return filter(
        lambda _: _.get('PerformanceInsightsEnabled', False),   
        response['DBInstances']
    )


def get_resource_metrics(instance):
    metric_queries = []
    if engine_metrics.get(instance['Engine'], False):
        for metric in engine_metrics[instance['Engine']]:
            metric_queries.append({'Metric': metric})

    if not metric_queries:
        return

    return pi_client.get_resource_metrics(
        ServiceType='RDS',
        Identifier=instance['DbiResourceId'],
        StartTime=time.time() - 300,
        EndTime=time.time(),
        PeriodInSeconds=60,
        MetricQueries=metric_queries
    )

def send_cloudwatch_data(pi_response):
    metric_data = []

    for metric_response in pi_response['MetricList']:
        cur_key = metric_response['Key']['Metric']

        for datapoint in metric_response['DataPoints']:
            logger.debug('adding datapoint={}'.format(datapoint))

            # We don't always have values from an instance
            value = datapoint.get('Value', None)

            if value:
                metric_data.append({
                    'MetricName': cur_key,
                    'Dimensions': [
                        {
                            'Name':'DbiResourceId',    
                            'Value':pi_response['Identifier']
                        } 
                    ],
                    'Timestamp': datapoint['Timestamp'],
                    'Value': datapoint['Value']
                })

    if metric_data:
        cw_client.put_metric_data(
            Namespace='PerformanceInsights',
            MetricData= metric_data
        )

After updating the code, test it again with the ScheduledTest event to make sure it runs properly.

View data in CloudWatch

And you’re done! Back in CloudWatch, you should see a new custom namespace called PerformanceInsights, as shown in the following screenshot, along with the new metrics that you added.

Conclusion

In this post, I showed you how to connect the Performance Insights API to a Lambda function to insert Performance Insights counter metrics into CloudWatch. I also gave a brief introduction to the Performance Insights API, especially GetResourceMetrics.

In my next blog post I plan to further discuss the Performance Insights API.

 


About the Author

Andrew McNair is a senior software engineer with the RDS at Amazon Web Services. His team created Performance Insights to help customers better understand their database performance.