Build proactive database monitoring for Amazon RDS with Amazon CloudWatch Logs, AWS Lambda, and Amazon SNS

Customers running Amazon Relational Database Service (Amazon RDS) want to shorten the process of accessing database logs and to receive proactive notifications of database alerts. Generally, database administrators have host access to the database servers, which gives them access to the database logs on the host file system, which are used for monitoring and validating the errors in any database. Because Amazon RDS doesn’t provide host file system access, direct access to the database logs isn’t available. However, Amazon RDS provides a feature to export database logs to Amazon CloudWatch Logs, which you can access for monitoring and alerting.

You can export database logs such as alert logs and audit logs to CloudWatch Logs when creating or modifying your Amazon RDS database server. For more information, see Accessing Amazon RDS database log files.

In this post, we use CloudWatch Logs, AWS Lambda, and Amazon Simple Notification Service (Amazon SNS) to monitor Oracle, PostgreSQL, Amazon Aurora, MySQL, MariaDB, and SQL Server databases, and provide email notification of errors occurring in the exported database logs.

Solution overview

Database administrators generally monitor for keywords like ORA- errors in Oracle databases and ERROR in PostgreSQL, Aurora, MySQL, SQL Server, and MariaDB databases. When the database error occurs, DBAs need to be notified of the alert to acknowledge the seriousness of the error and take appropriate action.

The general process to monitor errors in an RDS database is to look for the errors in database error logs on the AWS Management Console. However, this process doesn’t send timely alerts when the error occurs on the databases, nor does it have a default process to read the logs from the time it was last read.

The solution in this post addresses two issues:

Monitoring the RDS database for database errors that appear in the logs
Streaming the database logs for RDS databases without having to read the whole file every time to monitor the database errors

This post details the steps to implement proactive monitoring of alerts for an RDS database by streaming the database logs based on the keywords of ORA- errors in an Oracle database or ERROR in PostgreSQL, Aurora, MySQL, MariaDB, or SQL Server databases and send a notification by email to the database administrator to take necessary action to fix the issue. The following diagram illustrates our solution architecture.

Walkthrough

We provide two methods to configure this solution: set up the resources manually through various AWS services, or deploy the resources with an AWS CloudFormation template. Both methods complete the following high-level steps:

Create an SNS topic.
Create AWS Identity and Access Management (IAM) roles and policies needed for the solution.
Create a Lambda function with the provided code and assign the appropriate IAM roles.

We then walk you through identifying the log group to monitor, creating a trigger in the Lambda function, and testing the error notification system.

Prerequisites

For this walkthrough, the following prerequisites are necessary:

An AWS account with RDS instances running.
An RDS instance with logs exported to Amazon CloudWatch. To verify this, on the Amazon RDS console, navigate to your instance and choose Configuration. Under the published logs, you see PostgreSQL (when using PostgreSQL or Aurora), alert (when using Oracle), or error (when using MariaDB, MySQL, or SQL Server).

Set up proactive monitoring and alerting in Amazon RDS manually

In this section, we walk through the process to set up the resources for active monitoring and alerting using various AWS services. To deploy these resources via AWS CloudFormation, you can skip to the next section.

Create an SNS topic

We first create a standard SNS topic and subscribe to it in order to receive email notifications. For instructions, see To create a topic using the AWS Management Console and Email notifications, respectively.

If you already have an existing SNS topic that you want to use, you can skip to the next step.

Set up an IAM role and policies

This step creates a role to create the Lambda function and grant it appropriate permissions. We start by creating our policy.

On the IAM console, under Access management, choose Policies.
Choose Create policy.
Choose JSON.

Enter the following JSON code, providing your Region, account ID, SNS topic name, and the name of the function you’re creating:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": [
				"logs:CreateLogGroup",
				"sns:Publish"
			],
			"Resource": [
				"arn:aws:logs:<region>:<account number>:*",
				"arn:aws:sns:<region>:<account number>:<sns topic name>",
			"arn:aws:lambda:<region>:<account_number>:function:<lambda function name>"

			]
		},
		{
			"Effect": "Allow",
			"Action": [
				"logs:CreateLogStream",
				"logs:PutLogEvents"
			],
			"Resource": [
				"arn:aws:logs:<region>:<account number>:log-group:/aws/lambda/<lambda function name>:*"
			]
		}
	]
}

Choose Review policy.
Enter a policy name, such as AWSLambdaExecutionRole-ErrorNotification.
Choose Create policy.You now create a role and attach your policy.
In the navigation pane, under Access management, choose Roles.
Choose Create role.
Choose Lambda.
Choose Next: Permissions.
Find and choose the policy you just created.
Choose Next: Tags.
Choose Next: Review.
Enter a name for the role, such as LambdaRoleErrorNotification.
Choose Create role.

Create a Lambda function

This steps illustrates how to create the Lambda function that is used to process the CloudWatch logs and send notifications using the Amazon SNS ARN of the topic you created.

On the Lambda function, choose Create function.You need to create the function in the same Region as that of the RDS database server you want to monitor.
Select Author from scratch.
For Function name, enter a name, such as RDSErrorsNotification.
For Runtime, choose Python 3.8.
For Execution role¸ select Use an existing role.
For Existing role, choose the role you created.

Enter the following code:

import sys  
import re  
import boto3  
import math, time  
import datetime  
import base64  
import json  
import gzip  
import os  
  
def lambda_handler(event, context):  
  
# Reading the cloudwatch log data  
    cloud_log_data= event['awslogs']['data']  
    message=""  
    compressed_data= base64.b64decode(cloud_log_data)  
    uncompressed_data = gzip.decompress(compressed_data)  
    logdataload = json.loads(uncompressed_data)  
  
# Getting the log group name that needs processing  
    LogGroupName = logdataload['logGroup']  
  
# Debug output of logEvents  
    print(logdataload["logEvents"])  
  
# Get the environment variables in Lambda  
    Region=os.environ.get('Region')  
    SNSArn=os.environ.get('SNSArn')  
    SNSRegion=os.environ.get('SNSRegion')  
    ExcludeFilterPattern=os.environ.get('ExcludeFilterPattern')  
    if os.environ.get('ExcludeFilterPattern') is None:  
        ExcludeFilterPattern="password"  
    else:  
        ExcludeFilterPattern=ExcludeFilterPattern+",password"  
          
    ExcludeFilterPattern=ExcludeFilterPattern.split(",")  
    IncludeFilterPattern=os.environ.get('IncludeFilterPattern')  
  
# The script works for Oracle/PostgreSQL/Aurora, the condition checks which database and assigns a pattern  
    if os.environ.get('IncludeFilterPattern') is None:  
        if "oracle" in  LogGroupName.lower():  
            IncludeFilterPattern="ORA-"  
              
        if "postgres" in LogGroupName.lower():  
            IncludeFilterPattern="ERROR"  
              
        if "aurora" in LogGroupName.lower():  
            IncludeFilterPattern="ERROR"  
              
        if "maria" in LogGroupName.lower():  
            IncludeFilterPattern="ERROR"  
              
        if "sqlserver" in LogGroupName.lower():  
            IncludeFilterPattern="ERROR"  
              
        if "mysql" in LogGroupName.lower():  
            IncludeFilterPattern="ERROR"  
  
  
    IncludeFilterPattern=IncludeFilterPattern.split(",")  
  
# Checking if errors exist which match the pattern  
    errors_exist = len(logdataload["logEvents"])  
   
    if errors_exist == 0:  
        print("No errors exist")  
    else:  
        for record in logdataload["logEvents"]:  
# checks if the error is in exclude list or need to be filtered  
            if re.compile('|'.join(ExcludeFilterPattern),re.IGNORECASE).search(record["message"]):  
                print('Error is ignored for {0}'.format(record["message"]) )  
            else:  
                if re.compile('|'.join(IncludeFilterPattern),re.IGNORECASE).search(record["message"]):  
                    message="Errors in logfile are:n" + record["message"] + "\n"  
                else:  
                    print("No errors match IncludeFilterPattern list")  
  
# Sends an SNS notification with the error information  
    if len(message) > 0:  
        SNSClient = boto3.client('sns', region_name=SNSRegion)  
        response = SNSClient.publish(  
            TopicArn=SNSArn,  
            Message=str(message),  
            Subject='Errors exists in RDS database log group'+LogGroupName  
        )  
        print('\t Response:{xyz} \n'.format(xyz=response['ResponseMetadata']['HTTPStatusCode']))

Choose Deploy.
On your Lambda function page, choose Edit environment variables and input the following keys with corresponding values.

Key	Value	Additional Information
SNSArn	The ARN of the SNS topic you created.	This variable is mandatory.
SNSRegion	The Region of SNS topic which you created.	This variable is mandatory.
Region	The Region of Lambda and the RDS database CloudWatch logs.	This variable is mandatory.
IncludeFilterPattern	Comma-separated patterns for errors that you want to be alterted of. For example, if the RDS database is Oracle, you could only be notified of errors like ORA-00600,ORA-07445.	You can use this parameter to filter any pattern (not just errors) that need to be monitored in the database. This variable is optional.
ExcludeFilterPattern	Comma-separated patterns for errors that you don’t want to be alterted of. For example, if the RDS database is Oracle, the value can be ORA-12560,ORA-12152.	This variable is optional.

By default, if no filter patterns are mentioned, all ORA- errors in the Oracle RDS alert.log and ERROR messages in the PostgreSQL or Aurora postgresql.log are alerted.

Choose Save.

Set up proactive monitoring and alerting in Amazon RDS using AWS CloudFormation

This section demonstrates the process of setting up your resources from the previous section using AWS CloudFormation.

CloudFormation template

The following YAML template automatically creates the SNS topic, Lambda function, and IAM roles and policies (you can modify the SNS topic name, IAM policy name, and Lambda function name as needed):

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
  SubscriptionEndPoint:
    NoEcho: 'false'
    Type: String
    Description: The endpoint that receives notifications from the Amazon SNS topic. The endpoint value depends on the protocol that you specify. This could be a URL or ARN
    MinLength: '1'
  SubscriptionProtocol:
    NoEcho: 'false'
    Type: String
    Description: The subscriptions protocol
    AllowedValues:
    - http
    - https
    - email
    - email-json
    - sms
    - sqs
    - lambda
    - application
    Default: 'email'
Resources:
  SNSRDSTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref 'SubscriptionEndPoint'
          Protocol: !Ref 'SubscriptionProtocol'
      TopicName: "RDSAlertCFN"
  CFNRoleTest:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: RDSAlertLambdaFunctionPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
              - logs:CreateLogGroup
              - logs:CreateLogStream
              - logs:PutLogEvents
            Resource: '*'
  CFNRDSPolicy:
    Type: 'AWS::IAM::ManagedPolicy'
    Properties:
      Description: Policy for creating a RDS policy
      Path: /
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
        - Effect: Allow
          Action: ['logs:CreateLogGroup', 'sns:Publish*']
          Resource: !Ref SNSRDSTopic
      Roles:
      - !Ref CFNRoleTest
  LambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.8
      Timeout: 5
      Handler: index.handler
      Role: !GetAtt CFNRoleTest.Arn
      FunctionName: RDSAlertCFNLambda
      Environment:
        Variables:
          SNSArn: !Ref SNSRDSTopic
          SNSRegion: !Ref "AWS::Region"
          Region: !Ref "AWS::Region"
      Code:
        ZipFile: 
          !Sub
            - |-
              #!/usr/bin/env python3
              import sys
              import boto3  
              import math, time  
              import datetime 
              import base64 
              import json
              import gzip
              import os
              import re
              def handler(event, context):
                cloud_log_data= event['awslogs']['data'] 
                message=""  
                compressed_data= base64.b64decode(cloud_log_data)
                uncompressed_data = gzip.decompress(compressed_data)
                logdataload = json.loads(uncompressed_data)
                LogGroupName = logdataload['logGroup'] 
                print(logdataload["logEvents"])
                Region=os.environ.get('Region')
                SNSArn=os.environ.get('SNSArn')
                SNSRegion=os.environ.get('SNSRegion')
                ExcludeFilterPattern=os.environ.get('ExcludeFilterPattern')
                if os.environ.get('ExcludeFilterPattern') is None:
                  ExcludeFilterPattern="password" 
                else:  
                  ExcludeFilterPattern=ExcludeFilterPattern+",password"  
                ExcludeFilterPattern=ExcludeFilterPattern.split(",")  
                IncludeFilterPattern=os.environ.get('IncludeFilterPattern')
                
                if os.environ.get('IncludeFilterPattern') is None: 
                  if "oracle" in  LogGroupName.lower():
                    IncludeFilterPattern="ORA-"
                  if "postgre" in LogGroupName.lower():
                    IncludeFilterPattern="ERROR"
                  if "aurora" in LogGroupName.lower():
                    IncludeFilterPattern="ERROR"
                  if "maria" in LogGroupName.lower():
                    IncludeFilterPattern="ERROR"
                  if "sqlserver" in LogGroupName.lower():  
                    IncludeFilterPattern="ERROR"
                  if "mysql" in LogGroupName.lower():
                    IncludeFilterPattern="ERROR"
                IncludeFilterPattern=IncludeFilterPattern.split(",")  
                errors_exist = len(logdataload["logEvents"])
                if errors_exist == 0:
                  print("No errors exist")
                else:
                  for record in logdataload["logEvents"]:
                    if re.compile('|'.join(ExcludeFilterPattern),re.IGNORECASE).search(record["message"]):
                      print('Error is ignored for {0}'.format(record["message"]) )
                    else:
                      if re.compile('|'.join(IncludeFilterPattern),re.IGNORECASE).search(record["message"]):
                        message="Errors in logfile are:n" + record["message"] + "\n"
                      else:
                        print("No errors match IncludeFilterPattern list")
                        
                if len(message) > 0:
                  SNSClient = boto3.client('sns', region_name=SNSRegion)
                  response = SNSClient.publish(
                    TopicArn=SNSArn,
                    Message=str(message),
                    Subject='Errors exists in RDS database log group'+LogGroupName
                    )  
                  print('\t Response:{xyz} \n'.format(xyz=response['ResponseMetadata']['HTTPStatusCode']))  
            -
              lambda_function_role_arn: !Ref CFNRoleTest

Create the CloudFormation stack

You can deploy the preceding YAML file by creating a CloudFormation stack.

The stack creates the necessary resources needed to set up monitoring on Amazon RDS. You can now complete the configuration.

Create a CloudWatch trigger in the Lambda function

The database logs stored in CloudWatch need to be streamed to the Lambda function for it to process in order to send notifications. You can stream CloudWatch logs by creating a CloudWatch trigger in the function.

On the Lambda console, choose the function you created (or the CloudFormation stack created for you).
In the Designer section, choose Add trigger.
On the drop-down menu, choose CloudWatch Logs.
For Log group, choose the log group corresponding to the RDS database you want to monitor.You can add additional log groups by creating additional triggers.
For Filter name, enter a name.
For Filter pattern, enter a pattern (ORA- for Oracle or ERROR for PostgreSQL, Aurora, MariaDB, MySQL, or SQL Server).If you don’t provide a filter pattern, the function is triggered for every log data written to the database logs, which we don’t want because it increases costs. Because we just want to be notified for alerts, we need to stream just the ERROR or ORA- keywords to the Lambda function.
Choose Add.

To add more triggers, repeat these steps for other database logs.

Test the error notification

Now we’re ready to test the notification when an error occurs in the database. You can generate a fake alert in the RDS database.

For this post, we create an alert for an RDS for PostgreSQL database.

You should receive an email to the email subscribed to the SNS topic.

The SNS topic in this example has an email subscription, but you can set the subscription to different protocols as needed.

Summary

Monitoring and getting notified for any database errors is very crucial to customers. A proactive monitoring and alerting mechanism using CloudWatch Logs and Lambda is a simple way to achieve this. You can use this solution to set up custom alerts for any specific database patterns you need to be notified of.

Use this link to find more posts about monitoring databases with Amazon CloudWatch.

About the author

Saumya Mula is a Database Consultant with the Professional Services team at Amazon Web Services. She works as a database migration consultant to provide Amazon customers with technical guidance to migrate their on-premises databases to AWS.

AWS Database Blog