Analyze PostgreSQL logs with Amazon OpenSearch Service

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.

Amazon RDS supports publishing PostgreSQL logs to Amazon CloudWatch for versions 9.6.6 and above. Aurora PostgreSQL supports publishing logs to CloudWatch Logs for versions 9.6.12 and above, and versions 10.7 and above. By live streaming this data from CloudWatch to Amazon Elasticsearch Service (Amazon ES), you maintain continuous visibility of RDS PostgreSQL DB logs. You can visualize, analyze, and discover data in real time using Kibana with straightforward search syntax. You can also configure monitoring on the PostgreSQL logs and set alarms in Kibana to enable timely detection of logged errors or long-running queries.

Using CloudWatch, you can query the logs and perform visualizations, but this can be challenging if you have multiple databases in multiple AWS accounts. The solution used in this post is ideal in this situation, as we will be streaming the logs to a central location and visualizing dashboards for multiple databases without logging into the multiple consoles for each account.

Amazon ES is a fully managed service that makes it easy to deploy, secure, and operate Elasticsearch at scale with zero downtime. You can analyze unstructured and semi-structured logs generated by websites, mobile devices, servers, and sensors. This ensures operational intelligence, application monitoring, root cause analysis, and more. The service offers open-source APIs, managed Kibana, and integrations with Logstash and other AWS services to securely ingest data from any source. With Amazon ES, you can search, analyze, and visualize it in real time.

This post demonstrates how to publish logs of RDS PostgreSQL databases to CloudWatch and live stream the data into Amazon ES. You will also learn how to get rid of cold data that is above your data retention threshold.

The following are the high-level steps:

Create an Amazon ES domain.
Configure the RDS PostgreSQL instance.
Configure data streaming to Amazon ES.
Archive and clean out cold data for storage and performance gains.

Prerequisites

A valid AWS account with access to the appropriate AWS services.
An Aurora/RDS PostgreSQL database. For more information, see Amazon RDS.

Creating an Amazon ES domain

Amazon ES domains are clusters with the settings, instance types, instance counts, and storage resources that you specify. To create a new domain, see Create an Amazon ES Domain.

After you create the domain, find the endpoint, ARN, and Kibana URL in the Amazon ES Domain Overview page, as shown in the screenshot preview below.

Screenshot of the Amazon ES Domain Overview page showing where you can locate the endpoint, ARN and Kibana URL.

Configuring the RDS PostgreSQL instance

To configure the RDS PostgreSQL database log files to publish logs to CloudWatch, complete the following steps:

Log in to the AWS Management Console and under Database category, choose RDS.
Choose your desired database instance.
Choose Modify.
For Log exports, choose PostgreSQL log and Upgrade log as shown in the screenshot preview below.
Choose Continue.
Choose Apply immediately.

To verify that you created the new log group, go to the CloudWatch console. Under Management and Governance, choose Logs. The new log group should appear as: /aws/rds/instance/<dbname>/postgresql.

Configuring data streaming to Amazon ES

To configure streaming from CloudWatch to Amazon ES, see Streaming CloudWatch Logs Data to Amazon Elasticsearch Service. To configure your log format and filters, complete the following steps:

On the Configure Log Format and Filters page, under Log Format, choose Common Log Format as shown in the screenshot preview below.
For Subscription Filter Pattern, enter date, time, misc, and message.

Choose Test Pattern. The screenshot below is a preview of the Configure Log Format and Filters page demonstrating these three steps.
Screenshot of the Configure Log Format and Filters page, showing Common Log Format selected and entering [date, time, misc, message] under Subscription Filter Pattern

Screenshot of the Configure Log Format and Filters page, showing Common Log Format selected and entering [date, time, misc, message] under Subscription Filter Pattern

This step tests the pattern to view how your data is organized. You can use filter patterns to organize the data and segregate each log statement based on the structure of logging. This segregation is useful when creating the visualizations in Kibana.
The following code is an example log statement:

2019-08-08 02:47:24 UTC:72.21.217.47(7556):uadmin_TyOXrDmzkyCdRXcKdMHCnqqVCgZpDBHDLThdSNunLrzHNqFHNxjqxdwj@<dbname>:[11381]:LOG: connection authorized: user=uadmin_TyOXrDmzkyCdRXcKdMHCnqqVCgZpDBHDLThdSNunLrzHNqFHNxjqxdwj database=<dbname> SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off)

After you apply the filter, the preceding statement is stored as the following:

date=August 7th 2019, 
time=02:47:24, 
misc=UTC:72.21.217.47(7556):uadmin_TyOXrDmzkyCdRXcKdMHCnqqVCgZpDBHDLThdSNunLrzHNqFHNxjqxdwj@<dbname>:[11381]:LOG:, 
message=connection authorized: user=uadmin_TyOXrDmzkyCdRXcKdMHCnqqVCgZpDBHDLThdSNunLrzHNqFHNxjqxdwj database=<dbname> SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off)

Choose Next.
Choose Review.
Choose Start streaming.

Modifying the index name pattern in AWS Lambda

To modify your index name pattern in AWS Lambda, complete the following steps:

Open the Lambda console.
Choose the new function that was automatically created by the previous step (for example, LogsToElasticsearchEx_my-elk-stack_<account-number>).

Replace the index prefix in the Lambda function (index.js). See the following code:

        // index name format: cwl-<databaseName>-YYYY.MM.DD
        var idxName = payload.logGroup.toLowerCase().split('/');
        var indexName = [
            'cwl-' + idxName[4] + '-' + timestamp.getUTCFullYear(), // log group + year
            ('0' + (timestamp.getUTCMonth() + 1)).slice(-2), // month
            ('0' + timestamp.getUTCDate()).slice(-2) // day
        ].join('.');

Best practice is to have one index per database, which helps in performance and scaling.

At this point, the Lambda function transforms the log to JSON format and exports the log to Amazon ES. However, because you did not configure permissions to allow Lambda to export the logs, it errors out with HTTP status 403.

To view the Lambda log, go to CloudWatch and view the logs under /aws/Lambda/LogsToElasticsearchEx_my-elk-stack_<accountid>. The following code shows an example error message:

{ “errorMessage”: “{\”statusCode\”:403,\”responseBody\”:{\”Message\”:\”User: arn:aws:sts::<account_number>:assumed-role/my-elk-stack/LogsToElasticsearchEx_my-elk-stack_<account_number> is not authorized to perform: es:ESHttpPost\”}}” }

From this error message, you can copy the ARN to include in the Amazon ES access policy.

Modifying the access policy in the Amazon ES domain

To modify your access policy, complete the following steps:

Open the Amazon ES console.
Choose the ES domain that was created in the earlier step.
Choose Modify access policy that fulfills the following requirements:
- Allows data inflow from CloudWatch and also includes access permissions for users to view the Kibana dashboard.
- Has a corresponding ARN entry in the Amazon ES access policy for each account. Do this if you have database instances in multiple accounts and you have configured the log exports.
- Has a policy for anonymous access to Amazon ES and Kibana endpoints.
- Grants access to a single user or multiple users using CIDR bits.

The following screenshot shows a preview of the ES domain page and the Modify access policy button to select.

Screenshot preview of the ES domain page and the Modify access policy button to select.

See the following example policy for multiple accounts:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:sts::<db_account_number1>:assumed-role/my_elk_stack/LogsToElasticsearchEx_my-elk-stack_<es_account_number>",
          "arn:aws:sts::<db_account_number2>:assumed-role/my_elk_stack/LogsToElasticsearchEx_my-elk-stack_<es_account_number>",
          "arn:aws:sts::<db_account_number3>:assumed-role/my_elk_stack/LogsToElasticsearchEx_my-elk-stack_<es_account_number>",
          "arn:aws:sts::<db_account_number4>:assumed-role/my_elk_stack/LogsToElasticsearchEx_my-elk-stack_<es_account_number>",
          "arn:aws:sts::<db_account_number5>:assumed-role/my_elk_stack/LogsToElasticsearchEx_my-elk-stack_<es_account_number>"
        ]
      },
      "Action": [
        "es:ESHttpGet",
        "es:ESHttpHead",
        "es:ESHttpPost",
        "es:ESHttpPut"
      ],
      "Resource": "arn:aws:es:us-east-1:<es_account_number>:domain/my-elk-stack/*"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:ESHttpGet",
      "Resource": "arn:aws:es:us-east-1:<es_account_number>:domain/my-elk-stack/*",
      "Condition": {
        "IpAddress": {
           "aws:SourceIp": [ "<first two blocks of your ip for multiple users>.0.0/24", "<full ip address for single user>/32" ]
        }
      }
    }
  ]
}

Archive and clean out cold data

The Lambda function creates one index per day per database. To manage the storage and overcrowded indexes in the cluster, it is a best practice to progressively remove older indexes by date. In doing so, you reduce the unwanted load on Amazon ES and improving query performance.

To achieve this, you must create a Lambda function (refer to sample – Node.js Lambda function) that iterates through the indexes and deletes them based on the configured archival period. You may archive as much data as is required.

Create the Node.js Lambda delete function for multiple database indexes and data archival for 15 days. Change the number of days as per your requirement.

After creating the Lambda function, create a CloudWatch rule that helps invoke the Lambda delete indexes function on a regular schedule. To set up the rule, complete the following steps:

Log in to the AWS Management Console account containing the Amazon ES domain.
On the CloudWatch console, under Events, choose Rules.
Choose Create rule.
For Events Source, choose Schedule.
For Cron expression, enter 0 0/2 * * ? *.
This expression invokes the function every two hours. If the automatic snapshot of those indexes is in progress, there is a chance of some indexes remaining.
Choose Add target.
From the dropdown menu, choose Lambda function and choose Configure details.
In the next screen, for Function, enter the name and description for the CloudWatch rule as shown in the screenshot preview below.

Screenshot preview showcasing steps 1-8 above.

Also add permission to allow the Lambda function to delete the old indexes:

Open the Amazon ES console.
Choose the ES domain that was created in the earlier step.
Choose Modify access policy.

Append the existing policy with the following code:

{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<accountNumber>:role/service-role/<curatorFunctionName>"
},
"Action": "es:ESHttpDelete",
"Resource": "arn:aws:es:us-east-1:<accountNumber>:domain/my-elk-stack/*"
}

Summary

This post demonstrated how to stream PostgreSQL logs to Amazon ES and configure a lifecycle policy to help manage storage and performance. You now have a powerful and easy query analytics engine with which you can visualize and analyze logs intelligently.

This blog post is part of a series. To learn how to configure Kibana for the visualization and analysis of data stored in Amazon ES, read Configuring and authoring Kibana dashboards.

About the Author

Marcel George is a Consultant with Amazon Web Services. He works with customers to build scalable, highly available, and secure solutions in AWS cloud. His focus area is homogenous and heterogeneous migrations of on-premise databases to Amazon RDS and Aurora PostgreSQL.