AWS Database Blog

Working with RDS and Aurora PostgreSQL logs: Part 2

July 2023: This post was reviewed for accuracy.

The first post in this series, Working with RDS and Aurora PostgreSQL Logs: Part 1, discussed the importance of PostgreSQL logs and how to tune various parameters to capture more database activity details. PostgreSQL logs provide useful information when troubleshooting database issues. This post focuses on different methods to access PostgreSQL logs.

The PostgreSQL logs are generated and stored on each of the instances in the cluster. There are multiple ways to access these log files. The following sections provide some of these methods.

Accessing via the AWS Management Console

The most direct method to access the PostgreSQL log files is through the AWS Management Console. Complete the following steps:

  1. Open the Amazon RDS
  2. Choose your RDS/Aurora instance.
  3. Choose Logs and events.
  4. In Logs, choose the required log file.
  5. Choose either View, Watch, or Download.

The following screenshot is a preview of the Logs section.

This is the most basic method to view or download the log files.

Publishing log files to CloudWatch Logs

RDS and Aurora PostgreSQL support publishing the logs to Amazon CloudWatch Logs. For more information, see Publishing PostgreSQL logs to CloudWatch Logs in the RDS User Guide.

When the logs are in CloudWatch Logs, you can create alarms and perform real-time analysis. For example, if you set log_statements to ddl, you can set up an alarm to alert whenever a DDL statement is executed.

CloudWatch Logs provide searching and filtering options that can help with the log analysis. For more information, see Filter and Pattern Syntax.

For example, the following screenshot shows all Aurora PostgreSQL log messages generated in the last 5 minutes that contain the words FATAL or ERROR.

Downloading log files using AWS CLI

You can download RDS and Aurora PostgreSQL log files using the AWS CLI. Before running any AWS CLI command, either configure your IAM user credentials or use an Amazon EC2 instance with the appropriate IAM role assigned. For more information, see Installing the AWS CLI and Configuring the AWS CLI.

The IAM user or role used for downloading the log files should have the permission to execute operations rds:DescribeDBLogFiles and rds:DownloadDBLogFilePortion. For more information, see Manage IAM Permissions.

The following AWS CLI command lists the available log files for the given instance:

$ aws rds describe-db-log-files --db-instance-identifier wxyzpgprod --output text
DESCRIBEDBLOGFILES      1552013774000   error/postgresql.log.2019-03-08-02      3912
DESCRIBEDBLOGFILES      1552017375000   error/postgresql.log.2019-03-08-03      3912
DESCRIBEDBLOGFILES      1552020975000   error/postgresql.log.2019-03-08-04      3912
…

You can choose the required file name from this list and download the required files using the following command:

$ aws rds download-db-log-file-portion --db-instance-identifier wxyzpgprod --output text --starting-token 0 --log-file-name error/postgresql.log.2019-03-08-02 > logfile.txt

You can automate the steps to download the log files generated during a given time window using a script. For a sample script, see get_rds_log_files.sh on the GitHub repo. This script assumes that you already ran aws configure to set up the AWS CLI credentials, or you are running from an EC2 instance with a role assigned that allows downloading PostgreSQL log files. The script expects the RDS or Aurora PostgreSQL instance name and the date range as an argument, and it downloads the log files generated for that instance during the given time window.

By default, this script downloads the default standard format (stderr) files. If you are using CSV format log files, you can download them by using the “-c 1” argument.

If you don’t provide a date range, this script downloads all logs generated on the current day.

The following is the help output of the script:

$ ./get_rds_log_files.sh -h
USAGE: ./get_rds_log_files.sh [OPTIONS]
Options:
    -r  : AWS Region. Default 'us-east-1'
    -d  : RDS PostgreSQL Instance Name. Default 'wxyzpgprod'
    -c  : Use CSV log files. Allowed values 0 or 1. Default '0'
    -s  : Start date of log file [YYYY-MM-DD or YYYY-MM-DD-HH]. Defaults to current date (in UTC) '2019-02-06'
    -e  : End date of log file [YYYY-MM-DD or YYYY-MM-DD-HH]. Defaults to next day (in UTC) '2019-02-07'

The following is a sample output of the script from an EC2 jump server:

$ ./get_rds_log_files.sh -d wxyzpgprod -s '2019-02-06-16' -e '2019-02-06-18'

20190206-142134: Fetching logs generated between dates [ 2019-02-06-16 ] and [ 2019-02-06-18 ] (UTC).
20190206-142135: Downloading Logfile = postgresql.log.2019-02-06-16
20190206-142136: Downloading Logfile = postgresql.log.2019-02-06-17
20190206-142138: Downloading Logfile = postgresql.log.2019-02-06-18
20190206-142140: RDS Logs download completed.

$ ls postgresql.log.*
postgresql.log.2019-02-06-16  postgresql.log.2019-02-06-17  postgresql.log.2019-02-06-18

You need to download the log files to analyze them using some custom scripts or PostgreSQL log file analysis tools such as pgBadger.

Using log_fdw to view log data via a foreign table

To access PostgreSQL logs using the SQL interface, use the log_fdw extension. It creates a foreign table on top of the log file and makes it available for SQL queries. The foreign table provides a mechanism to access data that resides outside this PostgreSQL server. With a few steps, you can link to the required log file and start a troubleshooting or monitoring activity. For more information, see Foreign Data in the PostgreSQL documentation.

The CSV format is useful when using the log_fdw extension. The foreign table created on stderr log files contains a single column with log event data. For the CSV format log file, the foreign table has multiple columns, which makes analysis easy.

The following sections provide details about how to use this extension, and a custom function to load available log files with a single function call.

Basic usage steps

To configure and use log_fdw, complete the following steps:

  1. Connect to the PostgreSQL database using any client, such as psql or pgAdmin.
  2. Create the log_fdw extension using the following SQL statement:
    CREATE EXTENSION log_fdw;
  3. Create the foreign server using log_fdw with the following SQL statement:
    CREATE SERVER log_server FOREIGN DATA WRAPPER log_fdw;
  4. Get the list of the most recent log files with the following SQL statement:
    SELECT * FROM list_postgres_log_files() ORDER BY 1 DESC;

    The following sample output is limited to five rows:

    log_test=> SELECT * FROM list_postgres_log_files() ORDER BY 1 DESC LIMIT 5;
                file_name             | file_size_bytes
    ----------------------------------+-----------------
     postgresql.log.2019-03-07-21     |          235909
     postgresql.log.2019-03-07-20     |         1008923
     postgresql.log.2019-03-07-19     |         1098235
     postgresql.log.2019-03-07-18     |         1019780
     postgresql.log.2019-03-07-17     |          970201
    (5 rows)

    If you configured the CSV log format, you also generate a CSV output file. To get the list of the five most recent CSV log files, use the following SQL code:

    SELECT * FROM list_postgres_log_files() WHERE file_name LIKE 'postgresql.log.%.csv' ORDER BY 1 DESC;

    The following sample output is limited to the five most recent log files:

    log_test=> SELECT * FROM list_postgres_log_files() WHERE file_name LIKE 'postgresql.log.%.csv' ORDER BY 1 DESC LIMIT 5;
                file_name             | file_size_bytes
    ----------------------------------+-----------------
     postgresql.log.2019-03-07-21.csv |          407902
     postgresql.log.2019-03-07-20.csv |         1267089
     postgresql.log.2019-03-07-19.csv |         1397837
     postgresql.log.2019-03-07-18.csv |         1281672
     postgresql.log.2019-03-07-17.csv |         1212340
    (5 rows)

    The timestamps in the log file name are in the UTC timezone. For example, the file postgresql.log.2019-03-07-21.csv contains the logs generated on March 7 at a timestamp greater than or equal to 9 p.m. UTC and less than 10 p.m. UTC.

  5. Load any of these files as a foreign table using the following SQL statement:
    SELECT create_foreign_table_for_log_file('pglog_20190307_21', 'log_server', 'postgresql.log.2019-03-07-21.csv');

    The first argument to this function is the name you want to give the foreign table. The second argument is the name of the foreign server you created. The third argument is the name of the log file you want to make available via log_fdw.

  6. After the preceding statement is successful, the foreign table is ready for your analysis. Use the following code:
    SELECT * FROM pglog_20190307_21 ORDER BY 1;

    The following sample output query fetches the count of log rows. The currently active log file continues to show new log rows in real time.

    log_test=> SELECT count(1) FROM pglog_20190307_21;
     count
    -------
      2311
    (1 row)
    log_test=> select count(1) from pglog_20190307_21;
     count
    -------
      2776
    (1 row)
  7. When you are done using the foreign table, drop it using the following SQL statement:
    DROP FOREIGN TABLE pglog_20190307_21;

    By default, the table is created in your default schema. If you want to create the table in a specific schema, add your schema at a higher precedence in the search_path. See the following code:

    CREATE SCHEMA dba;
    SET search_path = dba, public;
    

    After changing the search_path, the foreign table created by function create_foreign_table_for_log_file is created in the dba schema.

Load log files as a single table

The preceding steps are useful if you are interested in a small number of log files; you can create foreign tables for your analysis easily. However, if you need to look at all the available log files, loading each file individually can be time-consuming.

To address this problem, use the custom function load_postgres_log_files() available in the GitHub repository. This function simplifies the process for loading the available log files as a single table. Complete the following steps:

  1. Download the SQL file for function load_postgres_log_files from the GitHub repository.
  2. Create the function load_postgres_log_files.
  3. Run the following statement to load the log files:
    SELECT public.load_postgres_log_files();
  4. Start looking at the logs with the following code:
    SELECT * FROM logs.postgres_logs;
  5. If necessary, repeat Step 2 to pick up new log files that were generated since the last load.
  6. When you are done with the log analysis, drop the tables with the following code:
    DROP TABLE logs.postgres_logs CASCADE;

The following are the key features of this function:

  • By default, a table named postgres_logs is created in the schema logs.
  • You can change the schema name and table name via arguments.
  • Existing tables are dropped.
  • If the schema logs does not exist, it is created.
  • Each log file is loaded as a foreign table and made the child of table logs.postgres_logs.
  • By default, the function looks for CSV file format logs. If it can’t find CSV files, it falls back to the standard error format log file. To change this behavior, set the argument v_prefer_csv to false.
  • The following daily-, hourly-, and minute-based log file name formats are supported for CSV and non-CSV output files:
    • postgresql.log.YYYY-MM-DD
    • postgresql.log.YYYY-MM-DD-HH
    • postgresql.log.YYYY-MM-DD-HHMI
  • Using CSV format adds a check constraint to the child table created for each log file.

To use your own schema and table names, run the command in Step 3 as follows:

SELECT public.load_postgres_log_files('myschema', 'mytable');

If you need the latest logs to always be available in this table without manual reload, you can write a cron job or scheduled Lambda function to call this function every hour. For more information, see Schedule Expressions Using Rate or Cron.

Summary

RDS and Aurora PostgreSQL logs provide useful information about database activity, which can help in performance tuning and troubleshooting. This post explained how you can use the logs; you can now choose the methods that suit your needs.

If you have any questions or comments, please post your thoughts in the comments section.

 


About the Author

Yaser Raja is a Senior Consultant with Professional Services team at Amazon Web Services. He works with customers to build scalable, highly available and secure solutions in AWS cloud. His focus area is homogenous and heterogeneous migrations of on-premise databases to AWS RDS and Aurora PostgreSQL.