Common administrator responsibilities on Amazon RDS and Amazon Aurora for PostgreSQL databases

Amazon Web Services (AWS) offers Amazon Relational Database Service (RDS) and Amazon Aurora as fully managed relational database services. With a few commands, you can have your production database instance up and running on AWS.

An online database frees the database administrator (DBA) from many maintenance and management tasks. However, there are a few significant responsibilities to be aware of. This post discusses the DBA tasks to perform on Amazon RDS for PostgreSQL and Aurora with PostgreSQL-compatible databases.

As a DBA, you face daily pressure to deliver value to your business across many fronts. Maintaining the right platform for running mission-critical databases is becoming increasingly difficult. Maintenance is also a challenging job.

The launch of Amazon RDS and Aurora has vastly reduced the time you spend on tasks like installation, configuration, monitoring, and security. Nevertheless, you must still carry out several critical tasks: several of them daily, a few weekly, and some only at the time of Amazon RDS or Aurora installation (at the time of instance creation).

Some of the administrative tasks that you must carry out include:

Configuring the parameter group
Managing IP traffic using a security group
Auditing the database log files
Maintenance and management activities
Planning backup and recovery strategies
User management
Monitoring the database

Configuring the parameter group

The data directory of an on-premises PostgreSQL cluster contains the configuration file postgresql.conf. You can manage the parameters through this configuration file. Similarly, for Amazon RDS and Aurora PostgreSQL instances, you manage the parameters through a parameter group.

Before you create a new Amazon RDS and Aurora instance, customize your DB parameter group. For more information about creating a new parameter group, modifying the parameters, and attaching it to the instance, see Working with DB Parameter Groups.

If you do not have a customized parameter group at the time of creation, you can perform an instance restart. Replace the default DB parameter group with the custom parameter group, which allows the customized parameters to take effect.

The following overview describes which parameters you should turn on for optimal performance:

Enter the following logging parameters:

log_autovacuum_min_duration 0
log_checkpoints '1'
log_connection  '1'
log_disconnection '1'
log_min_duration_statement  ''
log_temp_files  '1'
log_statement='ddl'
rds.force_autovacuum_logging_level='log'

Enter the following autovacuum parameters:

autovacuum_max_workers
autovacuum_vacuum_cost_limit
autovacuum_vacuum_cost_delay

Enter the following as other parameters:

random_page_cost
default_statistics_target
shared_preload_libraries='pg_hint_plan, pg_stat_statements'

Managing IP traffic using a security group

In Amazon RDS and Aurora, the security group controls the traffic in and out of the instance. It controls both incoming and outgoing traffic by applying appropriate rules to the security group.

For example, the following screenshot shows how you can allow PG traffic from your applications to the database via port 5432:

Do not open your database to the world by using 0.0.0.0/0.

Auditing the database log files

The busier your database is, the higher the number of transactions. The more transactions, the more logs it generates. The more log files, the more complicated it becomes to extract specific information from those log files.

Most DBAs review their log files as a last resort, but you should turn to them frequently for the ERROR, FATAL, WARNING, and HINTS messages they contain. It is vital to check and audit the log files regularly.

When it becomes difficult to analyze the log files every day due to size, you can use pgBadger, which is available on GitHub. pgBadger is an open-source PostgreSQL log analyzing tool that generates HTML reports from your PostgreSQL log file.

By default, RDS and Aurora instances retain logs for 3–7 days. Run custom bash scripts to download the log files locally or to an Amazon EC2 instance or an Amazon S3 bucket to maintain log files for a longer period.

To install and generate pgBadger reports, complete the following steps:

Sign in to the AWS Management Console and create one EC2 RHEL or CentOS instance.
Download the pgdg repo on Amazon EC2.

To install, enter the following code:

sudo yum install ftp://ftp.pbone.net/mirror/apt.sw.be/redhat/7.3/en/i386/rpmforge/RPMS/perl-Text-CSV_XS-0.65-1.rh7.rf.i386.rpm perl perl-devel
 
sudo yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
 
sudo yum install pgbadger -y

This post tested the preceding steps on RHEL 7 with pgdg-10 (PostgreSQL repo).

To generate the report, complete the following steps:

Download the PostgreSQL log files from Amazon RDS or Aurora to Amazon EC2 and run pgBadger.
Enable the logging parameters in your DB parameter group.
Schedule a cron job to download the log files to an appropriate location on Amazon EC2 and generate the pgBadger report.

Download and convert your log files with the following code:

#This Script helps to download the Postgres log files from cloud and store it on EC2.
## 1. Delete the logs and pgBadger reports older than 3 days.
## 2. Download the latest Postgres log from Amazon RDS instance: <instance_name>.
## 3. Generate the pgBadger report for newly downloaded log file.
#create pgBadger dir under /home/ec2-user mkdir -p /home/ec2-user/pgBadger,
# mkdir -p /home/ec2-user/pgBadger/logs , mkdir -p /home/ec2-user/pgBadger/reports
#Use must install pgbadger and it should be in path.
#Here is link for pgbadger installation: https://github.com/darold/pgbadger
#Install awscli on EC2 instance set the env (https://docs.aws.amazon.com/cli/latest/topic/config-vars.html)
# to download the log files.
 
home_dir="/home/postgres/pgBadger"
logDir="/home/postgres/pgBadger/logs"
rptDir="/var/www/pgbadger"
identifier='<instance_identifier>'
 
date=`date -d "-1 days" +%Y-%m-%d`
 
sudo find $logDir -name '*.log.*' -type f -mtime 0 -exec rm {} \;
sudo find $rptDir -name 'postgresql*.html' -type f -mtime +10 -exec rm {} \;
 
sudo mkdir -p $logDir/$date
sudo chown -R postgres:postgres $logDir/$date
 
#how to generate pgbadger report
#Install pgbadger on EC2 . To install, follow the link: https://github.com/darold/pgbadger
 
for i in `seq -w 00 23`
do
sudo aws rds download-db-log-file-portion --db-instance-identifier $identifier
--log-file-name error/postgresql.log.$date-$i --starting-token 0 --output text
> $logDir/$date/postgresql.log.$date-$i
done
 
if [ $? -eq 0 ] ; then
 
sudo pgbadger --prefix '%t:%r:%u@%d:[%p]:'  $logDir/$date/*.log.* -o $rptDir/postgresql.$date.html
-f stderr   #-f $logDir/*.log.*
sudo chmod -R 777 $rptDir/postgresql.$date.html
 
if [ $? -eq 0 ]; then
#mailx -s "Successfully Generated the pgbadger report for Date: $date"
echo "Successfully Generated the pgbadger report for Date: $date"
else
 #mailx -s "UNSUCESSFUL GENERATION of pgbadger report for Date: $date"
 echo "Successfully Generated the pgbadger report for Date: $date"
fi
gzip -r9 $logDir/$date
fi

This script generates the pgbadger report that you can use to analyze the activities performed on the database. For a sample pgBadger report, see postgres_sample.

Maintenance and management activities

A remote database still requires maintenance. The following section discusses autovacuuming, the VACUUM ANALYZE command, and long-running queries and sessions.

Autovacuuming

Query slowness due to table or index bloat is one of the most common scenarios in PostgreSQL. Amazon RDS and Aurora enable autovacuuming by default to reduce this bloat. As you manage slowdown, keep the following in mind:

Autovacuum holds a less-priority lock on the table. It might cancel its own job when another high-priority operation wants to acquire a lock on the table.
The same table can become a candidate for repeated autovacuums, which causes other tables to remain bloated.

Because these are the common scenarios in PostgreSQL, you should tune your autovacuum parameter properly. If tuning does not work, you must schedule a manual vacuum/analyze script. Based on the frequency of the bloat, you can decide whether to perform VACUUM ANALYZE, VACUUM FULL, or PG_REPACK.

Scheduling VACUUM ANALYZE

To keep the stats updated, remove bloat in reused space, and avoid the transaction wraparound, schedule VACUUM ANALYZE on your database. VACUUM removes the bloat and avoids transaction wraparound. ANALYZE helps to update the database stats, which helps the planner generate good plans for queries.

Before you proceed, you should understand the differences between VACUUM ANALYZE, VACUUM FULL, and PG_REPACK.

VACUUM ANALYZE – Removes the bloat from the tables and indexes and updates the tables’ statistics. This is a non-locking operation; you can run it at a table level or database level. It cleans the bloated pages but does not reclaim the space.
VACUUM FULL – Writes the entire content of the table into a new disk file and releases the wasted space back to OS. This causes a table-level lock on the table and slow speeds. Avoid using VACUUM FULL on a high-load system.
PG_REPACK – Writes the entire content of the table into a new disk file and releases the wasted space back to OS and does it online without holding the lock on the table. It is faster than VACUUM FULL, and Amazon Aurora and Amazon RDS support it as an extension. Instead of re-indexing or performing a VACUUM FULL, you should use PG_REPACK to back up. PG_REPACK is available as an extension in Amazon Aurora for PostgreSQL and Amazon RDS PostgreSQL.

The following code calculates the bloat and extra space that bloated pages occupy:

SELECT current_database(), schemaname, tblname, bs*tblpages AS real_size,
(tblpages-est_tblpages)*bs AS extra_size,
CASE WHEN tblpages - est_tblpages > 0
THEN 100 * (tblpages - est_tblpages)/tblpages::float
ELSE 0
END AS extra_ratio, fillfactor,
CASE WHEN tblpages - est_tblpages_ff > 0
THEN (tblpages-est_tblpages_ff)*bs
ELSE 0
END AS bloat_size,
CASE WHEN tblpages - est_tblpages_ff > 0
THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float
ELSE 0
END AS bloat_ratio, is_na
-- , (pst).free_percent + (pst).dead_tuple_percent AS real_frag
FROM (
SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages,
ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff,
tblpages, fillfactor, bs, tblid, schemaname, tblname, heappages, toastpages, is_na
-- , stattuple.pgstattuple(tblid) AS pst
FROM (
SELECT
( 4 + tpl_hdr_size + tpl_data_size + (2*ma)
- CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END
- CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE ceil(tpl_data_size)::int%ma END
) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + toastpages) AS tblpages, heappages,
toastpages, reltuples, toasttuples, bs, page_hdr, tblid, schemaname, tblname, fillfactor, is_na
FROM (
SELECT
tbl.oid AS tblid, ns.nspname AS schemaname, tbl.relname AS tblname, tbl.reltuples,
tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
coalesce(toast.reltuples, 0) AS toasttuples,
coalesce(substring(
array_to_string(tbl.reloptions, ' ')
FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
current_setting('block_size')::numeric AS bs,
CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
24 AS page_hdr,
23 + CASE WHEN MAX(coalesce(null_frac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END
+ CASE WHEN tbl.relhasoids THEN 4 ELSE 0 END AS tpl_hdr_size,
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024) ) AS tpl_data_size,
bool_or(att.atttypid = 'pg_catalog.name'::regtype)
OR count(att.attname) <> count(s.attname) AS is_na
FROM pg_attribute AS att
JOIN pg_class AS tbl ON att.attrelid = tbl.oid
JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname
LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
WHERE att.attnum > 0 AND NOT att.attisdropped
AND tbl.relkind = 'r'
GROUP BY 1,2,3,4,5,6,7,8,9,10, tbl.relhasoids
ORDER BY 2,3
) AS s
) AS s2
) AS s3;

You receive the following code as output:

 current_database |     schemaname     |         tblname         | real_size  | extra_size |   extra_ratio    | fillfactor | bloat_size |   bloat_ratio    | is_na
------------------+--------------------+-------------------------+------------+------------+------------------+------------+------------+------------------+-------
 postgres         | public             | sample_table             | 1565351936 |  239951872 | 15.3289408267611 |        100 |  239951872 | 15.3289408267611 | f

To reclaim the space, run VACUUM FULL or PG_REPACK:

Postgres#  vacuum full analyze sample_table;

After you run VACUUM FULL, the query returns something similar to the following output:

current_database |     schemaname     |         tblname         | real_size | extra_size |    extra_ratio     | fillfactor | bloat_size |    bloat_ratio     | is_na
------------------+--------------------+-------------------------+-----------+------------+--------------------+------------+------------+--------------------+-------
postgres         | public             | sample_table             |  41746432 |      24576 | 0.0588697017268446 |        100 |      24576 | 0.0588697017268446 | f

VACUUM FULL and re-indexing are locking operations that block other sessions, but PG_REPACK is an online method to reorganize the tables and indexes. You can query the pg_stat_all_tables and pg_stat_user_tables to check the last autovacuum or manual vacuum execution.

For example, see the following code:

SELECT schemaname,relname as table_name, last_vacuum, last_analyze, last_autovacuum, last_autoanalyze, n_live_tup,n_dead_tup from pg_stat_user_tables;

You receive the following code as output:

schemaname  | table_name  | last_vacuum | last_analyze | last_autovacuum |       last_autoanalyze        | n_live_tup | n_dead_tup
------------+-------------+-------------+--------------+-----------------+-------------------------------+------------+------------
 public     | vacuum_test |             |              |                 | 2019-01-23 06:44:56.257586+00 |   13671089 |          0

You can also use this code:

SELECT schemaname, relname as table_name, last_vacuum, last_analyze, last_autovacuum, last_autoanalyze, n_live_tup, n_dead_tup from pg_stat_all_tables;

You receive the following code as output:

     schemaname     |   table_name   |          last_vacuum          |         last_analyze          |  last_autovacuum   | last_autoanalyze | n_live_tup | n_dead_tup
--------------------+----------------+-------------------------------+-------------------------------+--------------------+------------------+------------+------------
 information_schema | sql_sizing     | 2019-01-23 07:05:06.524004+00 | 2019-01-23 07:05:06.52429+00  |                    |                  |         23 |          0

To run VACUUM ANALYZE on a table, enter the following code:

Vacuum analyze <table_name>;

To run VACUUM ANALYZE on the database, enter the following code:

Vacuum analyze verbose;

Only the superuser or database owner can run a vacuum on system tables. If substantial bloat in system tables causes performance degradation, or when you must free up bloated space to the disk, you must run VACUUM FULL. Only run this command outside of business hours, because it locks the tables on which it runs.

To check the transactional age of the database, enter the following code:

SELECT datname, age(datfrozenxid) from pg_database order by age(datfrozenxid) desc limit 20;

To prevent transaction wraparound issues in the database, enter the following code:

Vacuum freeze;

The autovacuum process can also perform these activities, and it is highly recommended that you keep it enabled. Amazon RDS for PostgreSQL has autovacuuming enabled by default.

Make sure that you tune the autovacuum parameters to best suit your requirements. In Amazon RDS, the parameter rds.adaptive_autovacuum helps automatically tune the autovacuum parameters whenever the database exceeds the transaction ID thresholds.

Enter the following code to check if autovacuum is running in PostgreSQL version 9.6 and above:

SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query
FROM pg_stat_activity
WHERE upper(query) like '%VACUUM%'
ORDER BY xact_start;

Long-running queries and sessions

To terminate queries that have run for a long time or are blocking another session, check the PID of the query from the pg_stat_activity table. To kill the query, run the following commands.

To cancel the query without disconnecting the connection, enter the following code:

SELECT pg_cancel_backend(pid);

To terminate the connection and cancel all other queries in that connection, enter the following code:

SELECT pg_terminate_backend(pid);

To cancel the running queries, always use PG_CANCEL_BACKEND. If the query is stuck and locking other processes, you can use PG_TERMINATE_BACKEND. After termination, you might need to re-run the session again to establish the connection.

Planning backup and recovery strategies

Unlike on-premises databases, which require manual backup and recovery, Aurora for PostgreSQL and RDS PostgreSQL instances have built-in features to automate backups using snapshots. You must enable these during the creation of the Amazon RDS or Aurora instance. Amazon RDS creates a storage volume snapshot to back up the entire database instance.

When you create a DB snapshot, you must identify which DB instance you want to back up, and then give your DB snapshot a name so you can restore from it later. The amount of time it takes to create a snapshot varies depends on the size of your databases. For more information, see Restoring from a DB Snapshot.

User management

User management is one of the most critical admin tasks, and you must perform it with utmost care. When you create a new Amazon RDS PostgreSQL or Aurora for PostgreSQL instance, it creates an RDS_SUPERUSER role. This is similar to the PostgreSQL user of a typical PostgreSQL instance, but with a few limitations.

You can manage users that connect to the database by setting appropriate permission levels. In a default PostgreSQL environment, you can manage user connection through the pg_hba.conf file, but in Amazon RDS for PostgreSQL, you must use GRANT/REVOKE. You can also assign access and privileges to users at a schema level or table level. You can decide on what kind of privileges you want to provide to the users. For more information, see Managing PostgreSQL users and roles.

Monitoring the database

Monitoring is an integral part of maintaining the reliability, availability, and performance of Amazon RDS and your AWS solutions. Collect monitoring data from all the parts of your AWS solution so that you can debug a multi-point failure if one occurs. One of the major tasks is to set up a detailed level of monitoring for your Amazon RDS and Aurora instances.

Amazon Aurora and Amazon RDS offer two types of monitoring by default: Amazon CloudWatch and Amazon RDS Performance Insights.

Monitoring with CloudWatch

CloudWatch offers the following metrics available for Amazon RDS and Aurora PostgreSQL:

High CPU or RAM consumption
Disk space consumption
Network traffic
Database connections
IOPS metrics
Maximum Used Transaction IDs
Queue Depth

For more information, see Monitoring Amazon Aurora DB Cluster Metrics.

CloudWatch has many metrics available to monitor the health of the Amazon RDS and Aurora instances at the hardware level. However, you must configure Amazon SNS (alarm) on each metric.

Monitoring with Performance Insights

Amazon RDS Performance Insights employs lightweight data collection methods without impacting the performance of your applications to tune the database for performance.

Performance Insights offers the following metrics:

OS metrics:

CPU Utilization – Wait, Idle, Steal, Nice
Disk I/O – Read KbPS, Write IOsPS
Load Average
Swap – Cached, Free, Total

Database metrics:

Cache – blocks hit, buffers allocated
Checkpoint – Checkpoint timed, buffers checkpoint, checkpoint write latency

For more information, see Performance Insights for Amazon RDS for PostgreSQL.

Summary

This post shared a few common administrator responsibilities on Amazon RDS and Aurora for PostgreSQL databases. This provides a basic framework that you can implement on your test and production workloads. The post also highlights logging and log auditing for better management of the instances.

If you have questions or comments about this post, post your thoughts in the comments.

About the Author

John Solomon is a Consultant with AWS Global Competency Center India, working closely with customers who are migrating from on-premises to the AWS Cloud. He is an AWS certified speaker and speaks at various meetups, breakout sessions, webinars, etc. He is an ardent member of the PostgreSQL community and works as a database administrator for PostgreSQL databases.

AWS Database Blog

Common administrator responsibilities on Amazon RDS and Amazon Aurora for PostgreSQL databases

Configuring the parameter group

Managing IP traffic using a security group

Auditing the database log files

Maintenance and management activities

Autovacuuming

Scheduling VACUUM ANALYZE

Long-running queries and sessions

Planning backup and recovery strategies

User management

Monitoring the database

Monitoring with CloudWatch

Monitoring with Performance Insights

Summary

About the Author

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help