Amazon Aurora PostgreSQL parameters, Part 2: Replication, security, and logging

Organizations today have a strategy to migrate from traditional databases and as they plan their migration, they don’t want to compromise on performance, availability, and security features. Amazon Aurora is a cloud native relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The PostgreSQL-compatible edition of Aurora delivers up to 3X the throughput of standard PostgreSQL running on the same hardware, enabling existing PostgreSQL applications and tools to run without requiring modification. The combination of PostgreSQL compatibility with Aurora enterprise database capabilities provides an ideal target for commercial database migrations.

Amazon Aurora PostgreSQL-Compatible Edition has enhancements at the engine level, which improves the performance for high concurrent OLTP workload, and also helps bridge the feature gap between commercial engines and open-source engines. While the default parameter settings for Aurora PostgreSQL are good for most of the workloads, customers who migrate their workloads from commercial engines may need to tune some of the parameters according to performance and other non-functional requirements. Even workloads which are migrated from PostgreSQL to Aurora PostgreSQL may need to relook at some of the parameter settings because of architectural differences and engine level optimizations.

In this four part series, we explain parameters specific to Aurora PostgreSQL. We also delve into certain PostgreSQL database parameters that apply to Aurora PostgreSQL, how they behave differently, and how to set these parameters to leverage or control additional features in Aurora PostgreSQL.

In part one of this series, we discussed the instance memory-related parameters and Query Plan Management parameters that can help you tune Amazon Aurora PostgreSQL. In this post, we discuss parameters related to replication, security, and logging. We will cover Aurora PostgreSQL optimizer parameters in part three which can improve performance of queries. In part four, we will cover parameters which can align Aurora PostgreSQL closer to American National Standards Institute (ANSI) standards and reduce the migration effort when migrating from commercial engines.

Durability and high availability

Aurora PostgreSQL uses a unique mechanism for replication in comparison to the community PostgreSQL to optimize over and above the decoupled storage layer.

The Aurora storage engine writes data to six copies in parallel spread across three Availability Zones. The storage layer in Aurora is not just a block device but a cluster of machines functioning as storage nodes capable of decrypting database transaction logs. Aurora has a quorum algorithm for writes—as soon as four of the six writes are durable, the system successfully returns a committed transaction to the database. From a durability point of view, the Aurora storage engine can handle an Availability Zone plus one failure, sustaining continued write despite loss of an Availability Zone. The Aurora storage engine can continue to serve reads despite an Availability Zone plus a failure of an additional copy. Failure of a storage copy is mitigated by repairing it from the surviving three copies.

Amazon Aurora conceptual architecture diagram

Although all the nodes of an Aurora PostgreSQL cluster share the same storage, only one node can send write requests to the log-based storage engine and process acknowledgements for the same. For replication, the changes on writer nodes are shipped to the replica in a stream and applied to pages in the buffer cache. There is no need for the replica to write changes out to storage or to read pages to apply changes if they’re not in buffer cache. For more information about the architecture and fault tolerance capabilities of the Aurora storage engine, see Introducing the Aurora Storage Engine.

In this section, we cover the behavior of key replication-related parameters that you might be familiar with in PostgreSQL.

rds.logical_replication

Aurora PostgreSQL 2.2 (compatible with PostgreSQL v10.6) and later allows you to enable logical decoding that can be used by AWS Database Migration Service (AWS DMS) or to set up native logical replication (with Aurora PostgreSQL as publisher). To enable logical replication, you need to set rds.logical_replication to 1. You can set this parameter in the DB cluster parameter group and it needs a restart to take effect. After you enable rds.logical_replication, whether you create a logical replication slot or not, Aurora PostgreSQL produces additional Write ahead logs (WAL) for the purpose of logical decoding. This can cause an increase in the billed metric for the Aurora cluster, VolumeWriteIOPs. After you set up a replication slot, Aurora PostgreSQL starts retaining WAL files that haven’t been consumed by the slot. This can cause an increase in the billed metric VolumeBytesUsed. You can use TransactionLogsDiskUsage to keep track of storage utilized by WAL files. Logical replication also incurs additional read input/output operations (IOPs) to Aurora PostgreSQL’s distributed storage engine, which causes an increase in the billed metric VolumeReadIOPs.

synchronous_commit

PostgreSQL ensures durability of a change before acknowledging a commit request from the client. In case of implicit transactions, every DML statement’s durability is ensured before acknowledging back to the client. In Amazon RDS for PostgreSQL, as with PostgreSQL running on Amazon Elastic Compute Cloud (Amazon EC2), durability is ensured by making sure that the change is written to the WAL file upon every commit (whether explicitly requested or implicit).

You can control this behavior by setting synchronous_commit. If you disable synchronous_commit, PostgreSQL doesn’t try to ensure durability of every transaction. Instead, PostgreSQL writes committed transactions to the WAL file in groups. This can offer performance benefits at the risk of losing transactions. You should be very careful when changing this parameter from its default value. Disabling synchronous_commit compromises durability of your DB instance and can lead to data loss.

Aurora PostgreSQL uses a log-based storage engine to persist all modifications. Every commit is sent to six copies of data; after it’s confirmed by a quorum of four, Aurora PostgreSQL can acknowledge the commit back to client. If you disable synchronous_commit, every commit requested by client doesn’t wait for the four out of six quorum. Ideally, you shouldn’t disable this parameter because that compromises the durability benefits offered by Aurora PostgreSQL. You can disable the parameter in the DB cluster parameter group. It can also be set for individual transactions in a session before starting a new transaction.

You can get similar performance by grouping transactions and batching commits at the application layer. This approach allows for implementing proper exception handling for any transaction failure during a restart of a writer instance.

max_standby_streaming_delay

Aurora PostgreSQL allows you to add up to 15 Aurora replica instances in a cluster, which can be used for read queries. The Aurora replica nodes use the same storage as the writer instance. Any modifications on the writer is replicated to the replica. The replica invalidates the modified pages from its own buffer cache. The storage architecture and in-memory replication of Aurora helps reduce replication lag.

It’s possible that while a change is being replicated from writer to a replica, another active query on the replica is trying to read the same page in buffer, which needs to be invalidated. This results in an access conflict, and the replication waits for the select query to finish. This can result in increased replication lag on the replica. If you’re experiencing frequent spikes in replication lag because of a long-running query, you can set max_standby_streaming_delay to a lower value. This cancels conflicting queries on the replica. If a query is canceled on the replica instance, you receive an error similar to the following, along with additional detail indicating the reason for the conflict:

ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

You can set this parameter in the DB cluster parameter group or for individual replicas in their respective DB parameter groups, and the change is applied without a restart. The valid value is between 1,000–30,000 milliseconds (1–30 seconds). A lower value can result in frequent cancellation of queries. A higher value may result in higher replication lag on the replica, which means other application sessions may fetch stale data until the conflicting query is completed or cancelled.

Security

In this section, we cover some parameters that control key security features in Aurora PostgreSQL. We also see how these features behave differently than PostgreSQL.

rds.restrict_password_command

PostgreSQL allows you to change your own password, and any user with the CREATEROLE privilege can change a password and password validity for any user. For example, as a normal user, I can’t alter my password’s validity, but I can change the password, as in the following code:

postgres=> \du
                                                               List of roles
    Role name    |                         Attributes                         |                          Member of
-----------------+------------------------------------------------------------+-------------------------------------------------------------
 pgadmin      | Create role, Create DB                                    +| {rds_superuser,pgtraining}
                 | Password valid until infinity                              |
 monitoring_user |                                                            | {rds_superuser}
 pgadmin         |                                                            | {rds_superuser}
 pgtraining      |                                                            | {}
 rds_ad          | Cannot login                                               | {}
 rds_iam         | Cannot login                                               | {}
 rds_password    | Cannot login                                               | {}
 rds_replication | Cannot login                                               | {}
 rds_superuser   | Cannot login                                               | {pg_monitor,pg_signal_backend,rds_replication,rds_password}
 rdsadmin        | Superuser, Create role, Create DB, Replication, Bypass RLS+| {}
                 | Password valid until infinity                              |
 rdsproxyadmin   | Password valid until infinity                              | {}
 sameer          | Password valid until 2020-12-31 00:00:00+00                | {}

postgres=> select current_user;
 current_user
--------------
 sameer
(1 row)

postgres=> alter role sameer  valid until '2022-12-31';
ERROR:  permission denied
postgres=> alter role sameer password 'simple_password';
ALTER ROLE
postgres=>

On Aurora and Amazon RDS for PostgreSQL, you can set rds.restrict_password_command to restrict password management commands for members of the group rds_postgres. Setting up restrictions on which user can do password management can be useful when you want users to comply with specific password requirements, such as complexity. You can implement a separate client application as an interface that allows users to change their password.

The following are some examples of SQL commands that are restricted when restricted password management is enabled:

postgres=> CREATE ROLE myrole WITH PASSWORD 'mypassword';
postgres=> CREATE ROLE myrole WITH PASSWORD 'mypassword' VALID UNTIL '2020-01-01';
postgres=> ALTER ROLE myrole WITH PASSWORD 'mypassword' VALID UNTIL '2020-01-01';
postgres=> ALTER ROLE myrole WITH PASSWORD 'mypassword';
postgres=> ALTER ROLE myrole VALID UNTIL '2020-01-01';
postgres=> ALTER ROLE myrole RENAME TO myrole2;

You can give other roles membership for the rds_password role by using the GRANT SQL command. Ideally, membership to rds_password should be restricted to only a few roles that you use solely for password management. These roles also require the CREATEROLE authority to modify other roles. The rds_superuser role has membership for the rds_password role by default.

This parameter doesn’t impose a check on password validity, nor does it impose a password complexity requirement. You have to implement these checks at the client application that is used for changing the password.

This parameter can only be set in parameter groups and can’t be modified at session level. Changing this parameter requires a restart.

rds.force_ssl

By default, Aurora PostgreSQL has the SSL parameter enabled. Also, by default, PostgreSQL clients and drivers try to make an SSL connection using sslmodel prefer. That means if you use default settings for Aurora PostgreSQL and your PostgreSQL client, you always have a secure TCP connection. However, it’s possible for clients to explicitly request a connection without SSL. You can check whether a connection is using SSL by using pg_stat_ssl:

$ psql "sslmode=disable user=pgadmin"
psql (11.8, server 11.7)
Type "help" for help.

postgres=> select * from pg_stat_ssl where pid=pg_backend_pid();
  pid  | ssl | version | cipher | bits | compression | clientdn
-------+-----+---------+--------+------+-------------+----------
 11765 | f   |         |        |      |             |
(1 row)

You can make sure that the Aurora PostgreSQL instance only allows connections that request SSL connection by setting rds.force_ssl in the DB cluster parameter group:

$ psql "sslmode=disable user=pgadmin"
psql: FATAL: no pg_hba.conf entry for host "10.1.0.135", user "pgadmin", database "postgres", SSL off

$ psql "sslmode=prefer user=pgadmin"
psql (11.8, server 11.7)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> select * from pg_stat_ssl where pid=pg_backend_pid();
 pid  | ssl | version |           cipher            | bits | compression | clientdn
------+-----+---------+-----------------------------+------+-------------+----------
 1192 | t   | TLSv1.2 | ECDHE-RSA-AES256-GCM-SHA384 |  256 | f           |
(1 row)

This parameter can be dynamically changed in the DB cluster parameter group and doesn’t require a restart to make the change.

This parameter is also applicable to Amazon RDS for PostgreSQL.

ssl_min_protocol_version and ssl_max_protocol_version

Aurora PostgreSQL v3.3.0 (compatible with PostgreSQL v11.8) introduces the ability to set a version requirement on clients making TLS connections to the database server. You can set ssl_min_protocol_version to set the minimum SSL/TLS protocol version to use and set ssl_max_protocol_version to set the maximum SSL/TLS protocol version to use.

For example, to limit client connections to the Aurora PostgreSQL server to at least TLS 1.2 protocol version, you can set the ssl_min_protocol_version to TLSv1.2.

Logging

In this section, we cover parameters that you can use to increase the amount of information written to log files in Aurora PostgreSQL. Aurora PostgreSQL stores the logs in the temporary EBS volume attached to the Aurora DB instance. Setting these parameters may result in more logs to be generated, resulting in space constraints on the temporary volume. You can configure your Aurora instance to have a relatively shorter log retention period and publish logs to Amazon CloudWatch Logs for long-term retention. For more information, see Publishing Aurora PostgreSQL logs to Amazon CloudWatch Logs.

rds.force_autovacuum_logging

When you enable autovacuum logging by setting log_autovacuum_min_duration to 0 or higher, PostgreSQL logs operations performed by autovacuum workers. If you set it to 200, all autovacuum operations—autovacuum and autoanalyze—taking more than 200 milliseconds are logged.

On Aurora PostgreSQL (and also Amazon RDS for PostgreSQL), setting log_autovacuum_min_duration alone doesn’t enable autovacuum logging. To enable autovacuum logging, you also have to set rds.force_autovacuum_logging_level. The allowed values for this parameter are disabled, debug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal, and panic. When you set this parameter to a value other than disabled, such as info, the PostgreSQL engine starts logging the autovacuum and autoanalyze activity as per the threshold that the parameter log_autovacuum_min_duration set. Setting it to debug1 – debug5 produces very detailed log entries and is required only when you’re debugging a specific issue and logs generated by less verbose values (like info) aren’t enough.

You can set this parameter in the DB cluster parameter group attached to the Aurora PostgreSQL cluster. It’s dynamic in nature and doesn’t need a restart after it has been set. After you enable autovacuum logging by setting log_autovacuum_min_duration and rds.force_autovacuum_logging, autovacuum events are Aurora PostgreSQL logs. For example, see the following event from log:

2020-09-14 07:39:32 UTC::@:[13513]:LOG: automatic analyze of table "rdsadmin.public.rds_heartbeat2" system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2020-09-14 07:40:04 UTC::@:[8261]:LOG: automatic vacuum of table "pgtraining.public.item_inventory": index scans: 1
pages: 0 removed, 37024269 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1427 removed, 2147396171 remain, 0 are dead but not yet removable, oldest xmin: 10877634
buffer usage: 5909857 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 116.42 s, system: 1.33 s, elapsed: 149.76 s

rds.force_admin_logging_level

Amazon RDS for PostgreSQL and Aurora PostgreSQL have an internal superuser named rdsadmin. This user is only used for database management activities. For example, you might forget the password for the primary user and want to reset it. The rdsadmin user performs the password reset when you change your password on the AWS Management Console. This user’s activities aren’t captured even after setting the logging parameters for PostgreSQL. If there is a requirement to capture the activities from this user, enable this by setting rds.force_admin_logging_level to log. When you set this parameter, the Amazon RDS for PostgreSQL engine starts capturing all queries run by this admin user in any of the user databases.

Conclusion

This series of blogs discusses Aurora PostgreSQL specific parameters and how they can be tuned to control database behavior. In this post, we covered how the difference in replication architecture affects the parameter behavior in Aurora PostgreSQL. We also covered security and logging parameters that can be useful for DBAs to implement additional scrutiny for Aurora PostgreSQL clusters.

In part three we discuss query optimizer parameters which improves query performance by using Aurora specific query optimization steps. In part four we will dive deep into Aurora PostgreSQL parameters that adds ANSI compatibility options.

About the authors

Sameer Kumar is a Database Specialist Technical Account Manager at Amazon Web Services. He focuses on Amazon RDS, Amazon Aurora and Amazon DocumentDB. He works with enterprise customers providing technical assistance on database operational performance and sharing database best practices.

Gopalakrishnan Subramanian is a Database Specialist solutions architect at Amazon Web Services. He works with our customers to provide guidance and technical assistance on database projects, helping them improving the value of their solutions when using AWS