PostgreSQL architecture considerations for application developers: Part 1

Although the application layer is the portion the world accesses for many cloud architectures, it seems that we rarely consider how we can optimize our application for the database we’re using. When using any relational database engine, it’s important to consider not just schema design, but understanding how databases read and write data to their storage systems to ensure the application is manageable, scalable, and performant. In this post, which is Part 1 of a series, we discuss key PostgreSQL terms, then we dive a bit deeper into autocommit, autovacuum, and idle in transaction when using Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

Making PostgreSQL parameter changes: Where, when, and why

Parameters are used in databases and PostgreSQL to define its elemental properties. PostgreSQL has default parameters set when creating new databases and, for a lot of systems, normally the default parameters provide good performance and tuning is not needed. As systems grow, scale, and have more stress, some parameters may need to be tuned for optimal performance.

Depending on if you’re using a self-managed or an AWS managed database, different parameter values will need to be changed. For self-managed databases, parameter changes are made in the postgresql.conf file. In AWS managed databases, access to the postgresql.conf file is restricted, so changes can only be made via the AWS Management Console, the AWS Command Line Interface (AWS CLI), SDK, or AWS CloudFormation to your underlying database or cluster parameter groups.

postgresql.conf (self-managed)

For self-managed databases, changes are made to this file (within your PostgreSQL data directory) when parameters are intended to be set globally across your PostgreSQL cluster. For more information, refer to Setting Parameters in the PostgreSQL community documentation.

RDS DB parameter group (AWS managed)

Amazon RDS cluster-level and database-level parameter groups have default settings depending on instance class and size. If different values are needed for better performance, you can create a new parameter group via the console, AWS CLI, SDK, or AWS CloudFormation. For more information, refer to Creating a DB parameter group.

Session level (self-managed or AWS managed)

Many PostgreSQL parameters can be changed at the session level (which consists of one or more transactions) because these parameters are only desirable for specific queries within your workload, and not across the board. These parameters can be changed using the SET command. For more information, refer to SET in the PostgreSQL community documentation.

PostgreSQL concepts

In this section, we discuss the core PostgreSQL concepts that are vital for the operation of PostgreSQL databases.

Transactions

In PostgreSQL, a transaction is a sequence of SQL statements run as a single operation. This transactional model ensures that either all statements in a transaction are successfully committed to the database, or rolled back if a statement fails (or an error occurs). PostgreSQL is ACID compliant, which is a set of database characteristics ensuring atomicity, consistency, isolation, and durability is always maintained in database operations:

Atomicity – Database atomicity ensures open transactions are invisible to other transactions until the transaction is complete, then all changes become visible simultaneously as a single unit.
Consistency – Consistency ensures that if there is a committed change to data, a new transaction will see the change whether it was committed days, hours, or seconds before; and after a server crash, data can be recovered without error.
Isolation – PostgreSQL provides various isolation levels to control the visibility of data changes to other concurrent transactions. By default, an isolation level of read committed is used, allowing transactions to only see changes after they’re committed by other transactions.
Durability – Durability is the guarantee that the database will keep track of all committed changes, so if there’s an abnormal cancellation, the database can either roll back to its original state or continue where it left off by replaying transaction logs.

For more information about transactions, refer to Transactions. For more information about ACID compliance, see the glossary in the PostgreSQL community documentation.

Locking

PostgreSQL uses locking mechanisms to manage concurrent access to data in order to prevent conflicts across multiple transactions. PostgreSQL provides two types of locks:

Shared locks – These allow multiple transactions to read a particular data object simultaneously
Exclusive locks – These prevent other transactions from accessing data objects until the lock is released

PostgreSQL utilizes a locking protocol, where locks are acquired and released in a specific order. This helps protect the database against deadlocks, a situation where two or more transactions are blocked while waiting for each other to release locks for resource access. PostgreSQL also supports row-level locking, which allows locks on individual rows instead of entire tables, providing fine-grained control over concurrent access. Finally, PostgreSQL implements lock escalation, where a large number of locks on a single object can be replaced by a single higher-level lock to reduce overall lock overhead.

For more information about locking, refer to Explicit Locking.

VACUUM

PostgreSQL stores rows of data in a structure known as a tuple. When tuples are logically updated or deleted, an invisible version still in the database. The tuples are maintained to ensure transactions running concurrently with the delete or update command can finish with a snapshot of the database from when the transaction started. PostgreSQL uses a VACUUM process to free space used by old invisible tuple versions and to reclaim storage. Those updated and deleted rows are marked as dead tuples to be cleaned up at a later point by the VACUUM process. They are not cleaned immediately because different transactions can work with the same tuple concurrently, ensuring accuracy through Multi-Version Concurrency Control (MVCC). For example, if the original version is removed immediately, concurrently running transactions aren’t rolled back accurately. Although simply running VACCUUM will remove dead tuples, there are other variations of the VACCUUM command that must be understood, which we discuss in this section.

VACUUM ANALYZE

This command removes dead rows and collects database statistics about the contents of that table (storing those statistics in the pg_statistic system catalog). This data is then used by the PostgreSQL query optimizer to help determine the most efficient query run plans at query runtime.

VACUUM FREEZE

In addition to cleaning up dead rows, this option aggressively freezes tuples, specifying the cutoff age (in transactions XID) that VACUUM uses to decide whether to freeze old rows, keeping them visible to all users, or to delete them. This avoids data loss of older active tuples during the VACUUM process and transaction wraparound corruption. Transactions are tracked by their unique XID and can become exhausted because it is a fixed number. If not monitored properly, a production database’s XID numbers can reach the upper limit that can be registered, and wraparound occurs where past TID numbers are now in the present and can be reused. This wraparound corruption can cause the database to shut down to protect data integrity.

VACUUM FULL

This option removes dead tuples from a table (or database) by completely rewriting its contents. It reclaims disk space from deleted and updated tuples by performing a more thorough cleanup by physically rearranging data, resulting in more compressed storage and improved query performance. This option should not be used in ordinary production circumstances, because it creates an exclusive lock against the table, preventing all other access until the operation is complete.

PostgreSQL timeout-related parameters

Timeout settings are a necessity because although we can estimate how workloads may behave during benchmarking, production workloads sometimes behave unexpectedly. Properly configuring timeout settings acts as a safeguard to protect your cluster from abnormalities in the run of your workload. These settings can and should be adjusted throughout your database lifecycle. A good practice is to have settings for connection, and request timeouts. RDS for PostgreSQL databases have useful default settings; however, if you determine different settings would enhance performance, change and test settings one at a time before applying to production.

PostgreSQL database timeouts settings can be at the statement, user, or database level. It’s useful for the application developer to understand these timeout parameters and how they work to prevent timeout errors. The following are some parameters around timeouts that you can tune so application performance and user experience is not negatively affected by timeout errors:

statement_timeout – The number of milliseconds before a statement in a query times out. The default is no timeout.
idle_in_transaction_session_timeout – This parameter closes any session with an open transaction that has been idle for longer than the specified duration. This releases any locks held by that session so the connection slot can be reused; it also allows tuples visible only to this transaction to be vacuumed to decrease bloat. The default value is (0) disabled.
idle_session_timeout – In PostgreSQL versions 14 and later, you can use the idle_session_timeout parameter. Any session that’s idle but outside an open transaction for more than the specified time is closed. The default value is (0ms) disabled. For versions 13 and before, the idle_in_transaction_session_timeout parameter was used but it stopped all transactions in an open session.
client_connection_check_interval – In PostgreSQL versions 14 and later, you can use the client_connection_check_interval parameter. With this parameter, you can set the time interval between optional checks for client connections when running queries. This check allows long-running queries to end sooner if the kernel reports that the connection is closed. The default value is (0ms) disabled. For versions 13 and before, the server didn’t detect dead connections until query completion, resulting in not being able to send results back to the client if the connection closed unexpectedly.

PostgreSQL features around database behavior and their best practices

PostgreSQL is a powerful, object-relational database system with proven data integrity, reliability, and extensibility. It has a strong architecture that can deliver performant and innovative database solutions. PostgreSQL has many features to help the application developer build operational fault-tolerant applications in a free open-source extensible environment. Developers can build custom functions and use code from different programming languages without having to recompile the database. In this section, we discuss a few features around database behavior and some best practices.

AUTOCOMMIT

PostgreSQL requires transactions to be explicitly committed because of its ACID compliance. One feature to help with this is AUTOCOMMIT, which automatically saves transactions to the database. AUTOCOMMIT is where each statement runs in a transaction and each statement is automatically committed. Its default value is ON, meaning you don’t have to specifically issue the BEGIN or COMMIT command to run it. Transactions start with BEGIN and end with a COMMIT command, and the commit saves user changes. When AUTOCOMMIT is set to OFF, a BEGIN command is not required, but the COMMIT command must be written explicitly at the end of the statement for changes to take effect in the database.

The AUTOCOMMIT default setting is useful in most environments and doesn’t need to be modified. For example, AUTOCOMMIT doesn’t need to be disabled when using the \COPY command to bulk load rows. You can get better performance with an AUTOCOMMIT bulk insert of 100 rows (for example, INSERT … VALUES (...), (...), (...), (...) compared to a single COMMIT of 100 separate INSERT statements (BEGIN; INSERT … INSERT … INSERT …) because individual BEGIN and COMMIT commands use significant disk activity and CPU. However, under certain circumstances, turning the setting off can provide a better working experience. If one insert fails, all rows are rolled back in order to avoid unwanted partial data loads, which could be problematic depending on business needs.

This feature can be beneficial to application developers because it allows for quick recovery from mistakes such as running a DELETE statement without a WHERE clause by accident. If leaving AUTOCOMMIT off is desired, it’s best to modify this setting at the session level for specific aspects of your workload. If a statement is issued with AUTOCOMMIT off and a COMMIT command is not given, the next statement will run into locks because PostgreSQL holds locks until a COMMIT is given, as described earlier in this post.

The following are best practices for using AUTOCOMMIT:

Leave AUTOCOMMIT turned on globally, and only disable if you have business reasons for doing so. Note the following:
- With AUTOCOMMIT on, queries are not grouped together.
- Because AUTOCOMMIT is implicitly issued, there isn’t any uncertainty about which query is committed or rolled back.
- AUTOCOMMIT in PostgreSQL has an implicit BEGIN and COMMIT around it, oftentimes called a transaction block, and a COMMIT command is unnecessary.
- Having AUTOCOMMIT on guarantees every SQL statement is automatically committed and a rollback isn’t possible unless AUTOCOMMIT is off.
When disabling AUTOCOMMIT, do so only at the session level. Note the following:
- When disabled, the database is always in transaction mode and must explicitly end with a COMMIT or ROLLBACK command.
- When AUTOCOMMIT is off, if a mistake is made, it’s easy to issue a rollback and everything is undone. This allows a quick and straightforward recovery from mistakes because changes have not been persisted to the database.

AUTOVACUUM

VACUUM is a manual process that cleans up dead tuples, whereas AUTOVACUUM is a periodic background utility daemon that automates the removal of deleted and older updated tuples. Autovacuum is enabled by default and its parameters need to be tested and adjusted if there are many UPDATE and DELETE commands issued to the database. PostgreSQL uses the MVCC model and will hold old row versions to allow concurrent read requests to complete. During these statements, rows won’t be deleted, and the old version will be retained until the transaction finishes. If a COMMIT command is not given, AUTOVACUUM can’t delete these rows because the transaction is technically still running, and the row may still be needed. Some example problems caused from disabling autocommit are locks in the database and AUTOVACUUM maintenance obstruction.

A consequence of disabling AUTOVACUUM are dead tuples not being removed, leading to table bloat. This is undesirable because database bloat leads to an increase in overall disk usage by your tables and indexes causing query runtimes to increase, leading to dead tuples and active visible rows to be read from your tables or indexes to run queries. Instead of AUTOVACUUM automatically removing dead tuples, they may have to be physically removed by specifically calling the VACUUM FULL command.

The following are best practices for AUTOVACUUM:

Ensure AUTOVACUUM is running by regularly performing bloat estimate reports using the pgstattuple extension for tuple-level statistics. Note the following:
- Bloat will result in increased disk consumption and performance loss, resulting in queries of related data take twice as long (or longer) if not managed and reduced regularly. AUTOVACUUM can help minimize database bloat over time, when properly configured.
- Tune the autovacuum_naptime parameter appropriately to ensure that AUTOVACUUM is running frequently enough, in order to prevent the proliferation of database bloat.
- Avoid long-running queries that run adjacent to write-heavy workloads. Make sure to fully understand workloads running on the server because AUTOVACUUM produces weak locks on the table. Normal database operations such as INSERT, UPDATE, and DELETE can proceed, but it does have an effect on indexes and truncating on the table.
Tune AUTOVACUUM settings based on table usage and access patterns. Note the following:
- Test and set table AUTOVACUUM thresholds from mentioned VACUUM parameters on tables with heavy DELETE and INSERT statements.
- Plan and test the best strategies of how and when to run AUTOVACUUM during non-busy times to have little impact on the database. Also, determine how to run during busy times to not disrupt production system workloads with locks.
- Be careful of not using AUTOVACUUM regularly; you may run the risk of systems getting too busy between vacuum runs and having to catch up, or having to use VACUUM FULL.

Idle in transaction

Monitoring PostgreSQL connections is an important task because changes in connection state can be a starting point on how to troubleshoot an error. PostgreSQL has four main states a transaction or statement can be in:

active – A connection that is open and running queries
idle – A connection that is idle and not running queries, but consuming server resources such as memory and CPU and is common cause of bad performance
idle_in_transaction – A connection whose backend is in transaction, but is idle and currently waiting for input
idle_in_transaction (aborted) – similar to idle_in_transaction, but the statement in the transaction has caused an error

Open transactions can hold locks, block other queries, and prevent AUTOVACUUM and VACUUM performance leading to bloated tables. An important state that is often hard to identify that causes performance problems is idle_in_transaction. A query is idle_in_transaction when the database has issued a BEGIN command, has taken locks on a table or tables, and waiting for user input but hasn’t issued the COMMIT or ROLLBACK command for whatever reason. Connections that are idle in transaction are hung and can remain in this state endlessly because PostgreSQL doesn’t know why it is waiting and won’t stop the transaction automatically, all while consuming transaction thread. This state will not resolve automatically because there could be good business reasons why the process is waiting. For example, it takes time for a document to be read, or it’s waiting some amount of business days to send or receive an email. To eliminate idle_in_transaction, database locks must be understood to determine why they are issued, and the application will need to know how to handle it when encountered.

idle_in_transaction is easy to reproduce. First, create a table and add data. Then initiate a statement by entering BEGIN. After the statement has started, alter the table by adding another column without ending with a commit or rollback. This action will cause an idle in transaction lock because a second statement has started without the first ending.

To replicate idle_in_transaction in a psql session, run the following code:

CREATE TABLE mydbtable (
id int GENERATED BY DEFAULT AS IDENTITY,
username varchar (50),
password varchar (50)
);
BEGIN;
alter table mytable add column last_update timestamp;

Open another psql tab and run the following code:

SELECT * FROM mydbtable;

Nothing happens because there is a lock on the table. To release the lock, go back to the first session and run a commit or rollback:

COMMIT;

After the command is run, the second session finishes immediately. Locks are always kept until there is a commit or rollback.

The following are best practices for managing idle_in_transaction sessions:

Query the pg_stat_activity table in order to locate queries that are currently idle_in_transaction. For more information on this table and its usage, refer to pg_stat_activity.
Break transactions into smaller and more manageable pieces in order to avoid idle_in_transaction. Note the following:
- Prepare queries so they don’t run longer than a specified time on a per-session or per-database setting.
- Periodically check timeout logs to detect long-running transactions.
- Consider setting the idle_in_transaction_session_timeout parameter so long-running transactions can be cancelled. The default is 0, meaning no timeout.
- Use pg_stat_activity to check for long-running queries and how long it has been in that state.
Move long-running stored procedures or functions out of the database layer and into the application layer. Note the following:
- Errors can be handled by error handling logic coded in the application.
- Code applications to close transactions before processing query results.
- Make sure AUTOCOMMIT is on in the application and database layer to avoid unnecessary errors.
- Make sure idle_in_transaction sessions are not preventing VACUUM and other queries from accessing tables by monitoring the tables using the pg_stat_activity parameter for idle_in_transaction transactions. This will list all open transactions and their state.

Conclusion

In this post, we defined key PostgreSQL functionalities and detailed specific features of the PostgreSQL engine along with best practices to help guide your application architecture. Considering database design and parameter settings during application design for Amazon RDS for PostgreSQL and Aurora PostgreSQL-Compatible will help avoid costly and inconvenient interruptions in database performance while lessening downstream load interruptions. Although this post is not an exhaustive resource on these topics, it is meant to be a primer for follow-up posts discussing additional PostgreSQL architecture and tuning considerations for the application developer.

We welcome your comments and feedback in the comments section.

View the Turkic translated version of this post here.

About the Authors

Peter Celentano is a Specialist Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He works with AWS customers to design scalable, secure, performant, and robust database architectures on the cloud.

Tracy Jenkins is a Database Specialist Solutions Architect with Amazon Web Services. She enjoys working with databases and helping customers architect performant, highly available and scaleable solutions while making recommendations on reliability, cost and security.