AWS Database Blog

Understanding autovacuum in Amazon RDS for PostgreSQL environments

January 2024: This post was reviewed and updated for accuracy.

PostgreSQL has become the preferred open-source relational database for many enterprise developers and startups, and powers leading business and mobile applications. Amazon Web Services (AWS) offers Amazon Relational Database Service (Amazon RDS) and Amazon Aurora as fully managed relational database services. Amazon RDS for PostgreSQL makes it easy to set up, operate, and scale PostgreSQL deployments in the cloud. With a few commands, you can have your production database instance up and running on AWS. A managed database service frees the database administrator from many maintenance and management tasks. However, there are a few maintenance tasks, like VACUUM, that need close monitoring and change based on the usage of the database. Autovacuum is the process that automates vacuum activity in PostgreSQL.

In this post, we talk about the autovacuum process and its importance. We also cover tuning autovacuum settings for better performance and the disadvantages of turning it off.

MVCC in PostgreSQL

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to maintain multiple versions of a row when performing data modifications. During UPDATE and DELETE operations on a table, the database keeps the old versions of the rows for other running transactions that may need a consistent view of the data. In PostgreSQL, every statement that modifies the database generates a transaction ID, referred to as xid. The status of a row in a table is tracked using the xid in two hidden columns, xmin and xmax.

Consider a table test with one column. To insert a row, see the following code:

postgres=# CREATE TABLE mvcc_test(id int);
CREATE TABLE
postgres=# INSERT INTO mvcc_test VALUES(1);
INSERT 0 1
postgres=# SELECT  xmin, xmax, id FROM mvcc_test;
 xmin | xmax | id
------+------+----
 100 |    0 |  1
(1 row)

xmin represents the xid through which the row was inserted, and xmax is usually 0 for visible rows. If xmax is more than 0, it represents an expired row that isn’t visible. However, in some cases, xmax is more than 0 but a row is still visible. This occurs if you update or delete something in the transaction and it’s rolled back or not committed yet.

If you delete a row, a version of that row still appears to maintain the MVCC. In this use case, for the deleted row, xmin is the xid of the INSERT statement through which the row was inserted, and xmax becomes the xid of the DELETE statement through which the row was deleted.

In PostgreSQL, UPDATE is implemented as DELETE and INSERT: the engine implicitly deletes the old row and inserts a new one. Both the old and new rows are maintained to honor the MVCC. In this use case, for the old row, xmin is the xid through which the row was inserted, and xmax is the xid through which the row was updated. For a new row, xmin is the xid through which the row was updated, and xmax is 0 because the row is just freshly created.

The following diagram summarizes MVCC behavior.

For more information, see Concurrency Control on the PostgreSQL website.

The expired rows that UPDATE and DELETE cause are called dead rows or dead tuples. A table with these dead tuples is a bloated table. Unless you clean up those dead rows, you can’t reclaim or remove the space they’re using. To reuse the space occupied by these dead rows, you use the VACUUM command.

VACUUM scans every page in the table (also known as the heap) that might contain dead row versions. It removes dead row versions from those pages and makes that space available for reuse. A data structure called the visibility map keeps track of which pages have been modified since the last VACUUM and helps preventing the vacuum from scanning data pages that are completely visible (e.g., do not contain dead tuples).

Introduction of autovacuum

The autovacuum daemon is designed with two different kinds of processes: autovacuum launcher and autovacuum worker.

The autovacuum launcher is a default running process that the postmaster starts when the autovacuum parameter is set to on. The postmaster acts as a processing mechanism for requests to the PostgreSQL system. All the client connections send a startup message to the postmaster, and the postmaster uses the information in the message and starts the backend process that will serve all the client requests from the time on. The autovacuum launcher process decides the appropriate time to start the autovacuum worker process for running the vacuum and analyze operations on the tables.

The autovacuum worker is the actual worker process that executes the vacuum operations on the tables. It connects to the database as scheduled by the launcher, reads the catalog tables, and selects a table for running the vacuum or analyze operation.

The autovacuum launcher process keeps monitoring the tables in the database and chooses the right table for a vacuum job after a table reaches the autovacuum threshold. This threshold is based on parameters like autovacuum_vacuum_threshold, autovacuum_analyze_threshold, autovacuum_vacuum_scale_factor, and autovacuum_analyze_scale_factor.

autovacuum_vacuum_threshold and autovacuum_analyze_threshold

These parameters determine the minimum number of updates or deletes in a table for the table to be scheduled for an autovacuum and autoanalyzer, respectively. The default for both is 50.

autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor

These parameters determine the percentage of a table that needs to have changes for the table to be scheduled for an autovacuum and autoanalyzer, respectively. The default for autovacuum_vacuum_scale_factor is 0.2 (20%), and autovacuum_analyze_scale_factor is 0.1 (10%).

Both of these figures are fine for tables if the number of rows in the table isn’t too high. However, for tables with a large number of rows, the default values for these parameters represent a large number of row changes, which is a lot of vacuuming work when it kicks in. These parameters can be tuned, but if large tables are in the minority on the database, it’s better to set these parameters on the table level rather than in the server config file.

For more information about these parameters, see Automatic Vacuuming on the PostgreSQL website.

To calculate the threshold, use the following formula:

vacuum threshold = vacuum base threshold + vacuum scale factor * number of live tuples. Use the following values:

  • Vacuum base thresholdautovacuum_vacuum_threshold
  • Vacuum scale factorautovacuum_vacuum_scale_factor
  • Number of live tuples – the value of n_live_tup from pg_stat_all_tables view

The autovacuum launcher can’t start the autovacuum worker process on its own; this is done by the postmaster process. The launcher stores the information about the databases in the autovacuum shared memory area, sets a flag in the shared memory, and sends a signal to the postmaster. The postmaster starts the autovacuum worker process. This new worker process reads the information from the shared memory, connects to the required database, and completes the vacuum job.

If the postmaster fails to start a worker process, it sets a flag in the shared memory and sends a signal to the launcher process. Reading the postmaster’s signal, the launcher tries again to start the worker process by sending the signal to the postmaster (the postmaster’s failure to start a worker process in some cases can be due to high load and memory pressure or too many processes already running).

When the autovacuum worker process is done with the vacuum operations, it sends a signal to the launcher. When the launcher gets the signal from the worker, the launcher wakes up and tries to start another worker if the list of tables to be vacuumed is too high in the shared memory. The workers also reload the data in pgstats tables just before vacuuming each table, to avoid vacuuming a table that another worker just finished vacuuming and is therefore no longer noted in shared memory.

A common misconception in PostgreSQL is that the autovacuum process causes increases in I/O. Therefore, many choose to completely turn the autovacuum process off. This might look like an effective solution in the initial stages of the environment, but when the database starts growing in size, you may find a lot of space occupied by the dead tuples, which rapidly slows down and increases the size of your database. Disabling the Autovacuum deamon is strongly discouraged.

Advantages of autovacuum

This section talks about the important things that autovacuum takes care of when it’s on. It also explains issues you face when it’s off.

Updating stats

The PostgreSQL ANALYZE daemon collects and calculates the stats of the tables. The query planner uses these stats to execute a query plan. This information is calculated and collected by the ANALYZE daemon and stored in the catalog tables using these stats.

Statistics for a table are calculated during the autovacuum using a threshold and a formula:

analyze threshold = analyze base threshold + analyze scale factor * number of live tuples

  • Analyze base thresholdautovacuum_analyze_threshold
  • Analyze scale factorautovacuum_analyze_scale_factor
  • Number of live tuples – the value of n_live_tup from pg_stat_all_tables view

In a similar scenario, if your autovacuum is set to off, the ANALYZE daemon doesn’t collect and calculate the stats. The query planner doesn’t have information about the table, which leads to making a bad query plan, which isn’t cost-effective.

Massive data ingestion

Starting from PostgreSQL version 13, the autovacuum process has been enhanced with the introduction of new parameters that influence its behavior in case of massive data ingestion. A new formula has been introduced:

insert threshold = insert base threshold + insert scale factor * number of live tuples

  • Insert base threshold – autovacuum_vacuum_insert_threshold
  • Insert scale factorautovacuum_vacuum_insert_scale_factor
  • Number of live tuples – the value of n_live_tup from pg_stat_all_tables view

Such vacuums may allow portions of the table to be marked as all visible, with great benefit during index only scans accesses.

Preventing transaction wraparound

As explained earlier, PostgreSQL allots a number to the transaction as the transaction ID. Because the transaction ID is a number, it should have limits like the maximum and minimum values to allow, as you can’t generate infinite numbers.

PostgreSQL uses a 4-byte integer as a definite number for the transaction ID. That is, the maximum transaction IDs you can generate with 4 bytes is 2^32 ~ 4294967296, which is 4 billion transaction Ids, where only a maximum of 2^31 ~ 2147483648 transaction Ids can be used at any time: this is because the engine compares the transaction Ids to understand if a row is visible or not to a given transaction, this is performed with a modulo-2^32 arithmetic comparison. This means that for every transaction Id, there are two billion transaction Ids that are “older” and two billion that are “newer”; another way to say it is that the transaction Ids space is circular with no endpoint. Transaction Id wrap around happens when this counter reaches its maximum of 2^32 ~ 4294967296 and wraps around to zero (actually the first usable transaction Id is not really zero, it is 3) and all of a sudden transactions that were in the past appear to be in the future, which means their output become invisible. In short, this is a catastrophic data loss: actually the data is still there, but you cannot “view” it. More details on this is in the PostgreSQL official documentation Preventing Transaction ID Wraparound Failures.

To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions to perform a “freeze” operation: the freeze will mark rows as frozen, indicating that they were inserted by a transaction that committed sufficiently far in the past that the effects of the inserting transaction are certain to be visible to all current and future transactions. Freezing a transaction ID means converting the value of a transaction ID to a frozen transaction ID, which is a value of 2, that is always less of the minimum value of 3.

Autovacuum visits each row in each page and freezes the transaction IDs. It uses this freezing option to avoid the wraparound problem when a database transaction ID age reaches autovacuum_freeze_max_age. The age of a transaction ID is simply how many transactions were performed on a table or database without any FREEZE or after the last FREEZE operation. Whenever the database transaction age reaches autovacuum_freeze_max_age, PostgreSQL launches the autovacuum process instantly to perform the freeze operation on the whole database, this happens even if the autovacuum is intentionally disabled.

Monitoring your autovacuum

To make sure your autovacuum is working effectively, you must regularly monitor dead rows, disk usage, and the last time autovacuum or autoanalyze ran.

Dead tuples

PostgreSQL offers a pg_stat_user_tables view, which provides information on each table (relname) and how many dead rows (n_dead_tup) are in the table.

Monitoring the number of dead rows in each table, especially the frequently updated tables, helps you determine if the autovacuum processes are periodically removing them so their disk space can be reused for better performance. You can use the following query to check the number of dead tuples and when the last autovacuum ran on the tables:

SELECT
relname AS TableName
,n_live_tup AS LiveTuples
,n_dead_tup AS DeadTuples
,last_autovacuum AS Autovacuum
,last_autoanalyze AS Autoanalyze
FROM pg_stat_user_tables;

Table disk usage

Tracking the amount of disk space each table uses is important because it allows you to analyze the changes in a query performance over time. It can also help you detect issues related to vacuuming. For example, if you recently added a lot of new data to a table and see an unexpected increase in the table’s disk usage, that table may have a vacuuming problem.

Vacuuming helps mark outdated rows as available for reuse, so if VACUUM doesn’t run regularly, newly added data uses additional disk space instead of reusing the disk space taken up by dead rows.

Last autovacuum and autoanalyze

The pg_stat_user_tables view gives you information about the last time the autovacuum daemon ran on a table. You can track if the autovacuum daemon is working efficiently by using autovacuum and autoanalyze. The following query gives you details of last_autovacuum and last_autoanalyze run on the tables:

SELECT relname, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables;

Enabling log_autovacuum_min_duration

The log_autovacuum_min_duration parameter helps you log every action the autovacuum process ran. The actions are logged when the autovacuum runs the specified number of milliseconds or exceeds the table threshold, if the parameter is set as table storage parameter. If you set this parameter to 150 milliseconds, all the autovacuum processes that run 150 milliseconds or more are logged. In addition, when this parameter is set to any value other than -1, a message is logged if an autovacuum action is skipped due to a conflicting lock. It can also provide more information on the slow speed of the autovacuum processes.

Enabling an Amazon CloudWatch alarm

You can set an Amazon CloudWatch alarm for transaction wraparound. For more information, see Implement an Early Warning System for Transaction ID Wraparound in Amazon RDS for PostgreSQL.

You can also use CloudWatch metrics to monitor overall system resource usage and make sure that it’s within an acceptable range when autovacuum sessions run concurrently.

Common autovacuum-related issues

If the preceding stats indicate that the autovacuum daemon isn’t working efficiently on your database, it may be for a few different reasons. The following section discusses some possible reasons.

Autovacuum parameter tuning

If your autovacuum isn’t triggering the vacuum process on your tables regularly, or isn’t performing efficiently, consider tuning the autovacuum parameters. The autovacuum process depends on several configuration settings to decide when it should automatically run VACUUM and ANALYZE commands on your tables.

The following code gives you the list of autovacuum parameters that you should consider tuning:

select category, name, setting, unit, source, min_val, max_val, boot_val from pg_settings where category = 'Autovacuum' ;
 
  category |                name                 | setting   | unit |       source       | min_val |  max_val   | boot_val 
------------+-------------------------------------+-----------+------+--------------------+---------+------------+-----------
Autovacuum | autovacuum                          | on        |      | default            |         |            | on
Autovacuum | autovacuum_analyze_scale_factor     | 0.05      |      | configuration file | 0       | 100        | 0.1
Autovacuum | autovacuum_analyze_threshold        | 50        |      | default            | 0       | 2147483647 | 50
Autovacuum | autovacuum_freeze_max_age           | 200000000 |      | default            | 100000  | 2000000000 | 200000000
Autovacuum | autovacuum_max_workers              | 3         |      | default            | 1       | 262143     | 3
Autovacuum | autovacuum_multixact_freeze_max_age | 400000000 |      | default            | 10000   | 2000000000 | 400000000
Autovacuum | autovacuum_naptime                  | 30        | s    | configuration file | 1       | 2147483    | 60
Autovacuum | autovacuum_vacuum_cost_delay        | 20        | ms   | default            | -1      | 100        | 20
Autovacuum | autovacuum_vacuum_cost_limit        | -1        |      | default            | -1      | 10000      | -1
Autovacuum | autovacuum_vacuum_scale_factor      | 0.1       |      | configuration file | 0       | 100        | 0.2
Autovacuum | autovacuum_vacuum_threshold         | 50        |      | default            | 0       | 2147483647 | 50

In the preceding output, the setting column shows the current configured values. The boot_val column shows the default values for the autovacuum parameters set by PostgreSQL, which you use when you don’t make changes to the default parameters. Tuning these autovacuum parameters makes sure that your autovacuum process works frequently and efficiently on the tables.

For more information about autovacuum tuning, see A Case Study of Tuning Autovacuum in Amazon RDS for PostgreSQL.

Autovacuum skipped due to lock conflicts

If you already optimized your autovacuum settings and see your autovacuum daemon skipping the vacuum process on a few tables, it could be due to EXCLUSIVE locks on the tables.

To run a vacuum on a table, the autovacuum process has to acquire a SHARE UPDATE EXCLUSIVE lock, which conflicts with other locks because two transactions can’t hold a SHARE UPDATE EXCLUSIVE lock at the same time. This is the same for other lock modes like SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE.

The SHARE UPDATE EXCLUSIVE lock doesn’t block SELECT, UPDATE, INSERT, or DELETE. It only blocks transactions with the following locks:

  • SHARE UPDATE EXCLUSIVE – Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, REINDEX CONCURRENTLY, CREATE STATISTICS, and certain ALTER INDEX and ALTER TABLE variants.
  • SHARE – Acquired by CREATE INDEX (without CONCURRENTLY).
  • SHARE ROW EXCLUSIVE – Acquired by CREATE TRIGGER and some forms of ALTER TABLE.
  • EXCLUSIVE – Acquired by REFRESH MATERIALIZED VIEW CONCURRENTLY.
  • ACCESS EXCLUSIVE – Acquired by DROP TABLE, TRUNCATE, REINDEX, CLUSTER, VACUUM FULL, and REFRESH MATERIALIZED VIEW (without CONCURRENTLY) commands. Many forms of ALTER INDEX and ALTER TABLE also acquire a lock at this level.

Therefore, if any transactions come with a request to hold one of these locks on a table, and if the autovacuum daemon is already running a vacuum job on one of those tables, it immediately cancels the vacuum job so the other transaction can take the locks. Similarly, if any transaction is already holding an ACCESS EXCLUSIVE lock on a table, autovacuum skips those tables from vacuuming. The autovacuum process keeps skipped tables to run a vacuum job on the next iteration.

Autovacuum action skipped due long-running transactions

Because PostgreSQL is based on the MVCC concept, the autovacuum process doesn’t clean up the dead tuples if one or more transactions is accessing the outdated version of the data. Similarly, old tuples are not frozen if they are currently needed by a running transaction. If any transaction is working on a snapshot of the data that was taken before the data was deleted or updated, autovacuum skips those dead tuples, and those dead tuples get vacuumed in the next iteration. This usually happens with long-running transactions in your database. To find the long-running transactions in your database, enter the following code:

SELECT now()-query_start as Running_Since, pid, datname, usename, application_name, client_addr, left(query,60) FROM pg_stat_activity WHERE state in ('active', 'idle in transaction') AND (now() - query_start) > interval '5 minutes';

It’s a good practice to include your idle in transaction sessions as a part of your monitoring because those might cause autovacuum to skip its actions on the table.

Best practices of autovacuum

This section talks about the best practices for running autovacuum.

Allocating memory for autovacuum

The maintenance_work_mem parameter is an important parameter that influences the performance of autovacuum. It decides how much memory to use for the autovacuum process to scan the tables in a database and holds the row IDs that need a vacuum.

Setting the parameter to low makes the vacuum process scan the table multiple times to complete the vacuum job, which negatively impacts the database’s performance.

If you have many small tables, allocate more autovacuum_max_workers and less maintenance_work_mem. If you have large tables (over 100 GB), allocate more memory and fewer worker processes. You must have enough memory allocated to succeed on your biggest table. Each autovacuum_max_workers can use the memory you allocate. Therefore, you should make sure the combination of worker processes and memory equals the total memory that you want to allocate.

For larger instances, setting maintenance_work_mem to at least 1 GB will significantly improve the performance of vacuuming tables with a large number of dead tuples. Note however that vacuum caps it’s memory use at 1 GB; enough to process about 179 million dead tuples in one pass. Vacuuming a table with more dead tuples than that will require multiple passes through the tables indexes, which can cause vacuum to take significantly longer. You can calculate how many dead tuples vacuum can process in a single pass by dividing maintenance_work_mem (in bytes) by 6.

Setting the autovacuum_work_mem or maintenance_work_mem parameters sets the maximum memory size that each autovacuum worker process should use. By default, autovacuum_work_mem is set to -1, which indicates that the memory allocation for the autovacuum worker process should use the maintenance_work_mem settings.

Amazon RDS has updated its default for this parameter to be KBs calculated as

GREATEST({DBInstanceClassMemory/63963136*1024},65536).

For more information, see Allocating Memory for Autovacuum and A Case Study of Tuning Autovacuum in Amazon RDS for PostgreSQL.

Reducing the chances of transaction ID wraparound

In some use cases, even tuned autovacuum settings aren’t aggressive enough to prevent transaction ID wraparound. To address this issue, Amazon RDS has a mechanism that adapts the autovacuum parameter values automatically.

With adaptive autovacuum parameter tuning enabled, Amazon RDS begins adjusting autovacuum parameters when the CloudWatch metric MaximumUsedTransactionIDs reaches 750,000,000 or the parameter autovacuum_freeze_max_age, whichever is greater.

Amazon RDS continues to adjust parameters for autovacuum when a table continues to trend toward transaction ID wraparound. Each adjustment dedicates more resources to autovacuum to avoid wraparound. Amazon RDS updates the following autovacuum-related parameters at table level:

  • autovacuum_vacuum_cost_limit – The accumulated cost that causes the autovacuum process to sleep. The default value is 200.
  • autovacuum_vacuum_cost_delay – The specified amount of time in milliseconds that the autovacuum process sleeps when it exceeds the cost limit. The default value is 20 milliseconds.
  • autovacuum_work_mem – The maximum amount of memory used by each autovacuum worker process. It defaults to -1, indicating that the value of maintenance_work_mem should be used.
  • autovacuum_naptime – Specifies the minimum delay between autovacuum runs on any given database. In each round, the daemon examines the database and issues VACUUM and ANALYZE commands as needed for tables in that database. The delay is measured in seconds, and the default is 1 minute. You can only set this parameter in the postgresql.conf file or on the server command line.

Amazon RDS modifies these parameters only if the existing values aren’t aggressive enough. These parameters are modified in the memory of the DB instance and not changed in the parameter group.

Whenever Amazon RDS modifies any of these autovacuum parameters, it generates an event for the affected DB instance that you can see on the AWS Management Console through the Amazon RDS API. After the MaximumUsedTransactionIDs CloudWatch metric returns below the threshold, Amazon RDS resets the autovacuum-related parameters in memory to the values specified in the parameter group.

Setting autovacuum at table level

In a growing PostgreSQL environment based on the global autovacuum settings, you may likely see that large tables aren’t effectively vacuumed and small tables are frequently vacuumed. To avoid these scenarios, you can set your autovacuum parameters at the table level with the following steps:

  1. List out the large tables in your environments.
  2. List the tables that have a high number of changes happening.
  3. See what tables have a high number of 'n_dead_tup.
  4. See when the tables have been last autoanalzyed and autovacuumed.
  5. Change the autovacuum and autoanalyze parameter at the table level, for example:
    ALTER TABLE my_table SET (autovacuum_vacuum_scale_factor = 0.02);

    To reset a previously changed parameter just use RESET.

    Additionally, it is a good practice to partition the large tables to improve the autovacuum.

Summary

Dead tuples can compromise a PostgreSQL database’s performance because they continue to occupy space and can lead to bloat. You can use VACUUM to remove dead tuples and reuse the space for future inserts and updates. You can also use ANALYZE on the table that updates the table statistics so that the optimizer can choose optimal execution plans for an SQL statement. Autovacuum helps you remove bloat, reduce table disk usage, and update your table stats regularly for the query planner to run cost-effectively.

There is a common misconception that autovacuum slows down the database because it causes a lot of I/Os. However, even if it generates I/Os, if you tune it properly and follow best practices, it can be very beneficial for your Amazon RDS for PostgreSQL environment saving you to generate even more I/Os while reading data pages containing a lot of dead tuples. Turning autovacuum off causes performance issues and and can lead to transaction wraparound issues, which stops your database from working, so you should never turn it off.


About the Authors

Anuraag Deekonda is an Associate Consultant with AWS ProServe India. As a migration specialist, he works closely with customers who are migrating their workloads from on-prem to AWS Cloud. He has enabled customers to rehost their applications and databases in their migration journey. He greases the wheels for customers to build highly scalable, available, and secure solutions. He is passionate about PostgreSQL, MySQL, and other opensource databases.

Baji Shaik is a Consultant with AWS ProServe, GCC India. His background spans a wide depth and breadth of expertise and experience in SQL/NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on-premises to Amazon RDS and Aurora PostgreSQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include “PostgreSQL Configuration” and “PostgreSQL Development Essentials“. Furthermore, he has delivered several conference and workshop sessions.

Domenico di Salvia is a Senior Database Specialist Solutions Architect at AWS. In his role, Domenico works with customers in EMEA region to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using or migrating to AWS, designing scalable, secure, performant, sustainable, cost-effective and robust database architectures in the AWS Cloud.


Audit History

Last reviewed and updated in January 2024 by Domenico di Salvia | Sr. RDS PostgreSQL Solutions Architect