Implement an Early Warning System for Transaction ID Wraparound in Amazon RDS for PostgreSQL
Shawn McCoy is a database engineer in Amazon Web Services.
As a PostgreSQL database owner, you know that critical maintenance operations are handled by the autovacuum process. The effectiveness of this background process is constantly being improved with each release. For most PostgreSQL users, the default settings will work just fine. However, some workloads and usage patterns require customized parameters. I’ll cover two areas in this post: First, what you can do to monitor your Amazon RDS for PostgreSQL database’s transaction ID health, and second, common things I’ve helped customers work through once a problem is identified.
What is “transaction ID” health?
A PostgreSQL database can have two billion “in-flight” unvacuumed transactions before PostgreSQL takes dramatic action to avoid data loss. If the number of unvacuumed transactions reaches
(2^31 - 10,000,000), the log starts warning that vacuuming is needed. If the number of unvacuumed transactions reaches
(2^31 - 1,000,000), PostgreSQL sets the database to read-only mode and requires an offline, single-user, standalone vacuum. This vacuum requires multiple hours or days of downtime (depending on database size). A very detailed explanation of transaction ID wraparound is found in the PostgreSQL documentation.
Early warning is the key
We now know that around 2.1 billion unvacuumed transactions is “where the world ends” to quote the source code comment. There are several reasons a transaction can become this old, but let’s first look at how you can be alerted of this pending problem. Although it can be monitored with manual queries, we have introduced an Amazon CloudWatch metric, MaximumUsedTransactionIDs, to make setting this kind of alert easier for you. To generate this metric, the Amazon RDS agent runs this query:
SELECT max(age(datfrozenxid)) FROM pg_database;
From my experience working with customers, I recommend an alarm when this metric reaches a value of 1 billion. For some situations, a low severity warning at 500 million might also be useful. Let’s take a look at configuring a CloudWatch alarm, as described in the CloudWatch documentation.
First, choose CloudWatch in AWS Management Console:
Then choose Alarms on the navigation pane:
Choose Create Alarm to create a new alarm:
MaximumUsedTransactionIDs metric and press Return to bring up a list of your instances. For this example, I’m going to just select one of them. If you have several instances, you can group several together and get a single alarm.
After selecting your target instance, choose Next to set up the threshold:
Then set up the threshold (1000000000) and notification email:
In my example, I’ve already exceeded the threshold, so I get an email with this wording:
This message will also show up in the CloudWatch dashboard:
Setting this threshold to 1 billion should give you plenty of time to investigate the problem. It really depends on the problem’s cause how much time it will take to fix. The default autovacuum_freeze_max_age value is 200 million. If the age of the oldest transaction is 1 billion, autovacuum is having a problem keeping this threshold at the target of 200 million. It’s a luxury to have extra time to fix these issues. There is nothing more frustrating than trying to fix this type of problem while keeping the system under normal workloads.
The monitor has sent an alarm—now what?
Let’s say that the monitor has sent the alarm email and you know that there is a problem. I’ll walk through the steps that I take in identifying solutions to the most common problems I’ve helped customers resolve. One thing I like to do after I have identified the problem is to execute a verbose vacuum command and see how long the action takes. This approach can help me get out of immediate trouble and plan my long-term fix.
The three things to identify are:
- Where autovacuum is currently working.
- Which database is aging.
- Which tables are aging.
With this information, you can start to diagnose what is happening.
What is autovacuum doing?
To find out what autovacuum is doing, you can take the following action. This action requires that you are running RDS Postgres 9.3.12 or later, 9.4.7 or later, or 9.5.2 or later to have full visibility into the running rdsadmin processes. This query will show the information we want:
SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) like '%VACUUM%' ORDER BY xact_start;
Which databases are aging?
To find out which databases are aging, the following query will tell us where to start work:
SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC LIMIT 20;
Which tables are aging?
To find out which tables are aging, connect to the first database indicated in the result from the preceding query. This database will have the table containing the oldest transaction, and this query will show the top 20 oldest tables and their size. Most likely, these tables are the ones you can observe autovacuum working on in the first query.
SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age, pg_size_pretty(pg_table_size(c.oid)) as table_size FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind = 'r' ORDER BY 2 DESC LIMIT 20;
Let’s look at two different scenarios based on the output of these queries.
The case of long-running autovacuum sessions
In our first example, there are sessions that have been running for a long time. This situation usually indicates maintenance_work_mem is set too small for the size of the table.
These sessions will most likely never complete. These are two common causes of this scenario:
- The top problem I have seen is that maintenance_work_mem is too small for the size of the larger tables. This critical parameter determines how much tuple information the autovacuum worker can hold in memory while it’s processing tables. If this parameter is too small for a table, it will require multiple passes on the table to complete. For more information, see RDS recommendations for working with the maintenance_work_mem.
- If you have adjusted your maintenance_work_mem parameter and the workers are still not keeping up, you might need to adjust the individual parameter setting for the larger tables. The benefit of this approach is that you can allow autovacuum to remain less aggressive on your entire system and just target larger tables. With the default parameters, autovacuum will take a nap of autovacuum_vacuum_cost_delay (20 milliseconds) every time the internal counter hits auto_vacuum_cost_limit. For a large table, it can be very beneficial to have autovacuum take no “breaks” at the expense of additional resource consumption. To make sure that’s the case, use this setting:
ALTER TABLE mytable SET (autovacuum_vacuum_cost_delay=0);
For more information, see RDS recommendations for working with the other autovacuum parameters and how you set them.
The case where autovacuum sessions seem to complete, but overall fall behind
The scenario where autovacuum sessions seem to complete but really fall behind can be a little trickier to fix. By default, autovacuum_max_workers is set to 3. This setting is usually fine. However, if you have an extremely high number of tables (30,000 or more), three workers can be overwhelmed. Recall the query above to check what autovacuum is doing. The results here seem to indicate things are okay:
This result actually looks normal. The autovacuum sessions haven’t been running long, and all three are working on tables. But we received an alarm indicating our database age has passed our threshold of 1 billion. So what is the problem?
If we look at the query that checks the age of databases and tables, we see that there is something preventing autovacuum from keeping up:
At this point, I check exactly how many tables I have for autovacuum to work through, because these queries don’t indicate any problem other than that there are still enough tables aging to trigger my alarm.
SELECT count(*) FROM pg_tables;
Schema design questions aside, there must be too many tables for our default three workers to traverse. The typical fix is to increase autovacuum_max_workers. Two things to note are that the cost parameter auto_vacuum_cost_limit is evenly distributed among active workers and that each worker can consume up to maintenance_work_mem. In this case, I recommend the following settings to see if progress is made and the CloudWatch graph shows a downward trend:
I also recommend taking a look at the schema to see if improvements can be made in the design and lessen the overall number of tables.
This post identifies a process to monitor this critical PostgreSQL metric, gives steps to start troubleshooting the problem, and describes a couple of common scenarios I have worked with RDS customers to fix. Other situations can arise, but I hope that this post helps you start the troubleshooting process.
In conjunction with your CloudWatch metric, it’s also useful to enable autovacuum logging so you can see where time is being spent. However, it might be beneficial to execute a manual “vacuum freeze” on a table that is approaching that 2 billion “age” that we talked about at the beginning of this post. It’s useful to see how long the operation takes. When using the
verbose keyword, you get a better sense at what is happening—for example, if you don’t need those indexes that cause this operation to take additional time, it’s a good time to get rid of them! Here the RDS documentation outlines the steps to execute a manual verbose vacuum.
You can check out more information about autovacuum in Working with PostgreSQL Autovacuum on Amazon RDS in the RDS documentation. If you have any questions, please let us know in the comments following.