How do I troubleshoot issues related to major version upgrades in RDS for PostgreSQL?

Last updated: 2022-06-20

My engine version upgrade for Amazon Relational Database Service (Amazon RDS) for PostgreSQL is stuck or has failed.

Short description

When Amazon RDS supports a new version of a database engine, you can upgrade your DB instances to the new version. You can perform a minor version grade or a major version upgrade for your DB instances. Minor version upgrades are used to patch security vulnerabilities and fix bugs. These upgrades usually don't add any new functionality. Minor releases don't change the internal storage format and are always compatible with the earlier and later minor releases of the same major version. However, major version upgrades contain database changes that aren't backward-compatible with existing applications. These upgrades might change the internal format of system tables, data files, and data storage. RDS uses the PostgreSQL utility pg_upgrade for performing major version upgrades.

During the major version upgrade process, Amazon RDS for PostgreSQL runs a precheck procedure to identify any issues that might cause the upgrade to fail. The precheck procedure checks for potential incompatible conditions across all databases. If an issue is identified during the precheck process, then a log event is created to indicate that the upgrade precheck failed. You can find information related to the precheck process for all databases in the instance in the upgrade log pg_upgrade_precheck.log. Amazon RDS appends a timestamp to the file name. RDS events might also provide the reasons for upgrade failure. However, for issues that are engine specific, you must check the database log files. For more information, see Viewing and listing database log files.

During a major version upgrade, RDS completes the following steps:

  1. Create a snapshot of the instance before the upgrade. This happens only if you set the backup retention period for your DB instance to a number greater than zero.
  2. Shut down the instance.
  3. Use the pg_upgrade utility to run the upgrade job on the instance.
  4. Create a snapshot of the instance after the upgrade.

Resolution

Although RDS manages these upgrades, you might encounter the following issues during a version upgrade:

  • Upgrade takes a longer time.
  • Upgrade fails.

Upgrade takes a longer time

Pending maintenance activities: Any pending maintenance activities, such as applying an operating system patch on your RDS instance, are automatically applied along with engine version upgrades. In this case, the operating system patch is applied first followed by the version upgrade. Therefore, performing operating system maintenance activities leads to an increase in the time taken to complete the upgrade.

Also, if your RDS instance is in a Multi-AZ deployment, then operating system maintenance results in a failover. When you set up your instance in Multi-AZ, the backup for the instance is usually created on the secondary instance. In case of a failover, a backup is created on a new secondary instance after the upgrade. This backup on the new secondary instance might not be the latest backup. Therefore, a full backup might be triggered instead of an incremental backup. Creating a full backup can take a long time, especially if the database is very large.

To avoid this issue, be sure to look for pending maintenance activities in the Pending maintenance section under the Maintenance & backups tab in your RDS console. Or, use the AWS Command Line Interface (AWS CLI) command describe-pending-maintenance-actions on your instance.

aws rds describe-pending-maintenance-actions --resource-identifier example-arn

Be sure to complete these maintenance activities before performing the database engine version upgrades.

No snapshot created before the upgrade: It's a best practice to create a snapshot of the RDS instance before performing the upgrade. If you already enabled backups for your instance, then a snapshot is created automatically as part of the upgrade process. Creating a snapshot before the upgrade reduces the time needed for the upgrade process to complete because only an incremental backup is created during the upgrade process in this case.

Read replica upgrades: When you perform a major version upgrade of your primary DB instance, all the read replicas in the same Region are also automatically upgraded. After the upgrade workflow starts, the read replicas wait for pg_upgrade to complete successfully on the primary DB instance. Then, the primary instance upgrade waits for the read replica upgrades to be completed. You experience an outage until all the upgrades are complete. If the downtime window for the upgrade is limited, then you can promote or drop your replica instance and recreate the read replicas after the upgrade is complete.

Long-running transactions or high workload before the upgrade: Long-running transactions or a high workload running on the database just before the upgrade might increase the time taken to shut down the database and add to the upgrade time.

Run the following query to identify long-running transactions:

SQL>SELECT pid, datname, application_name, state, 
age(query_start, clock_timestamp()), usename, query 
FROM pg_stat_activity 
WHERE query NOT ILIKE '%pg_stat_activity%' AND 
usename!='rdsadmin' 
ORDER BY query_start desc;

Insufficient compute capacity: The pg_upgrade utility can be compute-intensive. Therefore, it's a best practice to perform a dry-run upgrade before upgrading your production databases. You can restore a snapshot of the production instance and perform a dry run with the same instance class as that of the production database.

Upgrade fails

Unsupported DB instance classes: The upgrade might fail if the instance class of your DB instance isn't compatible with the PostgreSQL version that you're upgrading to. Be sure to check the compatibility of the instance class with the engine version. For more information, see Supported DB engines for DB instance classes.

Open prepared transactions: Prepared transactions that are open on the database might lead to upgrade failure. Be sure to commit or roll back all open prepared transactions before starting an upgrade.

Use the following query to check if there are open prepared transactions on your instance:

SELECT count(*) FROM pg_catalog.pg_prepared_xacts;

In this case, the error in the pg_upgrade.log file looks similar to the following:

------------------------------------------------------------------------
Upgrade could not be run on Wed Apr 4 18:30:52 2018
-------------------------------------------------------------------------
The instance could not be upgraded from 9.6.11 to 10.6 for the following reasons.
Please take appropriate action on databases that have usage incompatible with 
the requested major engine version upgrade and try the upgrade again.

*There are uncommitted prepared transactions. Please commit or rollback 
all prepared transactions.*

Unsupported data types: The upgrade fails with an error if you attempt to upgrade the database with unsupported data types, such as the following:

  • regcollation
  • regconfig
  • regdictionary
  • regnamespace
  • regoper
  • regoperator
  • regproc
  • regprocedure

Note: The datatypes regclass, regrole, and regtype are supported.

The PostgreSQL upgrade utility pg_upgrade doesn't support upgrading databases that include table columns using the reg* OID-referencing system data types. Be sure to remove all uses of reg* data types, except for regclass, regrole, and regtype, before attempting an upgrade.

Run the following query to verify the usage of unsupported reg* data types:

SELECT count(*) FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n, pg_catalog.pg_attribute a
  WHERE c.oid = a.attrelid
      AND NOT a.attisdropped
      AND a.atttypid IN ('pg_catalog.regproc'::pg_catalog.regtype,
                         'pg_catalog.regprocedure'::pg_catalog.regtype,
                         'pg_catalog.regoper'::pg_catalog.regtype,
                         'pg_catalog.regoperator'::pg_catalog.regtype,
                         'pg_catalog.regconfig'::pg_catalog.regtype,
                         'pg_catalog.regdictionary'::pg_catalog.regtype)
      AND c.relnamespace = n.oid
      AND n.nspname NOT IN ('pg_catalog', 'information_schema');

Logical replication slots: An upgrade can't occur if your instance has any logical replication slots. Logical replication slots are typically used for AWS Database Migration Service (AMS DMS) migration and replicating tables from databases to data lakes, business intelligence tools, and other targets. Before upgrading, be sure that you know the purpose of logical replication slots that are in use and confirm that they can be deleted.

If the logical replication slots are still being used, you must not delete them. In this case, you can't proceed with upgrade.

The related error in the pg_upgrade log file looks similar to the following:

"The instance could not be upgraded because one or more databases have logical replication slots. Please drop all logical replication slots and try again."

If the logical replication slots aren't needed, run the following queries to delete them:

SELECT * FROM pg_replication_slots;
SELECT pg_drop_replication_slot(slot_name);

Storage issues: While the pg_upgrade script runs, the instance might run out of space. This causes the script to fail, and you see an error message similar to the following:

pg_restore: [archiver (db)] Error while PROCESSING TOC: 
pg_restore: [archiver (db)] could not execute query: ERROR: could not create file "base/12345/12345678": No space keyword" left  on device

To resolve this issue, be sure that the instance has sufficient free storage depending on the number of databases and data files before starting the upgrade.

Unknown data types: PostgreSQL versions 10 and later don't support unknown data types. If a PostgreSQL version 9.6 database uses the unknown data type, an upgrade to version 10 shows an error message similar to the following:

"The instance could not be upgraded because the 'unknown' data type is used in user tables. Please remove all usages of the 'unknown' data type and try again."

This is a PostgreSQL limitation, and RDS automation doesn't modify columns using the unknown data type. You might need to modify these columns manually before the upgrade.

Run the following query to find columns in your database with unknown data type:

SELECT DISTINCT data_type FROM information_schema.columns WHERE data_type ILIKE 'unknown';

After identifying the columns, you can remove these columns or modify them to a supported data type.

Read replica upgrade failure: If your RDS instance has read replicas, then possible read replica upgrade failures might lead to the upgrade of your primary instance being stuck. Read replica upgrade failure might also result in failure of the primary instance upgrade. A failed read replica is placed in the incompatible-restore state, and replication is terminated on the DB instance.

A read replica upgrade might fail due to the following reasons:

  • The read replica was unable to catch up with the primary DB instance even after the wait time.
  • The read replica was in a terminal or incompatible lifecycle state, such as storage-full or incompatible-restore.
  • When the primary DB instance upgrade started, a separate minor version upgrade was running on the read replica.
  • The read replica used incompatible parameters.
  • The read replica was unable to communicate with the primary DB instance to synchronize the data folder.

To resolve this issue, you can delete the read replica and recreate a new read replica based on the upgraded primary instance after the primary instance is upgraded.

Incorrect master user name: If the master user name starts with "pg_", then the upgrade fails, and you see the following error message:

PreUpgrade checks failed: The instance could not be upgraded because one or more role names start with 'pg_'. Please rename  all roles with names that start with 'pg_' and try again

To resolve this issue, create another user with the rds_superuser role. You can contact AWS Support to update this user as the new master user.

Incompatible parameter error: This error occurs if a memory-related parameter, such as shared_buffer or work_memory, is set to a higher value and caused the upgrade script to fail. To fix the issue, reduce the values of these parameters, and then try running the upgrade again.

Extensions not updated before the upgrade: A major version upgrade doesn’t upgrade any PostgreSQL extensions. If you didn't update the extensions before performing a major version upgrade, then you see the following error in the pg_upgrade.log file:

The Logs indicates that the RDS instance ''xxxx'' has older version of PostGIS extension or its dependent extensions (address_standardizer,address_standardizer_data_us, postgis_tiger_geocoder, postgis_topology, postgis_raster) installed as against the current version required for the upgrade.

This error message indicates an issue with the PostGIS extension.

In this case, run the following query to check the default and installed versions for PostGIS and its dependent extensions:

SELECT name, default_version, installed_version
FROM pg_available_extensions WHERE installed_version IS NOT NULLL AND
name LIKE 'postgis%' OR name LIKE 'address%';

If the value for installed_version is less than that of the default_version, then you must update the installed version of PostGIS and its dependent extension to the default version. To do so, run the following query:

ALTER EXTENSION extension_name UPDATE TO 'default_version_number';

For more information, see Upgrading PostgreSQL extensions.

Issue in views due to change in system catalog of the target version: Columns in the certain views vary across different PostgreSQL versions.

For example, you might see error messages similar to the following:

PreUpgrade checks failed: The instance could not be upgraded because one or more databases have views or materialized views which depend on 'pg_stat_activity'. Please drop them and try again.

This error occurs when you upgrade the database from version 9.5 to 9.6. This error is caused due to the pg_stat_activity view because the column waiting is replaced with wait_event_type and wait_event columns in version 9.6.

pg_restore: from TOC entry xxx; xxx xxxx VIEW sys_user_constraints art 
pg_restore: error: could not execute query: ERROR: column c.consrc does not exist LINE 18: "c"."consrc" AS "search_condition", ^ HINT: Perhaps you meant to reference the column "c.conkey" or the column "c.conbin".

This error occurs because the structure of the catalog pg_constraint has changed in PostgreSQL version 12.

These issues can be resolved by dropping the views based on system catalogs of the target version.

Note: Exercise caution when dropping these views. Be sure to consult with your DBA.

Other considerations

  • Be sure to follow Best practices for upgrading Amazon RDS to major and minor versions of PostgreSQL to mitigate most of the issues that might occur during engine version upgrades for RDS for PostgreSQL instances.
  • The pg_upgrade utility produces two logs: pg_upgrade_internal.log and pg_upgrade_server.log. Amazon RDS appends a timestamp to the file name for these logs. View these logs to get more information about the issues and errors encountered during the upgrade. For more information, see Monitoring Amazon RDS log files.
  • When the upgrade is complete, upgrade the pg_statistics table by running ANALYZE on all user databases. A major upgrade doesn’t move the content of the pg_statistics table to the new version. Skipping this step might result in slow running queries.

Did this article help?


Do you need billing or technical support?