Ensuring consistent performance after major version upgrades with Amazon Aurora PostgreSQL Query Plan Management

Keeping your database version up to date provides benefits like improved performance and security, access to new features, fixes to known issues, compliance with regulatory requirements, and more. However, there is also a risk of some queries experiencing performance regression following a database upgrade. This risk is higher with major version upgrades because they can contain changes that may not be backward-compatible with existing application queries.

In this post, I show you how to minimize the risk of query performance regression after a major version upgrade in Amazon Aurora PostgreSQL-Compatible Edition using the Query Plan Management (QPM) feature. I upgrade an Aurora PostgreSQL cluster version 9.6.11 to major version 10.12 and use Aurora QPM to stabilize query plans across the upgrade. The steps described in this post are applicable to all supported major version upgrade paths in Aurora PostgreSQL.

QPM in Aurora PostgreSQL helps control how and when query plans change. It enables you to record, evaluate, and enforce the use of known, good plans by the database optimizer. These capabilities of Aurora QPM make it a good tool for maintaining query performance stability when you upgrade your database version.

Prerequisites

You need to have the following prerequisites to complete this walkthrough:

An Aurora PostgreSQL cluster running version 9.6.11 in a private subnet within an Amazon Virtual Private Cloud (Amazon VPC). For instructions, see Create an Aurora PostgreSQL DB cluster and How to create a VPC for use with Amazon Aurora.
A bastion host with network access to your Aurora PostgreSQL cluster.
A PostgreSQL client tool of your choice. This is required for connecting to Aurora PostgreSQL and running SQL commands. For this post, I use psql as a client, but you can use a tool of your choice.
pgbench installed on your bastion host. You use pgbench to load sample data into the Aurora PostgreSQL cluster.

Following this walkthrough incurs costs in your AWS account. Refer to Amazon Aurora Pricing for current pricing information.

Solution overview

This post walks you through the following high-level steps:

Preparing your Aurora PostgreSQL cluster and loading sample data.
Capturing query plans using Aurora QPM.
Upgrading the Aurora PostgreSQL cluster from version 9.6.11 to version 10.12.
Verifying use of captured query plans on the upgraded cluster.

Preparing your Aurora PostgreSQL cluster

To prepare your cluster, you must configure QPM, create the extension on your database, and then load sample data to test the solution.

Configuring QPM on your Aurora PostgreSQL cluster

Follow these steps to configure and enable QPM on your Aurora PostgreSQL cluster:

Sign in to the Amazon Relational Database Service (Amazon RDS) console.
Create a custom DB parameter group for setting Aurora QPM parameters.
1. Set the value of capture_plan_baselines to automatic and apg_plan_mgmt.use_plan_baselines to true. For more information, see Modifying parameters in a DB parameter group.
Create a custom DB cluster parameter group for setting Aurora QPM parameters.
1. Set the value of rds.enable_plan_management to 1. For more information, see Modifying parameters in a DB cluster parameter group.
Attach the custom DB cluster parameter group to your Aurora cluster. For more information, see Modifying a DB cluster.
Attach the custom DB parameter group to instances in your Aurora cluster. For more information, see Modify a DB instance in a DB cluster.
Restart your DB instance to activate these new settings.

Creating the Aurora QPM extension on your database

In the following sections, I perform the steps as a database user with the rds_superuser role. The rds_superuser role grants a user permission to add extensions to Aurora PostgreSQL. You should always follow the principles of least privileges and segregation of duties as a security best practice. For more information about creating and managing roles, see Creating roles.

From your bastion host, connect to the database and issue the following command to create Aurora QPM extension apg_plan_mgmt:
```
aurorapgdb=> CREATE EXTENSION apg_plan_mgmt;
```

Verify the newly created extension:

aurorapgdb=> SELECT extname, extversion FROM pg_extension WHERE extname = 'apg_plan_mgmt';
    extname    | extversion
---------------+------------
 apg_plan_mgmt | 1.0.1

You now verify that all QPM-related parameters that were set via the custom parameter groups are properly configured.

Query to check if apg_plan_mgmt extension is enabled:

aurorapgdb=> SHOW rds.enable_plan_management;
 rds.enable_plan_management
----------------------------
 1

Query to check if automatic capture of plan baseline is enabled:

aurorapgdb=> SHOW apg_plan_mgmt.capture_plan_baselines;
 apg_plan_mgmt.capture_plan_baselines
--------------------------------------
 automatic

Automatic capture enables Aurora QPM to capture plans for statements that run at least twice.

Query to check if the use of captured plans for managed statements is enabled:

aurorapgdb=> SHOW apg_plan_mgmt.use_plan_baselines;

apg_plan_mgmt.use_plan_baselines

----------------------------------

on

Managed statements are SQL statements captured by the optimizer under QPM.

For more information about Aurora QPM parameters, see Parameter reference for query plan management.

Loading sample data into your database

In this step, you load some data into your Aurora PostgreSQL database using pgbench.

Run the following pgbench command to create tables and load data into your Aurora PostgreSQL database cluster. Remember to substitute appropriate values for your Aurora cluster endpoint, username, port number, and database name in the command. Supply a password for the database user when prompted.

$  pgbench --initialize --scale=100 --host=<Aurora cluster endpoint> --username=<username> --port=<port number> <database name>

The operating system prompt returns after the command is successful.

Capturing the query plan on an Aurora PostgreSQL version 9.6.11-compatible cluster

In this step, I demonstrate how to tune a sample query using pg_hint_plan to provide hints to the query planner. pg_hint_plan is an extension in PostgreSQL that you use to direct the query planner to use a desired join order, join method, or access paths for a SQL statement. After tuning the query, use Aurora QPM to capture and approve the plan, which subsequent query runs use.

This tuning approach is particularly useful if you can’t modify SQL in a third-party application code. You can simply extract SQL from the application code, tune it using pg_hint_plan, and configure Aurora QPM to capture and approve the tuned plan. Subsequent runs of the exact query by your application then use the approved tuned plan.

Configuring the pg_hint_plan extension in your database

You now configure the extension in your database.

From the bastion host, connect to your Aurora PostgreSQL cluster as in the following code. Remember to substitute appropriate values for your Aurora cluster endpoint, username, port number, and database name. Supply a password for the database user when prompted.
```
$  psql --username=<username> --host=<Aurora cluster endpoint> --port=<port number> --database=<database name>
```

Enter the following code to install and configure pg_hint_plan in the database and validate it:
```
aurorapgdb=> CREATE EXTENSION pg_hint_plan;

CREATE EXTENSION
```

Enable the hint table and verify it’s properly enabled:

aurorapgdb=> SET pg_hint_plan.enable_hint = true;
SET

aurorapgdb => SHOW pg_hint_plan.enable_hint;
pg_hint_plan.enable_hint
--------------------------
 on

Capturing the workload with Aurora QPM

To capture the workload, complete the following steps:

Tune the following example SELECT statement by adding hints to direct the query planner to use specific join and access methods:
```
SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
```

The tuning done here is simplified for illustration purposes and is not presented as most optimal for this type of query.

Run the query twice to allow Aurora QPM to capture it as a managed statement.

For clarity, I run the query with EXPLAIN to display the query plan created by the query planner. The (hashes true) option displays details of the SQL and plan hash used by the query planner.

aurorapgdb => EXPLAIN (hashes true) SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.58..2123638.44 rows=10000000 width=465)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts a  (cost=0.43..423626.43 rows=10000000 width=97)
   ->  Index Scan using pgbench_branches_pkey on pgbench_branches b  (cost=0.14..0.16 rows=1 width=364)
         Index Cond: (bid = a.bid)
 SQL Hash: -1649916351, Plan Hash: 1425849592

The EXPLAIN plan shows that the query planner used a nested loop join method and index scans.

Record the SQL hash returned.

Run the following SQL to verify successful capture of the query plan by Aurora QPM:

aurorapgdb=> SELECT sql_hash, plan_hash, status, enabled, sql_text::varchar(10) FROM apg_plan_mgmt.dba_plans where sql_hash=-1649916351;
  sql_hash   | plan_hash  |  status  | enabled |  sql_text  
-------------+------------+----------+---------+------------
 -1649916351 | 1425849592 | Approved | t       | SELECT * F

The output shows that the query with SQL hash -1649916351 and plan hash 1425849592 was automatically captured and approved by Aurora QPM.

Add hints to the query directing the planner to use hash join and sequential scan, then capture the new plan as in the following code.

The choice of hints selected in this example is strictly for illustration purposes. Index scans may be more beneficial for some queries than sequential scans depending on factors like table size and cardinality.

aurorapgdb=> SET apg_plan_mgmt.capture_plan_baselines = manual; -- turn on manual plan capture
SET
aurorapgdb=> /*+
aurorapgdb*> HashJoin(a b)
aurorapgdb*> SeqScan(a)
aurorapgdb*> */
aurorapgdb-> EXPLAIN (hashes true) SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.58..2123638.44 rows=10000000 width=465)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts a  (cost=0.43..423626.43 rows=10000000 width=97)
   ->  Index Scan using pgbench_branches_pkey on pgbench_branches b  (cost=0.14..0.16 rows=1 width=364)
         Index Cond: (bid = a.bid)
 Note: An Approved plan was used instead of the minimum cost plan.
 SQL Hash: -1649916351, Plan Hash: 1425849592, Minimum Cost Plan Hash: -58699781
(6 rows)

The planner used the previously approved plan hash 1425849592. But it also generated a new but unapproved plan with hash -58699781. The following query displays the two captured plans:

aurorapgdb=> SELECT sql_hash, plan_hash, status, enabled, sql_text::varchar(10) FROM apg_plan_mgmt.dba_plans WHERE sql_hash=-1649916351;
  sql_hash   | plan_hash  |  status  | enabled |  sql_text  
-------------+------------+----------+---------+------------
 -1649916351 |  -58699781 | Rejected | t       | SELECT * F
 -1649916351 | 1425849592 | Approved | t       | SELECT * F

The optimizer continues to use an approved plan until you reject and replace it with another plan, even when cheaper plans exist. For the query planner to use the new plan, reject the original plan (1425849592) and approve the tuned plan (-58699781).

Run the following SQL to reject all existing plans for the SQL:

aurorapgdb=> SELECT apg_plan_mgmt.set_plan_status (sql_hash, plan_hash, 'Rejected') from apg_plan_mgmt.dba_plans where sql_hash = -1649916351;
 set_plan_status
-----------------
               0
               0

Verify the status of all existing plans for the SQL statement with SQL hash –1649916351.

aurorapgdb=> SELECT sql_hash, plan_hash, status, enabled, sql_text::varchar(10) FROM apg_plan_mgmt.dba_plans where sql_hash = -1649916351;
  sql_hash   | plan_hash  |  status  | enabled |  sql_text  
-------------+------------+----------+---------+------------
 -1649916351 |  -58699781 | Rejected | t       | SELECT * F
 -1649916351 | 1425849592 | Rejected | t       | SELECT * F

The plans are in the rejected status at this point.

Run the following SQL to approve the tuned plan (-58699781):

aurorapgdb=> SELECT apg_plan_mgmt.set_plan_status (-1649916351, -58699781, 'Approved');
 set_plan_status
-----------------
               0

Verify the status of the new plan using the following query:

aurorapgdb=> SELECT sql_hash, plan_hash, status, enabled, sql_text::varchar(10) FROM apg_plan_mgmt.dba_plans where sql_hash = -1649916351;
  sql_hash   | plan_hash  |  status  | enabled |  sql_text  
-------------+------------+----------+---------+------------
 -1649916351 |  -58699781 | Approved | t       | SELECT * F
 -1649916351 | 1425849592 | Rejected | t       | SELECT * F

The tuned plan is now in approved status.

From this point on, you don’t need to add hints to the query because Aurora QPM enforces the use of the approved plan as long as the SQL hash matches.

Run the query again, this time without hints and observe the output:

aurorapgdb=> EXPLAIN (hashes true) SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Sort  (cost=3864340.76..3872947.35 rows=3442635 width=720)
   Sort Key: a.aid
   ->  Hash Join  (cost=14.50..207604.26 rows=3442635 width=720)
         Hash Cond: (a.bid = b.bid)
         ->  Seq Scan on pgbench_accounts a  (cost=0.00..198361.35 rows=3442635 width=352)
         ->  Hash  (cost=12.00..12.00 rows=200 width=364)
               ->  Seq Scan on pgbench_branches b  (cost=0.00..12.00 rows=200 width=364)
 Note: An Approved plan was used instead of the minimum cost plan.
 SQL Hash: -1649916351, Plan Hash: -58699781, Minimum Cost Plan Hash: 1425849592

The planner uses the newly approved plan.

Now you have a query tuned to use a specific plan on your Aurora PostgreSQL 9.6.11 cluster. For this post, assume that the plan with hash –58699781 is a good plan you intend to carry on to the upgraded cluster. You’re ready for the next step, which is to upgrade the cluster to a new major version.

Upgrading the Aurora PostgreSQL database cluster

To upgrade your cluster, complete the following steps:

Prepare for the major version upgrade by creating a custom DB instance and DB cluster parameter groups that are compatible with your target version.

For this post, the target is Aurora PostgreSQL major version 10.

Configure Aurora QPM parameters in the DB instance and DB cluster parameter groups the same way you did earlier.
Upgrade your Aurora PostgreSQL cluster from major version 9.6.11 to 10.12. For instructions, see Manually upgrading the Aurora PostgreSQL engine.

After the upgrade, connect to the cluster and run the following SQL to see the new version of Aurora PostgreSQL:

aurorapgdb=> SELECT aurora_version(), version();
 aurora_version |                                   version                                    
----------------+------------------------------------------------------------------------------
 2.5.4          | PostgreSQL 10.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit

Validate Aurora QPM configuration by running the following SQL commands:

aurorapgdb=> SHOW rds.enable_plan_management; -- check that plan management is enabled
 rds.enable_plan_management
----------------------------
 1

aurorapgdb=> SHOW apg_plan_mgmt.use_plan_baselines; -- check that the optimizer is configured to use managed plans
 apg_plan_mgmt.use_plan_baselines
----------------------------------
 on

aurorapgdb=> SELECT extname, extversion FROM pg_extension WHERE extname = 'apg_plan_mgmt'; -- check version of the Aurora QPM extension
    extname    | extversion
---------------+------------
 apg_plan_mgmt | 1.0.1

Validating use of captured query plans after the upgrade

When you run the following query on apg_plan_mgmt.dba_plans, you can see that the managed plans and their respective status are retained from the Aurora PostgreSQL 9.6.11 cluster:

aurorapgdb=> SELECT sql_hash, plan_hash, status, enabled, sql_text::varchar(10) FROM apg_plan_mgmt.dba_plans;
  sql_hash   | plan_hash  |  status  | enabled |  sql_text  
-------------+------------+----------+---------+------------
 -1649916351 |  -58699781 | Approved | t       | SELECT * F
 -1649916351 | 1425849592 | Rejected | t       | SELECT * F

Run the SELECT statement again with EXPLAIN, this time without hints. See the following code:

aurorapgdb=> EXPLAIN (hashes true) SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Sort  (cost=3864340.76..3872947.35 rows=3442635 width=720)
   Sort Key: a.aid
   ->  Hash Join  (cost=14.50..207604.26 rows=3442635 width=720)
         Hash Cond: (a.bid = b.bid)
         ->  Seq Scan on pgbench_accounts a  (cost=0.00..198361.35 rows=3442635 width=352)
         ->  Hash  (cost=12.00..12.00 rows=200 width=364)
               ->  Seq Scan on pgbench_branches b  (cost=0.00..12.00 rows=200 width=364)
 Note: An Approved plan was used instead of the minimum cost plan.
 SQL Hash: -1649916351, Plan Hash: -58699781, Minimum Cost Plan Hash: 1425849592

The planner uses the known good plan that you approved before the upgrade.

You have successfully maintained performance stability of a sample query through an upgrade from Aurora PostgreSQL 9.6.11 to 10.12. You did this using Aurora QPM to capture, carry over, and enforce a known good query plan.

Next steps

The query planner on the upgraded Aurora PostgreSQL cluster continues to use the plan that you approved when the cluster was on the previous major version until you reject it. After the upgrade, you should carefully evaluate new plans that evolve, test them, and approve the ones that meet your performance goals. For more information, see Maintaining execution plans.

Summary

In this post, I presented you with a solution to a common concern around database version upgrades—the risk of query performance regression. Upgrading a database to a new version enables you to enjoy latest features, bug fixes, and improved security and performance. However, following an upgrade, there is a risk that some queries may experience performance regression due to changes in optimizer behavior or other factors. Although the number of affected queries may be small, the impact could be huge if those queries support critical business functions. This post presented a solution that allows you to manage query plans effectively across upgrades while you meet your upgrade goals.

With an upgrade use case, I described how to use Aurora QPM to stabilize query performance through an upgrade. I walked you through steps that you can apply in your environment to capture known good query plans on an Aurora PostgreSQL cluster before an upgrade. Then I showed how to force the upgraded cluster to use the known good query plans, allowing you to maintain stable performance.

Aurora QPM has a number of other important use cases you can take advantage of. For more information, refer to Use cases for query plan management in Amazon Aurora PostgreSQL.

About the Author

Lanre Showunmi is a database specialist with AWS Professional Services. He is based in London, United Kingdom where he works with customers and partners to build scalable, highly available and secure solutions in the AWS cloud.