How PostgreSQL processes queries and how to analyze them

With the application layer being the visible portion of a website, optimizing the database that powers the application can be critical to its performance, manageability, and scalability. Understanding how the database processes a query to return requested rows can help the application developer pinpoint bottlenecks to enhance troubleshooting efforts.

A query in PostgreSQL is declarative, meaning the SQL statement tells the database what data to return, but not how. PostgreSQL uses a client-server protocol when sending query text from client to server, and returns the full result no matter how many rows it contains. PostgreSQL runs queries in a specific order—but this is nuanced, because PostgreSQL supports concurrent query runs and relies on multi-version concurrency control (MVCC) to manage this ordering. Understanding this order is important for application developers and database administrators because it gives insight into how PostgreSQL chooses a query path, and helps pinpoint where prospective database complications may originate.

In Part 1 of this series, we discussed core PostgreSQL concepts, when and how to change parameters, database behavior, and their best practices. In this post, we discuss more PostgreSQL key concepts, including simple query protocol, explain plans, how to read explain plans, and tools to visualize these plans in Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition.

Statements, commands, and clauses

An executable line of query code in PostgreSQL is called a SQL statement. Statements provide the overall structure and organization for performing actions on a database. You may also refer to a statement as a command; some common SQL commands are SELECT, UPDATE, and DELETE. Clauses are used to filter, define, or modify conditions and can’t be run on their own, but when combined together, are runnable. By using these components together, application developers and database administrators can query databases to retrieve or change desired information.

Simple query protocol

Simple query protocol, often called query run stages, is the method of collecting query text from PostgreSQL’s frontend and processing on the backend. The stages of simple query protocol outline the order in which clauses of a query are evaluated, and almost all SQL commands are considered simple queries. The order of the query stages are as follows:

Parsing and analyzing
Transformation
Planning
Execution

Knowing what happens during these stages can help diagnose why a query won’t run or runs slowly, and can help you optimize queries to run faster. The psql command line utility uses this protocol exclusively to process queries.

Query processing

In PostgreSQL, running a query involves several stages orchestrated to efficiently retrieve and process data. The parser first checks the syntax and processes the SQL query into an internal tree structure. The query is then transformed into a query plan by the optimizer, which evaluates various strategies in order to determine the most efficient approach based on database statistics, indexes, and constraints. This plan then becomes a detailed blueprint outlining how PostgreSQL will access and join tables, filter rows, and perform any required sorts or aggregations. While the query is running, the executor starts fetching and processing data according to the chosen query plan, using buffer caches and running all the required steps (scanning tables, applying filters, and aggregating results). Finally, the results are assembled and returned to the client. Throughout these stages, the PostgreSQL planner and executor work in tandem to achieve optimal performance by utilizing indexes, minimizing disk access, and efficiently using database resources.

EXPLAIN

EXPLAIN is a command to graphically display the plan PostgreSQL’s query planner/optimizer selected, and can be used to evaluate query performance. To produce an explain plan, PostgreSQL inspects the query and returns the plan chosen by the optimizer. EXPLAIN can only be used for SELECT, INSERT, DELETE, UPDATE, AS, and DECLARE (CURSOR) statements.

Explain plan nodes are returned as an inverted tree graph. The first line of the response is the last unit of work performed, and the first step run is displayed at the bottom. Going up in the plan displays the intermediate result sets that are fed to the parent going up. The top of the query plan is the final step that return results to the client.

Each node is preceded by ->. Nodes with the same indentation are on the same level and may or may not be run in parallel; for example, joins. Investigating nodes and their costs can help you identify inefficient operations in the query run to focus performance-tuning efforts.

To run, pass the entire SQL statement after EXPLAIN. For each step, EXPLAIN prints the following information:

Type of operation required
Estimated cost of the query run
Estimated rows resulting from the query
Estimated width size of each row
Filter conditions used in the query run

EXPLAIN has many parameters, including:

ANALYZE – This parameter runs the query and displays the plan, cost estimates, actual before and after runtimes, and actual row count for each step. Running this parameter also updates statistics on the table.
BUFFERS– This can only be used with ANALYZE. It shows how many blocks (which, by default, are 8 KiB in size) each step reads, writes, and dirties.
VERBOSE– This displays all output expressions for each step in a query run plan. It is useful if the executor spends time in frequently run, expensive functions.
SETTINGS– This parameter displays all performance-relevant parameters different from default values in output.
WAL– Starting with PostgreSQL v13, this parameter displays WAL logs incurred by data modifying statements. It can only be used with ANALYZE.
FORMAT – This specifies the output format. The default is TEXT and is used to analyze query performance. Other options are XML, JSON, and YAML, and are used for automated processing.

The following are best practices for EXPLAIN:

Before deploying a new query in an application, run EXPLAIN to get an estimate of performance impact a query will have on your system
Only use explain plan results as an assumption and check performance and results before and after making any changes
Use plan results in addition to other metrics such as runtime and trace files
Review and update plans regularly because environments may change over time

The following are factors that EXPLAIN does not detail:

Why an index was not used by the planner
How to rewrite a query
External factors that can make the query or database slow, such as network lag
How much time it takes to return a query to the application or actions happening outside of the database

PostgreSQL v17 offers the following updates:

EXPLAIN can report on optimizer memory usage. The option is called MEMORY.
The EXPLAIN option SERIALIZE can report the cost of converting data for network transmission.
I/O block read/write timing statistics have been added to the EXPLAIN BUFFERS
v17 offers an improved EXPLAIN display of subplan nodes and output parameters.

ANALYZE

ANALYZE is a parameter that’s similar to EXPLAIN, where it’s used to locate nodes that consume the most cost. It defaults to EXPLAIN, but when set to EXPLAIN ANALYZE, the query runs and returns actual startup and total time in milliseconds. Startup and total time for the upper nodes are at least as high as the startup and total times of the lower nodes. To find the net time spent in a node, subtract the time spent in the lower nodes. Parallel queries need to be added together to get totals. To get the total time spent in a node, multiply the cost and time with the number of loops (because the number of loops are proportionate to cost and rows) to get a correct approximation of cost and rows.

To run, pass the entire SQL statement after EXPLAIN ANALYZE. For each step, EXPLAIN ANALYZE prints everything from an explain plan, with the following additional information:

Actual query run startup and total times in milliseconds
Row count
How many loops or runs of the node are completed for each step

The following are EXPLAIN and EXPLAIN ANALYZE best practices:

Use it to determine nodes where the most runtime was spent.
Find a node in the plan where the estimated row count is significantly different from the actual row count. This node is often the cause of bad performance, and long runtimes elsewhere can be a consequence.
Find long-running sequential scans and use a filter condition to remove rows to expedite. These are also good candidates for an index.

Consider the following when using EXPLAIN ANALYZE:

It can add overhead, causing some statements longer runtimes. This is a consequence of the database actually running the query.
Data may not be in cache during the first query run. It is valuable to repeat EXPLAIN ANALYZE a couple of times to observe result changes.
Do not use this method if the query never finishes because it will hang forever, for example idle_in_transaction
When checking for index usage, only EXPLAIN can be used.
When checking for partition usage, only EXPLAIN can be used.

Solution overview

The following sections detail the steps to create a table and run EXPLAIN and EXPLAIN ANALYZE on a simple PostgreSQL query. We also discuss how to read and calculate plan costs, and tools to help visualize explain plans in PostgreSQL.

Prerequisites

To complete the examples in this post, you must complete the following prerequisites:

Create and activate an AWS account.
Create a PostgreSQL database on Amazon Elastic Compute Cloud (Amazon EC2), RDS for PostgreSQL instance, or Aurora PostgreSQL database.

Create a table

For the examples in this post, we create a table in a PostgreSQL database using random values generated by the generate_series() function:

CREATE TABLE yes_no (id serial, yes_no_id int, repeat int);
INSERT INTO yes_no (yes_no_id, repeat)
SELECT x % 2 + 1, (random() *25)::integer
FROM generate_series(1, 100) AS f(x);

EXPLAIN SELECT * from yes_no;

Read the `EXPLAIN` plan

As detailed earlier, an explain plan without parameters gives the estimated cost, estimated number of rows, and estimated average width size of rows returned. In the previous code, a sequential scan was used on the yes_no table. The information in parenthesis contains the estimated startup cost (cost to return first row) separated by .. where the second number is the estimated total cost (cost to return all rows). The startup cost of a sequential scan is normally 0.00 because the database can usually return the first row quickly, and total cost is 2.00. The number of rows returned is 100, and the average width of rows is 12 bytes.

Default values needed when calculating cost are single full table scan seq_page_cost = 1.0 or random page access random_page_cost = 4.0, and when a row is processed, the default tuple cost cpu_tuple_cost = 0.01 is needed.

To understand how the optimizer determined the cost of the preceding sequential scan, the number of table data pages and number of tuples are also needed. To find them, use the following code:

SELECT relpages, reltuples from pg_class where relname='yes_no';

You can further break down and calculate the cost of a given explain plan based on the random_page_cost and cpu_tuple_cost settings in your particular cluster. However, the specific cost doesn’t matter as much because its value is relative to other plans the optimizer is producing. The lower the cost, the better the performance of a given query run plan. If you find plan costs that are wildly higher then other explain plans in your cluster, this can also be an indication that there is a problem in that particular query or its run plan, which should be troubleshooted.

To run an EXPLAIN ANALYZE command, pass the SQL statement to it:

EXPLAIN SELECT * from yes_no;

Read EXPLAIN ANALYZE

Adding the ANALYZE parameter actually runs the query, so use caution when calling INSERT, UPDATE, DELETE, or CREATE TABLE AS queries by wrapping these statements in a transaction block using BEGIN. Adding this parameter displays the actual planning time of the query along with the total runtime. This parameter is useful when comparing time spent in a node and estimated vs. actual rows returned for operations.

Explain plan visualizers

So far, we looked at the PostgreSQL explain plan in the terminal. In this section, we discuss an online tool to help visualize explain plans created in the terminal. Although the tool described here is popular with the customers we’ve worked with, this is by no means the only web GUI-based explain plan visualizer available.

explain.depesz.com is an online tool which allows you enter your EXPLAIN result text in a text area. Depesz parses the script to find common pattens and presents the plan in a table. The tool displays the original EXPLAIN text in columns while highlighting problematic nodes. It also highlights nodes that took longer to run. This is done by calculating cost using exclusive vs. inclusive times. Time spent in a node and all child nodes is inclusive time, and exclusive time is time spent in a single node. These times are displayed where color represents percent of planner inaccuracy. If the planner mis-estimated number of rows by 10 times, the color is yellow, by 100 times, the color is orange, and by 1,000 times, the color is red.

exlain.depesz.com is on the public web. If you have data transfer restrictions, you can run explain.depesz.com source code internally, or you can choose the options to obscure column names and not save the plan on its website.

Additional features include:

A rows x column that holds overestimated or underestimated row count. Bad estimates are highlighted with a color.
Choosing a node hides all nodes underneath. This allows you to focus on single nodes in long query run plans.
When hovering over a node, immediate children are highlighted with a star, permitting easy location in long query run plans.

To use this tool, enter the output from the preceding EXPLAIN ANALYZE query into the text box on the explain.depesz.com website.

Conclusion

Simple query protocol and explain plans are critical concepts to understand when developing for an application backed by PostgreSQL. Poorly written queries can make a well-designed data model perform poorly, and understanding how queries are run in PostgreSQL (and how to work with explain plans) are critical topics that database developers must understand. Even a well–written query could end up underperforming if a model, total data, or an access pattern changes, so understanding overall database behavior is imperative. In this post, we covered the stages of a query in PostgreSQL, and how to use explain plans to describe query runs such that inefficiencies can be quicky found and corrected.

We welcome your comments and feedback in the comments section.

About the Authors

Peter Celentano is a Specialist Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He works with AWS customers to design scalable, secure, performant, and robust database architectures on the cloud.

Tracy Jenkins is a Database Specialist Solutions Architect with Amazon Web Services. She enjoys working with databases and helping customers architect performant, highly available and scaleable solutions while making recommendations on reliability, cost and security.

AWS Database Blog