Performing Parallel Queries and Phrase Searching with Amazon RDS for PostgreSQL 9.6.1
Jeff Levine is a solutions architect for Amazon Web Services.
As a solutions architect for Amazon Web Services, one of my responsibilities is to help our customers to learn about new features and capabilities of our services. Amazon RDS makes it possible to set up, operate, and scale relational databases in the cloud without having to focus on time-consuming administrative tasks. Amazon RDS recently announced the support of PostgreSQL 9.6.1. In this blog post, I will discuss two of the features included in this release: the ability to perform parallel queries and phrase searches.
Parallel queries make it possible to use multiple CPU cores to speed up the execution of queries by creating multiple worker processes. Each worker process performs the query on a subset of data. The results are then combined to produce the final query result. The number of worker processes is determined by the value of the parameter
max_parallel_workers_per_gather. The default value of this parameter is
0, which disables parallelism.
In the examples that follow, I’ve created a VPC with a PostgreSQL RDS database on a db.m4.large instance class with 2 vCPUs and 8 GB RAM. I’m going to use pgcli, a command line tool for PostgreSQL, to interact with the database named pg961db to create a table.
create table statement uses the
generate_series function to create a set of integers from 1 to 100 million that serve as identifiers for the rows of the table. I am also using the
random() and floor() functions to generate a column with a random number from 1 to 9. I then use the
select statement to show the first three rows of the table. Now I will run a standard query:
In the preceding screenshot,
max_parallel_workers_per_gather is set to
0 to disable parallel queries and then perform a select. Notice that the query plan calls for a sequential scan (as denoted by
Seq Scan). The total query time is approximately 44 seconds. Now let’s see what happens when parallel queries are enabled:
In this example,
max_parallel_workers_per_gather is set to 2 to enable parallel queries. The output from
explain shows that two workers have been launched and that the total processing time was reduced to approximately 29 seconds, a 34% reduction in the time required to perform the same query with only one worker.
This example shows how parallel queries can reduce the processing time of queries using multiple workers in a divide and conquer manner. This is especially useful when processing aggregate functions such as
avg() in the preceding example. The resulting increase in efficiency will depend on a number of factors, including the data itself, the specific query, and the database instance class.
PostgreSQL 9.6.1 also introduced the ability to perform phrase searches, in which a search query consists of an ordered set of lexemes rather than, as with standard searches, an unordered set. Consider these standard search examples:
In the first example, the search for “Amazon Web Services” in “Amazon Web Services has over 90 services” yields the expected result of
True. However, the search for “Amazon Web Services” in “The Web Travel Services Agency offers tours of the Amazon” also yields a result of
True even though it is likely not what we wish to see. To understand why this happens, see how the
plainto_tsquery() function parses query strings:
The parser first applies the English language processing rules to the string “Amazon Web Services” by normalizing the case of each word and extracting the root of each token before converting it to a lexeme (hence, the change of “services” to “servic”). The resulting query string consists of the lexemes separated by the intersection operator “&.” A match occurs if all of the lexemes are found, regardless of order. PostgreSQL 9.6.1 includes support for phrase search query strings using the
phraseto_tsquery() function, as shown here:
The query string contains the same three lexemes, but with a new ordering operator
<->, which means “is followed by.” For a match to occur, the three lexemes must appear in the correct order. This capability makes it much easier to refine searches and produce the most desirable results.
Amazon RDS for PostgreSQL version 9.6.1 brings the latest capabilities of PostgreSQL to the cloud as a managed service. The parallel query feature can increase the performance of searches by using additional CPU capacity to run multiple worker processes. The phrase search capability provides for a more tailored search experience where the specific order of terms is required. It’s easy to get started. Use our Getting Started Guide to create your PostgreSQL database in just a few clicks.