AWS Database Blog

Accelerate HNSW indexing and searching with pgvector on Amazon Aurora PostgreSQL-compatible edition and Amazon RDS for PostgreSQL

It’s important to understand strategies for how to use pgvector with your Generative AI applications so you can find the optimal configurations. This can include the selection of your database instance family, how you ingest your data, and even the version of pgvector you’re using.

We ran a series of tests to see how pgvector 0.5.1 performed compared to the previous version (0.5.0), and workload characteristics on Graviton2 and Graviton3 instances on Amazon Aurora PostgreSQL-compatible edition and Amazon RDS for PostgreSQL. We used the HNSW (hierarchical navigable small worlds) index method in pgvector for this test. For the workloads we tested, we observed that pgvector 0.5.1 can ingest vectors up to 225% faster than pgvector 0.5.0, and for searching over vectors with high dimensionality (1536 dimensions) with higher concurrency (64 clients), Graviton3 instances can outperform Graviton2 by 60%.

In this post, we explore several strategies to help maximize performance when using HNSW indexing with pgvector on and Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

Background: Using pgvector for Retrieval Augmented Generation (RAG)

Generative AI applications can use a technique called Retrieval Augmented Generation (RAG) that provides additional context to foundational models to provide domain-specific results to users. A key component of RAG is a vector data store, which provides an efficient storage and retrieval mechanism to recall information that can be used by a foundational model to provide an enhanced answer. To prepare a vector data store for RAG, you must first convert raw data, such as text documents, by passing it through an embedding model, such as Amazon Titan, and turning them into a set of corresponding vectors that provide a mathematical representation of the data. From there, users of a generative AI application can perform actions such as ask questions, and the application can perform a vector similarity search to find the best solution to augment the foundational model. The following graphic demonstrates the ingestion and retrieval workflow used in RAG.

In the ingestion workflow, raw data from a source such as a data lake (A) is converted by batch processes into vectors of tokens (B) and stored in a vector database (C). During transactional user interactions, the user issues a question (1), which is converted into vectors of tokens (2) and presented to the vector database for a similarity search (3). The result of the database query is merged with the tokenized user question to generate a richer prompt for better answers from the generative AI model (4).

pgvector is an open-source extension available for Amazon Aurora PostgreSQL-Compatible Edition and Amazon RDS for PostgreSQL that adds vector database capabilities, including a vector data type, distance operations, and several indexing methods. One aspect of successfully using pgvector in your Generative AI application is choosing and configuring the appropriate index method. The next section provides an overview of how indexing works with pgvector.

Understanding approximate nearest neighbor indexing

Finding similar vectors, also known as nearest neighbor or K-NN queries, can be a time-consuming process. This is because a query must compare the distance of every single vector in a dataset to find the top candidates. Although searching over every vector in a dataset provides the best quality results, a generative AI application may be able to trade off result quality for speed.

Approximate nearest neighbor (ANN) algorithms can help accelerate these searches by searching over a subset of vectors in the dataset, though this can impact the quality of results. The metric used to measure result quality is recall, which measures the percentage of expected results for a given query. ANN algorithms provide tuning parameters that let you choose the performance/recall trade-off that’s appropriate for your application.

The pgvector extension provides two ANN index methods: IVFFLAT and, as of 0.5.0, HNSW (hierarchical navigable small worlds). HNSW is a graph-based indexing method that groups similar vectors into increasingly dense layers, which allows for efficient, high-quality searches over a smaller set of vectors. Both Amazon Aurora and Amazon RDS support HNSW indexing with pgvector. Amazon RDS for PostgreSQL recently added pgvector 0.5.1 to version 15.4-R3, which contains performance improvements for high-concurrency workloads that require HNSW indexing.

Let’s explore how pgvector works so we can build strategies to use it to effectively index and search data in our application.

Prerequisites

For the experiment, we used RDS for PostgreSQL instances, specifically with version 15.4-R3, and pgvector 0.5.1. These experiments can also be run with Amazon Aurora PostgreSQL-compatible edition. The experiment compared pgvector 0.5.1 to using 0.5.0. Additionally, for the purposes of benchmarking, we used a db.m7g.16xlarge instance and a db.m6g.16xlarge instance, both with default configurations, and used gp3 storage with 20,000 provisioned IOPS. For the client instance, we used an Amazon Elastic Compute Cloud (Amazon EC2) c6in.8xlarge instance that is positioned in the same Availability Zone as the databases (for example, us-east-2a).

About the testing methodology

When reviewing the performance of a vector data store, particularly when using an ANN search, it’s important to evaluate four characteristics:

  • Performance – How fast does a query return a result?
  • Recall – What is the quality of the results?
  • Build time – How long does it take to build an index?
  • Size – How much storage does the data require?

These characteristics are used both by vector data store designers and users to understand how well a data store works for vector workloads. This post focuses primarily on query performance and index build time. We don’t focus on size because the vector storage size is either fixed or not important to the outcome of the tests demonstrated in this post.

Although it’s essential to consider recall when building an application that uses an ANN method such as HNSW, we address recall indirectly in this post when we discuss query performance with the hnsw.ef_search parameter. Higher values of hnsw.ef_search typically provide better recall. Additionally, you can often improve recall by increasing the index construction parameters m and ef_construction, but the strategies outlined in this post are applicable regardless of these values. This post uses the default values for m and ef_construction provided in pgvector 0.5.1, which are m=16 and ef_construction=64.

Data ingestion with pgvector and HNSW on Amazon RDS for PostgreSQL

The first step to setting up an RDS for PostgreSQL database for RAG workflows is to prepare it to store vectors. In this section, we show how concurrent inserts, or inserting vectors from different clients, can help accelerate building an HNSW index in pgvector. As of version 0.5.1, pgvector doesn’t support parallel index builds for HNSW, but concurrent inserts offer a way to speed up index build times. HNSW is an iterative algorithm, which means that you can continue to add vectors to the HNSW index over time. This makes it possible to use the concurrent insertion method.

To run this example, you must first ensure pgvector is installed in your PostgreSQL database. You can install pgvector in your database by running the following command:

CREATE EXTENSION IF NOT EXISTS vector;

You will see the following output:

CREATE EXTENSION

For our ingestion test, we want to generate random vectors. The following PL/pgSQL takes an argument of dim and generates a normalized vector that has dim dimensions:

CREATE OR REPLACE FUNCTION generate_random_normalized_vector(dim integer)
RETURNS vector
LANGUAGE plpgsql
AS $$
DECLARE
    a real[];
    mag real;
    x real;
    i int;
BEGIN
    SELECT array_agg(random()::real) INTO a FROM generate_series(1, dim);

    mag := public.vector_norm(a::vector);

    FOR i IN 1..dim
    LOOP
        a[i] := a[i] / mag;
    END LOOP;

    RETURN a::vector;
END;
$$;

You will see the following output:

CREATE FUNCTION

For testing ingestion performance, we use the pgbench PostgreSQL utility. pgbench is a benchmarking tool that comes with PostgreSQL. You can provide your own custom benchmarking scripts for pgbench. For this data ingestion test, we created a script that inserts a randomly generated dimensions into a table called vecs. Create a file called pgvector-hnsw-concurrent-insert.sql and add the following code:

\setshell vector_dim echo ${VECTOR_DIM:-128}

INSERT INTO vecs (embedding)
SELECT public.generate_random_normalized_vector(':vector_dim');

This script uses a pgbench command, \setshell, to pass in an arbitrary value for the vector dimension, defaulting to 128 dimensions. Note that generating the random vector can take 0.5 to 2 milliseconds, which has a marginal impact on ingestion performance, but we opt for this method due to convenience.

We now have enough to set up the ingestion test. For this test, we see how many vectors we can insert into each environment over a period of 15 minutes (900 seconds). Running the test for a prolonged period of time helps normalize the effects of other operations on the server, client, and network. For each iteration of the test, we increase the number of clients, and use a ratio of one client per job, where a job is equivalent to one thread. Also, between each test run, we must drop any previously created index, remove all the data from the table, and create an empty index.

We use the following commands to run the ingestion test:

export PGHOST="<database host>"
export PGUSER="<database user>"
export PGPASSWORD="<database password>"
export PGDATABASE="<database name>"
export VECTOR_DIM=128
TEST_TIME=900
VECTOR_OPS="vector_l2_ops"
for clients in 1 2 4 8 16 32 64; do
    psql -c "DROP INDEX IF EXISTS vecs_embedding_idx;"
    psql -c "TRUNCATE TABLE vecs;"
    psql -c "CREATE INDEX ON vecs USING hnsw(embedding ${VECTOR_OPS});"
    pgbench -n -T "${TEST_TIME}" \
      --file=pgvector-hnsw-concurrent-insert.sql \
      -c "${clients}" -j "${clients}"
done

As the test runs, you’ll see output similar to the following:

pgbench (16.0 (Ubuntu 16.0-1.pgdg22.04+1), server 15.4)
transaction type: pgvector-hnsw-concurrent-insert.sql
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
maximum number of tries: 1
duration: 900 s
number of transactions actually processed: 8127088
number of failed transactions: 0 (0.000%)
latency average = 7.081 ms
initial connection time = 870.289 ms
tps = 9038.789025 (without initial connection time)

For the first run, we inserted 128 dimensional vectors into a table. The table was created using the following SQL:

CREATE TABLE vecs (
    id int GENERATED BY DEFAULT AS IDENTITY,
    embedding vector(128)
);

You will see the following output:

CREATE TABLE

The following table shows the results of the different runs. As mentioned earlier, these tests were performed using gp3 storage with 20,000 provisioned IOPS and used the Euclidean (L2) distance operation and measured using transactions per second (TPS):

 

Clients (TPS)
Instance Version 1 2 4 8 16 32 64
m7g.16xl 0.5.0 310 515 800 1178 1717 2325 2758
m7g.16xl 0.5.1 309 541 950 1683 3051 5489 8960
m6g.16xl 0.5.1 238 420 786 1420 2636 4736 7318

Let’s look more closely at these results. In these experiments, we can observe two factors that impacted ingestion times:

  • Using pgvector 0.5.0 and 0.5.1
  • The instance type used

The first test kept the instance type fixed (db.m7g.16xlarge) but used two different version of pgvector (0.5.0 and 0.5.1). As the number of clients increases, we can see that pgvector 0.5.1 can index vectors more rapidly than 0.5.0. With 64 clients concurrently inserting vectors, pgvector 0.5.1 showed a 225%, or 3.25 times, speedup over pgvector 0.5.0.

The following table shows the results presented in the above chart:

Clients 0.5.0 (TPS) 0.5.1 (TPS) Speedup (%)
1 310 309 -1%
2 515 541 5%
4 800 950 19%
8 1178 1683 43%
16 1717 3051 78%
32 2325 5489 136%
64 2758 8960 225%

The bulk of this speedup comes from two pull requests (refer to Speed up HNSW index build and Improve HNSW index build performance more) in pgvector that reduce the number of distance calculations performed when inserting elements.

The second test kept the version of pgvector fixed (0.5.1) but used two different instance types (db.m7g.16xlarge, db.m6g.16xlarge). In this test run, we observe that there is a general performance increase in using the m7g family of instances.

The following table shows the results presented in the above chart:

Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 238 309 29.70%
2 420 541 28.80%
4 786 950 20.90%
8 1420 1683 18.50%
16 2636 3051 15.80%
32 4736 5489 15.90%
64 7318 8960 22.40%

The m7g instance family uses the Graviton3 processor, which contains general performance improvements and specific improvements targeted for floating point calculations. Vector workloads using pgvector can benefit from the newer Graviton processors to help accelerate workloads, as these tests show.

The preceding tests use a 128-dimensional vector, but many embedding models use larger vectors. For example, Amazon Titan embeddings generate a 1,536-dimensional vector to represent its embeddings. We can repeat the earlier test to determine how pgvector and Amazon RDS for PostgreSQL work with a 1,536-dimensional vector. To set up the test, first recreate the vecs table to store a 1,536-dimensional vector:

DROP TABLE IF EXISTS vecs;
CREATE TABLE vecs (
    id int GENERATED BY DEFAULT AS IDENTITY,
    embedding vector(1536)
);

You can now run the tests for vectors with 1,536 dimensions. You can use the same script as before, but change VECTOR_DIM to be 1536:

export PGHOST="<database host>"
export PGUSER="<database user>"
export PGPASSWORD="<database password>"
export PGDATABASE="<database name>"
export VECTOR_DIM=1536
TEST_TIME=900
VECTOR_OPS="vector_l2_ops"
for clients in 1 2 4 8 16 32 64; do
    psql -c "DROP INDEX IF EXISTS vecs_embedding_idx;"
    psql -c "TRUNCATE TABLE vecs;"
    psql -c "CREATE INDEX ON vecs USING hnsw(embedding ${VECTOR_OPS});"
    pgbench -n -T "${TEST_TIME}" \
      --file=pgvector-hnsw-concurrent-insert.sql \
      -c "${clients}" -j "${clients}"
done

The following table shows the results of the different runs with the 1,536-dimensional vector dataset. As mentioned earlier, these tests were performed using gp3 storage with 20,000 provisioned IOPS and used the Euclidean (L2) distance operation and measured using transactions per second (TPS).

 

Clients (TPS)
Instance Version 1 2 4 8 16 32 64
m7g.16xl 0.5.0 71 121 191 287 404 474 487
m7g.16xl 0.5.1 71 130 238 442 772 1,039 1,031
m6g.16xl 0.5.1 57 102 189 344 570 656 664

For the first test, where we varied pgvector versions (0.5.0, 0.5.1), we observed a speedup but not as significant as on the 128-dimensional vector dataset. You can see the results in the following figure:

The following table shows the results presented in the above chart:

Clients 0.5.0 (TPS) 0.5.1 (TPS) Speedup
1 71 71 0%
2 121 130 7%
4 191 238 25%
8 287 442 54%
16 404 772 91%
32 474 1,039 119%
64 487 1,031 112%

The speedup in the test with the 1,536-dimensional vectors is not as large as the 128-dimensional vectors primarily due to how the vectors are stored. The fundamental storage unit in PostgreSQL is called a page, which can store up to 8 KB of data. A 128-dimensional vector requires about 0.5 KB of storage, whereas a 1,536-dimensional vector requires about 6 KB. Therefore, a 1,536-dimensional vector will take up an entire page, whereas you can fit multiple 128-dimensional vectors on the same page. When building an HNSW index, pgvector will need to read more pages to index a 1,536-dimensional vector compared to a 128-dimensional vector, which can take additional time if the pages aren’t in active memory. The improvements in pgvector 0.5.1 target reducing the number of calculations, so although the 1,536-dimensional vectors do benefit from them, you may observe less benefit over smaller vectors due to the amount of space the larger vectors require.

For the second test, where we varied the instance class (db.m6g.16xlarge, db.m7g.16xlarge), we observed a larger speedup particularly as the number of clients increased. You can see the results in the following figure:

The following table shows the results presented in the above chart:

Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 57 71 25%
2 102 130 27%
4 189 238 26%
8 344 442 28%
16 570 772 35%
32 656 1,039 58%
64 664 1,031 55%

As mentioned previously, Graviton3 processors contain optimizations that can help accelerate floating point calculations. It takes additional effort to calculate the distance between 1,536-dimensional vectors due to their size, which means the m7g instance family can accelerate processing these vectors.

Now that we’ve observed how pgvector 0.5.1 contains performance improvements on index building and how your choice of instance type can affect ingestion performance, let’s review how these decisions impact query performance.

Query performance with pgvector and HNSW on Amazon RDS for PostgreSQL

One reason HNSW is a popular choice for vector queries is due to its general performance while achieving high recall. In the previous section, we observed that instance type could impact HNSW index build times. But what effect does instance selection have on query performance on this dataset?

For this test, we use pgvector 0.5.1 and only vary the instance type (db.m6g.16xlarge, db.m7g.16xlarge). Like the previous test, we use gp3 storage with 20,000 provisioned IOPS. For each run, all of the vectors in the dataset were indexed and the table and vector data were loaded into memory.

This test used a process-based multiprocessor written in Python to run queries. Each process generated 50,000 random normalized vectors, connected to the database, and simultaneously ran queries. We used this to calculate the total number of queries processed per second. To simulate recall levels, we performed this test with increasing values of hnsw.ef_search. Each test run measured performance by calculating the number of transactions run per second (TPS).

The following results used a dataset of 10,000,000 128-dimensional vectors. The table used 5.45 GB of storage, and the HNSW index used 7.75 GB of storage.

ef_search: 20
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 967 1120 13.60%
2 1667 1823 8.60%
4 3251 3278 0.80%
8 6481 6616 2%
16 12,923 13,177 1.90%
32 24,474 25,096 2.50%
64 47,367 49,731 4.80%
ef_search: 40
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 601 671 10.40%
2 1,037 1,106 6.30%
4 1,964 1,938 -1.30%
8 3,879 3,925 1.20%
16 7,750 7,814 0.80%
32 15,073 15,083 0.10%
64 29,110 29,967 2.90%
ef_search: 80
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 359 371 3.20%
2 608 625 2.70%
4 1,108 1,104 -0.40%
8 2,253 2,228 -1.20%
16 4,431 4,411 -0.40%
32 8,564 8,581 0.20%
64 16,930 17,358 2.50%
ef_search: 200
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 161 173 6.70%
2 276 289 4.50%
4 501 500 -0.20%
8 1,015 1,016 0.10%
16 1,998 1,984 -0.70%
32 3,928 3,981 1.30%
64 7,672 8,472 9.40%
ef_search: 400
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 85 88 3.50%
2 148 150 1.90%
4 264 261 -0.90%
8 536 533 -0.50%
16 1,055 1,059 0.30%
32 2,078 2,121 2.10%
64 3,982 4,510 11.70%
ef_search: 800
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 43 47 7.70%
2 76 77 1%
4 135 134 -0.80%
8 276 277 0.40%
16 548 550 0.40%
32 1,066 1,077 1%
64 1,956 2,191 10.70%

Overall, for the 128-dimensional dataset, the query performance results were comparable between the db.m6g.16xlarge and the db.m7g.16xlarge. However, we observed that the query performance on the db.m7g.16xlarge was consistently around 10% faster with 64 clients on higher values of hnsw.ef_search.

We repeated the experiment using 10,000,000 1,536-dimensional vectors, which have the same dimensions as the vectors generated by the Amazon Titan embeddings. The table used 39 GB of storage, and the index used 38 GB of storage. We ran a similar experiment, where we compared the query performance with different values of hnsw.ef_search with 64 clients concurrently executing queries. As with the earlier experiment, we compared the db.m6g.16xlarge and db.m7g.16xlarge instances. You can see the results in the following figure:

The following table shows the results presented in the above chart:

ef_search: 20
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 576 662 15.1%
2 994 1,115 12.2%
4 1,944 2,213 13.8%
8 3,453 3,788 9.7%
16 7,081 7,793 10.1%
32 14,341 15,889 10.8%
64 20,800 28,798 38.5%
ef_search: 40
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 358 397 11.1%
2 626 693 10.7%
4 1,198 1,321 10.3%
8 2,090 2,260 8.2%
16 4,313 4,690 8.7%
32 8,977 9,714 8.2%
64 13,193 18,506 40.3%
ef_search: 80
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 205 233 13.7%
2 367 392 7.0%
4 696 774 11.1%
8 1,416 1,307 -7.6%
16 2,492 2,733 9.7%
32 5,160 5,642 9.3%
64 6,457 9,568 48.2%
ef_search: 200
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 95 107 12.3%
2 166 183 9.8%
4 320 356 11.4%
8 560 609 8.8%
16 1,157 1,258 8.7%
32 2,299 2,659 15.7%
64 2,859 4,256 48.9%
ef_search: 400
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 52 58 12.3%
2 90 101 11.6%
4 176 194 10.6%
8 305 335 9.6%
16 657 686 4.4%
32 1,246 1,413 13.4%
64 1,481 2,237 51.0%
ef_search: 800
Clients db.m6g.16xlarge (TPS) db.m7g.16xlarge (TPS) Speedup (%)
1 26 30 13.8%
2 48 53 10.3%
4 93 102 9.6%
8 163 180 10.6%
16 334 365 9.4%
32 658 739 12.3%
64 782 1,265 61.8%

Overall, for the 1,536-dimensional dataset, the db.m7g.16xlarge instance generally outperformed the d6.m6g.16xlarge instance by 10%. However, we saw a notable increase in performance for 64 clients, particularly as we increased hnsw.ef_search, the parameter that helps influence recall. We observed that the db.m7g.16xlarge had a 40–60% speedup over the db.m6g.16xlarge for querying 1,536-dimensional vectors, because we can use the improvements in the m7g instance family that help workloads that are CPU heavy perform better.

Clean up

If you ran these tests on your own Amazon Aurora or RDS for PostgreSQL and EC2 instances, and don’t require the instances for any additional tasks, then you can delete the instances.

Conclusion

In this post, we walked through different methods that you can use to accelerate ingestion and query performance when using the HNSW indexing functionality in pgvector on Amazon Aurora PostgreSQL-compatible edition and Amazon RDS for PostgreSQL. We also observed how performance improvements in pgvector 0.5.1 can speed up concurrent ingestion, and how new instance classes can accelerate queries.

Understanding methods to optimize your usage of a vector index such as HNSW can make a difference in both your ingestion and query performance. These strategies can help you optimize your database configuration and design your applications so that you can take full advantage of the index functionality. You can use the techniques in this post to help you start building your generative AI applications with pgvector on Amazon Aurora PostgreSQL-compatible edition and Amazon RDS for PostgreSQL!

We invite you to leave feedback in the comments.


About the author

Jonathan Katz is a Principal Product Manager – Technical on the Amazon RDS team and is based in New York. He is a Core Team member of the open source PostgreSQL project and an active open source contributor.f