Amazon RDS Proxy multiplexing support for PostgreSQL Extended Query Protocol

Today, we are announcing Amazon RDS Proxy multiplexing support for PostgreSQL Extended Query Protocol. In this post, you learn about PostgreSQL’s Extended Query Protocol for communication between client and server, its benefits, and how multiplexing support for Extended Query Protocol can improve the connection pooling capability of RDS Proxy, which in turn reduces the overhead of database connections and provides faster and more graceful failover experience.

Overview of RDS Proxy multiplexing

RDS Proxy is a fully managed, highly available database proxy for Amazon Relational Database Service (Amazon RDS) that helps make applications more scalable, resilient to database failures, and secure. RDS Proxy allows you to manage an application’s access to the database and supports connection pooling, multiplexing, and graceful failover. It helps you scale beyond database connection limits and manage bursts of connections.

One of RDS Proxy’s value propositions is connection multiplexing. By default, RDS Proxy can reuse a connection after each transaction in your session. This transaction-level reuse is called multiplexing. When RDS Proxy temporarily removes a connection from the connection pool to reuse it, that operation is called borrowing the connection. When it is deemed safe, RDS Proxy returns that connection to the connection pool. Connection reuse across transactions is more efficient when database requests do not rely on the session state information from previous requests.

Client connections to the proxy can enter a state known as pinning. When a connection is pinned, each later transaction from the same client connection uses the same underlying database connection until the session ends. Other client connections also cannot reuse that database connection until the session ends. The session ends when the client connection is dropped. RDS Proxy automatically pins a client connection to a specific DB connection when it detects a session state change that isn’t appropriate for other sessions. Pinning reduces the effectiveness of connection reuse.

In a typical production scenario, you might have an application with client-side connection pooling using libraries such as DBCP, CP30, HikariCP, psycopg, or Npgsql. As your demand increases, you automatically or manually scale your application nodes, increasing the database connections.

For example, your application might start with 10 nodes, each holding 100 connections in the pool, resulting in 1,000 database connections. As your business grows, you scale your application to meet growing demand, only to realize that after scaling to 50 application nodes, your system is hitting a database bottleneck through latency impact from increased memory needs on the database or hitting max_connections, forcing you to scale up your database compute instance or in some cases rearchitect the application.

To mitigate this issue, you can use a centrally managed connection pooler, which can reduce the total number of database connections even during peak workloads when application nodes scale up. However, with RDS Proxy or other open source connection poolers, these connection poolers have different limitations supporting prepared statements. For example, some proxies don’t correctly manage prepared statement states, allowing prepared statements to leak across connections. With RDS Proxy, for prepared statements, connections get pinned to prevent statement states getting leaked across different client connections, resulting in the same total number of database connections as the client connections, which means the original problem of reducing the total number of database connections is not resolved.

A typical log message you might expect in such a scenario is:

The client session was pinned to the database connection[...] for the remainder of the session. The proxy can't reuse this connection until the session ends. Reason: A parse message was detected.

New release for PostgreSQL Extended Query Protocol

Until today, when client application use PostgreSQL Extended Query Protocol to connect to their Amazon RDS or Amazon Aurora database via RDS Proxy, the database connection gets pinned to the client connection. PostgreSQL Extended Query Protocol is now the default in many commonly used client frameworks. Until now, applications using these frameworks couldn’t benefit from RDS Proxy’s multiplexing. As mentioned earlier, when RDS Proxy detects session state changes, such as prepared statements, it automatically pins the client connection to a specific DB connection.

With this release, applications connecting via Extended Query Protocol can now benefit from RDS Proxy’s multiplexing. We have added the capability to monitor and keep a record of protocol-level prepared statements, which enables RDS Proxy to multiplex the connection irrespective of the state of the session caused by the use of prepared statements and limit the number of database connection requirements.

To understand how RDS Proxy supports multiplexing with Extended Query Protocol, let’s first understand PostgreSQL’s Extended Query Protocol for client-server communication.

Extended Query Protocol

The Extended Query Protocol allows the run of SQL commands to be divided into multiple steps. The execution cycle consists of four stages: parse, bind, execute, and sync. The state retained between steps is represented by two types of objects: prepared statements and portals. A prepared statement represents the result of parsing and semantic analysis of the query string. A prepared statement is not ready to run because it might need more specific parameter values. A portal represents a ready-to-run or already partially run statement bound to particular parameter values. Let’s look at each stage in detail:

Parse – The Parse message contains a query string with additional information about parameters, their data types, and the name of a destination prepared statement object. The query string contained in a Parse message can’t include more than one SQL statement in an Extended Query Protocol. Prepared statements last until the end of the current session.
Bind – The Bind message creates a portal, which represents a ready-to-run prepared statement with parameter values filled in. A named portal lasts until the end of the current transaction.
Execute – The actual execution of a portal containing the query with its bound parameters is triggered via the execute message. In case of a query that returns rows (SELECT, SHOW, and so on), the execute step can be told to fetch only a limited number of rows so that multiple execute steps might be needed to complete the operation.
Sync – To complete a batch of extended query messages, the front end issues a Sync message.

To learn more about Extended Query Protocol, refer to Extended Query.

RDS Proxy multiplexing support for Extended Query Protocol

With this release, as RDS Proxy processes request and response messages, it will monitor and keep record of all prepared statements created for every client connection. All prepared statements that are monitored are isolated within the client connection and will be destroyed along with the client connection. Upon completion of transactions, the database connection will be returned to the pool. If any subsequent request on the same client connection references a tracked prepared statement (such as via BIND, DESCRIBE, or CLOSE messages), RDS Proxy will reintroduce prepared statements as needed by the workload on the new borrowed database connection. This is done without incurring additional round trips to the database by efficiently appending to the Extended Protocol batch of messages received from the client connection for your application’s workload.

Multiplexing support for prepared statements is limited to Extended Query Protocol messages. If you are using query-level PREPARE, DISCARD, DEALLOCATE, or EXECUTE statements or using queries against the pg_prepared_statements view, it will still result in pinning.

Impact of RDS Proxy’s multiplexing when using Extended Query Protocol

To demonstrate the impact multiplexing can have on your workload while using RDS Proxy, we ran a pgbench workload with 100 clients and --protocol=extended. We ran the test with and without multiplexing support and monitored Amazon CloudWatch metrics.

The following figure shows the result of running the workload using RDS Proxy without multiplexing support. The number of database connections (100) is the same as client connections (100), and connections are getting pinned, resulting in no reuse of DB connections.

However, as shown in the following figure, while using RDS Proxy with multiplexing support, the CloudWatch metrics show that prepared statements are multiplexed, resulting in significantly lower database connections (30), although the client connections remain 100. You can notice that no connections are pinned, effectively allowing connection reuse.

How to get started

Multiplexing with Extended Query Protocol is enabled automatically in RDS Proxy; therefore, no action is required. If you’re using Extended Query Protocol clients, you will notice the following when using RDS Proxy:

A drop in total and pinned database connections in RDS Proxy’s CloudWatch metrics (DatabaseConnections, DatabaseConnectionsCurrentlySessionPinned). This means that multiplexing is working as expected, resulting in fewer database connections.
A reduction in the following pinning-related log messages in RDS Proxy log:

The client session was pinned to the database connection[...] for the remainder of the session. The proxy can't reuse this connection until the session ends. Reason: A parse message was detected.

Conclusion

Most PostgreSQL drivers use Extended Query Protocol by default, which until now caused pinning while using the RDS Proxy. Pinning will cause a higher demand for database connections and memory consumption on the PostgreSQL database. With this release, RDS Proxy can multiplex connections for prepared statements from Extended Query Protocol clients, thereby limiting database connections and resource utilization.

We welcome your feedback. If you have any questions or suggestions, leave them in the comments section.

About the Authors

Kiran Singh is a Senior Database Specialist Solutions Architect at AWS. Kiran specializes in Amazon RDS and Amazon Aurora, and is a subject matter expert for PostgreSQL. Kiran works with ISV partners to design and implement database migration and modernization strategies and provides assistance to customers for building scalable, secure, performant, and robust database architectures in the AWS Cloud.

Chirag Dave is a Principal Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He maintains technical relationships with customers, making recommendations on security, cost, performance, reliability, operational efficiency, and best practice architectures.

Yoni Shalom is a Senior Software Engineer with AWS RDS. He is focused on application networking solutions in the database space, and is a lead engineer for the RDS Proxy service.

Vinodkrishna Gopalan is Senior Software Development Manager with AWS RDS. He leads the RDS Application Networking organization.