Determining the optimal value for shared_buffers using the pg_buffercache extension in PostgreSQL
In OLTP databases, the buffer cache is memory allocated for storing cached data and dirty pages. Cached data speeds up the retrieval of frequently accessed information, and reading from these cached pages minimizes disk I/O operations. Setting appropriate values for
shared_buffers is important for optimal PostgreSQL performance and can lead to significant reductions in overall database operational costs. In this post, we discuss the importance of the
shared_buffers configuration and guide you on determining its optimal value using the
pg_buffercache extension. The optimization strategies outlined in this post are applicable to PostgreSQL, regardless of whether it’s self-managed or hosted on Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
Overview of shared_buffers
The global memory area in PostgreSQL mainly consists of
wal_buffers, and CLOG buffers. The
shared_buffers memory area is used to store cached data and dirty pages. A dirty page is a written or modified data page in
shared_buffers as the result of a write operation against index or table data. These dirty pages are flushed to the disk by a background writer process when the number of clean
shared_buffers appears to be insufficient. The objective of this process is to ensure that the data pages are moved to permanent storage and free space is available for use in
shared_buffers. Furthermore, checkpoints are automatically issued in the background based on checkpoint settings writing all dirty pages to disk to create a restore point in the event of a crash. Amazon Aurora PostgreSQL uses log records for higher availability instead of checkpoints for crash recovery. The following diagram illustrates the memory components of PostgreSQL.
The default configuration of
shared_buffers varies depending on the PostgreSQL deployment. In the community version of PostgreSQL, the default value for
shared_buffers is set at 128 MB. However, this value might be lower if your kernel settings are not supportive, as determined during the
initdb process. For Amazon RDS for PostgreSQL, the default is calculated using the formula
DBInstanceClassMemory/32768. Both community PostgreSQL and Amazon RDS for PostgreSQL lean heavily on the operating system for caching, making an allocation of 30–35% of memory to
shared_buffers ideal. Conversely, in Amazon Aurora PostgreSQL, the default value is derived from the formula
SUM(DBInstanceClassMemory/12038, -50003). This difference stems from the fact that Amazon Aurora PostgreSQL does not depend on the operating system for data caching. As a result,
shared_buffers allocation in Amazon Aurora PostgreSQL is notably higher compared to that in Amazon RDS for PostgreSQL.
How to set shared_buffers
In both Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, you can establish a custom parameter group to alter any parameter configuration. The
shared_buffers setting is adjusted through this custom parameter group. Notably,
shared_buffers is a static parameter, meaning any modifications require a reboot of the database instance for the changes to take effect. This parameter is defined as a number of 8 kB blocks. To determine the value of
shared_buffers for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL within the RDS parameter group, employ the following formula:
shared_buffers = (Target percentage x Total RAM in GB x 1024 x 1024) /(100 x 8)
For example, if you’re running an RDS instance on the db.m6g.12xlarge instance class (192 GB of RAM), a value of 10,066,329 on the
shared_buffers parameter will set 40% of total RAM:
(40 x 192 x 1024 x 1024) /(100 x 8)
To set the
shared_buffers value in your parameter group, complete the following steps:
- On the Amazon RDS console, choose Databases in the navigation pane.
- Navigate to your database and in the Instance section, choose the link for your associated parameter group.
- In the Modifiable parameters section, search for and modify the
shared_buffersparameter in the custom parameter group using the preceding formula.
Applying the new value of
shared_buffers requires an instance reboot.
shared_buffers and I/O utilization
In both Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, for read-intensive workloads, the engine tries to fulfill the query request using pages already in the buffer cache. If the required pages are absent from the cache, the PostgreSQL database engine initiates reads against the storage layer to fetch them. When workloads are served entirely from the cache, there’s no read I/O consumption. However, when they aren’t, I/O charges apply for the pages sourced from storage. For example, if the engine scans 8 data pages to retrieve 10 tuples, it registers 8 read I/O operations.
Amazon CloudWatch has the metric
BufferCacheHitRatio, which displays the percentage of requests catered by the buffer cache. A higher hit ratio implies reduced read I/O. Therefore, to curtail read I/O,
shared_buffers should be expansive enough to house the majority of the working dataset. For a deeper understanding of I/O characteristics in Amazon RDS for PostgreSQL, refer to I/O size and volume throughput limits.
Benefits of analyzing shared_buffers
shared_buffers in PostgreSQL plays a critical role in database performance. By storing data and indexes in memory,
shared_buffers can significantly reduce the number of disk I/O operations. This results in faster query performance and reduces overall database cost.
shared_buffers uses the least recently used (LRU) algorithm for cached pages. The LRU caching algorithm removes the least recently used data pages when the cache is full, and a new page is referenced that is not in the cache.
shared_buffers is too small, it could result in frequent reads from storage, and the pages in the buffer may have to be constantly flushed to and from the disk. This affects performance and lowers query concurrency. However, setting the value of shared buffers too high can result in increased memory pressure on the system, which can lead to swapping and decreased performance. Factors to consider when tuning the
shared_buffers setting include the amount of memory available on the system, the size of the working dataset, and the workloads on the database. Analyzing the
shared_buffers setting and making appropriate adjustments ensures that the PostgreSQL database is performing optimally and providing the best possible result for workloads.
Use AWS monitoring tools to determine shared_buffers modification
In Amazon Aurora PostgreSQL, the
BufferCacheHitRatio CloudWatch metric can help you determine whether to modify
shared_buffers or not. If
BufferCacheHitRatio is below 90%, consider increasing
shared_buffers to serve more data from cache. In Amazon RDS for PostgreSQL, the hit ratio can be identified by querying the
pg_statio_user_indexes system catalogs. The following query identifies the hit ratio on tables:
The following query identifies hit ratio on indexes:
To finalize your
shared_buffers modification, in addition to CloudWatch metrics, you can use Amazon RDS Performance Insights. Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess workloads on your database and determine next steps. Counter metrics such as
tup_returned can be used to better estimate
shared_buffers. Performance Insights offers 7 days of performance data history at no charge. For pricing details, refer to Performance Insights Pricing. The following screenshot shows these counters for an Aurora PostgreSQL instance.
Analyze shared_buffers using the pg_buffercache extension
After analyzing the hit ratio, you can use the
pg_buffercache extension to examine the contents of the shared buffer cache in real time. The
pg_buffercache extension provides useful information about the buffer cache and helps you analyze the performance of a PostgreSQL database.
pg_buffercache functions provide information about the number and size of buffers in the cache, the number of hits and misses, and the state of individual buffers. This information can be used to help diagnose performance problems and optimize database performance.
pg_buffercache result set provides one row for every buffer in the shared cache. If a buffer is unused, all its fields will appear as null, except for the
bufferid. A high number of cache misses can suggest that the
shared_buffers value might be set too low, causing the database to engage in more disk I/O operations than required. If this is observed, adjust the
shared_buffers value upward, which in turn can enlarge the buffer cache and decrease cache misses.
Let’s look at how to use
pg_buffercache to analyze the buffer cache.
Note: All SQL queries are tested on PostgreSQL 15. You may have to modify the queries for older PostgreSQL versions.
- Connect to your Aurora PostgreSQL or RDS for PostgreSQL database using your preferred SQL client.
- Run the following SQL statement to install the
- Run the following query to show the sorted relations based on the buffer cache used:
The following screenshot shows the output.
The preceding query provides the number of buffers used by each relation of the current database. Now, we run a sample workload on a sample database and analyze
- Run the following query to identify the top 10 relations residing in
shared_buffersand percentage utilization:
The output columns are as follows:
- buffer_size – The size of buffer related to the relation
- relation_size – The human-readable relation size
- buffer_percentage – The percentage of
shared_buffersoccupied by the relation
- relation_percentage – The percentage of relations that reside in
- relation_type – The type of relation, such as table, index, or sequence
The following screenshot shows our output.
In this test scenario,
shared_buffers is set at 1.95 GiB. The
pgbench_accounts table is 139 GiB in size. 52.50% of
shared_buffers is occupied by 1 GiB of this table. The rest of
shared_buffers (.95 GiB) is used by other relations. If the working dataset of this table is larger than the cached data, to host more data for caching,
shared_buffers should be modified to a higher value. While modifying the
shared_buffers value, it’s imperative to monitor
FreeableMemory, because other process-based memory allocations such
maintenance_work_mem also share memory from the same memory pool. If there isn’t sufficient space to increase the
shared_buffers value, then it’s advisable to scale up the instance class to secure a larger total RAM allocation.
shared_buffers and pg_prewarm
For business-critical read workloads, having data preloaded in
shared_buffers enhances performance. Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL support the
pg_prewarm extension, which provides a convenient way to load relational data into the cache. In the preceding example, if you determine that
pgbench_accounts_1 is a business-critical table and publishing a report is scheduled on the table, you can preload the data by running
pg_prewarm for this table, as shown in the following code. Note that prewarming data pushes out the
shared_buffers content based on an LRU basis, which can impact other read workloads.
In this post, we explored the concept of
shared_buffers and the
pg_buffercache extension for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL. The appropriate value for
shared_buffers directly impacts the performance of PostgreSQL. Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL support the
pg_buffercache extension, which provides a convenient way to get information about the buffer cache of a PostgreSQL database. CloudWatch metrics and Performance Insights counters help determine the workloads served by cached data. By understanding the state of the buffer cache, you can identify areas for performance optimization and fine-tune the configuration of PostgreSQL databases.
If you have any questions or suggestions about this post, leave a comment.
About the Authors
Kiran Singh is a Senior Partner Solutions Architect and an Amazon RDS and Amazon Aurora specialist at AWS focusing on relational databases. She helps customers and partners build highly optimized, scalable, and secure solutions; modernize their architectures; and migrate their database workloads to AWS.
Vivek Singh is a Principal Database Specialist Technical Account Manager with AWS focusing on Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL engines. He works with enterprise customers providing technical assistance on PostgreSQL operational performance and sharing database best practices. He has over 17 years of experience in open-source database solutions, and enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.