AWS Database Blog
Analyzing Amazon RDS Database Workloads with Performance Insights
November 2022: This post was reviewed and updated for accuracy.
AWS recently announced general availability of Amazon Aurora with PostgreSQL compatibility. With this release, AWS has also included the first release of a useful feature in Amazon Relational Database Service (Amazon RDS) called Performance Insights. Using a dashboard that visualizes database load—along with which SQL statements are causing the load and why—Performance Insights makes detecting performance problems easy for both experts and non-experts.
Because it collects performance data using lightweight methods, Performance Insights doesn’t impact the performance of your applications. It also requires no configuration or maintenance, and it offers a free tier that includes 7 days of performance data history and 1 million API requests per month. Although you can increase the rolling performance data history up to two years (pricing details). Performance Insights is enabled by default for instances that you create in the AWS Management Console. You can enable it for other instances on the console by choosing Modify on the Instance actions menu. Or, use the EnablePerformanceInsights
parameter of the CreateDbInstance or ModifyDbInstance API methods.
It’s easy to get started. Just sign in to the Amazon RDS console, and go to the Performance Insights dashboard to start monitoring performance with just one click.
Accessing Performance Insights
You can access Performance Insights through the Amazon RDS console in two places:
- Choose Performance Insights in the left navigation pane to open the dashboard and view a list of DB instances that have Performance Insights enabled.
- Choose the bar under Current Activity for a DB instance to open the Performance Insights dashboard for that DB instance. Instances that have Performance Insights enabled show current activity measured in Sessions.
Why does Performance Insights measure database load in sessions?
Sessions, in this case, is shorthand for “average active sessions,” sometimes abbreviated “AAS.” An active session is a database connection that has submitted a request to the database but has not yet received the response. Measuring the average number of active concurrent sessions over time provides a clear picture of the load on the database.
In the list of DB instances, a bar in the Current Activity column shows the database load of each instance that has Performance Insights enabled. An empty rectangle with a blue border indicates an idle instance. The vertical red line indicates the capacity of the host. As the database load increases, the bar fills with blue. When the load exceeds host capacity, it changes to red.Thus, whenever there is red in the bar, it means that the DB instance is saturated, and you should look at the Performance Insights dashboard to understand why. In the following example, an Aurora instance is showing red in the Current Activity bar. This indicates that the DB instance is under heavy load. You can tell that this instance has Performance Insights enabled, because Current Activity is expressed in Sessions. Choosing the bar opens the Performance Insights dashboard.
The Performance Insights dashboard
The dashboard is divided into two parts:
- At the top, a load chart shows the recent history of database load in units of average active sessions (AAS).
- On the bottom, a top activity table shows what is contributing to database load for the time interval on the load chart.
By default, the load chart is color coded by wait type. Breaking down database load by wait types can help you understand what kind of database mechanisms are chiefly contributing to the load. Top activity shows SQL statements by default. Understanding what SQL is chiefly contributing to the load can help you understand what parts of your application are responsible for bottlenecks.
Idle database
A database that is completely idle has no active sessions and thus no load. Although it sounds obvious, knowing that a database is idle can help diagnose problems. For instance, if a database is idle, then any performance problems that might exist are likely not caused by the database. With the Performance Insights dashboard, it is easy to determine whether a performance problem is inside the database or outside the database, perhaps in an application tier.
The following is an example of an idle database.Tuning a database load bottleneck
A session is database terminology for a database connection serving an application. Sessions that are waiting during database requests contribute to database load. A session might wait for a response for many reasons. A common reason is that the request is executing and using CPU to complete. However, the request could also be waiting for I/O to complete, for a lock, for writes to storage, for space in a buffer area, or for any number of other reasons. These various wait states appear in the load chart as stacked color areas. The colors correspond to wait states that can be seen in the legend on the right side. The most prominent colors in the load chart are the greatest contributors to database load.
An important visual cue in the load chart is the max vCPU line. This line represents the number of vCPUs (virtual central processing units) on the host. If more sessions are active in CPU wait than there are vCPUs, it means that the instance is running beyond CPU capacity. Whenever the overall load goes over the max CPU line, there might be a bottleneck. The bottleneck could be because of CPU saturation, or it could be caused by one of the many other ways that sessions wait in a database.
CPU bottleneck
In the preceding examples, there are two vCPU cores, so only two sessions can concurrently run on the CPU without queueing. If three sessions run on the CPU concurrently, then, at any given point in time, at least one of them is waiting in the run queue and therefore not getting work done.
In the following example, database load has spiked well above the max CPU line. The question is: “What is causing this bottleneck?” You can zoom in on the bottleneck by clicking and dragging (see the light blue area in the following chart), to select the area that you are interested in.In the following example, we have zoomed in on the selected area.In this case, the bottleneck is on the CPU, since green (CPU) is the largest color in the active sessions. The demand for CPU is above the available CPU shown by the max CPU line. To address this issue, you have two choices. Either you can scale up to a larger instance type with more CPU, or you can reduce your load on the current instance. Although scaling up to a larger instance type is an option, it generally requires discussion and scheduling, and it might increase costs. Often a more direct solution is to tune the workload on the current system.
To tune the system in the preceding example, the question becomes, “What do we tune?” The top activity table, below the load chart, can help you answer that. With the Top SQL tab selected (default), this table shows which statements are contributing most to load, and therefore are good candidates for tuning. In this case, one SQL statement is the chief contributor to load, and it is also the predominant contributor to CPU wait. It is apparent that the minute_rollups()
function is responsible for CPU load on this database. It might be worth spending effort to reduce the CPU usage of this stored function.
Wait bottleneck tuningIn the preceding example, you can see that a short time later, on the same database, a new bottleneck appears as a light orange spike. Again, you can click and drag to zoom in on the bottleneck.Rather than CPU, this spike consists mainly of the wait event IO:XactSync. To find out more about this wait, pause the cursor on the items in the legend, and information about the wait events appears. From this information, you can learn that IO:XactSync happens when a session is waiting for writes to stable storage. When you look at the Top SQL, you see that by far the greatest contributor to this wait state is an insert.
By default in PostgreSQL, single-row inserts auto-commit for each row. This means that the database must wait for durability (write to storage) for every insert. To improve performance, PostgreSQL supports multi-row inserts and disabling auto-commit.
Using Performance Insights for sizing a database host
Performance Insights can help with instance type sizing for a database. In the following example, database load is well under max CPU. If this is typical, then the database might be running on a larger than necessary instance type and could be scaled down.On the other hand, if database load regularly exceeds max CPU, and the workload cannot be further tuned, then the instance type might be too small. In the following example, database load is continually exceeding max CPU.Exploring other data dimensions
Both the load chart and the top activity portions of the dashboard can show dimensions other than those displayed by default (waits and SQL).
The load chart legend has a selector for choosing which dimension to slice the chart by. For Aurora PostgreSQL, Performance Insights currently supports slicing by waits, SQL, hosts, and users.
Top activity can list any of the dimensions indicated at the top of the list. For Aurora PostgreSQL, Performance Insights currently supports listing top SQL, waits, hosts, and users.
In the preceding image, top activity has been grouped by Hosts, which shows the load coming in from each client machine from which SQL is being executed. In this case, there is only one host generating any CPU load. This could indicate a deployment problem. These hosts are application servers, which should be running the same code and presenting similar load profiles.
To get an idea of how the SQL workload might be different on these hosts, choose Slice by in the load chart legend, and then choose SQL.With host load grouped by SQL instead of waits, you can now see that 172.31.31.154 is the only host running With cte as ( SELECT …. This is the same host that was asymmetrically showing CPU load in the previous example when the hosts load was grouped by waits.
You can also see that this same host, 172.31.18.90, is the only one running INSERT INTO authors … and further, that this appears to be the only SQL it is running. Understanding how workload and database load are asymmetrically distributed across a fleet of application servers is important because it can help you diagnose end-user performance issues.
By default, the dashboard displays load by waits and the top activity by SQL. This is the most useful combination for general use.
Summary
Performance Insights quickly and easily identifies performance bottlenecks on an Amazon RDS DB instance and shows where to look to address those bottlenecks. Performance Insights is enabled by default at database creation on the console or by modifying the database instance. It is fully automated, and the overhead is about 1 percent of a vCPU core. All the data storage, processing, and aggregation are automatically managed and done outside of the database host and have no impact on the database being monitored.
Amazon RDS Performance Insights is available now for Amazon Aurora (PostgreSQL- and MySQL-compatible editions), Amazon RDS for PostgreSQL, MySQL, MariaDB, SQL Server and Oracle.
About the Author
Kyle Hailey is a product manager for Performance Insights at Amazon Web Services.
Ahmed Ewis is a solutions architect at AWS Data Lab. He helps AWS customers design and build scalable data platforms using AWS database and analytics services. Outside of work, Ahmed enjoys playing with his child and cooking.