How do I troubleshoot high CPU utilization on my RDS for SQL Server instance?

6 minute read
0

I'm experiencing high CPU utilization on my Amazon Relational Database Service (Amazon RDS) for Microsoft SQL Server DB instances. How can I troubleshoot and resolve high CPU utilization?

Short description

Common reasons for increases in CPU utilization include the following:

  • User-initiated heavy workloads, multiple concurrent queries, or long-running transactions
  • Using an under-provisioned instance class for the workload
  • Out-of-date statistics and index fragmentation or missing indexes
  • Queries with scope for improvement
  • Blocking and deadlocks
  • Parallelism
  • Frequent compilation and recompilation
  • Parameter sniffing
  • Thread exhaustion

To identify the source of the CPU usage in your Amazon RDS for SQL Server instance, use the following tools:

  • Amazon CloudWatch metrics for Amazon RDS
  • Enhanced Monitoring
  • Performance Insights
  • SQL Server native tools

After you identify the source, you can analyze and optimize your workload to reduce CPU usage.

Resolution

Amazon CloudWatch metrics for Amazon RDS

You can use CloudWatch metrics to identify CPU patterns over extended periods. Compare the graphs for WriteIOPs, ReadIOPs, ReadThroughput, and WriteThroughput, along with the CPU utilization to find the times when the workload caused high CPU. For more information, see Amazon CloudWatch metrics for Amazon RDS.

If you're using an instance from the t2 or t3 instance class, check if your instance is under-provisioned. If you see a consistent decrease in the CPU credit balance and consistently increased CPU credit usage, your CPU cores aren't enough for your workload.

After identifying the time frame, you can review the Enhanced Monitoring data associated with your DB instance. You can set Enhanced Monitoring to collect data at intervals of 1, 5, 10, 15, 30, or 60 seconds. Doing this allows you to collect data at a more granular level than CloudWatch.

Enhanced Monitoring

If you have set up Enhanced Monitoring, you can use it to monitor the OS running on your DB instance. To check CPU utilization using Enhanced Monitoring, do the following:

  1. In the RDS console, select Databases, then choose your database.
  2. On the Monitoring tab, select OS process list from the Monitoring drop down menu. Verify if the high CPU is driven by the OS, an RDS process, a SQL Server process, or a SQL Agent process. You can also check the percentage of CPU and percentage of memory being used by these processes.
  3. Select metrics from the Enhanced monitoring dropdown list to check performance monitor data. These metrics include CPU idle, Kernel, and User. These metrics tell you if the CPU is spending most of the time idle, running the kernel, or running user processes.
  4. View other metrics by selecting Manage Graphs. You can then select the metrics related to I/O and disk throughput. These metrics include Read/s, Write/s, Read Kb/s, and Write Kb/s. You can also view memory-related parameters including Available Memory, SQL Server Memory, and Total Memory. These metrics are helpful because if your CPU is spending a lot of time waiting for resources, you might see high CPU utilization.

For more information, see Viewing OS metrics in the RDS console.

Performance Insights

You can use Amazon RDS Performance Insights to identify queries responsible for the database load. To do this,

  1. Check the SQL tab that corresponds to the time-frame you want to analyze.
  2. Identify the query that is taking the longest.
  3. Check the resource-intensive query and the wait events observed during this period. The following are wait events frequently associated with high CPU utilization:
    SOS_SCHEDULER_YIELD: This wait indicates that a worker yielded for someone else to run. High wait counts with low wait times usually indicate CPU-bound queries. High wait times here might be caused by issues with yielding. If you see a long wait time here, you must further review the workload.
    If SOS_SCHEDULER_YIELD is the prevalent wait type, it might indicate that CPU pressure is the problem. Review the type of workload and perform additional tuning.
    CXPACKET and CXCONSUMER: Parallelism-related wait events usually aren't a concern. But, if wait events are frequent and impacting performance, review the queries and set suitable values for the cost threshold of parallelism. Make sure that SQL Server is choosing the lower cost parallelism parameter in the parameter group.
    You can also increase the MAXDOP (max degree of parallelism) to 1 at the query or instance-level based on your use case.
    ThreadPool: This wait event indicates thread exhaustion. If your instance class is capable of handling it, increase the max worker threads parameter. ThreadPool waits might occur due to excessive threads being used. Excessive threads are used due to blocking, a high workload, or a high number of parallel queries. Or, this wait might be caused by misconfiguration of the max worker threads parameter.
  4. Check the Database Metrics for batch requests, SQL compilations, and SQL recompilation's. Check if queries are being compiled many times, indicating improvised queries. Also check if queries are recompiled frequently for a given batch, indicating the use of with recompile in the query code. Both of these cause high CPU usage.

SQL Server native tools

Specific queries for CPU troubleshooting: For more information, see Troubleshoot high-CPU-usage issues in SQL Server on the Microsoft documentation website. Focus on the CPU intensive queries, update statistics, missing indexes, and parameter sniffing.

Check the execution plan for poor performing queries and perform additional tuning. For more information, see Display an actual execution plan on the Microsoft documentation website.

Troubleshoot excessive locking, blocking, and deadlock-related issues using queries and Extended Events. For more information, see Understand and resolve SQL Server blocking problems on the Microsoft documentation website.

Note: You can't use the normal method for saving XEL files when configuring Extended Events in RDS for SQL Server. For instructions on how to configure extended events, see Set up Extended Events in Amazon RDS for SQL Server.

Use SQL Server reports to get the performance reports available natively in SQL Server. Use these reports to fine tune your workload. For more information, see Performance Dashboard on the Microsoft documentation website.