Why is my MySQL DB instance showing a high number of active sessions waiting on SYNCH wait events in Performance Insights?
Last updated: 2020-12-14
I have an RDS MySQL, RDS MariaDB, or Amazon Aurora MySQL instance, and I enabled Performance Insights. My DB instance is showing a large number of Average Active Sessions (AAS) waiting on synchronization (SYNCH) wait events. Why are SYNCH events slowing my queries, and how can I improve the performance of my DB instance?
If you see MySQL SYNCH wait events in Performance Insights, this means that a large number of sessions in the database are attempting to access the same protected objects or memory structures. Protected objects in MySQL include the following:
- The active binary log file in a binlog source instance - contains a mutex that allows only one session to read or write it at any time.
- The data dictionary - for writes, usually caused by data control language (DCL) or data definition language (DDL) statements.
- The adaptive hash index - contains a mutex that allows only one session to read or write it at any time.
- The open table cache - only one session can add or remove a table from the cache.
- Each single database block inside of the InnoDB Buffer Pool - only one session can modify the content of a block in memory at a time.
Note: There are a number of steps you can take to increase parallelism in your SQL queries. In some cases, you need to look closer at the application's architecture, and how the application uses the database to solve these issues.
Make sure that the DB instance has enough CPU resources to handle the workload
If you have a high number of sessions waiting on SYNCH events, this causes high CPU usage. If the usage hits 100%, this increases the number of sessions waiting. When troubleshooting, increase the size of your DB instance to make sure that there is enough CPU to process the extra workload.
Because these events are usually short-lived, the Amazon CloudWatch CPU utilization metric might not show the peak usage correctly. The best way to check this is to use the one-second CPU counters in RDS Enhanced Monitoring. These counters are more granular and detailed.
Increase MySQL's mutex/lock wait array
MySQL uses an internal data structure to coordinate threads. This array has a size of one, by default. This is suitable for single-CPU machines, but can cause issues on machines with several CPUs. If your workload has a large number of waiting threads, increase the array size. Set the MYSQL parameter innodb_sync_array_size to the amount of CPUs (or higher, up to 1024).
Note: The innodb_sync_array_size parameter applies only at database start up.
In general, parallelism helps to improve throughput. But when a large number of sessions try to do the same or similar activities, the sessions need access to the same protected objects. The higher the number of sessions, the more CPU you use while waiting.
Spread these activities over time, or schedule them in series. You can also bundle several operations into a single statement, such as multi-row inserts.
Examine specific wait events
Use the example solutions below to troubleshoot your specific wait event:
- synch/rwlock/innodb/dict sys RW lock or synch/rwlock/innodb/dict_operation_lock - This indicates a high number of concurrent DCLs of DDLs are triggered at the same time. Reduce the application's dependency on using DDLs during regular application activity.
- synch/cond/sql/MDL_context::COND_wait_status - This indicates a high number of SQLs (including selects) trying to access a table that a DCL or DDL is modifying. Avoid running DDL statements to high-traffic tables during regular application activity.
- synch/cond/sql/MYSQL_BIN_LOG::COND_done / synch/mutex/sql/MYSQL_BIN_LOG::LOCK_commit OR synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log - You have enabled binary logging, and there might be a high commit throughput, large number transactions committing, replicas reading binlogs, or a combination of these. Consider upgrading the database to a major version compatible with 5.7 or higher. Also, use multi-row statements, or bundle several statements into a single transaction. In Aurora, use global databases instead of binary log replication or use the aurora_binlog_* parameters.
- synch/mutex/sql/LOCK_open or synch/mutex/sql/LOCK_table_cache - This indicates that the number of tables that your sessions are opening exceeds the size of the table definition cache or the table open cache. Increase the size of these caches.
- synch/mutex/sql/LOG - Your database might be executing a large number of statements, and the current logging methods can't keep up. If you are using the TABLE output method, try using FILE instead. If you are using general log, use Aurora's advanced auditing instead. If you are using zero or less than one for the long_query_time parameter, try increasing it.
- synch/mutex/innodb/buf_pool_mutex or synch/mutex/innodb/aurora_lock_thread_slot_futex or synch/rwlock/innodb/index_tree_rw_lock - There are a large number of similar DMLs accessing the same database object at the same time. Try using multi-row statements. Also, spread the workload over different database objects. One way to do this is by using partitioning.