Use the TempTable storage engine on Amazon RDS for MySQL and Amazon Aurora MySQL
MySQL 8.0 has introduced TempTable as the new, default internal temporary table storage engine to speed up query processing. The MySQL query optimizer creates temporary tables internally to store intermediate datasets while processing complex queries. In AWS, you can deploy MySQL 8.0 using Amazon Relational Database Service (Amazon RDS) for MySQL, a managed service for MySQL Community Edition (CE) that automates time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. You can also deploy MySQL 8.0 compatible database clusters using Amazon Aurora, a MySQL and PostgreSQL-compatible relational database built for the cloud that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open-source databases.
In this post, I share with you best practices of using the TempTable storage engine in an RDS for MySQL DB instance and Aurora MySQL DB cluster. I discuss common issues that affect query performance when internal temporary tables are involved and require memory or disk storage allocation. I also point out specific considerations that you should give to AWS’s managed database services.
MySQL handling of internal temporary tables
Typically, internal temporary tables are held in memory first for best query performance. To avoid excessive memory allocation, MySQL provides parameters that you can use to set a memory limit. Internal temporary tables overflow to disk storage when that limit is reached. MySQL may designate different storage engines for internal temporary tables in memory than for those on disk storage.
Prior to version 8.0, MySQL used the MEMORY storage engine as the default for in-memory internal temporary tables. MySQL 8.0 has replaced it with the TempTable storage engine, but it also provides the option to switch back to the MEMORY storage engine.
A key simplification for manageability for the TempTable storage engine is that it uses a memory pool for all sessions that use it. The memory limit, set by
temptable_max_ram, applies to the sum of memory consumption from concurrent sessions. This is different from the MEMORY storage engine, where the memory limit, set by the smaller
max_heap_table_size, is per table, specific to a session. The TempTable storage engine also has its own disk overflow mechanism. You can configure it to overflow either to memory-mapped temporary files or to InnoDB on-disk internal temporary tables. A memory-mapped file provides the mapping between a file and memory space that speeds up file read and write operations. In addition to the memory limit, these two separate overflow paths can bring out distinctive aspects of internal temporary table storage engines and directly impact query performance.
Configuration parameter differences
The parameters are as follows:
internal_tmp_mem_storage_enginedefines the storage engine for in-memory internal temporary tables with allowed values of TempTable (default) or MEMORY. This parameter was added in MySQL 8.0.2 with the introduction of TempTable storage engine.
temptable_max_ram, also introduced in MySQL 8.0.2, defines the maximum amount of memory that can be used by the TempTable storage engine.
temptable_max_mmapwas introduced in MySQL 8.0.23, and defines the maximum amount of disk storage that the TempTable storage engine is permitted to allocate for memory-mapped temporary files. Setting it to 0 disables the use of memory-mapped temporary files, making the overflow go to the InnoDB on-disk internal temporary table.
temptable_use_mmapdefines whether the TempTable storage engine overflows to memory-mapped temporary files. It was introduced in MySQL 8.0.16, and deprecated in 8.0.26 because its function becomes a duplicate.
If you use Amazon RDS for MySQL, note the change history of these parameters. There have been quite a few revisions over time while the TempTable storage engine continues evolving. Because Amazon RDS for MySQL supports various minor versions of MySQL 8.0 CE, the available parameters in each version may vary. You can check the DB parameter group to verify.
Aurora MySQL 3.0 versions are wire-compatible with MySQL 8.0.23 CE. You have the option to configure all these parameters in either the DB cluster parameter group or DB parameter group, depending on whether you want to apply a uniform setting to all DB instances of an Aurora DB cluster. However, the unique storage architecture of an Aurora DB cluster results in different behavior of internal temporary tables on Aurora replica (reader) DB instances.
An Aurora DB cluster has two types of storage: a cluster volume shared by all DB instances, and local storage attached to each DB instance. On the Aurora primary (writer) DB instance, the TempTable storage engine has its overflow paths defined as follows:
- Memory-mapped temporary files get created on the local storage
- The InnoDB on-disk internal temporary table is stored in the shared cluster volume
Because Aurora replica DB instances have no write access to the shared cluster volume, they can’t use the InnoDB on-disk internal temporary tables. As a result, the following constraints apply to parameters on Aurora replicas:
internal_tmp_mem_storage_engineis set to the TempTable storage engine internally and can’t be modified. If you try to set it to the MEMORY storage engine in the parameter group, the change doesn’t take effect.
temptable_use_mmapis set to 1 internally and can’t be modified. If you apply a different value in the parameter group, the change doesn’t take effect.
temptable_max_mmapcan’t be set to 0. If you try to set it to 0 in the parameter group, it internally resets to the default value of 1 GB.
In summary, although the primary (writer) DB instance of an Aurora DB cluster has the same four configuration parameters available as MySQL 8.0.23 CE, Aurora replica (reader) DB instances only use two of the parameters,
temptable_max_mmap, as illustrated by the following diagram.
Internal temporary tables allocate either memory or storage, or most likely both, to process data. The resource consumption required by an internal temporary table storage engine is workload-dependent. However, an RDS DB instance has system resource capacity based on the DB instance class and storage architecture. A good tuning strategy requires striking a balance between the demand and supply of system resources to allow your queries to scale and meet the expected performance goals.
Processing data in memory is always faster than overflowing to disk. The memory limit of the TempTable storage engine,
temptable_max_ram, defaults to 1 GB in MySQL CE, and is honored by Amazon RDS for MySQL as well as Aurora MySQL in general. This is a conservative setting for most DB instances that use memory optimized DB instance classes, such as db.r5 or db.r6g. You can tune this parameter based on your workload and use a formula to size it based on the memory capacity of the DB instance,
DBInstanceClassMemory. This is a dynamic parameter and you can change it without rebooting the RDS instance.
temptable_max_ram to a large value may increase the risk of running out of memory, especially on some of the db.t2 or db.t3 instance classes that have limited memory capacity. Therefore, Aurora MySQL has adjusted the default value to be 16 MB for db.t3.medium, and db.t3.large instance classes. It’s recommended that you monitor the Amazon CloudWatch metric
FreeableMemory to make sure the memory usage on the DB instance is within the expected range.
Overflowing to disk is an alternative to allow queries to continue to be processed if the data is too large to fit in memory. It usually happens when those queries that use internal temporary tables come from a highly concurrent workload or process a large amount of data. The TempTable storage engine has two overflow paths. By default, it overflows to memory-mapped temporary files first when the memory limit is reached. There is a storage limit for memory-mapped temporary files, set by
temptable_max_mmap. When that limit is also reached, overflow goes down the second path, the InnoDB on-disk internal temporary table, if it’s available. As
temptable_max_mmap is a dynamic parameter, you can change it without an instance reboot.
The default configuration has the potential to make overflow take two hops. With respect to storage allocation and I/O operation, the two-hop path is more expensive compared to the one-hop path that uses either memory-mapped temporary files or the InnoDB on-disk internal temporary table, but not both. The default value of
temptable_max_mmap is 1 GB in MySQL. Leaving it to such a small default value raises the possibility of taking a two-hop path.
If you use Amazon RDS for MySQL, I recommend optimizing for a one-hop path to the extent possible. Both overflow paths lead to the same storage that is attached to the DB instance. When there is a storage shortage for memory-mapped temporary files, the same holds true for the InnoDB on-disk internal temporary table as well. If the available free storage allows, I recommend increasing
temptable_max_mmap to accommodate the largest size of memory-mapped temporary files that your workload can use. If your workload favors the InnoDB on-disk internal temporary table, you can set
temptable_max_mmap to be 0 to block the use of memory-mapped temporary files. You can scale up storage, or adjust your workload to avoid filling up the storage.
On the primary (writer) DB instance of an Aurora DB cluster, two overflow paths are assigned to go to different storage types. Memory-mapped temporary files are created in local storage. You can use the CloudWatch metric
FreeLocalStorage to check the available free local storage on the Aurora DB instance, and that is the maximum value of
temptable_max_mmap you can set. The InnoDB on-disk internal temporary table is stored in the cluster volume that has 128 TiB capacity. The one-hop path is still the best choice, and you can choose the overflow path that suits your workload best. One situation you want to watch out for is a runaway query while it drives an InnoDB on-disk internal temporary table growing on the cluster volume continuously. I recommend configuring
max_execution_time to set a query timeout limit to avoid this scenario.
On the Aurora replica (reader) DB instance, TempTable is the dedicated storage engine for internal temporary tables, and the overflow path goes to memory-mapped temporary files on local storage only. The query fails with a “table is full” error message when the size of memory-mapped temporary files reaches the storage limit set by
temptable_max_mmap. Your workload may require different temporary table configurations on the primary (writer) DB instance from the replica (reader) DB instance, and you have the option to apply separate DB parameter groups to ensure fine-grained tuning at the instance level. However, if the replica (reader) DB instance is your failover target, I recommend configuring both the primary (writer) and replica (reader) DB instance to use the same instance class and DB parameter group, to avoid potential mismatch issues after failover.
The TempTable storage engine retrieves data from a table and operates on it in its own format. The magnitude of data to be processed reflects the data structure of the source table, such as data types and character sets. The following diagram shows an example of the correlation between a table character set and storage allocated for different overflow paths.
The data is collected from running the same workload on two Amazon RDS for MySQL DB instances. Each instance has a set of
sysbench tables with the size ranging from 10–100 GB. The only difference between these two sets of tables is that one set uses
DEFAULT CHARSET and the other set uses
utf8mb4. The workload includes running a single GROUP BY query against the sysbench tables, one by one. In each run, the TempTable storage engine consumes all the available default 1 GB memory and overflows in a one-hop path.
From the testing, you can observe the following:
- Comparing tables using the
utf8mb4character set with those using
latin1, you can see a wide gap in the storage allocated during overflow, regardless of the overflow path taken
- Different overflow paths also contribute to the gap, and tend to make it wider on tables using the
utf8mb4character set as well
- In all overflow scenarios, the table size consistently serves as a good indication of the size of overflow
Note that MySQL 8.0 uses utf8mb4 as the default character set. Be aware of the potential impact of this change and be thoughtful when choosing the right character set for your tables.
RDS storage type
There are also differences in how efficiently the TempTable storage engine utilizes system resources. The following diagram shows an example of the correlation between query response time and RDS storage types under different overflow circumstances. The experiment is done by making two additional changes in the previous test environment. First, the two RDS for MySQL DB instances load the same set of
sysbench tables using
DEFAULT CHARSET. Second, for one of the DB instances, the 1 TB Amazon Elastic Block Store (Amazon EBS) volume is configured as 10k Provisioned IOPS (io1) while the other uses General Purpose SSD (gp2). Note that this 1 TB gp2 volume has the baseline performance at 3k IOPS.
From the above graph, you can observe the following:
- With the use of memory-mapped temporary files, queries run at the same speed on the io1 and gp2 volume at the beginning. As the time goes on, the burst credits of the gp2 volume get exhausted and queries on io1 volume start outperforming.
- When the overflow path changes to use the InnoDB on-disk internal temporary table, different volume types stop showing an effect on query response time regardless of table size.
- For queries running on the gp2 volume, the InnoDB on-disk internal temporary table may show a better query response time than memory-mapped temporary files.
If you use Amazon RDS for MySQL and have options to choose the RDS storage type, you can consider using high throughput disk storage to help your workload to achieve optimal performance. An Aurora MySQL DB instance has a pre-configured local storage capacity and I/O bandwidth based on the instance class. You can scale up to a larger instance class to get more local storage and IOPS/throughput. It’s always beneficial to tune your query to avoid using the TempTable storage engine to process a large dataset.
CloudWatch provides metrics for you to have an overall view of the memory and storage usage at the DB instance level or DB cluster level. The following metrics are useful:
- FreeableMemory for both Amazon RDS for MySQL and Aurora MySQL
- FreeStorageSpace for Amazon RDS for MySQL
- FreeLocalStorage, VolumeBytesUsed, and AuroraVolumeBytesLeftTotal for Aurora MySQL
If you enable Amazon RDS Performance Insights for your DB instances, you can also query MySQL’s performance schema to monitor the memory and storage used by the TempTable storage engine. You can use the following SQL queries to get more insights from the performance schema:
In this post, you learned some of the best practices for configuring the TempTable
storage engine for optimal query performance of an RDS for MySQL DB instance or Aurora MySQL DB cluster. I hope this post gives you a better understanding of this new feature in MySQL 8.0 and makes it easier to use it with AWS’s managed database services.
If you have any follow-up questions, please leave a comment.
About the Authors
Lei Zeng is a Senior Database Engineer at AWS.