Why is it taking a long time to perform a point-in-time recovery of my Amazon RDS for MySQL instance?
Last updated: 2021-10-14
I've initiated a point-in-time recovery (PITR) in Amazon Relational Database Service (Amazon RDS) for MySQL, and it's taking longer than expected. Why is this happening?
Point-in-time recovery (PITR) is the process of restoring a database to the state that it was in at a specified date and time. When you initiate a PITR, the most recent backup (either automated or manual) is restored. The transactional logs are then applied to roll the Amazon RDS database forward to the PITR time.
Best practices for avoiding a long point-in-time recovery
To avoid a long point-in-time recovery, follow these best practices:
- Create a disaster recovery strategy.
- Use smaller transactions and run the COMMIT command more frequently.
- To run a large transaction, take a snapshot before and after the large transactions. However, transactions that are larger than the max_allowed_packet parameter cause PITR to fail.
- Minimize snapshot restore times. Snapshot restores are initiated as part of the point-in-time recovery process. A longer snapshot restore can contribute to a longer point-in-time recovery session. For more information, see Why is it taking so long to restore a snapshot of my Amazon RDS for MySQL DB instance?
- A log apply process can take more time depending on the number of logs to be applied. To reduce the number of logs to be applied, consider taking a manual snapshot in-between the automated backups. As point-in-time recovery automatically selects automatic or manual snapshots created near the PITR time, having intermediate manual snapshots can reduce the number of logs to be applied. If you're dealing with a large volume of changes, take a manual snapshot every 3-4 hours.
- If you replay any large transactions, a low wait_timeout value can break point-in-time restore processes in Amazon RDS for MySQL. For example, breaks occur if you're performing a large row-based bulk update, insert, or delete and the replay takes longer than wait_timeout. To prevent any disruptions in the PITR process, set the wait_timeout value to "600" (10 minutes) or more. For more information, see the wait_timeout section in Best practices for configuring parameters for Amazon RDS for MySQL.
- When row-based binary logging is used, consider setting the binlog_row_image parameter value to "MINIMAL" instead of "FULL". This updated value will reduce the size of the binary logs, minimizing binlog recovery time.
- Unless you need a specific binlog format, consider using the MIXED logging format. With mixed logging, statement-based logging is used by default, but the logging mode switches automatically to row-based as needed. This switching can help reduce binlog sizes. For more information about MIXED logging, see Binary logging formats on the MySQL website.
Point-in-time recovery failures
The following scenarios will cause point-in-time recovery to fail:
- DB instances that are in a "storage full" state.
- Using MyISAM tables in MySQL. MyISAM storage engines do not support reliable crash recovery processes. The MyISAM tables prevent the point-in-time recovery process from working properly.
- Using temporary tables, distributed transactions, and in-memory tables.