Why is my Aurora MySQL-Compatible DB cluster snapshot taking so long to restore?

4 minute read
0

I want to restore an Amazon Aurora MySQL-Compatible Edition DB cluster snapshot, but it's taking a long time.

Short description

The snapshot restore process for Amazon Aurora MySQL-Compatible Edition DB clusters involves a number of important tasks. For example, during this process an Aurora cluster is created, as is the highly available cluster volume. Process like status checks, storage and hardware allocation, and writing data volumes all contribute to the time it takes for a snapshot to be restored.

Snapshot restore time is influenced by a number of factors:

  • For Aurora clusters, a single is distributed across three Availability Zones (AZs) to provide high availability. When the Aurora cluster restores from the snapshot, it provisions storage in these three AZs. After the cluster becomes available, It then creates a further six copies within the cluster volume for storing data. Storage volume is striped across hundreds of storage nodes, and distributed over three different AZs.
  • After the Aurora cluster is created, the cluster downloads data from Amazon Simple Storage Solution S3 (Amazon S3) to the storage nodes. The cluster does this before the data becomes available. Unlike the restore process for Amazon Relational Database Service (Amazon RDS) for MySQL instances, lazy loading doesn't occur after the restore.
  • Aurora restores are non-linear. So, for example, you might restore two separate clusters, one with 1 GB of data and another with 10 GB of data. Instead of taking ten times as long, the larger data set takes only a little longer than the smaller data set to restore.
  • Other processes within the restore include status checks, storage and hardware allocation, and writing data volumes. All of these processes are time consuming, but have to be performed precisely for best performance.

Resolution

You can use the Aurora cluster clone feature or the backtrack feature when you make changes to your Aurora databases, depending on your use case.

Aurora cluster clone

Using the Aurora cluster clone feature is the fastest way to create an identical copy of your current cluster. After the cloned cluster is created, you can test your changes against the cloned cluster without affecting the original cluster. If the test passes, you can apply changes to the original cluster.

Note: It's still a best practice to take a snapshot of your cluster before you make any changes in a database.

Here are some common use cases for cloning an existing Aurora cluster:

  • You want to experiment with and assess the impact of changes like schema changes or parameter group changes.
  • You want to perform workload-intensive operations like exporting data or running analytical queries.
  • You want to create a copy of a production DB cluster in a non-production environment for development or testing.

Aurora backtracking feature

You can also use the backtracking functionality for your Aurora clusters. Backtracking gives you the ability to rapidly undo errors by doing an in-place rewind of your data. Backtracking a DB cluster doesn't require the creation of a new DB cluster, so takes only a few minutes to complete.

But, there are limitations of this feature. First, it's available only on clusters that were created with the feature turned on. So if your cluster doesn't have this feature turned on, then you need to perform a snapshot restore to turn on backtracking. Also, backtracking doesn't support binary log replication, and cross-Region replication must be tuned off before you can configure or use backtracking. The limit for a backtrack window is 72 hours.

Considerations

The Aurora cluster clone and backtrack features were introduced to improve Aurora restore time in certain use cases. But, taking regular snapshot is still a best practice, and it's a best practice that you take this approach before conducting any planned changes to a database.

Also, make sure that no long-running write operations are running on the source database at the time of the snapshot, point-in-time, or clone. Any long-running DCL, DDL, or DML (open write transactions) can increase the time it takes for the restored database to become available.

Related information

Cloning a volume for an Amazon Aurora DB cluster

Backtracking an Aurora DB cluster


AWS OFFICIAL
AWS OFFICIALUpdated a year ago