AWS Database Blog
Improving Oracle backup and recovery performance with Amazon EBS multi-volume crash-consistent snapshots
Amazon Elastic Block Store (EBS) snapshots offer the ability to back up the data on your EBS volumes to Amazon S3 by taking point-in-time snapshots. If you run your Oracle Databases on Amazon EC2 and use EBS volumes, you may be using EBS snapshots to meet your backup requirements. Prior to May 2019, you had to put the database in backup mode and individually snapshot each EBS volume that contained database files. This approach had three main disadvantages:
- You had to interact with the database
- You had to take individual snapshots of the volumes by calling multiple APIs
- In backup mode, a database block is completely copied to the log buffer the first time it is modified, which likely generates more redo and impacts database performance for certain workloads
AWS released a new feature in May 2019 that allows you to create crash-consistent snapshots across all the EBS volumes attached to an EC2 instance with a single API call and minimal I/O pause. For more information, see Taking crash-consistent snapshots across multiple Amazon EBS volumes on an Amazon EC2 instance.
You can eliminate the need to put the database in backup mode by combining this feature and the Storage Snapshot Optimization feature introduced in Oracle Database 12c. This post describes and walks you through this process, discusses the benefits of this new feature (from a performance point of view), and also describes some use cases.
The database has to be 12c or greater to benefit from the Oracle Database 12c Storage Snapshot Optimization feature, which allows you to use third-party technologies to take a storage snapshot of your database without having to put the database in backup mode. You can use that snapshot to recover all or part of your database. For more information, see Storage Snapshot Optimization on the Oracle website.
Oracle customers with enterprise-grade, on-premises storage arrays and Oracle Database 12c with Storage Snapshot Optimization have enjoyed vastly improved backup and restore operations, such as point-in-time recovery and cloning. With the new EBS multi-volume crash-consistent snapshot capability, you can use similar skills you had used on-premises for years to improve your overall experience in AWS.
The main advantage of not putting the database (or tablespaces) in backup mode is that it generates less redo. Generating less redo reduces checkpoint frequency and redo size, which reduces both log write and database writer I/O during the backup and also reduces database recovery time if you need to apply the logs during a recovery.
Test environment
The test environment contained the following elements:
- An Oracle Database version 19c instance created in an EC2 instance (root volume is 50 GB)
- An ASM instance with two disk groups (
DATA
andRECO
)- The disk group
DATA
consists of nine EBS volumes presented as ASM disks (5 GB each) and contains the datafiles - The disk group
RECO
consists of four EBS volumes presented as ASM disks (30 GB each) and contains the redo log files, archive log files, and control files
- The disk group
- A 19-GB table, in which each block contains only two rows. Create and populate the table with the following code:
- An initial crash-consistent multi-volume snapshot, which you can create with the following code:
During the tests in this post, you generate update activity on the table and simulate a need to recover the database after the update activity.
Using a crash-consistent snapshot and Oracle Storage Snapshot Optimization
The first test uses the crash-consistent snapshot feature and Oracle Storage Snapshot Optimization. Complete the following steps:
- Update a column on the 19-GB table with the following code:
The amount of redo generated is approximately 2.5 GB. See the following code:
- Take another crash-consistent multi-volume snapshot with the following code:
Simulating a use case
To simulate a need to recover the database, complete the following steps:
- Stop the EC2 instance and detach all the EBS volumes (except the root volume) with the following code:
- Create volumes from the initial crash-consistent multi-volume snapshot (for the DATA disk group only) and attach them to the EC2 instance with the following code:
- Create volumes from the crash-consistent multi-volume snapshot taken after the update (for the
RECO
disk group only) and attach them to the EC2 instance with the following code:The
RECO
disk group contains redo log files, control files, and archived redo log files. It has all the needed information to recover the database.
Recovering the database
To recover the database, complete the following steps:
- Start the EC2 instance.
- Start the database. See the following code:
- Get the time of the initial snapshots (you can see it is unique across the snapshots). See the following code:
- Recover the database using the Storage Snapshot Optimization (update the snapshot time accordingly based on the previous output and add a few seconds). See the following code:
The time to recover is approximately 6 minutes.
Restoring the database
To restore the database as it was before the test, complete the following steps:
- Stop the EC2 instance and detach all the EBS volumes except the root one. You can use the same script that you used previously.
- Create volumes from the initial crash-consistent multi-volume snapshot (for the
DATA
andRECO
disk groups) and attach them to the EC2 instance. See the following code: - Start the EC2 instance.
- Start the database.
The database instance starts and performs only instance recovery, as if the host had simply crashed. This is an excellent example of crash-consistent snapshot usage: simply restore and recover all the volumes as it was during the time of the snapshot.
Using the begin/end backup procedure
The second test uses the begin/end backup procedure. Complete the following steps:
- Put the database in backup mode. See the following code:
- Take a snapshot looping on all the EBS volumes that belong to the
DATA
disk group with the following code: - Update a column on the 19-GB table (with the same script as in the first test) with the following code:
The amount of redo generated is approximately 42 GB. See the following code:
- Put the database out of the backup mode. See the following code:
- Take a snapshot looping on all the EBS volumes that belong to the
RECO
disk group with the following code:
Simulating a use case
To simulate a need to recover the database, complete the following steps:
- Stop the EC2 instance and detach all the EBS volumes except the root one. Use the same code that you used previously.
- Create volumes from the snapshots of the EBS volumes that belong to the
DATA
disk group (while the database was in backup mode) and attach them to the EC2 instance. See the following code: - Create volumes from the snapshots of the EBS volumes that belong to the
RECO
disk group (while the database was out of backup mode) and attach them to the EC2 instance. See the following code:
Recovering the database
To recover the database, complete the following steps:
- Start the EC2 instance.
- Start the database. See the following code:
- Recover the database. See the following code:
The time to recover is approximately 15 minutes.
Additional use cases
You could use crash-consistent snapshots in the following use cases:
- Point-in-time recovery
- Automation of test environments refresh and creation (without interacting at all with the source database)
- Add an additional layer of security (if you use another backup strategy, such as using rman and backing up to S3)
Note: This blog post leverages Oracle database feature Storage Snapshot Optimization which is (at the time of this writing) EE specific and licensed under Oracle Advanced Compression option.
Conclusion
This post demonstrated that a database can generate far more redo logs while in backup mode (for the same update). The time to recover from the crash-consistent snapshots was far less than the time needed to recover from a standard begin backup/end backup strategy. Furthermore, you can easily use the crash-consistent snapshots to reconstruct the database (between the two tests) as it was before the first test (performing instance recovery only).
That said, if you run your Oracle Database instances on Amazon EC2 and use Amazon EBS volumes you might want to consider using the crash-consistent snapshots feature for the advantages mentioned above.
About the Author
Bertrand Drouvot is a Sr. Database Engineer with Amazon Web Services.