Improving Oracle backup and recovery performance with Amazon EBS multi-volume crash-consistent snapshots

Amazon Elastic Block Store (EBS) snapshots offer the ability to back up the data on your EBS volumes to Amazon S3 by taking point-in-time snapshots. If you run your Oracle Databases on Amazon EC2 and use EBS volumes, you may be using EBS snapshots to meet your backup requirements. Prior to May 2019, you had to put the database in backup mode and individually snapshot each EBS volume that contained database files. This approach had three main disadvantages:

You had to interact with the database
You had to take individual snapshots of the volumes by calling multiple APIs
In backup mode, a database block is completely copied to the log buffer the first time it is modified, which likely generates more redo and impacts database performance for certain workloads

AWS released a new feature in May 2019 that allows you to create crash-consistent snapshots across all the EBS volumes attached to an EC2 instance with a single API call and minimal I/O pause. For more information, see Taking crash-consistent snapshots across multiple Amazon EBS volumes on an Amazon EC2 instance.

You can eliminate the need to put the database in backup mode by combining this feature and the Storage Snapshot Optimization feature introduced in Oracle Database 12c. This post describes and walks you through this process, discusses the benefits of this new feature (from a performance point of view), and also describes some use cases.

The database has to be 12c or greater to benefit from the Oracle Database 12c Storage Snapshot Optimization feature, which allows you to use third-party technologies to take a storage snapshot of your database without having to put the database in backup mode. You can use that snapshot to recover all or part of your database. For more information, see Storage Snapshot Optimization on the Oracle website.

Oracle customers with enterprise-grade, on-premises storage arrays and Oracle Database 12c with Storage Snapshot Optimization have enjoyed vastly improved backup and restore operations, such as point-in-time recovery and cloning. With the new EBS multi-volume crash-consistent snapshot capability, you can use similar skills you had used on-premises for years to improve your overall experience in AWS.

The main advantage of not putting the database (or tablespaces) in backup mode is that it generates less redo. Generating less redo reduces checkpoint frequency and redo size, which reduces both log write and database writer I/O during the backup and also reduces database recovery time if you need to apply the logs during a recovery.

Test environment

The test environment contained the following elements:

An Oracle Database version 19c instance created in an EC2 instance (root volume is 50 GB)
An ASM instance with two disk groups (DATA and RECO)
- The disk group DATA consists of nine EBS volumes presented as ASM disks (5 GB each) and contains the datafiles
- The disk group RECO consists of four EBS volumes presented as ASM disks (30 GB each) and contains the redo log files, archive log files, and control files

A 19-GB table, in which each block contains only two rows. Create and populate the table with the following code:

$ cat create_table.sql
create table BDT (
id number(15)
,c1 varchar2(10)
)tablespace BDT;

insert /*+ APPEND */ into BDT select rownum
,'xxxxxxxxxx'
from dual connect by level <= 1;

alter table BDT minimize records_per_block;
truncate table BDT;

alter session set WORKAREA_SIZE_POLICY=MANUAL;
alter session set SORT_AREA_SIZE=1073741824;

insert /*+ APPEND */ into BDT select rownum
,'xxxxxxxxxx'
from dual connect by level <= 5000000;

An initial crash-consistent multi-volume snapshot, which you can create with the following code:

$ cat ./cli_create_multivol_table_created.sh
aws ec2 create-snapshots \
--instance-specification InstanceId=<your_instance_id>,ExcludeBootVolume=true \
--description "table created" \
--tag-specifications 'ResourceType=snapshot,Tags=[{Key="Name",Value="table created"}]'

During the tests in this post, you generate update activity on the table and simulate a need to recover the database after the update activity.

Using a crash-consistent snapshot and Oracle Storage Snapshot Optimization

The first test uses the crash-consistent snapshot feature and Oracle Storage Snapshot Optimization. Complete the following steps:

Update a column on the 19-GB table with the following code:

$ cat update_table.sh
sqlplus / as sysdba << EOF
execute dbms_workload_repository.create_snapshot;
update bdt.bdt set c1 = 'BDT';
commit;
execute dbms_workload_repository.create_snapshot;
EOF

The amount of redo generated is approximately 2.5 GB. See the following code:

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
redo size                             2,629,050,100    7,429,507.5 1.8778929E+08

Take another crash-consistent multi-volume snapshot with the following code:

$ cat ./cli_create_multivol_table_updated.sh
aws ec2 create-snapshots \
--instance-specification InstanceId=<your_instance_id>,ExcludeBootVolume=true \
--description "table updated" \
--tag-specifications 'ResourceType=snapshot,Tags=[{Key="Name",Value="table updated"}]

Simulating a use case

To simulate a need to recover the database, complete the following steps:

Stop the EC2 instance and detach all the EBS volumes (except the root volume) with the following code:

$ cat ./cli_detach_non_root.sh
# detach volumes from DATA
for datavol in `aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=attachment.instance-id,Values=<your_instance_id> Name=size,Values=5`
do
aws ec2 detach-volume --instance-id <your_instance_id> --volume-id $datavol
done

# detach volumes from RECO
for datavol in `aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=attachment.instance-id,Values=<your_instance_id> Name=size,Values=30`
do
aws ec2 detach-volume --instance-id <your_instance_id> --volume-id $datavol
done

Create volumes from the initial crash-consistent multi-volume snapshot (for the DATA disk group only) and attach them to the EC2 instance with the following code:

$ cat ./cli_create_data_volumes_test1.sh
# create DATA Volumes
for datasnaps in `aws ec2 describe-snapshots --query 'Snapshots[*].[SnapshotId]' --output text --filters Name=tag:Name,Values="table created" Name=volume-size,Values=5`
do
aws ec2 create-volume --availability-zone eu-west-3a --snapshot-id $datasnaps --tag-specifications 'ResourceType=volume,Tags=[{Key="Name",Value="Volume from table created snapshot"}]'
done

$ cat ./cli_attach_data_volumes_test1.sh
# Attach volumes for DATA
for voldev in 1:/dev/sdf 2:/dev/sdg 3:/dev/sdh 4:/dev/sdi 5:/dev/sdj 6:/dev/sdk 7:/dev/sdl 8:/dev/sdm 9:/dev/sdn
do
themax=`echo $voldev | cut -f1 -d:`
thedev=`echo $voldev | cut -f2 -d:`
volid=`aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=tag:Name,Values="Volume from table created snapshot" --max-items $themax | grep -v None | tail -1`
aws ec2 attach-volume --device $thedev --instance-id <your_instance_id> --volume-id $volid
done

Create volumes from the crash-consistent multi-volume snapshot taken after the update (for the RECO disk group only) and attach them to the EC2 instance with the following code:

$ cat ./cli_create_reco_volumes_test1.sh
# create RECO Volumes
for recosnaps in `aws ec2 describe-snapshots --query 'Snapshots[*].[SnapshotId]' --output text --filters Name=tag:Name,Values="table updated" Name=volume-size,Values=30`
do
aws ec2 create-volume --availability-zone eu-west-3a --snapshot-id $recosnaps --tag-specifications 'ResourceType=volume,Tags=[{Key="Name",Value="Volume from table updated snapshot"}]'
done

$ cat ./cli_attach_reco_volumes_test1.sh
# Attach volumes for RECO
for voldev in 1:/dev/sdo 2:/dev/sdp 3:/dev/sdq 4:/dev/sdr
do
themax=`echo $voldev | cut -f1 -d:`
thedev=`echo $voldev | cut -f2 -d:`
volid=`aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=tag:Name,Values="Volume from table updated snapshot" --max-items $themax | grep -v None | tail -1`
aws ec2 attach-volume --device $thedev --instance-id <your_instance_id> --volume-id $volid
done

The RECO disk group contains redo log files, control files, and archived redo log files. It has all the needed information to recover the database.

Recovering the database

To recover the database, complete the following steps:

Start the EC2 instance.

Start the database. See the following code:

SQL> startup
ORACLE instance started.

Total System Global Area 4999608360 bytes
Fixed Size                  8906792 bytes
Variable Size             822083584 bytes
Database Buffers         4110417920 bytes
Redo Buffers               58200064 bytes
Database mounted.
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DATA/BDT19/DATAFILE/system.257.1010589259'

Get the time of the initial snapshots (you can see it is unique across the snapshots). See the following code:

$ cat ./cli_describe_snapshots.sh
aws ec2 describe-snapshots --query 'Snapshots[*].[StartTime]' --output text --filters Name=tag:Name,Values="table created" Name=volume-size,Values=5 | uniq

$ sh ./cli_describe_snapshots.sh
2019-11-23T10:43:10.964Z

Recover the database using the Storage Snapshot Optimization (update the snapshot time accordingly based on the previous output and add a few seconds). See the following code:

$ cat ./recover_opti.sh
sqlplus / as sysdba << EOF
alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';
RECOVER database SNAPSHOT TIME '2019/11/23 10:43:50';
alter database open;
EOF

$ time sh ./recover_opti.sh

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 23 11:37:12 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL>
Session altered.

SQL> Media recovery complete.
SQL>
Database altered.

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

real    6m31.077s
user    0m0.007s
sys     0m0.021s

The time to recover is approximately 6 minutes.

Restoring the database

To restore the database as it was before the test, complete the following steps:

Stop the EC2 instance and detach all the EBS volumes except the root one. You can use the same script that you used previously.

Create volumes from the initial crash-consistent multi-volume snapshot (for the DATA and RECO disk groups) and attach them to the EC2 instance. See the following code:

$ cat ./cli_create_reset_volumes.sh
# create DATA Volumes
for datasnaps in `aws ec2 describe-snapshots --query 'Snapshots[*].[SnapshotId]' --output text --filters Name=tag:Name,Values="table created" Name=volume-size,Values=5`
do
aws ec2 create-volume --availability-zone eu-west-3a --snapshot-id $datasnaps --tag-specifications 'ResourceType=volume,Tags=[{Key="Name",Value="Volume to reset from table created snapshot"}]'
done
# create RECO Volumes
for recosnaps in `aws ec2 describe-snapshots --query 'Snapshots[*].[SnapshotId]' --output text --filters Name=tag:Name,Values="table created" Name=volume-size,Values=30`
do
aws ec2 create-volume --availability-zone eu-west-3a --snapshot-id $recosnaps --tag-specifications 'ResourceType=volume,Tags=[{Key="Name",Value="Volume to reset from table created snapshot"}]'
done

$ cat ./cli_attach_reset_volumes.sh
for voldev in 1:/dev/sdf 2:/dev/sdg 3:/dev/sdh 4:/dev/sdi 5:/dev/sdj 6:/dev/sdk 7:/dev/sdl 8:/dev/sdm 9:/dev/sdn 10:/dev/sdo 11:/dev/sdp 12:/dev/sdq 13:/dev/sdr
do
themax=`echo $voldev | cut -f1 -d:`
thedev=`echo $voldev | cut -f2 -d:`
volid=`aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=tag:Name,Values="Volume to reset from table created snapshot" --max-items $themax | grep -v None | tail -1`
aws ec2 attach-volume --device $thedev --instance-id <your_instance_id> --volume-id $volid
done

Start the EC2 instance.
Start the database.
The database instance starts and performs only instance recovery, as if the host had simply crashed. This is an excellent example of crash-consistent snapshot usage: simply restore and recover all the volumes as it was during the time of the snapshot.

Using the begin/end backup procedure

The second test uses the begin/end backup procedure. Complete the following steps:

Put the database in backup mode. See the following code:
```
SQL> alter database begin backup;

Database altered.
```

Take a snapshot looping on all the EBS volumes that belong to the DATA disk group with the following code:

$ cat ./cli_create_data_snapshots_test2.sh
# Snap DATA volumes
for datavol in `aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=attachment.instance-id,Values=<your_instance_id> Name=size,Values=5`
do
aws ec2 create-snapshot --description "DATA snap begin backup" --tag-specifications 'ResourceType=snapshot,Tags=[{Key="Name",Value="DATA snap begin backup"}]' --volume-id $datavol
done

Update a column on the 19-GB table (with the same script as in the first test) with the following code:

$ cat ./update_table.sh
sqlplus / as sysdba << EOF
execute dbms_workload_repository.create_snapshot;
update bdt.bdt set c1 = 'BDT';
commit;
execute dbms_workload_repository.create_snapshot;
EOF

The amount of redo generated is approximately 42 GB. See the following code:

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
redo size                            42,000,569,896   40,984,442.6 3.0000407E+09

Put the database out of the backup mode. See the following code:
```
SQL> alter database end backup;

Database altered.
```

Take a snapshot looping on all the EBS volumes that belong to the RECO disk group with the following code:

$ cat ./cli_create_reco_snapshots_test2.sh
# Snap RECO volumes
for datavol in `aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=attachment.instance-id,Values=<your_instance_id> Name=size,Values=30`
do
aws ec2 create-snapshot --description "RECO snap end backup" --tag-specifications 'ResourceType=snapshot,Tags=[{Key="Name",Value="RECO snap end backup"}]' --volume-id $datavol
done

Simulating a use case

To simulate a need to recover the database, complete the following steps:

Stop the EC2 instance and detach all the EBS volumes except the root one. Use the same code that you used previously.

Create volumes from the snapshots of the EBS volumes that belong to the DATA disk group (while the database was in backup mode) and attach them to the EC2 instance. See the following code:

$ cat ./cli_create_data_volumes_test2.sh
# create DATA Volumes
for datasnaps in `aws ec2 describe-snapshots --query 'Snapshots[*].[SnapshotId]' --output text --filters Name=tag:Name,Values="DATA snap begin backup" Name=volume-size,Values=5`
do
aws ec2 create-volume --availability-zone eu-west-3a --snapshot-id $datasnaps --tag-specifications 'ResourceType=volume,Tags=[{Key="Name",Value="Volume from DATA snap begin backup"}]'
done

$ cat ./cli_attach_data_volumes_test2.sh
# Attach volumes for DATA
for voldev in 1:/dev/sdf 2:/dev/sdg 3:/dev/sdh 4:/dev/sdi 5:/dev/sdj 6:/dev/sdk 7:/dev/sdl 8:/dev/sdm 9:/dev/sdn
do
themax=`echo $voldev | cut -f1 -d:`
thedev=`echo $voldev | cut -f2 -d:`
volid=`aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=tag:Name,Values="Volume from DATA snap begin backup" --max-items $themax | grep -v None | tail -1`
aws ec2 attach-volume --device $thedev --instance-id <your_instance_id> --volume-id $volid
done

Create volumes from the snapshots of the EBS volumes that belong to the RECO disk group (while the database was out of backup mode) and attach them to the EC2 instance. See the following code:

$ cat ./cli_create_reco_volumes_test2.sh
# create RECO Volumes
for recosnaps in `aws ec2 describe-snapshots --query 'Snapshots[*].[SnapshotId]' --output text --filters Name=tag:Name,Values="RECO snap end backup" Name=volume-size,Values=30`
do
aws ec2 create-volume --availability-zone eu-west-3a --snapshot-id $recosnaps --tag-specifications 'ResourceType=volume,Tags=[{Key="Name",Value="Volume from RECO snap end backup"}]'
done

$ cat ./cli_attach_reco_volumes_test2.sh
# Attach volumes for RECO
for voldev in 1:/dev/sdo 2:/dev/sdp 3:/dev/sdq 4:/dev/sdr
do
themax=`echo $voldev | cut -f1 -d:`
thedev=`echo $voldev | cut -f2 -d:`
volid=`aws ec2 describe-volumes --query 'Volumes[*].[VolumeId]' --output text --filters Name=tag:Name,Values="Volume from RECO snap end backup" --max-items $themax | grep -v None | tail -1`
aws ec2 attach-volume --device $thedev --instance-id <your_instance_id> --volume-id $volid
done

Recovering the database

To recover the database, complete the following steps:

Start the EC2 instance.

Start the database. See the following code:

SQL> startup
ORACLE instance started.

Total System Global Area 4999608360 bytes
Fixed Size                  8906792 bytes
Variable Size             822083584 bytes
Database Buffers         4110417920 bytes
Redo Buffers               58200064 bytes
Database mounted.
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DATA/BDT19/DATAFILE/system.257.1010589259'

Recover the database. See the following code:

$ cat recover_non_opti.sh
sqlplus / as sysdba << EOF
RECOVER database;
AUTO
alter database open;
EOF

$ $ time sh ./recover_non_opti.sh

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 23 13:06:46 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> ORA-00279: change 14646020 generated at 11/23/2019 12:15:13 needed for thread 1
ORA-00289: suggestion :
+RECO/BDT19/ARCHIVELOG/2019_11_23/thread_1_seq_507.323.1025094369
ORA-00280: change 14646020 for thread 1 is in sequence #507


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}.

. (truncating output for visibility)
.

Log applied.
Media recovery complete.
SQL> 
Database altered.

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

real    15m8.149s
user    0m0.025s
sys     0m0.025s

The time to recover is approximately 15 minutes.

Additional use cases

You could use crash-consistent snapshots in the following use cases:

Point-in-time recovery
Automation of test environments refresh and creation (without interacting at all with the source database)
Add an additional layer of security (if you use another backup strategy, such as using rman and backing up to S3)

Note: This blog post leverages Oracle database feature Storage Snapshot Optimization which is (at the time of this writing) EE specific and licensed under Oracle Advanced Compression option.

Conclusion

This post demonstrated that a database can generate far more redo logs while in backup mode (for the same update). The time to recover from the crash-consistent snapshots was far less than the time needed to recover from a standard begin backup/end backup strategy. Furthermore, you can easily use the crash-consistent snapshots to reconstruct the database (between the two tests) as it was before the first test (performing instance recovery only).

That said, if you run your Oracle Database instances on Amazon EC2 and use Amazon EBS volumes you might want to consider using the crash-consistent snapshots feature for the advantages mentioned above.

About the Author

Bertrand Drouvot is a Sr. Database Engineer with Amazon Web Services.

AWS Database Blog

Improving Oracle backup and recovery performance with Amazon EBS multi-volume crash-consistent snapshots

Test environment

Using a crash-consistent snapshot and Oracle Storage Snapshot Optimization

Simulating a use case

Recovering the database

Restoring the database

Using the begin/end backup procedure

Simulating a use case

Recovering the database

Additional use cases

Conclusion

About the Author

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help