AWS Database Blog
Archival solutions for Oracle database workloads in AWS: Part 2
This post is a continuation of Archival solutions for Oracle database workloads in AWS: Part 1. Part 1 explains three archival solutions that allow you to archive Oracle data into Amazon Simple Storage Service (Amazon S3). In this post, we explain three archival solutions using native Oracle products and utilities.
As covered in Part 1, all archival solutions fall into at least one category:
- Retire – In this strategy, you sunset or decommission the Oracle database. Data is migrated to lower-cost storage. You can uninstall or remove the Oracle database after migrating the data.
- Reduce – The archive data is migrated to lower-cost storage. The current data remains in the Oracle database. Users are able to aggregate data (current data and archived data) using an AWS service or standard SQL tool.
- Reorganize – Through logical or physical approaches, data continues to be managed in the Oracle database. Users are able to aggregate data (current data and archived data) using a standard SQL tool.
The following table summarizes the solutions we discuss, which part in the series we discuss them in, and which category the solution falls under.
Solution | Series Part | Retire | Reduce | Reorganize |
Simple Database Archival Solution (SDAS) | One | Yes | No | No |
AWS Glue (workflows and / or blueprints) | One | Yes | Yes | No |
AWS Database Migration Service (AWS DMS) | One | Yes | Yes | No |
Oracle Data Pump Access Driver | Two | No | Yes | No |
Oracle In-Database Archiving (IDA) | Two | No | No | Yes |
Oracle Partitioning | Two | No | No | Yes |
Archival solution options
You can implement an Oracle database archival solution in AWS. You can configure these archival solutions with Amazon Relational Database Service (Amazon RDS) for Oracle, Amazon RDS Custom for Oracle, or Oracle on Amazon Elastic Compute Cloud (Amazon EC2). All solutions in this post were tested using Amazon RDS for Oracle Enterprise Edition version 19c. With the exception of Oracle Partitioning, you can implement these archival solutions with Oracle Database Standard Edition 2. The archival solutions will run on Oracle database versions with available patching and premier support. This includes database versions 19c and 21c.
You should select a solution based on your existing skill sets, number of database objects to archive, volume of data to archive, Oracle data types used, network configuration, application or front end requirements, and complexity to set up and maintain archiving.
Oracle has multiple solutions to archive Oracle data. These include, but are not restricted to, Oracle In-Database Archiving (IDA), Oracle Data Pump Access Driver, and Oracle Partitioning.
Oracle archiving can also be implemented using Oracle Information Lifecycle Management (Oracle ILM). Oracle ILM allows you to define data classes and place them on different storage tiers. Oracle ILM can also use additional Oracle features, such as data partitioning, advanced row compression, hybrid columnar compression, automatic data optimization, heat map, direct NFS client, Clonedb, SecureFiles, In-Database Archiving, and Database File System (DBFS). You can implement Oracle ILM with Oracle on Amazon EC2. Amazon RDS for Oracle and Amazon RDS Custom for Oracle don’t permit storage tiering; therefore, Oracle ILM cannot be fully implemented with Amazon RDS. This post does not provide details on Oracle ILM.
Oracle data can also be archived data to Amazon Redshift or Amazon Aurora. These targets are relational databases. Amazon Redshift offers the best price-performance for cloud data warehousing. Amazon Aurora provides unparalleled high performance and availability at a global scale with full MySQL and PostgreSQL compatibility. This post does not cover Oracle to Amazon Redshift or Oracle to Amazon Aurora migrations.
Each archival solution has an overview section that provides more information. The solution should be determined by your requirements. The following table outlines three archival solutions for Oracle databases. The column definitions are as follows:
- Solution – Name of the solution (AWS service or Oracle product to implement Oracle data archiving).
- Application interface – If an interface (web-based, Java, and more) is provided with the solution, it will be noted in this column.
- Category – Denotes which approach (retire, reduce, or reorganize) is used.
- Archive data location – This column specifically lists where the archive data resides.
- Logical or physical – This column explains if the archive data is logically or physically separated from the source data.
- Record format – This column identifies the archival record format.
- Aggregation tool – This is the tool or service that allows you to join current data with archival data.
- CDC allowed – Change data capture (CDC) is permissible. This column identifies if data can continuously flow from the Oracle database to the archival location in near-real time.
- Comment – More information on the solution. This column may include limitations or restrictions.
Solution | Application interface | Category | Archive data location | Logical or physical | Record format | Aggregation tool | CDC allowed | Comment |
Oracle Data Pump Access Driver | No | Reduce | Amazon EFS, Amazon EBS, or Amazon S3 | Physical | Oracle Proprietary Format | Oracle SQL tool or application | No | Must use Oracle external tables to access data within Oracle Data Pump files. |
Oracle IDA | No | Reorganize | Within same Oracle table | Logical | Database record | Oracle SQL tool or application | No | Aggregating current and archival data requires setting the Oracle initialization parameter ROW ARCHIVAL VISIBILITY to ALL. SQL must be submitted to move current data to archive data. |
Oracle Partitioning | No | Reorganize | Different Oracle table | Physical | Database record | Oracle SQL tool or application | No | Data will be migrated from a partitioned source table to a partitioned archive table. |
Archival solution sample use cases
In Part 1, we reviewed Simple Database Archival Solution (SDAS), AWS Database Migration Service (AWS DMS), and AWS Glue using AWS Glue workflows and / or blueprints. The following tables provide an archival strategy based on multiple factors for Oracle Data Pump Access Driver, Oracle In-Database Archiving (IDA) and Oracle Partitioning.
Requirement | Description |
Objective | Archive data outside of an Oracle database running in AWS using proprietary Oracle formatted files. You must be able to access these files from Oracle databases using standard SQL tools (not accessible via Amazon Athena, Amazon Redshift Spectrum, and more). |
Solution | Oracle Data Pump Access Driver |
Category | Reduce |
Source | Amazon RDS for Oracle, Amazon RDS Custom for Oracle, Oracle on Amazon EC2 |
Target | Proprietary Oracle formatted files residing in Amazon EFS, Amazon EBS, or Amazon S3. The Oracle database must have access to the Oracle Data Pump files located on the AWS storage. |
Source database future state | Same as source database |
Reporting / analytics | Must be able to query current data and archived data in the database via reports or dashboards |
Comments | Each proprietary file represents a single Oracle table. If placed on Amazon EFS, multiple Oracle databases can access the archived data simultaneously. This solution is not advisable for dozens or hundreds of Oracle tables. You would need to set up and automate processes based on business requirements. |
Requirement | Description |
Objective | Archive data in the Oracle database in AWS. Keep the data in the same table. Have the capability to aggregate or join current and archival data. |
Solution | Oracle In-Database Archiving (IDA) |
Category | Reorganize |
Source | Amazon RDS for Oracle, Amazon RDS Custom for Oracle, Oracle on Amazon EC2 |
Target | Same as source database (same table) |
Source database future state | Same as source database |
Reporting / analytics | Must be able to query current data and archived data in the database via reports or dashboards |
Comments | Using a specific interval, you need to submit a SQL statement to continuously archive data. When a query is submitted against a table with Oracle IDA, the optimizer uses a filter against a hidden column to exclude archive data. |
Requirement | Description |
Objective | Archive data in the Oracle database in AWS. Move the source partitioned table to an archived partitioned Oracle table. Have the capability to aggregate current and archived data. |
Solution | Oracle Partitioning |
Category | Reorganize |
Source | Amazon RDS for Oracle, Amazon RDS Custom for Oracle, Oracle on Amazon EC2 |
Target | Same as source database (different table) |
Source database future state | Same as source database |
Reporting / analytics | Must be able to query current data and archived data in the database via reports or dashboards |
Comments | You must set up a job that runs on a specific interval to continuously archive data. Unlike Oracle IDA, this solution physically segregates current and archived data. |
Remove archival data from the source database
If you read Part 1, you can skip to “Oracle Data Pump Access Driver overview”. If the archival data is stored outside of the Oracle database, it is your responsibility to implement a process to remove the data from the database after the archival data has been validated. Although this post does not go in-depth on the ways to remove archival data from the source database or tables, here are a few high-level processes that you can perform: If a schema was retired, you can drop the schema. Methods to remove archive data from source tables include dropping or truncating tables and partitions. Additionally, you can submit a DELETE statement against an Oracle table or partition. Your use case should determine which approach should be used. These processes should be thoroughly tested prior to implementing in the production environment.
Sample dataset overview
We used an AWS sample dataset provided with Simple Database Archival Solution (SDAS) and containing approximately 10 GB of data. Table row counts range from a few records to more than 55 million records. Several indexes exist that are larger than 1.5 GB. The full code is available on the GitHub repo.
Oracle Data Pump Access Driver overview
The ORACLE_DATAPUMP
access driver allows you to read and write data in an Oracle proprietary, binary format that can only be accessed by the ORACLE_DATAPUMP
access driver. This data resides outside of the database; however, you can view it using SQL. You can also join this data with data in an Oracle database. The binary file resides in the pre-determined location for a database directory object. You can place these files in an Amazon Elastic Block Store (Amazon EBS) file system connected to the database host, within an Amazon Elastic File System (Amazon EFS) volume, in an Amazon S3 folder, or within a storage device accessible by the database host.
This solution requires scripting to generate the proprietary files, move the proprietary files to a common location, and create the external table DDL. An interface does not exist for this solution. This solution is not recommended for dozens or hundreds of Oracle tables.
After the external file is created, it cannot be modified; therefore, you cannot perform data manipulation language (DML) operations against it. However, you can read the file numerous times and use it as the source file for another external table in the same database or in a different database.
To use the ORACLE_DATAPUMP
access driver successfully, you must know the file format and record format (including character sets and field data types) of the data files in your environment. You must also be able to use SQL to create an external table and perform queries against the table it creates.
Not only is the data in a proprietary format that is only readable via the ORACLE_DATAPUMP
access driver, you can compress and encrypt the data. Compression (outside of basic compression) and encryption requires an Oracle software license. When the external files are created, you can create them in parallel, which generates multiple external Data Pump files per Oracle table.
The following steps provide an example to use the ORACLE_DATAPUMP
access driver to unload and view data. The example assumes that the directory object dpad_dir
already exists, and that database user has read and write access to it.
When an update, insert or delete statement is issued against the external table, it will fail.
If the ORACLE_DATAPUMP
file is shared with another Oracle database running the same version or higher as the Oracle database that created the file, it can access the data using SQL. You can copy or move the Oracle Data Pump file and use it for another external table in the same database or different database. Note that when you create an external table that uses an existing file, there is no AS SELECT clause for the CREATE TABLE statement. The example assumes that the directory object dpad_dir already exists, and that database user has read and write access to it. The ORACLE_DATAPUMP
file must be copied to another Amazon RDS for Oracle instance. After creating the external table, you can read the data directly from the ORACLE_DATAPUMP
file (outside of the database).
This approach allows you to archive data in ORACLE_DATAPUMP
files. These files can only be read via SQL tools that are connected to an Oracle database. You can create the Oracle Data Pump file and then remove the archived data from the source database.
There are two considerations with this solution. First, you must connect to an Oracle database and use Oracle external tables to access the data in the ORACLE_DATAPUMP
files. Amazon Athena cannot access archive data stored in ORACLE_DATAPUMP
files. Secondly, each Data Pump file or set of files (if created in parallel) only contains data for a single table. If you have a requirement to archive data from 50 tables, a minimum of 50 Data Pump files and 100 CREATE TABLE statements (one for creating the file and one for accessing the file) are required. This approach is not recommended for archiving data for dozens or hundreds of Oracle database tables.
Refer to The ORACLE_DATAPUMP Access Driver for more information.
Oracle In-Database Archiving (IDA) overview
Oracle IDA allows you to archive rows within a table by marking them as inactive. This is considered a soft delete. This makes the archived rows invisible to SQL and DML statements; however, you can reveal them as needed. Data is not physically moved. This process does not reduce the size of the database and does not have an interface (SQL must be summitted).
You can enable Oracle IDA when creating a table or when a table is altered. The data in these rows is available for compliance purposes if needed by setting a session parameter.
To manage Oracle IDA for a table, you must enable ROW ARCHIVAL for the table and manipulate the ORA_ARCHIVE_STATE
hidden column of the table. When enabled, all rows are visible. Additionally, newly inserted rows are always visible. To archive rows, an update statement must be issued to set the ORA_ARCHIVE_STATE
hidden column to a value other than 0. Optionally, you specify either ACTIVE or ALL for the ROW ARCHIVAL VISIBILITY session parameter. If the ROW ARCHIVAL VISIBILITY session parameter is set to ALL, SQL or DML statements will access the current and archived rows.
These inactive rows are in the database and you can optimize them using compression. To compress the archived data, you can partition the associated table on the ORA_ARCHIVE_STATE
column and then compress the partitions containing the archive data. The current data will be left in an uncompressed state because it’s frequently accessed and doesn’t impact performance. Additionally, you can set archived partitioned data to read only to exclude them from regular Oracle Recovery Manager (RMAN) database backups. Oracle Partitioning is an option for Oracle database Enterprise Edition (EE) and requires licensing.
With Oracle IDA, you can store more data for a longer period of time within a single database, without compromising application performance. You can compress archived data to help improve backup performance, and you can defer updates to archived data during application upgrades to improve upgrade performance.
The following SQL statements demonstrate how to enable Oracle IDA for a single table:
Create a table for the IDA demo.
Verify no existing hidden columns exist.
Verify the row count is still the same.
The following query fails because Oracle In-Database Archiving is not enabled.
Enable In-Database archiving.
Show all Oracle In-Database archiving tables owned by the current user.
Show that enabling In-Database archiving creates a hidden column.
Show all unique values of the ora_archive_state
column.
Once again, show the record count of all rows older than today.
Archive all rows older than today.