AWS Database Blog
Use the DBMS_CLOUD package in Amazon RDS Custom for Oracle for direct Amazon S3 integration
Amazon Relational Database Service (Amazon RDS) Custom is a managed database service for legacy, custom, and packaged applications that require access to the underlying operating system (OS) and database environment. Amazon RDS Custom automates database administration and operational tasks while allowing you as administrator to access and customize the underlying OS and database to support third-party applications that require privileged access.
While working with our customers, we realized that at times they need to directly access data stored in Amazon Simple Storage Service (Amazon S3) from Oracle Database using SQL or PL/SQL. Beginning with Oracle Database 19c (19.9 onwards) and Oracle Database 21c (21.3 onwards), you can install the DBMS_CLOUD package, which provides comprehensive support for working with data in object storage.
In this post, we demonstrate how to use the DBMS_CLOUD
package to transfer files between S3 buckets and directories in an RDS Custom for Oracle database. We also show how you can access data from Amazon S3 directly using Oracle features such as external tables and hybrid partition tables. The features provided by DBMS_CLOUD
could vary between different Oracle releases, so pay close attention to the steps in the post and make sure you reference DBMS_CLOUD in the Oracle Database 19c documentation. To avoid confusion, the option discussed in this post is for RDS Custom for Oracle, not for RDS for Oracle. RDS for Oracle offers S3 integration.
Solution overview
The high-level steps to implement this solution are as follows:
- Install the
DBMS_CLOUD
package and configure access controls. - Configure credentials for the
DBMS_CLOUD
package. - Create a sample schema for demonstration purposes.
- Upload data to an S3 bucket using
dbms_cloud.put_object
. - Download data from the S3 bucket using
dbms_cloud.get_object
. - Load data from the S3 bucket directly using
dbms_cloud.copy_data
. - Access external data on the S3 bucket using
dbms_cloud.create_external_table
. - Create a hybrid partition table with data in the S3 bucket.
- Delete data in the S3 bucket using
dbms_cloud.delete_object
. - Use
DBMS_CLOUD
on RDS Custom for Oracle read replicas.
The following diagram illustrates the solution architecture.
Prerequisites
This post assumes you have the following prerequisites in place prior to installing and using the DBMS_CLOUD
package:
- Oracle 19c (19.9 onwards) multi-tenant database running on RDS Custom for Oracle
- An S3 bucket
- An AWS Identity and Access Management (IAM) user with access key enabled
The following multitenant RDS Custom for Oracle instance orcl
was created for this post. As of this writing, DBMS_CLOUD
only supports a multitenant container database (CDB) architecture.
For more information on RDS Custom for Oracle database creation, refer to Working with RDS Custom for Oracle. The database details are as follows:
- Database version: 19.16 (July 2022 RU)
- CDB name:
<your CDB name>
- PDB name:
<your PDB name>
- Admin user:
admin
- Database endpoint:
<your RDS endpoint>
- Region:
<your region>
The S3 bucket <your bucket name>
has been created for this post with Amazon S3 server-side encryption in the <your region>
Region. For more information on bucket creation, refer to Step 1: Create your first S3 bucket.
To access the S3 bucket from the database, you must have the proper IAM credentials and an IAM policy. Create an IAM user with an IAM policy attached to it. For more information, refer to Creating an IAM user in your AWS account and Managing access keys for IAM users. The role and policy details used in this post are as follows:
- IAM user name:
<your iam user>
- Access key ID:
<your access key id>
- Secret access key:
<your secret access key>
- IAM policy:
This solution involves the creation and utilization of AWS resources. Therefore, it will incur costs on your account, such as RDS Custom for Oracle instance and database storage cost, Amazon S3 storage and data transfer cost, and more. Refer to AWS Pricing for more information.
Install the DBMS_CLOUD package and configure access controls
As of this writing, the DBMS_CLOUD
package must deployed by manually running scripts provided by Oracle with sysdba
privilege, because it’s not natively integrated by default. We cover the installation and configuration required for feature demonstration in this post. Refer to the My Oracle Support (MOS) note How To Setup And Use DBMS_CLOUD Package (Doc ID 2748362.1) for the latest installation and configuration instructions.
Although there is no database or listener restart required for this step, we recommend you pause RDS Custom automation. The pause helps prevent the modifications from interfering with RDS Custom automation.
Prepare the schema creation script
This step creates the user C##CLOUD$SERVICE
and grants appropriate privileges. The user is locked by default so that no connections are directly made as this user. Save the following script into a file named dbms_cloud_install.sql
and upload it to the Oracle software owner’s home directory in your RDS Custom database server. In this post, the directory is /home/rdsdb
.
Run the install script
Replace the SYS password with your actual SYS password and run the following command.
$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u sys/<your_sys_password> --force_pdb_mode 'READ WRITE' -b dbms_cloud_install -d /home/rdsdb -l /home/rdsdb dbms_cloud_install.sql
The command runs the dbms_cloud_install.sql
script in both the ROOT container and PDBs for Amazon RDS Custom for Oracle. You can retrieve your SYS password from AWS Secrets Manager; the secret name for your SYS user starts with do-not-delete-rds-custom-db-ABCDEFGHIJKLMNOPQRS0123456-
, in which db-ABCDEFGHIJKLMNOPQRS0123456
is the RDS resource ID.
After the command has completed successfully, check the logs for any errors. Log in to the CDB ROOT container using the SYS user to verify the package status:
Create an SSL wallet with certificates
DBMS_CLOUD
interacts with Amazon S3 using HTTPS protocol, so a wallet is required with these trusted certificates installed. Go to Amazon Trust Services to download Root CAs for the CN name from Amazon Root CA 1 to Amazon Root CA 4, and save them to the Oracle software owner’s home directory ($HOME
; the default value is /home/rdsdb
) in your RDS Custom database server.
If you have already created an Oracle wallet, skip this step and proceed to the next step to import these Root CAs into the wallet. Otherwise, run following commands to create the wallet using orapki.
You will be prompted with a wallet password. Note down the password; you will need this to modify the wallet in subsequent steps.
Run the following commands to import these Amazon Root CAs into the wallet. Input the same wallet password when prompted.
You can verify these imported certificates as follows:
Configure the Oracle database to use the SSL wallet
In order for the Oracle database to use this newly create SSL wallet, you need to add the following entries to your $ORACLE_HOME/network/admin/sqlnet.ora
file:
During RDS Custom for Oracle database restoration from snapshot or restoration to a point in time, the content of SQLNET.ORA
is reset to default. In these scenarios, follow the instructions in this step to reconfigure SQLNET.ORA
.
Configure database ACEs for DBMS_CLOUD
By default, Oracle Database doesn’t allow outgoing communication, and this is controlled by Access Control Entries (ACEs). Save the following script in a file named setup_aces.sql
and upload it to the Oracle software owner’s home directory in your RDS Custom database server. Make sure you have modified sslwaletdir
to match with your wallet directory.
Run the script in the CDB ROOT container with the SYS user:
Configure credentials for the DBMS_CLOUD package
In this section, we create a role (<your db role name>
) and a user (<your db user name>
) with minimal privileges sufficient to demonstrate DBMS_CLOUD
functionality. We also configure the user name and password for DBMS_CLOUD
to allow it to access Amazon S3 with valid IAM credentials.
Create the database role and user
Create the database role and user with the following SQL commands in the pluggable database (PDB) <your PDB name>
with a privileged administrator user, such as ADMIN
or SYSTEM
. Using a role makes privilege and access control management simpler.
Configure ACEs for the demorole in ORCL PDB
DBMS_CLOUD
is a package with invoker’s right privilege. You need to enable ACEs as you have done previously for user C##CLOUD$SERVICE
. Run the following commands with a privileged administrator user (SYSTEM
or SYS
) in the <your PDB name>
pluggable database:
Create DBMS_CLOUD credentials
To access data from the S3 bucket, which is not publicly accessible, you need to use a credential to authenticate with Amazon S3. Log in to <your PDB name>
with the database user <your db user name>
and run the following script to create credentials based on the access key ID and secret access key for the IAM user <your iam user>
:
After the credential is created, you can use the following query to list all credentials:
To verify the setup you have done so far, run the following query. Here, s3.ap-southeast-1.amazonaws.com
is the endpoint for the ap-southeast-1
Region. Find the S3 endpoint for your Region and replace it accordingly.
Because our S3 bucket is empty, it returns “no rows selected” without other error messages.
Create a sample schema
Complete the following steps to create a sample schema for demonstration purposes:
- Connect with the database user
<your db user name>
to the<your PDB name>
database:
- Run the following script to create the
emp
table with sample data. We keep the dataset small to produce a brief query output in our examples.
- Use the following sample dataset:
- Write the data into a CSV file named
emp.csv
in theDATA_PUMP_DIR
directory:
Upload data to the S3 bucket using dbms_cloud.put_object
You can use the dbms_cloud.put_object procedure to upload data to your S3 bucket.
- Run the following PL/SQL statements to upload the file
emp.csv
from the RDS Custom database server to the S3 bucket with paths3://<your bucket name>/orcl/emp.csv
:
- List the objects uploaded to the S3 bucket with the following SQL statements:
- You can also write BLOB data directly to the S3 bucket after processing it inside the database, as demonstrated in the following example:
Download data from the S3 bucket using dbms_cloud.get_object
You can use the dbms_cloud.get_object procedure to download data from the S3 bucket to a directory on the RDS Custom for Oracle database, or read the data directly from Amazon S3 and return BLOB to the database. The procedure has other advanced parameters, including offset and compression options; refer to GET_OBJECT Procedure and Function for more details.
- Run the following PL/SQL statements to download the file and save it as
emp2.csv
on the database server:
- Run the following SQL statements to read mylob.dat from the S3 bucket, which we created earlier:
Load data from the S3 bucket directly using dbms_cloud.copy_data
You can use the dbms_cloud.copy_data procedure to load data from the S3 bucket into existing Oracle tables. In this section, we use the previous uploaded file emp.csv
as a source, which is in CSV format. For Avro and Parquet file formats, refer to COPY_DATA Procedure for Avro or Parquet Files.
- Create an empty table called
emp_copy
prior to data loading:
- Run the following PL/SQL statements to load data from the S3 bucket to the table
emp_copy
:
Multiple format options are available to meet various file specifications and data processing needs. Refer to DBMS_CLOUD Package Format Options for more information.
During the data load, logs and bad data are written to the COPY$N_LOG
and COPY$N_BAD
tables, in which N
is the unique number for this data load.
- Query the table to verify the data loaded:
Access external data on the S3 bucket using dbms_cloud.create_external_table
You can use the procedure dbms_cloud.create_external_table to create an external table in the Oracle database, which reads data directly from data stored in the S3 bucket. We continue using emp.csv
as example; for Avro and Parquet file formats, refer to CREATE_EXTERNAL_TABLE Procedure for Avro or Parquet Files. You can create external partition tables to use the Oracle partitioning capability if you have multiple data files. Take note that table partitioning is part of the Oracle Partitioning option, which is a separate license option of Oracle Enterprise Edition.
- Run the following PL/SQL statements to create the external table
emp_ext
. You can specify multiple source files using a comma delimited list forfile_uri_list
. You can also use wildcards in the file names in your URIs. The character*
can be used as the wildcard for multiple characters, and the character?
can be used as the wildcard for a single character.
- Query the external table to verify the data:
Create a hybrid partition table with data in the S3 bucket
With the hybrid partition table feature, you can combine historical data in Amazon S3 for cost-efficient long-term retention, and combine your current data in an Oracle database into a single partitioned table. Take note that partitioning tables are part of the Oracle Partitioning option.
- Run the following PL/SQL statements to create the hybrid partition table
emp_hpt
:
- Query the table to verify the record:
- Insert a new record to the partition, in which data is stored inside the database:
- Query the table again to confirm it contains both records from the S3 bucket and database:
Delete data in the S3 bucket using dbms_cloud.delete_object
To delete files from the S3 bucket, you can use the procedure dbms_cloud.delete_object to delete the specific object in the object storage.
- Drop these tables using
emp.csv
as the external data source:
- Delete
emp.csv
andmylob.dat
from the S3 bucket:
Use DBMS_CLOUD on RDS Custom for Oracle read replicas
With Oracle Active Data Guard, you can open RDS Custom for Oracle replica databases in read-only mode and use them to run reporting applications, ad hoc queries, and data extraction jobs to offload your primary database. Oracle Active Data Guard is a separately licensed option on Oracle Enterprise Edition.
In the previous steps for DBMS_CLOUD
package installation, two steps made changes outside the database. These changes are not replicated to replicas via Data Guard Redo Transportation Services:
- Create an SSL wallet with certificates
- Configure the Oracle database to use the SSL wallet
For your existing read replicas, you must follow the same steps to configure the SSL wallet and Oracle database to use the wallet. You can choose to create a new SSL wallet and import the ROOT CA certificates or copy the SSL wallet folder from the primary database.
In the scenario of read replicas created after the DBMS_CLOUD
package is installed on the primary database, you only need to modify the SQLNET.ORA
file on the replicas to configure the Oracle database to use the wallet.
You won’t be able to create an external table or hybrid partition table on the read replicas, because databases are in read-only mode. You can create these tables in the primary database and query them from read replicas to access data.
Clean up
To avoid ongoing costs, delete the resources that you created as part of this post.
Conclusion
In this post, we showed you how to install the DBMS_CLOUD
package and configure it to allow the least privileged database user to access data in an S3 bucket. With this package, you can perform common operations on the S3 bucket, such as list file, upload file, download file, and delete file, directly from your Oracle database. We also showed how to copy data from the S3 bucket to an existing table, as well as how to access this data in the S3 bucket through an external table or hybrid partition table.
If you have any comments or questions, leave them in the comments section.
About the Author
Donghua Luo is a Senior RDS Specialist Solutions Architect. He works with AWS customers to migrate and achieve higher flexibility, scale, and resiliency with database services in AWS cloud.