Integrating Amazon RDS for Oracle with Amazon S3 using S3_integration

Amazon RDS for Oracle gives you the full benefits of a managed service solution. You can use the lift-and-shift approach to migrate your legacy Oracle database to Amazon RDS for Oracle and, as a result, reduce the need to refactor and change existing application components.

Data warehouse (DW) extraction is an integral part of most databases. It is common practice to keep extracted files in a shared file system, such as a Network File System (NFS), between the database host and the DW server for in-house databases.

Dealing with migration from on-premises Oracle to Amazon RDS for Oracle with shared file systems for DW loads requires extra attention for a smooth transition of existing shared transfer mechanisms. With that in mind, AWS enhanced Amazon RDS for Oracle in February 2019 with Amazon S3 integration. This option enables the seamless integration of Amazon RDS for Oracle database to Amazon Simple Storage Service (Amazon S3).

This post discusses Amazon S3 integration as an alternative to shared file systems between an application host and Amazon RDS for Oracle host and how you can use external tables to access data stored in Amazon S3 as an alternate solution. This post used an Oracle database running on an Amazon EC2 instance in place of a source Oracle in an on-premises database.

Prerequisites

Make sure that Amazon RDS for Oracle has sufficient space to accommodate your database as well as additional files to demonstrate uploading and downloading files from Amazon S3.

This post uses an Amazon S3 bucket accessible through the IAM role from an RDS instance.

Amazon S3 integration as an alternative to shared file systems

Many Oracle database applications use flat files to load data into the database. These files are generally stored in a shared file system volume that is mounted on both the application and Oracle database server.

Amazon S3 integration provides a secure way to use your S3 bucket and share files as an alternative to a shared file system, which you can access from both Amazon RDS for Oracle and other application hosts. Amazon S3 is a durable and reliable object storage service, and its lifecycle rules help you save costs by automating object transition from one storage class to another.

Solution overview

This post demonstrates loading a file from an application server into database landing tables. These landing tables are later transformed, and data is saved as an output flat file on a shared file system volume via an S3 bucket.

The solution contains the following steps:

Configure Amazon S3 integration on the Amazon RDS for Oracle database.
Configure an IAM role in the Amazon RDS for Oracle database.
Load data into a table using sqlloader.
Modify the data into the Oracle table.
Create a text file on local Amazon RDS for Oracle storage.
Verify that the text file has the correct data.
Execute a script to upload the text file from local storage to an S3 bucket and download the file from the S3 bucket to EC2 instance file system.
Verify that the file downloaded correctly.

The following diagram shows the configuration of using Amazon S3 integration in place of a shared file system. The diagram shows an on-premises solution design using a shared file system. It explains how data is received as input file from end users and saved as input file in a shared file system. The data is then loaded into the database, transformed, saved as output file in shared file system, and finally sent back to the end users again.

The following diagram shows an Oracle RDS solution design using the Amazon S3 integration option. It explains how data is received as an input file from the end user and saved as input file in shared file system. The data is then loaded into the RDS Oracle database, transformed, and saved as an output file in RDS Oracle local file system. It then copies the output file from RDS local storage to the shared file system, and sends that data to the end users again.

Configuring Amazon S3 integration on the Amazon RDS for Oracle database

To configure your Amazon S3 integration, complete the following steps:

Open the Amazon RDS console.
Choose Option groups, as shown in the screenshot preview below.
Choose Create group.
For Name, enter a name for your group.
This post enters the name S3-Integration-group.
For Description, enter a brief description of the group.
This post enters the description Integrate RDS-Oracle with S3 to download and upload files from S3 bucket.
For Engine, from the drop-down menu, choose Oracle engine
For Major engine revision, from the drop-down menu, choose database Oracle version.
Choose Create.
On the RDS console, choose Option groups.
Choose the group you made.
Choose Add option.
For Option, choose S3_INTEGRATION.
For Version, choose 0.
For Apply Immediately, select Yes.
Choose Add Option.
Verify that the option group has the correct information. Check if the correct oracle version is listed in “Option Group Properties”, and S3_integration option is added in “Options”

Add option group to database

Choose your database by clicking on database identifier on RDS Dashboard.
Choose Modify.
Scroll down the page and in “Database Options” section, Select s3_integration option group in “Option Group”.
Scroll down to the end of the page and click “Continue”
Choose Modify DB Instance.
Once the Instance modification is completed successful and database is available, select “Configuration” tab to confirm that the option group is listed in the instance configuration.

Configuring an IAM role in the Amazon RDS for Oracle database

To configure your IAM role, complete the following steps:

From the RDS console, choose your DB instance.
Choose Manage IAM roles.
For Add IAM roles to this instance, choose rds-s3-integration-role.
For Feature, choose S3_INTEGRATION.

Loading the input data into a table using sqlloader

Log in to your EC2 instance as a user with access to the Oracle client binary files sqlplus. Below is a sample input file that we can load as an input file in the database.

$ cat tabledataforuploadtoRDS.txt
SYS|ICOL$|TABLE|17-JAN-19|VALID|
SYS|I_USER1|INDEX|17-JAN-19|VALID|
SYS|CON$|TABLE|17-JAN-19|VALID|
SYS|UNDO$|TABLE|17-JAN-19|VALID|
SYS|C_COBJ#|CLUSTER|17-JAN-19|VALID|
SYS|I_OBJ#|INDEX|17-JAN-19|VALID|
SYS|PROXY_ROLE_DATA$|TABLE|17-JAN-19|VALID|
SYS|I_IND1|INDEX|17-JAN-19|VALID|
SYS|I_CDEF2|INDEX|17-JAN-19|VALID|
SYS|I_OBJ5|INDEX|17-JAN-19|VALID|

$sqlplus oraadmin@orclauth
SQL> create table BIGTAB_FROMTEXTFILE as select owner,object_name,object_type,created,status from dba_objects where 1=2;

Table created.

SQL> desc BIGTAB_FROMTEXTFILE
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 OBJECT_NAME                               NOT NULL VARCHAR2(128)
 OBJECT_TYPE                                        VARCHAR2(23)
 CREATED                                   NOT NULL DATE
 STATUS                                             VARCHAR2(7)

Create .dat file to load data into the database from input file:

$ cat loaddata.dat
load data
INFILE 'tabledataforuploadtoRDS.txt'
INTO TABLE BIGTAB_FROMTEXTFILE
APPEND
FIELDS TERMINATED BY '|'
(
 OWNER,
 OBJECT_NAME,
 OBJECT_TYPE,
 CREATED,
 STATUS)

Use below sqlldr command to load data into the database:

$ sqlldr oraadmin@orclauth control=loaddata.dat log=loaddata.log
Password:

SQL*Loader: Release 12.1.0.2.0 - Production on Mon Jun 17 04:07:24 2019

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Path used:      Conventional
Commit point reached - logical record count 10
Commit point reached - logical record count 10

Table BIGTAB_FROMTEXTFILE:
  10 Rows successfully loaded.

Check the log file:
  loaddata.log
for more information about the load.

SQL> select * from BIGTAB_FROMTEXTFILE;

OWNER                OBJECT_NAME                    OBJECT_TYPE     CREATED   STATUS
-------------------- ------------------------------ --------------- --------- -------
SYS                  ICOL$                          TABLE           17-JAN-19 VALID
SYS                  I_USER1                        INDEX           17-JAN-19 VALID
SYS                  CON$                           TABLE           17-JAN-19 VALID
SYS                  UNDO$                          TABLE           17-JAN-19 VALID
SYS                  C_COBJ#                        CLUSTER         17-JAN-19 VALID
SYS                  I_OBJ#                         INDEX           17-JAN-19 VALID
SYS                  PROXY_ROLE_DATA$               TABLE           17-JAN-19 VALID
SYS                  I_IND1                         INDEX           17-JAN-19 VALID
SYS                  I_CDEF2                        INDEX           17-JAN-19 VALID
SYS                  I_OBJ5                         INDEX           17-JAN-19 VALID

Update the data into the database table.

Update the owner as SYSTEM and status as INVALID using below SQL*PLUS command:

SQL> update BIGTAB_FROMTEXTFILE set owner='SYSTEM',status='INVALID';

10 rows updated.

SQL> select * from BIGTAB_FROMTEXTFILE;

OWNER                OBJECT_NAME                    OBJECT_TYPE     CREATED   STATUS
-------------------- ------------------------------ --------------- --------- -------
SYSTEM               ICOL$                          TABLE           17-JAN-19 INVALID
SYSTEM               I_USER1                        INDEX           17-JAN-19 INVALID
SYSTEM               CON$                           TABLE           17-JAN-19 INVALID
SYSTEM               UNDO$                          TABLE           17-JAN-19 INVALID
SYSTEM               C_COBJ#                        CLUSTER         17-JAN-19 INVALID
SYSTEM               I_OBJ#                         INDEX           17-JAN-19 INVALID
SYSTEM               PROXY_ROLE_DATA$               TABLE           17-JAN-19 INVALID
SYSTEM               I_IND1                         INDEX           17-JAN-19 INVALID
SYSTEM               I_CDEF2                        INDEX           17-JAN-19 INVALID
SYSTEM               I_OBJ5                         INDEX           17-JAN-19 INVALID

SQL> commit;

Commit complete.

Creating a text file on local Amazon RDS for Oracle storage

Use below SQL*PLUS command to create local directory in RDS Oracle instance.

SQL> exec rdsadmin.rdsadmin_util.create_directory(p_directory_name => 'ORCLAUTHUTLFILES');

PL/SQL procedure successfully completed.

Use below SQL*PLUS code to Create the output text file on RDS Oracle local storage using UTL_FILE.

SQL> declare
  ltype utl_file.file_type;
  ldir  varchar2(100) := 'ORCLAUTHUTLFILES';
  lfile  varchar2(100) := 'BIGTAB_FROMTEXTFILE_FromRDSOracle.txt';
  s varchar2(10) := '|';
begin
  ltype := utl_file.fopen(ldir,lfile,'w');
  for i in (select OWNER,OBJECT_NAME,OBJECT_TYPE,CREATED,STATUS from BIGTAB_FROMTEXTFILE)
     loop
       utl_file.putf(ltype,i.OWNER||s||i.OBJECT_NAME||s||i.OBJECT_TYPE||s||i.CREATED||s||i.STATUS);
	   UTL_FILE.NEW_LINE(ltype);
    end loop;
   utl_file.fclose(ltype);
 end;
 /

PL/SQL procedure successfully completed.

Verifying the text file is correct

Verify that the text file contains the correct data with the following code:

SQL> col filename format a40
SQL> set lines 1000 pages 100
SQL> select * from table(rdsadmin.rds_file_util.listdir(p_directory => 'ORCLAUTHUTLFILES')) where filename='BIGTAB_FROMTEXTFILE_FromRDSOracle.txt';

FILENAME                                 TYPE         FILESIZE MTIME
---------------------------------------- ---------- ---------- ---------
BIGTAB_FROMTEXTFILE_FromRDSOracle.txt    file             4020 17-JUN-19

The following code shows the RDS Oracle local output text file content:

SQL> col text format a80
SQL> select text from table(rdsadmin.rds_file_util.read_text_file('ORCLAUTHUTLFILES','BIGTAB_FROMTEXTFILE_FromRDSOracle.txt'));

 TEXT
--------------------------------------------------------------------------------
SYSTEM|ICOL$|TABLE|17-JAN-19|INVALID
SYSTEM|I_USER1|INDEX|17-JAN-19|INVALID
SYSTEM|CON$|TABLE|17-JAN-19|INVALID
SYSTEM|UNDO$|TABLE|17-JAN-19|INVALID
SYSTEM|C_COBJ#|CLUSTER|17-JAN-19|INVALID
SYSTEM|I_OBJ#|INDEX|17-JAN-19|INVALID
SYSTEM|PROXY_ROLE_DATA$|TABLE|17-JAN-19|INVALID
SYSTEM|I_IND1|INDEX|17-JAN-19|INVALID
SYSTEM|I_CDEF2|INDEX|17-JAN-19|INVALID
SYSTEM|I_OBJ5|INDEX|17-JAN-19|INVALID

Uploading the text file to an S3 bucket and downloading it to an EC2 instance file system

Execute the below script which uploads the output text file from RDS Oracle local storage to the S3 bucket, and downloads that file from the bucket to an EC2 instance file system.

Use the following code example to configure the parameter values:

# In Oracle userid create required directories and save files.
$ mkdir -p scriptAWS/log
$ mkdir /u01/filestr

#Copy below file as base script in /home/oracle/scriptAWS folder 

$ cat bscr_rdss3ft.sh
#!/usr/bin/bash
#Script which will call the rds transfer main script and maintains variable details
export SCRIPTPATH=/home/oracle/scriptAWS
export SCRIPTLOG=/home/oracle/scriptAWS/log
export DBUSERNAME=oraadmin
export DBPASSWD=oraadmin123
export DBTNSSTRG=orclauth
export RDSDIRNAME=ORCLAUTHUTLFILES
export RDSFILNAME=BIGTAB_FROMTEXTFILE_FromRDSOracle.txt
export S3BUCKETNAME=rdsarchivedata
export FILELFSPATH=/u01/filestr

source $SCRIPTPATH/rdss3integration.sh

echo "Script Retrun"
echo "Script is again back to base script"
echo "move log files"
mv $SCRIPTLOG/downloadtolfs-$UPLOADTASK.log $SCRIPTLOG/downloadtolfs-$UPLOADTASK-`date +%m%d%Y%H%M%S`.log
mv $SCRIPTLOG/uploadstatus-$UPLOADTASK.log $SCRIPTLOG/uploadstatus-$UPLOADTASK-`date +%m%d%Y%H%M%S`.log
mv $SCRIPTLOG/filetext-$UPLOADTASK.log $SCRIPTLOG/filetext-$UPLOADTASK-`date +%m%d%Y%H%M%S`.log
mv $SCRIPTLOG/file_taskid_$UUID.log $SCRIPTLOG/file_taskid_$UUID-$UPLOADTASK-`date +%m%d%Y%H%M%S`.log
mv $SCRIPTLOG/rdslfsdelete-${UPLOADTASK}-${RDSFILNAME}.log $SCRIPTLOG/rdslfsdelete-${UPLOADTASK}-${RDSFILNAME-}-`date +%m%d%Y%H%M%S`.log
mv $SCRIPTLOG/rdss3integration_$UUID.log $SCRIPTLOG/rdss3integration_$UUID-${UPLOADTASK}-`date +%m%d%Y%H%M%S`.log

The following code is the actual base script that uploads to Amazon S3 and downloads to the file system:

#Copy below file in /home/oracle/scriptAWS folder 
$ cat rdss3integration.sh
#!/usr/bin/bash

UUID=$(uuidgen)

getTaskId()
{
sqlplus -s $DBUSERNAME/$DBPASSWD@$DBTNSSTRG <<EOF > ${SCRIPTLOG}/rdss3integration_$UUID.log  2>&1
set heading off
set pages 0
set feedback off
set echo off
set lines 100
set trimspool on
set trimout on
spool ${SCRIPTLOG}/file_taskid_$UUID.log
SELECT rdsadmin.rdsadmin_s3_tasks.upload_to_s3(
        p_bucket_name    =>  '$S3BUCKETNAME',
        p_prefix         =>  '$RDSFILNAME',
        p_s3_prefix      => '',
        p_directory_name =>  '$RDSDIRNAME')
   AS TASK_ID FROM DUAL;
spool off
EOF
}

ERRCOUNT=0

runTask()
{
while :
do

UPLOADTASK=`tail -1 ${SCRIPTLOG}/file_taskid_$UUID.log`
sqlplus -s $DBUSERNAME/$DBPASSWD@$DBTNSSTRG <<EOF >> ${SCRIPTLOG}/rdss3integration_$UUID.log  2>&1
set heading off
set pages 0
set feedback off
set echo off
set lines 500
col text format a500
set trimspool on
set trimout on
spool ${SCRIPTLOG}/filetext-${UPLOADTASK}.log REPLACE
SELECT text FROM table(rdsadmin.rds_file_util.read_text_file('BDUMP','dbtask-$UPLOADTASK.log'));
spool off
EOF

FILESTATUS=`tail -1 ${SCRIPTLOG}/filetext-${UPLOADTASK}.log|awk '{ print $8}'`
echo `date`':'$FILESTATUS >> ${SCRIPTLOG}/uploadstatus-${UPLOADTASK}.log
if [[ $FILESTATUS = "finished" ]]
        then
        echo "file upload to s3 is finished "
        echo
        date  >> ${SCRIPTLOG}/downloadtolfs-${UPLOADTASK}.log
        aws s3 cp s3://$S3BUCKETNAME/$RDSFILNAME $FILELFSPATH >> ${SCRIPTLOG}/downloadtolfs-${UPLOADTASK}.log

###############################################################################################################
#Uncoment below code if you want to delete the file automatically
#sqlplus -s $DBUSERNAME/$DBPASSWD@$DBTNSSTRG <<EOF >> ${SCRIPTLOG}/rdss3integration.log  2>&1
#set heading on
#set pages 0
#set feedback on
#set echo on
#set lines 300
#set trimspool on
#set trimout on
#set serveroutput on
#spool $SCRIPTLOG/rdslfsdelete-${UPLOADTASK}-${RDSFILNAME}.log
#exec utl_file.fremove('$RDSDIRNAME','$RDSFILNAME');
#spool off
#EOF
##############################################################################################################
        return
elif [[ $FILESTATUS = "failed." ]]
        then
        echo "file upload to s3 is failed "
        exit 1
elif [[ $FILESTATUS = "Uploading" ]]
        then
        echo "file upload to s3 is in progress, please wait while the upload complete successfully"
else
if [[ $ERRCOUNT -le 180 ]]
        then
        echo "This is incrementing error $ERRCOUNT as task status was $FILESTATUS  " >> ${SCRIPTLOG}/errorcount-${UPLOADTASK}.log
        ERRCOUNT=`expr $ERRCOUNT + 1`
else
        echo "The S3 integration process encounter error from parent script which is causing this script to fail"
        echo "Please review your source script to ensure everything is working as expected"
        exit 1
fi
fi
        sleep 5
        done
}

getTaskId
runTask

Grant additional execute permission to the above created .sh file, and execute it using the command below.

$chmod 755 *.sh
$sh bscr_rdss3ft.sh

Uploading to Amazon S3 using Amazon S3 integration is a serialized process; you can upload only one file at a time. If a concurrent process attempts to upload the files, one process waits until the other upload is complete.

Verifying that the file is uploaded to Amazon S3 and downloaded correctly to file system

You can confirm that the file uploaded to Amazon S3 through the S3 console.

To confirm that the file downloaded to your local EC2 file system, enter the following code:

$ cd /u01/filestr/
$ ls -lt BIGTAB_FROMTEXTFILE_FromRDSOracle.txt
-rw-r--r--. 1 oracle oinstall 4020 Jun 17 05:04 BIGTAB_FROMTEXTFILE_FromRDSOracle.txt
$

Conclusion

This post described how to use Amazon S3 integration as a secure alternative to shared file systems. The shared script is not restricted to any specific Oracle directory; you can use the same script by changing the directory and file name. For example, you can use it to migrate Amazon RDS for Oracle data pump export dumps to an S3 bucket, and vice versa. As always, AWS welcomes your feedback, so please leave any comments below.

About the Authors

Bhavesh Rathod is an Oracle Database Cloud Architect with the Professional Services team at Amazon Web Services. He works as database migration specialist to help internal Amazon customers to move their on-premises database environment to AWS cloud database solutions.

Samujjwal Roy is a Database Specialty Architect with the Professional Services team at Amazon Web Services. He has been with Amazon for 15+ years and has led migration projects for internal and external Amazon customers to move their on-premises database environment to AWS cloud database solutions.

Amishi Shah is a DevOps Consultant with the Professional Services team at Amazon Web Services. She works with customers to build scalable, highly available and secure solutions in AWS cloud. Her focus area is leading enterprise customers through large technical, organizational, and SDLC transformations.