Automating File Transfers to Amazon RDS for Oracle databases

Many integrated Oracle applications use external files as input. Oracle databases access such files via a logical object called a database directory. Apart from accessing the application files, Oracle databases also use database directories to access data pump backups, external tables, reading logs, and more. In the traditional on-premises client-server architecture, the database administrator has to transfer the files to be processed from one server to another, log in to the database server to create an Oracle database directory object, and run the aforementioned tools. With Amazon Relational Database Service (Amazon RDS) for Oracle, some of those tasks are abstracted, as we show throughout this post.

Amazon RDS for Oracle gives you the benefits of a managed service solution that makes it easy to set up, operate, and scale Oracle deployments in the AWS Cloud. Amazon RDS for Oracle allows you to access files via database directory objects and native tools in the same ways you can access your on-premises Oracle databases. The main difference between Amazon RDS for Oracle and on-premises Oracle deployments is that Amazon RDS for Oracle is a managed service, therefore access to the underlying host is restricted in order to offer a fully managed service.

Because you can’t access the underlying operating system for your database in Amazon RDS for Oracle, to automate large numbers of file uploads, we need to build a solution using Amazon Simple Storage Service (Amazon S3) and AWS Lambda to load files into Amazon RDS for Oracle storage. If the number or size of the files to be transferred to your Amazon RDS for Oracle database is small or infrequent, you can manually move the files to Amazon S3, download the files from Amazon S3 to the Amazon RDS for Oracle database, and finally load or process the files in the database. However, when your business logic requires continual loading and processing of many files, automating this process allows IT organizations to spend their time on other tasks that bring more value to the company.

The purpose of this post is to demonstrate how you can use Amazon S3 and Lambda to automate file transfers from a host (on-premises or cloud-based) to an object database directory inside an Amazon RDS for Oracle database local storage.

Solution overview

This solution demonstrates the automation of file transfers from on premises to Amazon RDS for Oracle databases by using Amazon S3, Lambda, and AWS Secrets Manager. After the files have been uploaded to S3 buckets, an S3 event triggers a Lambda function responsible for retrieving the Amazon RDS for Oracle database credentials from Secrets Manager and copying the files to the Amazon RDS for Oracle database local storage. The following diagram shows this workflow.

The following diagram shows this workflow.

The implementation of this solution consists of the following tasks:

Create an S3 bucket for file uploads to Amazon RDS for Oracle database local storage.
Create a Secrets Manager secret for retrieving credentials required to connect to the Amazon RDS for Oracle database.
Create AWS Identity and Access Management (IAM) policies and roles required by the solution to interact with Amazon RDS for Oracle, Secrets Manager, Lambda, and Amazon S3.
Create a Lambda function for the automation of the file transfers from Amazon S3 to Amazon RDS for Oracle local storage.
Configure S3 events to invoke the function on new file uploads.
Validate the solution.

Prerequisites

This post assumes that you can load files directly into Amazon S3 from the host where files are stored, and that you have provisioned an Amazon RDS for Oracle database with Amazon S3 integration. For detailed steps on how to perform this task, see Integrating Amazon RDS for Oracle with Amazon S3 using S3_integration.

This also process assumes the following AWS resources have already been provisioned inside your AWS account:

A Linux-based workstation to perform deployments, cloud or on-premises
Python 3.6 or 3.7 installed on the workstation used to create the AWS services
The Amazon Command Line Interface (AWS CLI) installed and configured on the workstation used to create the AWS services
The Lambda function must be created in private subnets
Connectivity from private subnets to Secrets Manager using NAT Gateway or a VPC endpoint for the Lambda function to retrieve secrets
RDS for Oracle database user with the following privileges:
- CREATE SESSION
- SELECT_CATALOG_ROLE
- EXECUTE on rdsadmin.rdsadmin_s3_tasks
- EXECUTE on rdsadmin.rds_file_util
- EXECUTE on rdsadmin.rdsadmin_util

Creating the S3 bucket

We need to create an S3 bucket or repurpose an existing bucket for file uploads to Amazon RDS. If you want to create a new bucket, use the following instructions:

Log in to the Linux workstation where Python and the AWS CLI are installed, using the appropriate credentials via SSH or the terminal emulator of your choice. For example:
```
ssh -i my-creds.pem ec2-user@myLinuxWorkstation
```

Use a unique bucket name such as s3-int-bucket-yyyymmdd-hhmiss in your chosen Region:

export myAWSRegion=us-east-1
export myS3Bucket=s3-int-bucket-20201119-184334
aws s3 mb s3://$myS3Bucket --region $myAWSRegion

Create a folder under the newly created bucket called incoming-files:

aws s3api put-object --bucket $myS3Bucket --key incoming-files/ \
--region $myAWSRegion

Creating Secrets Manager secrets

The Lambda function needs a Secrets Manager secret in order to retrieve database credentials to access the Oracle databases securely. The following steps show how to create a new secret for your databases:

Create a JSON document containing the information to be stored in the secret, using the following template:

db-secrets.json:
{
  "username": "RDS_User_For_S3_Transfer",
  "password": "XXXX",
  "engine": "oracle",
  "host": "FQDN_Of_RDS_EndPoint",
  "port": 1521,
  "dbname": "Name_Of_RDS_Database",
  "dbInstanceIdentifier": "Name_Of_RDS_Instance",
  "dbtype": "RDS"
}

Obtain the values for each key pair using the following command:

$ aws rds describe-db-instances --db-instance-identifier oracle19 \
--query "DBInstances[0].[MasterUsername,Engine,Endpoint,DBName,DBInstanceIdentifier]" \
--region $myAWSRegion 
[
    "oracle",
    "oracle-ee",
    {
        "Address": "oracle19.aaaaaaaaaaaa.us-east-1.rds.amazonaws.com",
        "Port": 1521,
        "HostedZoneId": "Z2R2ITUGPM61AM"
    },
    "ORACLE19",
    "oracle19"
]

With the information retrieved from the AWS CLI, we can populate the template:

db-secrets.json:
{
  "username": "s3trfadmin",
  "password": "MyPasswordGoesHere1234!",
  "engine": "oracle",
  "host": " oracle19.aaaaaaaaaaaa.us-east-1.rds.amazonaws.com",
  "port": 1521,
  "dbname": "ORACLE19",
  "dbInstanceIdentifier": "oracle19",
  "dbtype": "RDS"
}

Note that for the engine value pair, we used oracle instead of oracle-ee.

We use the JSON document to create the Secrets Manager secret. For simplicity purposes, we match the name of the secret to our database’s name (oracle19). See the following code:
```
myRDSDbName=oracle19
myAWSSecret=oracle19
aws secretsmanager create-secret --name ${myAWSSecret} \
--secret-string file://db-secrets.json \
--region $myAWSRegion
```

Retrieve the Access Resource Name (ARN) for the Secrets Manager secret to use in later steps:

aws secretsmanager describe-secret --secret-id $myAWSSecret \

--query "ARN" --output text \

--region $myAWSRegion

arn:aws:secretsmanager:us-east-1:123123123123:secret:oratrg19-NW8BK1

Creating IAM policies

For this post, we create the IAM policy SecretsManagerReadOnly for the Lambda function to use.

Use the ARN for the Secrets Manager secret to create a file containing the policy granting permissions on Secrets Manager:

secrets-manager-readonly-policy.json:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetRandomPassword",
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds",
                "secretsmanager:ListSecrets"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:123123123123:secret:oratrg19-NW8BK1"
        }
    ]
}

Create a policy using the policy document:

aws iam create-policy --policy-name SecretsManagerReadOnly \
--policy-document file://secrets-manager-readonly-policy.json \
--region $myAWSRegion

Verify if the policy was created correctly using the following command:
```
aws iam list-policies | grep '"SecretsManagerReadOnly"' 
```

Creating IAM roles

Our Lambda function uses the role PythonRDSForLambdaRole. To create the role, follow these steps:

Create a file containing the appropriate service trust policy, which associates the new role with a specific AWS service:

lambda-trust.json:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create a role using the trust policy document:

aws iam create-role --role-name PythonRDSForLambda \
--assume-role-policy-document file://lambda-trust.json

Verify the role was created:

aws iam list-roles |grep '"PythonRDSForLambda"

Obtain the AWS account number to use in the next steps.

myAWSAcct=$(aws iam list-roles --query 'Roles[*].[Arn]' --output text |\
grep 'PythonRDSForLambda$' | cut -d\: -f5)

Attach policies to the role:

aws iam attach-role-policy --role-name PythonRDSForLambda \
--policy-arn "arn:aws:iam::${myAWSAcct}:policy/SecretsManagerReadOnly"
aws iam attach-role-policy --role-name PythonRDSForLambda \
--policy-arn "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
aws iam attach-role-policy --role-name PythonRDSForLambda \
--policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
aws iam attach-role-policy --role-name PythonRDSForLambda \
--policy-arn "arn:aws:iam::aws:policy/service-role/AWSLambdaENIManagementAccess"

Replace the string ${myAWSAcct} with your AWS account number if you’re not running the commands from a UNIX or Linux shell.

The preceding code attaches the following policies:

SecretsManagerReadOnly
AmazonS3ReadOnlyAccess
AWSLambdaVPCAccessExecutionRole
AWSLambdaENIManagementAccess

Creating the Lambda function

Lambda is a serverless compute service that allows you to run code without having to provision servers, implement complex workload-aware cluster scaling logic, maintain integration with other services, or manage runtimes. Lambda natively supports many popular programming languages, such as Java, Python, Node.js, Go, PowerShell, C#, and Ruby. It also supports other programming languages via a Runtime API. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration. Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute power and runs your code based on the incoming request or event for any scale of traffic.

For this post, our function is responsible for automatically transferring files from the Amazon S3 bucket to the RDS for Oracle instance.

To create the function, we require custom Python libraries and Oracle libraries that are packaged alongside the Python code to be implemented for this solution. Complete the following steps:

Make a note of the Python version installed on the machine where you create the package to deploy the Lambda function:
```
pythonVersion="python$(python3 --version|cut -c8-10)"
```

Create a custom directory where your code and libraries reside. For this post, the new directory is created under the user’s home directory:
```
cd /home/ec2-user/
mkdir -p s3ToOracleDir
```

Log in to Oracle OTN using your Oracle credentials and download the latest Oracle Instant Client libraries into the work directory /home/ec2-user/s3ToOracleDir.

Uncompress the downloaded files and delete them from the current directory:

cd /home/ec2-user/s3ToOracleDir
unzip -oqq "oraInstantClient*.zip" -d .
rm -f oraInstantClientBasic19_6.zip oraInstantClientSDK19_6.zip

Delete Oracle Instant Client libraries not required by the Lambda function to reduce the size of the deployment package:
```
cd /home/ec2-user/s3ToOracleDir
rm instantclient_19_6/lib*.so.1[!9]*
```

Move the remaining files from the instanclient_19_6 directory to the current directory and delete the instantclient_19_6 directory and ZIP files downloaded for the installation:
```
cd /home/ec2-user/s3ToOracleDir
mv instantclient_19_6/* .
rmdir instantclient_19_6
```

Install the cx-Oracle and lxml Python modules required by the Lambda function to interact with the RDS for Oracle DB instance:
```
cd /home/ec2-user/s3ToOracleDir
pip3 install -t . cx-Oracle
pip3 install -t . lxml
```

Install the libaio library and copy it to the current directory:

cd /home/ec2-user/s3ToOracleDir
sudo yum install libaio -y
cp /usr/lib64/libaio.so.1 .

Create a Python script called s3ToOracleDir.py under the /tmp directory using the following code. The sample code is for demonstration purposes only and for simplicity does not provide data encryption in transit. We recommend that your final implementation incorporates your organization’s security policies and AWS Security Best Practices.

#Lambda function to transfer files uploaded into S3 bucket using RDS for Oracle S3 Integration
import cx_Oracle
import boto3
import sys
import os
from urllib.parse import unquote_plus
import json

# Variable definitions
jSecret = {}
flag = False
s3Bucket = os.environ['S3BUCKET']
rdsDirectory = os.environ['RDSDIRECTORY']
regName = os.environ['AWSREGION']
secretName = os.environ['SECRETNAME']
print(f'Environment Variables\n\n')
print(f'AWS Region: {regName}')
print(f'AWS Secret Alias: {secretName}')
print(f'Amazon S3 Bucket: {s3Bucket}')
print(f'AWS RDS Database Directory: {rdsDirectory}')

# Initializing AWS resources
print('Initializing AWS S3 - Resource')
s3 = boto3.resource('s3')
print('Initializing AWS S3 - Session')
session = boto3.session.Session()
print('Initializing AWS Secrets Manager - Client')
client = session.client(
        service_name='secretsmanager',
        region_name=regName
    )
print(f'Retrieving secret ({secretName}) from AWS Secrets Manager')
secValResp = client.get_secret_value(SecretId=secretName)
if None != secValResp.get('SecretString'):
    jSecret = json.loads(secValResp['SecretString'])
else:
    decoded_binary_secret = base64.b64decode(secValResp['SecretBinary'])
    jSecret = json.loads(decoded_binary_secret)
dsnTnsRds = cx_Oracle.makedsn(jSecret['host'],jSecret['port'],service_name=jSecret['dbname'])
print(f'Database Connection String: {dsnTnsRds}')
connRds = cx_Oracle.connect(user=jSecret['username'], password=jSecret['password'], dsn=dsnTnsRds)
print(f'Target Database Version: {connRds.version}')
        
# When creating the Lambda function, ensure the following setting for LD_LIBRARY_PATH
# LD_LIBRARY_PATH = /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib:/opt/python
def lambda_handler(event, context):
    try:
        c = connRds.cursor()
        c.arraysize = 500
        commands = []
        if 0 < len(event.get('Records','')):
            print("Beginning file transfers from AWS S3 to AWS RDS for Oracle")
            # Process each file loaded in S3 bucket
            for record in event['Records']:
                bucket = record['s3']['bucket']['name']
                fileName = unquote_plus(record['s3']['object']['key'])
                print(f"Transferring file s3://{bucket}/{fileName} to Oracle directory {rdsDirectory}")
                sql =  "SELECT rdsadmin.rdsadmin_s3_tasks.download_from_s3("
                sql += "p_bucket_name => '" + s3Bucket + "'," 
                sql += "p_s3_prefix => '" + fileName + "',"
                sql += "p_directory_name => '" + rdsDirectory + "')" 
                sql += " AS TASK_ID FROM DUAL"
                print(f"Running: {sql}")
                c.execute(sql)
                while True:
                    rows = c.fetchmany(100)
                    if not rows:
                        break
                    for row in rows:
                        print(f"Output: {row[0]}")
            flag = True
            e = "Success"
        else:
            e = "No S3 events detected"
            print(f'There are no new files to process on {s3Bucket}')
    except:
        statusCode = 500
        e = "Error Message: " + str(sys.exc_info()[1])
        print(f'{e}')
    else:
        statusCode = 200
    return {
        'statusCode': statusCode,
        'body': e
    }

Copy the s3ToOracleDir.py Python script to the directory containing all the libraries and modules:
```
cd /home/ec2-user/s3ToOracleDir
cp /tmp/s3ToOracleDir.py .
```

Package all the files, making sure it’s done from within your working directory. (Otherwise, the function can’t find the programs and libraries required for it to run. It’s important to use relative paths for this step to prevent runtime issues with the Lambda function.) See the following code:
```
cd /home/ec2-user/s3ToOracleDir
zip -q9r ../s3ToOracleDir.zip .
```

Because the size of the resulting ZIP file is larger than 50 MB, you need to upload the file to Amazon S3, and from there, it can be deployed:
```
aws s3api put-object --bucket $myS3Bucket --key lambda/ \
--region $myAWSRegion
aws s3 cp ../s3ToOracleDir.zip s3://${myS3Bucket}/lambda/
```

After the package is uploaded, you can use it to create the function as follows:

aws lambda create-function --function-name s3ToOracleDir \
--code S3Bucket=${myS3Bucket},S3Key=lambda/s3ToOracleDir.zip \
--handler s3ToOracleDir.lambda_handler \
--runtime ${pythonVersion} \
--role arn:aws:iam::${myAWSAcct}:role/PythonRDSForLambda \
--output text \
--region $myAWSRegion

This next step sets the environment variables for Lambda to function properly. In this way, you can alter the behavior of the function without having to change the code. The Lambda environment variable name RDSDIRECTORY should match the name of the Oracle database directory that you use for file storage later on.

Set the environment variables with the following code:

myRDSDirectory=UPLOAD_DIR
aws lambda update-function-configuration --function-name s3ToOracleDir \
--environment "Variables={LD_LIBRARY_PATH=/var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib:/opt/python,S3BUCKET=$myS3Bucket,RDSDIRECTORY=$myRDSDirectory,AWSREGION=$myAWSRegion,SECRETNAME=$myAWSSecret}" \
--output text \
--region $myAWSRegion

Obtain obtain the subnet and security group IDs from the AWS Management Console or using the following AWS CLI commands:

myRDSSubnets=$(aws rds describe-db-instances --db-instance-identifier ${myRDSDbName} --query "DBInstances[0].DBSubnetGroup.Subnets[*].SubnetIdentifier" --output text --region $myAWSRegion|sed "s/\t/,/g")

myRDSSecGrps=$(aws rds describe-db-instances --db-instance-identifier ${myRDSDbName} --query "DBInstances[0].VpcSecurityGroups[*].VpcSecurityGroupId" --output text --region $myAWSRegion|sed "s/\t/,/g")

Now we configure the networking and security attributes for the Lambda function for its correct interaction with the Amazon RDS for Oracle database. The function and database must be created in private subnets. Because the function interacts with Secrets Manager, you must enable outside internet access via a NAT Gateway or by creating a VPC endpoint for Secrets Manager.

Configure the attributes with the following code:

aws lambda update-function-configuration --function-name s3ToOracleDir \
--vpc-config SubnetIds=${myRDSSubnets},SecurityGroupIds=${myRDSSecGrps} \
--output text \
--region $myAWSRegion

Creating an S3 event notification

The final step is to associate the s3ToOracleDir Lambda function with the S3 bucket we created in the earlier steps.

On the Amazon S3 console, choose the bucket you created (for this post, s3-int-bucket-20201119-184334).

Choose the Properties

Choose the Properties

Scroll down to the Event notifications section and choose Create event notification.

Scroll down to the Event notifications section and choose Create event notification.

For Event name, enter a name (for this post, s3ToOracleDir).
For Prefix, enter incoming-files/, which is the name of the directory we created in the S3 bucket previously.

Make sure the prefix ends with the forward slash (/).

For Suffix, enter a suffix associated with the file extension that triggers the Lambda function (for this post, .txt).

For Suffix, enter a suffix associated with the file extension that triggers the Lambda function

In the Event types section, select All object create events.

This selects the Put, Post, Copy, and Multipart upload completed events.

This selects the Put, Post, Copy, and Multipart upload completed events.

For Destination, leave at the default Lambda function.
For Specify Lambda function, leave at the default Choose from your Lambda functions.
For Lambda function, choose the function we created (S3ToOracleDir).

For Lambda function, choose the function we created

Choose Save changes.

Creating an Oracle directory

Next, we need to create an Oracle directory on the RDS for Oracle instance (if not already created) to be used for storing the files transferred to the database. The directory name must match the value set for the Lambda RDSDIRECTORY environment variable earlier in the process.

To simplify operations, create two small SQL scripts containing the statements associated with creating the database directory and listing its contents, respectively:

cd ~
#Script for creating the database directory
echo "exec rdsadmin.rdsadmin_util.create_directory(p_directory_name => '$myRDSDirectory');" > createDirectory.sql 

#Script for listing the contents of the database directory, 
echo "SELECT * FROM TABLE(rdsadmin.rds_file_util.listdir(p_directory => '$myRDSDirectory'));" > listDirectory.sql

Connect to the database using any SQL client as a user who has execute privileges on the rdsadmin packages:
```
sqlplus s3trfadmin@${myRDSDbName}
```

Create the database directory using the creatDirectory.sql script:
```
SQL> @createDirectory.sql
```

List the contents of the newly created directory using the listDirectory.sql script:
```
SQL> @listDirectory.sql
```

Validating the setup

The final step is for us to test the solution is working properly.

Create a sample text file or use an existing file:
```
ls -l /etc > test_upload.txt
```

Transfer the file to the S3 bucket under the incoming-files folder:
```
aws s3 cp test_upload.txt s3://${myS3Bucket}/incoming-files/ 
```

Wait a few seconds and list the contents of the Amazon RDS for Oracle directory:
```
sqlplus s3trfadmin@${myRDSDbName}
SQL> @listDirectory.sql
```

You should be able to see your test file listed in the directory.

Review the AWS CloudWatch logs associated with the Lambda function to troubleshoot any issues encountered during implementation. Some of the most common issues are associated with an incorrect setup of the Amazon S3 integration for Amazon RDS and communication problems with Secrets Manager or the RDS for Oracle instance due to incorrectly configured routes or security groups. For more information, see Troubleshooting issues in Lambda.

Conclusion

This post describes how to integrate Amazon RDS for Oracle, Amazon S3, Secrets Manager, and Lambda to create a solution for automating file transfers from Amazon S3 to Amazon RDS for Oracle local storage. You can further enhance this solution to call other Oracle PL/SQL or Lambda functions to perform additional processing of the files.

As always, AWS welcomes your feedback, so please leave any comments below.

About the Authors

Israel Oros is a Database Migration Consultant at AWS. He works with customers in their journey to the cloud with a focus on complex database migration programs. In his spare time, Israel enjoys traveling to new places with his wife and riding his bicycle whenever weather permits.

Bhavesh Rathod is a Senior Database Consultant with the Professional Services team at Amazon Web Services. He works as database migration specialist to help Amazon customers to migrate their on-premises database environment to AWS cloud database solutions.

AWS Database Blog