AWS Database Blog
Automating File Transfers to Amazon RDS for Oracle databases
Many integrated Oracle applications use external files as input. Oracle databases access such files via a logical object called a database directory. Apart from accessing the application files, Oracle databases also use database directories to access data pump backups, external tables, reading logs, and more. In the traditional on-premises client-server architecture, the database administrator has to transfer the files to be processed from one server to another, log in to the database server to create an Oracle database directory object, and run the aforementioned tools. With Amazon Relational Database Service (Amazon RDS) for Oracle, some of those tasks are abstracted, as we show throughout this post.
Amazon RDS for Oracle gives you the benefits of a managed service solution that makes it easy to set up, operate, and scale Oracle deployments in the AWS Cloud. Amazon RDS for Oracle allows you to access files via database directory objects and native tools in the same ways you can access your on-premises Oracle databases. The main difference between Amazon RDS for Oracle and on-premises Oracle deployments is that Amazon RDS for Oracle is a managed service, therefore access to the underlying host is restricted in order to offer a fully managed service.
Because you can’t access the underlying operating system for your database in Amazon RDS for Oracle, to automate large numbers of file uploads, we need to build a solution using Amazon Simple Storage Service (Amazon S3) and AWS Lambda to load files into Amazon RDS for Oracle storage. If the number or size of the files to be transferred to your Amazon RDS for Oracle database is small or infrequent, you can manually move the files to Amazon S3, download the files from Amazon S3 to the Amazon RDS for Oracle database, and finally load or process the files in the database. However, when your business logic requires continual loading and processing of many files, automating this process allows IT organizations to spend their time on other tasks that bring more value to the company.
The purpose of this post is to demonstrate how you can use Amazon S3 and Lambda to automate file transfers from a host (on-premises or cloud-based) to an object database directory inside an Amazon RDS for Oracle database local storage.
Solution overview
This solution demonstrates the automation of file transfers from on premises to Amazon RDS for Oracle databases by using Amazon S3, Lambda, and AWS Secrets Manager. After the files have been uploaded to S3 buckets, an S3 event triggers a Lambda function responsible for retrieving the Amazon RDS for Oracle database credentials from Secrets Manager and copying the files to the Amazon RDS for Oracle database local storage. The following diagram shows this workflow.
The implementation of this solution consists of the following tasks:
- Create an S3 bucket for file uploads to Amazon RDS for Oracle database local storage.
- Create a Secrets Manager secret for retrieving credentials required to connect to the Amazon RDS for Oracle database.
- Create AWS Identity and Access Management (IAM) policies and roles required by the solution to interact with Amazon RDS for Oracle, Secrets Manager, Lambda, and Amazon S3.
- Create a Lambda function for the automation of the file transfers from Amazon S3 to Amazon RDS for Oracle local storage.
- Configure S3 events to invoke the function on new file uploads.
- Validate the solution.
Prerequisites
This post assumes that you can load files directly into Amazon S3 from the host where files are stored, and that you have provisioned an Amazon RDS for Oracle database with Amazon S3 integration. For detailed steps on how to perform this task, see Integrating Amazon RDS for Oracle with Amazon S3 using S3_integration.
This also process assumes the following AWS resources have already been provisioned inside your AWS account:
- A Linux-based workstation to perform deployments, cloud or on-premises
- Python 3.6 or 3.7 installed on the workstation used to create the AWS services
- The Amazon Command Line Interface (AWS CLI) installed and configured on the workstation used to create the AWS services
- The Lambda function must be created in private subnets
- Connectivity from private subnets to Secrets Manager using NAT Gateway or a VPC endpoint for the Lambda function to retrieve secrets
- RDS for Oracle database user with the following privileges:
- CREATE SESSION
- SELECT_CATALOG_ROLE
- EXECUTE on rdsadmin.rdsadmin_s3_tasks
- EXECUTE on rdsadmin.rds_file_util
- EXECUTE on rdsadmin.rdsadmin_util
Creating the S3 bucket
We need to create an S3 bucket or repurpose an existing bucket for file uploads to Amazon RDS. If you want to create a new bucket, use the following instructions:
- Log in to the Linux workstation where Python and the AWS CLI are installed, using the appropriate credentials via SSH or the terminal emulator of your choice. For example:
- Use a unique bucket name such as
s3-int-bucket-yyyymmdd-hhmiss
in your chosen Region:
- Create a folder under the newly created bucket called
incoming-files
:
Creating Secrets Manager secrets
The Lambda function needs a Secrets Manager secret in order to retrieve database credentials to access the Oracle databases securely. The following steps show how to create a new secret for your databases:
- Create a JSON document containing the information to be stored in the secret, using the following template:
- Obtain the values for each key pair using the following command:
- With the information retrieved from the AWS CLI, we can populate the template:
Note that for the engine value pair, we used oracle
instead of oracle-ee
.
- We use the JSON document to create the Secrets Manager secret. For simplicity purposes, we match the name of the secret to our database’s name (
oracle19
). See the following code:
- Retrieve the Access Resource Name (ARN) for the Secrets Manager secret to use in later steps:
Creating IAM policies
For this post, we create the IAM policy SecretsManagerReadOnly
for the Lambda function to use.
- Use the ARN for the Secrets Manager secret to create a file containing the policy granting permissions on Secrets Manager:
- Create a policy using the policy document:
- Verify if the policy was created correctly using the following command:
Creating IAM roles
Our Lambda function uses the role PythonRDSForLambdaRole
. To create the role, follow these steps:
- Create a file containing the appropriate service trust policy, which associates the new role with a specific AWS service:
- Create a role using the trust policy document:
- Verify the role was created:
- Obtain the AWS account number to use in the next steps.
- Attach policies to the role:
Replace the string ${myAWSAcct} with your AWS account number if you’re not running the commands from a UNIX or Linux shell.
The preceding code attaches the following policies:
SecretsManagerReadOnly
AmazonS3ReadOnlyAccess
AWSLambdaVPCAccessExecutionRole
AWSLambdaENIManagementAccess
Creating the Lambda function
Lambda is a serverless compute service that allows you to run code without having to provision servers, implement complex workload-aware cluster scaling logic, maintain integration with other services, or manage runtimes. Lambda natively supports many popular programming languages, such as Java, Python, Node.js, Go, PowerShell, C#, and Ruby. It also supports other programming languages via a Runtime API. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration. Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute power and runs your code based on the incoming request or event for any scale of traffic.
For this post, our function is responsible for automatically transferring files from the Amazon S3 bucket to the RDS for Oracle instance.
To create the function, we require custom Python libraries and Oracle libraries that are packaged alongside the Python code to be implemented for this solution. Complete the following steps:
- Make a note of the Python version installed on the machine where you create the package to deploy the Lambda function:
- Create a custom directory where your code and libraries reside. For this post, the new directory is created under the user’s home directory:
- Log in to Oracle OTN using your Oracle credentials and download the latest Oracle Instant Client libraries into the work directory
/home/ec2-user/s3ToOracleDir
. - Uncompress the downloaded files and delete them from the current directory:
- Delete Oracle Instant Client libraries not required by the Lambda function to reduce the size of the deployment package:
- Move the remaining files from the
instanclient_19_6
directory to the current directory and delete theinstantclient_19_6
directory and ZIP files downloaded for the installation:
- Install the
cx-Oracle
andlxml
Python modules required by the Lambda function to interact with the RDS for Oracle DB instance:
- Install the
libaio
library and copy it to the current directory:
- Create a Python script called
s3ToOracleDir.py
under the/tmp
directory using the following code. The sample code is for demonstration purposes only and for simplicity does not provide data encryption in transit. We recommend that your final implementation incorporates your organization’s security policies and AWS Security Best Practices.
- Copy the
s3ToOracleDir.py
Python script to the directory containing all the libraries and modules:
- Package all the files, making sure it’s done from within your working directory. (Otherwise, the function can’t find the programs and libraries required for it to run. It’s important to use relative paths for this step to prevent runtime issues with the Lambda function.) See the following code:
- Because the size of the resulting ZIP file is larger than 50 MB, you need to upload the file to Amazon S3, and from there, it can be deployed:
- After the package is uploaded, you can use it to create the function as follows:
This next step sets the environment variables for Lambda to function properly. In this way, you can alter the behavior of the function without having to change the code. The Lambda environment variable name RDSDIRECTORY
should match the name of the Oracle database directory that you use for file storage later on.
- Set the environment variables with the following code:
- Obtain obtain the subnet and security group IDs from the AWS Management Console or using the following AWS CLI commands:
Now we configure the networking and security attributes for the Lambda function for its correct interaction with the Amazon RDS for Oracle database. The function and database must be created in private subnets. Because the function interacts with Secrets Manager, you must enable outside internet access via a NAT Gateway or by creating a VPC endpoint for Secrets Manager.
- Configure the attributes with the following code:
Creating an S3 event notification
The final step is to associate the s3ToOracleDir
Lambda function with the S3 bucket we created in the earlier steps.
- On the Amazon S3 console, choose the bucket you created (for this post,
s3-int-bucket-20201119-184334
).
- Choose the Properties
- Scroll down to the Event notifications section and choose Create event notification.
- For Event name, enter a name (for this post,
s3ToOracleDir
). - For Prefix, enter
incoming-files/
, which is the name of the directory we created in the S3 bucket previously.
Make sure the prefix ends with the forward slash (/
).
- For Suffix, enter a suffix associated with the file extension that triggers the Lambda function (for this post,
.txt
).
- In the Event types section, select All object create events.
This selects the Put, Post, Copy, and Multipart upload completed events.
- For Destination, leave at the default Lambda function.
- For Specify Lambda function, leave at the default Choose from your Lambda functions.
- For Lambda function, choose the function we created (
S3ToOracleDir
).
- Choose Save changes.
Creating an Oracle directory
Next, we need to create an Oracle directory on the RDS for Oracle instance (if not already created) to be used for storing the files transferred to the database. The directory name must match the value set for the Lambda RDSDIRECTORY environment variable earlier in the process.
- To simplify operations, create two small SQL scripts containing the statements associated with creating the database directory and listing its contents, respectively:
- Connect to the database using any SQL client as a user who has execute privileges on the
rdsadmin
packages:
- Create the database directory using the
creatDirectory.sql
script:
- List the contents of the newly created directory using the
listDirectory.sql
script:
Validating the setup
The final step is for us to test the solution is working properly.
- Create a sample text file or use an existing file:
- Transfer the file to the S3 bucket under the
incoming-files
folder:
- Wait a few seconds and list the contents of the Amazon RDS for Oracle directory:
You should be able to see your test file listed in the directory.
Review the AWS CloudWatch logs associated with the Lambda function to troubleshoot any issues encountered during implementation. Some of the most common issues are associated with an incorrect setup of the Amazon S3 integration for Amazon RDS and communication problems with Secrets Manager or the RDS for Oracle instance due to incorrectly configured routes or security groups. For more information, see Troubleshooting issues in Lambda.
Conclusion
This post describes how to integrate Amazon RDS for Oracle, Amazon S3, Secrets Manager, and Lambda to create a solution for automating file transfers from Amazon S3 to Amazon RDS for Oracle local storage. You can further enhance this solution to call other Oracle PL/SQL or Lambda functions to perform additional processing of the files.
As always, AWS welcomes your feedback, so please leave any comments below.
About the Authors
Israel Oros is a Database Migration Consultant at AWS. He works with customers in their journey to the cloud with a focus on complex database migration programs. In his spare time, Israel enjoys traveling to new places with his wife and riding his bicycle whenever weather permits.
Bhavesh Rathod is a Senior Database Consultant with the Professional Services team at Amazon Web Services. He works as database migration specialist to help Amazon customers to migrate their on-premises database environment to AWS cloud database solutions.