AWS Big Data Blog
Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler
Modern business applications rely on timely and accurate data with increasing demand for real-time analytics. There is a growing need for efficient and scalable data storage solutions. Data at times is stored in different datasets and needs to be consolidated before meaningful and complete insights can be drawn from the datasets. This is where replication tools help move the data from its source to the target systems in real time and transform it as necessary to help businesses with consolidation.
In this post, we provide a step-by-step guide for installing and configuring Oracle GoldenGate for streaming data from relational databases to Amazon Simple Storage Service (Amazon S3) for real-time analytics using the Oracle GoldenGate S3 handler.
Oracle GoldenGate for Oracle Database and Big Data adapters
Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. It captures and applies transactional changes in real time, minimizing latency and keeping target systems synchronized with source databases. It supports data transformation, allowing modifications during replication, and works with various database systems, including SQL Server, MySQL, and PostgreSQL. GoldenGate supports flexible replication topologies such as unidirectional, bidirectional, and multi-master configurations. Before using GoldenGate, make sure you have reviewed and adhere to the license agreement.
Oracle GoldenGate for Big Data provides adapters that facilitate real-time data integration from different sources to big data services like Hadoop, Apache Kafka, and Amazon S3. You can configure the adapters to control the data capture, transformation, and delivery process based on your specific requirements to support both batch-oriented and real-time streaming data integration patterns.
GoldenGate provides special tools called S3 event handlers to integrate with Amazon S3 for data replication. These handlers allow GoldenGate to read from and write data to S3 buckets. This option allows you to use Amazon S3 for GoldenGate deployments across on-premises, cloud, and hybrid environments.
Solution overview
The following diagram illustrates our solution architecture.
In this post, we walk you through the following high-level steps:
- Install GoldenGate software on Amazon Elastic Compute Cloud (Amazon EC2).
- Configure GoldenGate for Oracle Database and extract data from the Oracle database to trail files.
- Replicate the data to Amazon S3 using the GoldenGate for Big Data S3 handler.
Prerequisites
You must have the following prerequisites in place:
- An Oracle Database 19c or later, either on Amazon Relational Database Service (Amazon RDS) for Oracle or Amazon EC2.
- Make sure you have completed the steps in Preparing the Database for Oracle GoldenGate.
- Make sure that you have installed a Java development kit and configured ORACLE_HOME.
- An existing or new S3 bucket. To create a new S3 bucket, see Creating a bucket.
- An AWS Identity and Access Management (IAM) user. You can use temporary credentials; for more details, refer to Using temporary credentials with AWS resources.
- Make sure you have the right display settings and xclock is available. For more details, refer to How to enable X11 forwarding from Red Hat Enterprise Linux (RHEL), Amazon Linux, SUSE Linux, Ubuntu server to support GUI-based installations from Amazon EC2.
Install GoldenGate software on Amazon EC2
You need to run GoldenGate on EC2 instances. The instances must have adequate CPU, memory, and storage to handle the anticipated replication volume. For more details, refer to Operating System Requirements. After you determine the CPU and memory requirements, select a current generation EC2 instance type for GoldenGate.
Use the following formula to estimate the required trail space:
trail disk space = transaction log volume in 1 hour x number of hours down x .4
When the EC2 instance is up and running, download the following GoldenGate software from the Oracle GoldenGate Downloads page:
- GoldenGate 21.3.0.0
- GoldenGate for Big Data 21c
Use the following steps to upload and install the file from your local machine to the EC2 instance. Make sure that your IP address is allowed in the inbound rules of the security group of your EC2 instance before starting a session. For this use case, we install GoldenGate for Classic Architecture and Big Data. See the following code:
Install GoldenGate 21.3.0.0
Complete the following steps to install GoldenGate 21.3 on an EC2 instance:
- Create a home directory to install the GoldenGate software and run the installer:
- Run runInstaller:
A GUI window will pop up to install the software.
- Follow the instructions in the GUI to complete the installation process. Provide the directory path you created as the home directory for GoldenGate.
After the GoldenGate software installation is complete, you can create the GoldenGate processes that read the data from the source. First, you configure OGG EXTRACT.
- Create an extract parameter file for the source Oracle database. The following code is the sample file content:
- Add the EXTRACT on the GoldenGate prompt by running the following command:
- After you add the EXTRACT, check the status of the running programs with the info all
You will see the EXTRACT status is in the STOPPED state, as shown in the following screenshot; this is expected.
- Start the EXTRACT process as shown in the following figure.
The status changes to RUNNING. The following are the different statuses:
- STARTING – The process is starting.
- RUNNING – The process has started and is running normally.
- STOPPED – The process has stopped either normally (controlled manner) or due to an error.
- ABENDED – The process has been stopped in an uncontrolled manner. An abnormal end is known as ABEND.
This will start the extract process and a trail file will be created in the location mentioned in the extract parameter file.
- You can verify this by using the command
stats <<group_name>>
, as shown in the following screenshot.
Install GoldenGate for Big Data 21c
In this step, we install GoldenGate for Big Data in the same EC2 instance where we installed the GoldenGate Classic Architecture.
- Create a directory to install the GoldenGate for Big Data software. To copy the .zip file, follow these steps:
This will start the MANAGER program. Now you can install the dependencies required for the REPLICAT to run.
- Go to
/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader
and run the sh file with the latest version of aws-java-sdk. This script downloads the AWS SDK, which provides client libraries for connectivity to the AWS Cloud.
Configure the S3 handler
To configure an GoldenGate Replicat to send data to an S3 bucket, you need to set up a Replicat parameter file and properties file that defines how data is handled and sent to Amazon S3.
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
are the access key and secret access key of your IAM user, respectively. Do not hardcode credentials or security keys in the parameter and properties file. There are several methods available to achieve this, such as the following:
You can set these environment variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a secure method to set them temporarily:
Configure the properties file
Create a properties file for the S3 handler. This file defines how GoldenGate will interact with your S3 bucket. Make sure that you have added the correct parameters as shown in the properties file.
The following code is an example of an S3 handler properties file (dirprm/reps3.properties
):
Configure GoldenGate REPLICAT
Create the parameter file in /dirprm
in the GoldenGate for Big Data home:
Now you have successfully started the Replicat. You can verify this by running info and stats commands followed by the Replicat name, as shown in the following screenshot.
To confirm that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You can see that the table data has been replicated to Amazon S3 in JSON file format.
Best practices
Make sure that you are following the best practices on performance, compression, and security.
Consider the following best practices for performance:
- Optimize trail file storage by using high-performance storage systems for improved read/write operations. Refer to Amazon EBS-optimized instance types for more information.
- Monitor and tune GoldenGate parameters related to trail file management, such as trail file size, number of trail files, and trail file rollover settings, based on your workload characteristics.
- Monitor GoldenGate processing to identify and address performance bottlenecks. You can also monitor GoldenGate logs by using Amazon CloudWatch.
- Consider using GoldenGate’s different types of Replicats based on the requirements. For example, use Parallel Replicat for parallel processing to improve performance of heavy workloads.
The following are best practices for compression:
- Enable compression for trail files to reduce storage requirements and improve network transfer performance.
- Use GoldenGate’s built-in compression capabilities or use file system-level compression tools.
- Strike a balance between compression level and CPU overhead, because higher compression levels may impact performance.
Lastly, when implementing Oracle GoldenGate for streaming data to Amazon S3 for real-time analytics, it’s crucial to address various security considerations to protect your data and infrastructure. Follow the security best practices for Amazon S3 and security options available for GoldenGate Classic Architecture.
Clean up
To avoid ongoing charges, delete the resources that you created as part of this post:
- Remove the S3 bucket and trail files if no longer needed and stop the GoldenGate processes on Amazon EC2.
- Revert the changes that you made in the database (such as grants, supplemental logging, and archive log retention).
- To delete the entire setup, stop your EC2 instance.
Conclusion
In this post, we provided a step-by-step guide for installing and configuring GoldenGate for Oracle Classic Architecture and Big Data for streaming data from relational databases to Amazon S3. With these instructions, you can successfully set up an environment and take advantage of the real-time analytics using a GoldenGate handler for Amazon S3, which we will explore further in an upcoming post.
If you have any comments or questions, leave them in the comments section.
About the Authors
Prasad Matkar is Database Specialist Solutions Architect at AWS based in the EMEA region. With a focus on relational database engines, he provides technical assistance to customers migrating and modernizing their database workloads to AWS.
Arun Sankaranarayanan is a Database Specialist Solution Architect based in London, UK. With a focus on purpose-built database engines, he assists customers in migrating and modernizing their database workloads to AWS.
Giorgio Bonzi is a Sr. Database Specialist Solutions Architect at AWS based in the EMEA region. With a focus on relational database engines, he provides technical assistance to customers migrating and modernizing their database workloads to AWS.