Right-size Amazon RDS instances at scale based on Oracle performance metrics

Enterprise customers who are currently running on-premises, mission-critical applications on commercial databases are looking for cost-efficient, managed database service offerings. Amazon RDS is one of the target platforms for the relational database workload migration. RDS provides resizable capacity and manages time-consuming and undifferentiated heavy lifting administration tasks. For large-scale database migrations, there is a need for a scalable and effective solution to right-size a target RDS DB instance for a large number of databases.

In this post, we discuss the process for right-sizing DB instances at scale based on on-premises Oracle performance metrics. We show how to use Python and SQL scripts to collect the Oracle performance metrics from on-premises databases and how to use AWS Glue and Amazon Athena for data analysis and recommendations regarding DB instance size. This solution works for the DB instance sizing, with the scope ranging from one database to many databases.

Overview

To discover the on-premises Oracle workload, a SQL script is developed to query the Oracle Automatic Workload Repository (AWR) for hourly I/O, CPU, and memory usage statistics. A Python script reads from an input file that contains a list of databases to be discovered and loops through each database to run the SQL script. A .csv output file is generated for each database.

The target is to collect at least one month of performance metrics to obtain a better estimation of DB instance size. If AWR retention is set to less than one month, the scripts can be run multiple times. All the scripts are installed on an Amazon EC2 Linux instance and are executed from this central location. The Python script starts a SQL*Plus session and connects to Oracle databases remotely from the EC2 instance.

Each row in the database workload output file represents the performance metrics from one Oracle AWR snapshot. The key performance metrics collected in the output are the following:

Hourly maximum HOST CPU percentage usage
Hourly Oracle SGA and PGA usage
Hourly average physical reads and writes in megabits per second

The workload output files are uploaded to an Amazon S3 bucket. Athena, an easy-to-use, serverless interactive query service, is used to analyze the workload data in the S3 bucket. Large-scale datasets in S3 can be quickly analyzed in Athena using standard SQL after the schema is defined.

Athena is integrated by default with AWS Glue, a fully managed extract, transform, and load (ETL) service used to crawl workload data in S3 and discover the schema. The crawlers are created in AWS Glue to access the data in the S3 bucket, extract metadata, and create table definitions in the AWS Glue Data Catalog. To compare the on-premises database workload with the DB instance capacity, upload a .csv file to the same S3 bucket for lookup. The .csv file contains the target DB instance type, vCPU, memory, and maximum bandwidth configuration. For more information, see Amazon RDS hardware specifications.

The logic used for right-sizing DB instances based on Oracle performance metrics is coded in an Athena view:

CPU utilization—Use the hourly maximum HOST CPU percentage usage at the 90th percentile as the representative CPU workload. Using this value, multiply the number of CPUs configured on premises to determine the estimated CPU utilization. An extra 20% is added to this number as the headroom. If there are multiple database instances running on the same host, assume that the CPU usage is evenly distributed among the database instances. For Oracle RAC, use the sum of the estimated CPU utilization on all RAC nodes. The headroom percentage can be adjusted to account for the CPU clock speed difference. For more information, see EC2 instance CPU specifications).
Memory utilization—Use the maximum sum of Oracle SGA and PGA usage as the estimated memory utilization for Oracle. Add an extra 20% for OS to determine a target memory size requirement. For Oracle RAC, use the sum of the estimated memory utilization on all RAC nodes.
Bandwidth utilization—Use the sum of the hourly average physical reads and writes in megabits per second at the 90th percentile as the representative bandwidth utilization. An extra 20% is added to this number as the headroom. For Oracle RAC, use the sum of the estimated bandwidth utilization on all RAC nodes.
Closest match—Compare the preceding three estimated numbers (CPU utilization, memory utilization, and bandwidth utilization) with the DB instance CPU, memory, and bandwidth capacity. Find the closest match in the order of CPU, memory, and bandwidth.
Representative data point—The logic of choosing a data point as the representative CPU and throughput workload can be adjusted easily by modifying the view definition. Use Top-N or a higher percentile based on your own workload distribution.

Create the instance sizing report by querying the Athena view. The report can be saved as a .csv file, and shows the recommended DB instance type for each database. It also includes database size, on-premises CPU, memory, and bandwidth utilization, and target DB instance CPU, memory, and bandwidth capacity.

The following diagram illustrates the process of the solution covered in this blog:

Here’s how the process works:

Collect the database performance workload from large numbers of on-premises Oracle databases using Python and SQL scripts running from an EC2 instance.
Upload the workload output .csv files and the Amazon RDS instance configuration lookup .csv file from the EC2 instance to an S3 bucket.
Crawl and analyze the workload performance data and the Amazon RDS instance lookup data in S3 and create table metadata using AWS Glue.
Create a view in Athena to determine the appropriate target DB instance size and run the view to create the instance sizing report.
Download the report.

Solution

The code and the AWS CloudFormation templates to implement this solution can be found in the amazon-rds-sizing-sample GitHub repo.

The AWS CloudFormation templates provision the following resources:

instancesizing_CFN1.yaml
- An EC2 Linux instance using the Amazon Linux AMI with Python 2.7 and AWS CLI pre-installed to run the Oracle database performance metrics collection scripts
- An S3 bucket to store the output files of the Oracle database workload
- An IAM role attached to the instance with full access to the S3 bucket createdThe template automatically deploys the following tasks on the provisioned instance:
- Installs the Python extension module cx_Oracle.
- Installs Oracle Instant Client 18.3 Basis and SQL*Plus.
- Configures the Oracle environment variables in /home/ec2-user/.bash_profile.
- Creates tnsnames.ora with a sample database TNS entry.

instancesizing_CFN2.yaml
- An IAM role for AWS Glue
- An AWS Glue crawler to crawl performance metrics data uploaded to the S3 bucket
- An AWS Glue crawler to crawl the Amazon RDS instance type lookup .csv file
instancesizing_CFN3.yaml
- An Athena named query for the view definition to create the instance sizing report

Prerequisites

To follow along with this solution, you need the following resources:

An AWS account with permissions to access resources in Amazon EC2, Athena, AWS Glue, and Amazon S3.
An EC2 Linux instance in a virtual private cloud (VPC). This instance can connect to on-premises Oracle databases.
An Oracle Diagnostics Pack license. The Oracle database performance metrics collection script provided in this post queries dba_hist* tables in the Oracle catalog, for which you must have catalog permissions.
Oracle 11g Enterprise Edition or 12c Enterprise Edition (not including the Multitenant in Oracle Database), for working with the Oracle database data collection scripts.
The following AWS CloudFormation templates and Oracle performance metrics collection scripts downloaded from amazon-rds-sizing-sample
- instancesizing_CFN1.yaml
- instancesizing_CFN2.yaml
- instancesizing_CFN3.yaml
- instancesizingtools.zip

Step 1: Create AWS CloudFormation stack to provision an EC2 instance and a S3 bucket

The first AWS CloudFormation template creates an EC2 instance and an S3 bucket.

On the AWS CloudFormation console, choose Create stack. In the Select Template section, under Specify template, select Upload a template file. Select Choose File to upload the instancesizing_CFN1.yaml file, and then choose Next. For more information, see Step 3: Create the Stack.

On the Specify Details screen, provide values for the following fields:

Stack name—Enter a name for the stack.
InstanceType—Enter the instance type. The default is m4.large.
KeyName—Choose your key pair.
LinuxAMIidName—Enter the AMI ID of Amazon Linux AMI (HVM), SSD Volume Type. The default image includes the Python SDK and AWS CLI, which are required.
Name—Enter the value of the EC2 instance name tag; the default is WL.
S3Bucketname—Enter the S3 bucket name to keep the database workload data.
SecurityGroups—Choose the pre-created security group.
Subnet—Enter the subnet ID where the EC2 instance is to be created.
VPC—Choose the VPC where the EC2 instance is to be created.

On the Create page, acknowledge the creation of IAM resources and then choose Create. To monitor the stack creation progress, check the Events tab for the CloudFormation stack. After the CloudFormation stack is complete, verify that the resources have been created.

Step 2: Collect on-premises Oracle performance metrics

In the Resources tab for your stack, find Type AWS:EC2:Instance, and choose the Physical ID.

On the EC2 console, choose Connect. Follow the instructions to SSH to this EC2 instance as ec2-user.

Locate the zip file instancesizingtools.zip downloaded from the amazon-rds-sizing-sample GitHub repo and upload it to the directory /home/ec2-user on the EC2 instance. To unzip the file, run the following command:

unzip instancesizingtools.zip

The /instancesizingtools directory is created on the EC2 instance, which contains the performance metrics collection scripts. The structure of this directory is as follows:

instancesizingtools: Root directory
- output: Directory containing the script outputs.
- scripts: Directory containing the performance metrics collection scripts.
  - logfile: Directory containing the script log files.
  - dblist.txt: File containing the Oracle databases to be discovered.
  - run_aws_tool_awr.py: Python script to execute the SQL defined in awrsqllist.txt in the databases listed in dblist.txt.
  - oracle_performance.sql: SQL script to collect the database performance metrics
  - awrsqllist.txt: File containing the list of SQL files to be executed by the Python script. The content of this file is oracle_performance.sql.
- instancelookup:
  - rds_instance_type_csv: File containing the target DB instance type, vCPU, memory, and maximum bandwidth configuration. The M4 and R4 instance families are included in this sample file. More family types can be added according to your needs.

On the EC2 instance, use the following commands to verify:

The installation of AWS CLI and Python 2.7
The configuration of Oracle environment variables
The structure of the directory instancesizingtools

Create an Oracle user account with the privileges “create session” and “select_catalog_role” in the on-premises Oracle databases.

On the EC2 instance, create entries in $ORACLE_HOME/network/admin/tnsnames.ora with the connection information for the on-premises Oracle databases. A sample TNS entry is populated in tnsnames.ora for reference.

Update the file /home/ec2-user/instancesizingtools/scripts/dblist.txt with the Oracle database username, password, and tnsnames entry using the following format for all the databases to be analyzed. Each line represents one database. Oracle Instant Client 18.3 is installed using the AWS CloudFormation template. To connect to Oracle Database 10g, install Oracle instant client 12.1.0.2 manually.

#username, password, tns_entry

awsread,read123,orcl11

To collect the database performance metrics, run the following Python script on the EC2 instance:

cd /home/ec2-user/instancesizingtools/scripts

python run_aws_tool_awr.py dblist.txt > logfile/run_aws_tool_awr.log 2>&1

Verify the log file created in directory logfile and the outputs created in directory output. Correct any errors and rerun the script, if necessary.

Step 3: Upload performance metrics output and Amazon RDS instance type configuration .csv file to the S3 bucket

To recursively copy new and updated performance metrics output files from the output directory on the EC2 instance to the S3 bucket, run the following s3 sync command on the EC2 instance. Replace <S3bucketname> with the bucket name provisioned in Step 1.

cd /home/ec2-user/instancesizingtools/output

aws s3 sync . s3://<S3bucketname>/performancedata/ --exclude="*" --include="*.csv" --sse

S3 cp command is used to copy the local file rds_instance_type.csv to the S3 bucket. Replace <S3bucketname> with the bucket name provisioned in Step 1:

cd /home/ec2-user/instancesizingtools/instancelookup

aws s3 cp rds_instance_type.csv s3://<S3bucketname>/instancelookup/ --sse

The previous commands create two folders in the S3 bucket, respectively:

instancelookup
performancedata

A sample of the contents of the S3 bucket folders s3://<S3bucketname>/performancedata/ and s3://<S3bucketname>/instancelookup/ after the database workload output files are uploaded is shown in the following screenshots. The data in these two folders is crawled by AWS Glue crawlers.

Step 4: Create AWS CloudFormation Stack to provision two AWS Glue crawlers and then run the crawlers

The second AWS CloudFormation template creates two AWS Glue crawlers.

Repeat the stack creation process describled in Step 1 for the instancesizing_CFN2.yaml file. On the Specify Details page, provide the following values:

Stack name
S3Targetbucketname (enter the S3 bucket name provisioned in Step 1)

A database “dbinstancesize” is created in the AWS Glue Data Catalog. Two AWS Glue crawlers “dbinstancesizecrawler1” and “dbinstancesizecrawler2” are created. The crawlers are created to run on demand. You can define a time-based schedule for the crawlers based on your needs.

On the AWS Glue console, choose Crawlers, select the crawler dbinstancesizecrawler1, and choose Run crawler. Do the same to start the crawler dbinstancesizecrawler2.

These two crawlers crawl data in the Amazon S3 folders /performancedata and /instancelookup respectively. They each create one AWS Glue metadata table in the AWS Glue database dbinstancesize. The table name is the same as the Amazon S3 folder name.

When the crawler run is complete, check the table definition in the Athena console. The table schema definition is shown in the following screenshot.

Step 5: Create AWS CloudFormation Stack to provision an Amazon Athena named query and then create an Athena view

The third AWS CloudFormation template creates an Amazon Athena named query.

Repeat the stack creation process described in Step 1 for the instancesizing_CFN3.yaml file.

In the Athena console, on the Saved Queries tab, choose v_instance_sizing.

The named query is ported to a Query Editor window. To create the view v_instance_sizing, choose Run query.

Step 6: Create and download the instance sizing report

In the Athena console, on the Query Editor tab, open a new query tab and query the view v_intance_sizing using the following command:

select * from v_instance_sizing

For each source Oracle database, the result shows the following:

The recommended target instance type
The current database size
The source on-premises CPU, memory, and bandwidth utilization
The target DB instance CPU, memory, and bandwidth capacity settings

Choose Download. The result is in CSV format, as shown in the following screenshot.

Summary

In this post, we showed you a complete solution for right-sizing RDS DB instances based on Oracle performance metrics. This solution can be used to analyze a large number of databases and establish a baseline of infrastructure in AWS for your database migration workload.

About the Authors

Michelle Deng is a Sr. Cloud Database Architect at Amazon Web Services. She works with AWS customers to provide guidance and technical assistance about database migrations and Big data projects.

Udayasimha Theepireddy (Uday) has worked as a Sr. Cloud Database Architect at Amazon Web Services since November 2017. He works with internal Amazon customers to move several services from on-prem Oracle to Aurora, RDS PostgreSQL, RDS MySQL and Redshift databases.