Visualizing usage of Provisioned IOPS volumes on Amazon EBS for analysis

Organizations are always looking to right-size cloud infrastructure and optimize to cost. Historically, one of the areas where it has been difficult to right-size at scale are Provisioned IOPS volumes on Amazon EBS, as optimization usually required third-party tools. The recently announced AWS Compute Optimizer assists in solving that problem, as it helps customers optimize compute resources at scale through a detailed look at IOPS metrics.

Right-sizing Amazon EBS volumes is critical – under-utilized volumes means wasted resources, and over-utilized volumes don’t leave enough room for spiky workloads. Typically, developers and DevOps tend to over-provision IOPS to avoid any performance and throughput concerns. Amazon EBS bills io1 and io2 volumes based on storage size and Provisioned IOPS, so being able to correctly estimate the Provisioned IOPS is critical for cost optimization.

However, some customers require a greater level of granularity to properly optimize their IOPS rightsizing, and organizations sometimes need additional flexibility to:

Look at a different set of metrics than what the optimizer provides (for example, Amazon CloudWatch metrics for provisioned volumes listed here).
Perform custom calculations (for example, a weighted metric combining throughput along with IOPS utilization).
Create custom utilization reports and visualizations.

These custom metrics and visualizations can help with rightsizing by providing a more granular and holistic look into additional factors that can affect IOPS utilization.

In this blog post, we cover a scalable method for capturing Amazon CloudWatch metrics for all Provisioned IOPS volumes, and creating custom calculations (for example IOPS utilization) using those metrics. We also cover creating custom visualizations using services like AWS CloudFormation, AWS Lambda, AWS Glue, Amazon Athena, and Amazon QuickSight. Capturing and visualizing IOPS helps you identify under-utilized Amazon EBS volumes, which helps with cost optimization. Moreover, leveraging CloudWatch raw data gives you the flexibility to customize or extend this solution to fit your needs.

Solution overview: capturing and visualizing IOPS utilization metrics

Part 1: In this solution, an AWS CloudFormation template deploys an AWS Lambda function. Once invoked, the function captures CloudWatch metrics, which we then enhance and transform for IOPS calculations. The function generates a unique CSV file in Amazon S3 that contains metrics for the last 7 days for each of the io1 and io2 volumes. With minimal development effort, you can change the script to cover a longer timespan or capture other metrics.

Part 2: Once processed, you can use the generated CSV files with Athena or QuickSight for analytics and visualization (with some help from AWS Glue).

Prerequisites

For this tutorial, you should have the following prerequisites:

An AWS account
AWS resources: Provisioned IOPS volumes
AWS services: Amazon S3, AWS CloudFormation, Amazon CloudWatch, AWS Identity and Access Management (IAM), and AWS Lambda

Architecture

The CloudFormation template deploys a Lambda function that calculates the utilization using the VolumeReadOps and VolumeWriteOps data metrics from CloudWatch. The function iterates through all the AWS Regions and calculates IOPS utilization for io1 and io2 EBS volumes while skipping all other volume types.

Note: Currently, this function does not support cross-account access. Organizations with multiple accounts can manually run the CloudFormation template in each account.

Visualizing usage of Provisioned IOPS volumes on Amazon EBS for analysis

The following examples provide sample dashboards created with QuickSight that help identify utilization trends by visualizing statistics like minimum, median, average, and maximum IOPS.

Example 1: Sample IOPS utilization by volume – Last 7 days – Bar chart

This QuickSight report captures the min, median and average IOPS utilization across four EBS volumes for the last 7 days. The chart shows the average and median utilization being less than 2% for all those volumes. The median metric helps discover any skewed distribution in utilization. Combined with the average metric, the median metric can help detect under-utilized volumes.

Example 1 - Sample IOPS utilization by volume - Last 7 days - Bar chart

Example 2: Sample IOPS utilization by volume and Region – Last 7 days – Tabular View

Another way to look at the generated data is to create a drill-down view showing the daily IOPS utilization across volumes and Regions. This view captures the minimum, median, average, maximum, and standard deviation on a daily basis to get more granularity into this particular IOPS utilization pattern.

In this example, volume vol-0153980ef410786cb has a max utilization of 104% on day 24, which is substantially different from the max utilization for the preceding 7 days. In this case, you may need further workload analysis to find the root cause. Alternatively, you can look for a higher standard deviation, which also indicates significant difference in workload.

Example 2 - Sample IOPS utilization by volume and Region - Last 7 days - Tabular View

Metrics source

You can query CloudWatch metrics in two different ways, GetMetricData and GetMetricStatistics. This solution uses GetMetricData API, as you can retrieve data faster and the returned data supports metric math, ordering, and pagination. You can read more about choosing the right approach for your use case here.

Calculations

The following are the calculations made and visualized in this solution:

Total operations per second: CloudWatch metrics capture VolumeReadOps and VolumeWriteOpsat one-minute intervals. Total operations are calculated as the sum of read and write operations per minute, then divided by 60 to calculate the operations per second
- VolumeWriteOps: Total number of write operations
- VolumeReadOps: Total number of read operations
- TotalOpsPerSecond = (VolumeWriteOps + VolumeReadOps)/60
IOPS utilization: The percentage of used IOPS out of the available IOPS
- Provisioned IOPS: The # of IOPS provisioned for the volume
- IOPS utilization (%)= (TotalOpsPerSecond / Provisioned IOPS) * 100

Authorization

The CloudFormation template deploys an IAM role (“EBSlambdaRole”) that enables the AWS Lambda function to generate the CSV files, with relevant metrics, and to save them in Amazon S3.

Output

The Lambda function generates CSV files containing metrics for the last 7 days in the Amazon S3 results bucket

The following screenshot shows a sample CSV file for an io1 volume:

shows a sample CSV file for an io1 volume

Key fields captured in the CSV are:

Region: The Region the volume belongs to
Read Sum: Read operations
Write Sum: Write operations
Total: Sum of read and write operations
Total per second: Total operations per second
Provisioned IOPS: IOPS provisioned for the volume
Utilization: IOPS utilization

Analytics and visualization

You can analyze files stored in Amazon S3 to gain insights into the overall utilization trends. Questions such as “which volumes have a utilization peak higher than x?” or “which volume utilization over x days is y?” can be figured out using visualization tools such as Athena and QuickSight.

Note: QuickSight analysis is not included in the CloudFormation template, but instructions on how to visualize using Athena and QuickSight are included in the blog.

Part 1: Generating CSV files with AWS CloudFormation template and AWS Lambda function

Note that the script provided is not production ready and may need changes and enhancements based on your needs – we did not configure this script for a cross-account solution. You can also enhance the solution to include other CloudWatch metrics or custom metrics, change it to incorporate other types of EBS volumes, and extend the period.

Step 1: Download the AWS CloudFormation template into your local computer.

Step 2: Log in to your AWS account and create an S3 bucket with a name of your choice. Copy the file “index.zip” into it (the AWS Lambda function). The screenshot shows an S3 bucket with the name “ebs-analyzer” containing the file “index.zip.”

Putting an AWS Lambda function in an Amazon S3 bucket

Step 3: Create a new CloudFormation stack using the AWS Management Console. Choose Upload a template file as the Template source and upload the file “EBSAnalyzer.yaml” (CloudFormation template). Choose Next.

uploading a new AWS CloudFormation template using the AWS Management Console

Step 4: Provide a Stack name of your choice along with four input Parameters:

CodeBucketName: The S3 bucket name created in step 2.
CodeFileName: “zip”.
FunctionName: Name of the function used to create the Lambda function.
ResultsBucketName: Name of your choice – CloudFormation creates this.

Choose Next.

Specify stack details

Step 5: Optionally, on the Configure stack options screen, enter a tag name for future reference, and then choose next.

Optionally, on the Configure stack options screen, enter a tag name for future reference, and then choose next.

On the review step, check I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create stack.

I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create stack

Step 6: The CloudFormation template may take a few minutes to complete execution and deploying the resources. Wait for CREATE_COMPLETE to show on the left panel.

The CloudFormation template may take a few minutes to complete execution and deploying the resources

Step 7: Once deployed, navigate to AWS Lambda using the console, create a test script (empty JSON), and click Test.

Once deployed, navigate to AWS Lambda using the console, create a test script (empty JSON), and click Test.

Run output

Once the Lambda function finishes execution, it generates and stores metrics in the output bucket in Amazon S3.

The following sample screenshot shows an Amazon S3 output bucket containing four CSV files created, one for each Provisioned IOPS volume.

An S3 output bucket containing four CSV files created, one for each Provisioned IOPS volume.

Part 1 summary:

We deployed the solution using CloudFormation and ran the Lambda function to generate CSV files. Next, we can start looking at how to use the data for visualization and analysis.

Part 2: Visualization and optimization

Identification of under-utilized EBS volumes can help with cost optimization. Once identified, rightsizing is critical as under-utilized volumes means wasted resources, while over-utilized volumes don’t leave enough room for spiky workloads.

Once you have identified these under-utilized volumes, you may decide to reconfigure them to reduce the number of Provisioned IOPS. You could also consolidate and decommission other volumes, or even switch some of them to different volume types (general purpose or standard) that may be more appropriate and cost-effective for your usage. All of these strategies can help with cost optimization and identifying over-utilized volumes, which could be affecting application performance. In these cases, you could improve performance by upgrading to a different volume type or provision more IOPS.

Analytics: Using Athena and AWS Glue

AWS Glue discovers your data and stores the associated metadata in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available within Amazon Athena.

Creating a database in the AWS Glue Data Catalog

Step 1: Open AWS Glue using the AWS Management Console, choose Crawlers from the menu, and click Add Crawler. Provide a Crawler name of your choice and click Next.

Adding a crawler

Step 2: Select the Crawler source type as Data stores and click Next.

Select the Crawler source type as Data stores and click Next

Step 3: Choose data store as S3 and specify the path to the S3 bucket where you stored the CSV files generated by the Lambda function, and then click Next. On the next screen, choose No to Add another data store, and then Next.

Choosing your data store (Amazon S3) - only adding one data store for this example.

Step 4: For an IAM role, you can let AWS Glue Create an IAM role with the permissions needed. Provide a name for the IAM role and click Next.

AWS Glue creates an IAM role that you can name when creating your database in the AWS Glue Data Catalog

Step 5: For frequency, select Run on demand or use a scheduled frequency (for example, daily), based on your use case.

For frequency, select Run on demand or use a scheduled frequency (for example - daily), based on your use case.

Step 6: Provide a Database name and click Next. Finally, review all steps on the summary page; afterward, click Finish.

Provide a Database name and click Next. Finally, review all steps on the summary page; afterward, click Finish

Step 7: To run the crawler, select Crawlers from the left side menu, select the crawler you wish to run and click the Run Crawler button. The crawler will execute, populate the metadata, and create the database.

Running the crawler

Once the crawler completes, you can analyze the results using Athena, as detailed in the next section.

Amazon Athena setup instructions

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

Step 1: Open Athena using the AWS Management Console. On successful completion of the preceding section, the database should be visible in query editor in the Database dropdown.

Database visible in the database dropdown in the Amazon Athena console

Step 2: Run the following query on the table to view utilization numbers per volume per Region.

SELECT * FROM "iops-utilization-analyzer-db"."iops_utilization_analyzer_results";

Replace iops-utilization-analyzer-db with your database name.
Replace iops_utilization_analyzer_results with your table name.

The following is the output, with metrics like provisioned logs and write sum:

Output, with metrics like provisioned logs and write sum

Step 3: Create a view split by day, week, month, and year to help perform aggregations. The following is a sample view statement:

CREATE OR REPLACE VIEW iops_utilization_results_v AS
SELECT
"region"
, "volume id"
, "write sum"
, "total"
, "round"("total per second", 2) "total_per_second"
, "provisioned iops"
, "round"("utilization", 3) "utilization"
, "date_parse"("write date", '%m/%d/%Y %H:%i:%s') "date_full"
, "year"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_year"
, "month"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_month"
, "week"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_week"
, "day"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_day"
FROM
"iops-utilization-analyzer-db".iops_utilization_analyzer_results

The sample output displayed in the following screenshot includes date_year, date_month, date_week, and date_day.

Output displayed in the following screenshot includes date_year, date_month, date_week, and date_day

Step 4: In addition, you can create a view to retrieve the minimum, maximum, and average utilization.

CREATE OR REPLACE VIEW iops_utilization_yr_month_week_v AS 
SELECT
  "region"
, "volume id"
, "year"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_year"
, "month"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_month"
, "week"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_week"
, "round"("min"("utilization"), 2) "MIN(Utilization)"
, "round"("max"("utilization"), 2) "MAX(Utilization)"
, "round"("avg"("utilization"), 2) "AVG(Utilization)"
FROM
  "iops-utilization-analyzer-db".iops_utilization_analyzer_results
GROUP BY "region", "volume id", "year"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')), "month"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')), "week"("date_parse"("write date", '%m/%d/%Y %H:%i:%s'))

Run a select query on the view to retrieve results:

SELECT * FROM "iops-utilization-analyzer-db"."iops_utilization_yr_month_week_v"
ORDER BY region , "volume id"

The sample output displayed in the following screenshot includes date aggregations and the utilizations metrics.

date aggregations and the utilizations metrics

Analysis: Using Amazon QuickSight

You can visualize Athena tables or views using Amazon QuickSight.

Note: The following instructions assume a QuickSight account has been set up. If not, then find account creation instructions here.

Step 1: Open QuickSight and click on the Datasets menu item. Select New dataset.

Creating a new dataset in QuickSight

Step 2: Select Athena as the data source.

Selecting Amazon Athena as your data source in Amazon Quicksight

Step 3: Provide a Data source name of your choice and click Create data source.

New Athena data source for Amazon Quicksight dataset

Step 4: Choose the desired database and table query view created in Athena and click Select.

Choose the correct database from Athena and the correct table query view

Step 5: Import to SPICE (in-memory engine) or perform a direct query over the data. Click Visualize.

Import to SPICE (in-memory engine) or perform a direct query over the data. Click Visualize

Step 6: In the visualize tab, you have the ability to create a custom report. The following is a bar chart created with the x-axis set to Volume Id. You can create the value field using minimum, median, and average QuickSight functions in the Utilization field.

In the visualize tab, you have the ability to create a custom report

Part 2 Summary

AWS Glue, Athena, and QuickSight combined provide a powerful mechanism to perform visualization and analysis.

Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics.
Athena is a powerful data querying solution, useful in many different ways to handle complex queries. You can use it to answer additional questions such as “which volume has utilization peak higher than x?”
QuickSight provides the power to create custom BI dashboards to meet your needs.

Cleaning up

To avoid incurring any ongoing charges, delete the resources created manually or using AWS CloudFormation.

If your Amazon S3 bucket contains files, you have to manually delete them in the AWS Management Console before deleting other resources using CloudFormation.
Open CloudFormation using the AWS Management Console, select the stack you created, and click Delete.

Open CloudFormation using the AWS Management Console, select the stack you created, and click Delete

Conclusion

In this blog post, we cover a scalable method for capturing Amazon CloudWatch metrics for all Provisioned IOPS volumes. After capturing the metrics, we used them to create custom visualizations using services like AWS Glue, Amazon Athena, and Amazon QuickSight.

Capturing and visualizing IOPS enables you identify under or over-utilized volumes, which can help with cost optimization. Leveraging CloudWatch raw data gives you the flexibility to customize or extend this solution to fit your needs. Provisioning resources should not be a headache, and the analysis and visualization technique described in this post enable you to clearly look at your volume usage. Ultimately, this should put you in a better position to be more cost-effective and efficient, enabling you to shift focus from management of volumes to core competencies.

Thanks for reading this blog post about capturing IOPS utilization and creating custom analysis and visualizations. If you have any comments or questions, please don’t hesitate to leave them in the comments section.

AWS Storage Blog

Visualizing usage of Provisioned IOPS volumes on Amazon EBS for analysis

Solution overview: capturing and visualizing IOPS utilization metrics

Prerequisites

Architecture

Example 1: Sample IOPS utilization by volume – Last 7 days – Bar chart

Example 2: Sample IOPS utilization by volume and Region – Last 7 days – Tabular View

Metrics source

Calculations

Authorization

Output

Analytics and visualization

Part 1: Generating CSV files with AWS CloudFormation template and AWS Lambda function

Run output

Part 1 summary:

Part 2: Visualization and optimization

Analytics: Using Athena and AWS Glue

Creating a database in the AWS Glue Data Catalog

Amazon Athena setup instructions

Analysis: Using Amazon QuickSight

Part 2 Summary

Cleaning up

Conclusion

Resources

Follow