Visualizing usage of Provisioned IOPS volumes on Amazon EBS for analysis
Organizations are always looking to right-size cloud infrastructure and optimize to cost. Historically, one of the areas where it has been difficult to right-size at scale are Provisioned IOPS volumes on Amazon EBS, as optimization usually required third-party tools. The recently announced AWS Compute Optimizer assists in solving that problem, as it helps customers optimize compute resources at scale through a detailed look at IOPS metrics.
Right-sizing Amazon EBS volumes is critical – under-utilized volumes means wasted resources, and over-utilized volumes don’t leave enough room for spiky workloads. Typically, developers and DevOps tend to over-provision IOPS to avoid any performance and throughput concerns. Amazon EBS bills io1 and io2 volumes based on storage size and Provisioned IOPS, so being able to correctly estimate the Provisioned IOPS is critical for cost optimization.
However, some customers require a greater level of granularity to properly optimize their IOPS rightsizing, and organizations sometimes need additional flexibility to:
- Look at a different set of metrics than what the optimizer provides (for example, Amazon CloudWatch metrics for provisioned volumes listed here).
- Perform custom calculations (for example, a weighted metric combining throughput along with IOPS utilization).
- Create custom utilization reports and visualizations.
These custom metrics and visualizations can help with rightsizing by providing a more granular and holistic look into additional factors that can affect IOPS utilization.
In this blog post, we cover a scalable method for capturing Amazon CloudWatch metrics for all Provisioned IOPS volumes, and creating custom calculations (for example IOPS utilization) using those metrics. We also cover creating custom visualizations using services like AWS CloudFormation, AWS Lambda, AWS Glue, Amazon Athena, and Amazon QuickSight. Capturing and visualizing IOPS helps you identify under-utilized Amazon EBS volumes, which helps with cost optimization. Moreover, leveraging CloudWatch raw data gives you the flexibility to customize or extend this solution to fit your needs.
Solution overview: capturing and visualizing IOPS utilization metrics
Part 1: In this solution, an AWS CloudFormation template deploys an AWS Lambda function. Once invoked, the function captures CloudWatch metrics, which we then enhance and transform for IOPS calculations. The function generates a unique CSV file in Amazon S3 that contains metrics for the last 7 days for each of the io1 and io2 volumes. With minimal development effort, you can change the script to cover a longer timespan or capture other metrics.
Part 2: Once processed, you can use the generated CSV files with Athena or QuickSight for analytics and visualization (with some help from AWS Glue).
For this tutorial, you should have the following prerequisites:
- An AWS account
- AWS resources: Provisioned IOPS volumes
- AWS services: Amazon S3, AWS CloudFormation, Amazon CloudWatch, AWS Identity and Access Management (IAM), and AWS Lambda
The CloudFormation template deploys a Lambda function that calculates the utilization using the VolumeReadOps and VolumeWriteOps data metrics from CloudWatch. The function iterates through all the AWS Regions and calculates IOPS utilization for io1 and io2 EBS volumes while skipping all other volume types.
Note: Currently, this function does not support cross-account access. Organizations with multiple accounts can manually run the CloudFormation template in each account.
The following examples provide sample dashboards created with QuickSight that help identify utilization trends by visualizing statistics like minimum, median, average, and maximum IOPS.
Example 1: Sample IOPS utilization by volume – Last 7 days – Bar chart
This QuickSight report captures the min, median and average IOPS utilization across four EBS volumes for the last 7 days. The chart shows the average and median utilization being less than 2% for all those volumes. The median metric helps discover any skewed distribution in utilization. Combined with the average metric, the median metric can help detect under-utilized volumes.
Example 2: Sample IOPS utilization by volume and Region – Last 7 days – Tabular View
Another way to look at the generated data is to create a drill-down view showing the daily IOPS utilization across volumes and Regions. This view captures the minimum, median, average, maximum, and standard deviation on a daily basis to get more granularity into this particular IOPS utilization pattern.
In this example, volume vol-0153980ef410786cb has a max utilization of 104% on day 24, which is substantially different from the max utilization for the preceding 7 days. In this case, you may need further workload analysis to find the root cause. Alternatively, you can look for a higher standard deviation, which also indicates significant difference in workload.
You can query CloudWatch metrics in two different ways, GetMetricData and GetMetricStatistics. This solution uses GetMetricData API, as you can retrieve data faster and the returned data supports metric math, ordering, and pagination. You can read more about choosing the right approach for your use case here.
The following are the calculations made and visualized in this solution:
- Total operations per second: CloudWatch metrics capture VolumeReadOps and VolumeWriteOpsat one-minute intervals. Total operations are calculated as the sum of read and write operations per minute, then divided by 60 to calculate the operations per second
- VolumeWriteOps: Total number of write operations
- VolumeReadOps: Total number of read operations
- TotalOpsPerSecond = (VolumeWriteOps + VolumeReadOps)/60
- IOPS utilization: The percentage of used IOPS out of the available IOPS
- Provisioned IOPS: The # of IOPS provisioned for the volume
- IOPS utilization (%)= (TotalOpsPerSecond / Provisioned IOPS) * 100
The CloudFormation template deploys an IAM role (“EBSlambdaRole”) that enables the AWS Lambda function to generate the CSV files, with relevant metrics, and to save them in Amazon S3.
The Lambda function generates CSV files containing metrics for the last 7 days in the Amazon S3 results bucket
The following screenshot shows a sample CSV file for an io1 volume:
Key fields captured in the CSV are:
- Region: The Region the volume belongs to
- Read Sum: Read operations
- Write Sum: Write operations
- Total: Sum of read and write operations
- Total per second: Total operations per second
- Provisioned IOPS: IOPS provisioned for the volume
- Utilization: IOPS utilization
Analytics and visualization
You can analyze files stored in Amazon S3 to gain insights into the overall utilization trends. Questions such as “which volumes have a utilization peak higher than x?” or “which volume utilization over x days is y?” can be figured out using visualization tools such as Athena and QuickSight.
Note: QuickSight analysis is not included in the CloudFormation template, but instructions on how to visualize using Athena and QuickSight are included in the blog.
Part 1: Generating CSV files with AWS CloudFormation template and AWS Lambda function
Note that the script provided is not production ready and may need changes and enhancements based on your needs – we did not configure this script for a cross-account solution. You can also enhance the solution to include other CloudWatch metrics or custom metrics, change it to incorporate other types of EBS volumes, and extend the period.
Step 1: Download the AWS CloudFormation template into your local computer.
Step 2: Log in to your AWS account and create an S3 bucket with a name of your choice. Copy the file “index.zip” into it (the AWS Lambda function). The screenshot shows an S3 bucket with the name “ebs-analyzer” containing the file “index.zip.”
Step 3: Create a new CloudFormation stack using the AWS Management Console. Choose Upload a template file as the Template source and upload the file “EBSAnalyzer.yaml” (CloudFormation template). Choose Next.
Step 4: Provide a Stack name of your choice along with four input Parameters:
- CodeBucketName: The S3 bucket name created in step 2.
- CodeFileName: “zip”.
- FunctionName: Name of the function used to create the Lambda function.
- ResultsBucketName: Name of your choice – CloudFormation creates this.
Step 5: Optionally, on the Configure stack options screen, enter a tag name for future reference, and then choose next.
On the review step, check I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create stack.
Step 6: The CloudFormation template may take a few minutes to complete execution and deploying the resources. Wait for CREATE_COMPLETE to show on the left panel.
Step 7: Once deployed, navigate to AWS Lambda using the console, create a test script (empty JSON), and click Test.
Once the Lambda function finishes execution, it generates and stores metrics in the output bucket in Amazon S3.
The following sample screenshot shows an Amazon S3 output bucket containing four CSV files created, one for each Provisioned IOPS volume.
Part 1 summary:
We deployed the solution using CloudFormation and ran the Lambda function to generate CSV files. Next, we can start looking at how to use the data for visualization and analysis.
Part 2: Visualization and optimization
Identification of under-utilized EBS volumes can help with cost optimization. Once identified, rightsizing is critical as under-utilized volumes means wasted resources, while over-utilized volumes don’t leave enough room for spiky workloads.
Once you have identified these under-utilized volumes, you may decide to reconfigure them to reduce the number of Provisioned IOPS. You could also consolidate and decommission other volumes, or even switch some of them to different volume types (general purpose or standard) that may be more appropriate and cost-effective for your usage. All of these strategies can help with cost optimization and identifying over-utilized volumes, which could be affecting application performance. In these cases, you could improve performance by upgrading to a different volume type or provision more IOPS.
Analytics: Using Athena and AWS Glue
AWS Glue discovers your data and stores the associated metadata in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available within Amazon Athena.
Creating a database in the AWS Glue Data Catalog
Step 1: Open AWS Glue using the AWS Management Console, choose Crawlers from the menu, and click Add Crawler. Provide a Crawler name of your choice and click Next.
Step 2: Select the Crawler source type as Data stores and click Next.
Step 3: Choose data store as S3 and specify the path to the S3 bucket where you stored the CSV files generated by the Lambda function, and then click Next. On the next screen, choose No to Add another data store, and then Next.
Step 4: For an IAM role, you can let AWS Glue Create an IAM role with the permissions needed. Provide a name for the IAM role and click Next.
Step 5: For frequency, select Run on demand or use a scheduled frequency (for example, daily), based on your use case.
Step 6: Provide a Database name and click Next. Finally, review all steps on the summary page; afterward, click Finish.
Step 7: To run the crawler, select Crawlers from the left side menu, select the crawler you wish to run and click the Run Crawler button. The crawler will execute, populate the metadata, and create the database.
Once the crawler completes, you can analyze the results using Athena, as detailed in the next section.
Amazon Athena setup instructions
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Step 1: Open Athena using the AWS Management Console. On successful completion of the preceding section, the database should be visible in query editor in the Database dropdown.
Step 2: Run the following query on the table to view utilization numbers per volume per Region.
SELECT * FROM "iops-utilization-analyzer-db"."iops_utilization_analyzer_results";
iops-utilization-analyzer-db with your database name.
iops_utilization_analyzer_results with your table name.
The following is the output, with metrics like provisioned logs and write sum:
Step 3: Create a view split by day, week, month, and year to help perform aggregations. The following is a sample view statement:
CREATE OR REPLACE VIEW iops_utilization_results_v AS SELECT "region" , "volume id" , "write sum" , "total" , "round"("total per second", 2) "total_per_second" , "provisioned iops" , "round"("utilization", 3) "utilization" , "date_parse"("write date", '%m/%d/%Y %H:%i:%s') "date_full" , "year"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_year" , "month"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_month" , "week"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_week" , "day"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_day" FROM "iops-utilization-analyzer-db".iops_utilization_analyzer_results
The sample output displayed in the following screenshot includes date_year, date_month, date_week, and date_day.
Step 4: In addition, you can create a view to retrieve the minimum, maximum, and average utilization.
CREATE OR REPLACE VIEW iops_utilization_yr_month_week_v AS SELECT "region" , "volume id" , "year"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_year" , "month"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_month" , "week"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')) "date_week" , "round"("min"("utilization"), 2) "MIN(Utilization)" , "round"("max"("utilization"), 2) "MAX(Utilization)" , "round"("avg"("utilization"), 2) "AVG(Utilization)" FROM "iops-utilization-analyzer-db".iops_utilization_analyzer_results GROUP BY "region", "volume id", "year"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')), "month"("date_parse"("write date", '%m/%d/%Y %H:%i:%s')), "week"("date_parse"("write date", '%m/%d/%Y %H:%i:%s'))
Run a select query on the view to retrieve results:
SELECT * FROM "iops-utilization-analyzer-db"."iops_utilization_yr_month_week_v" ORDER BY region , "volume id"
The sample output displayed in the following screenshot includes date aggregations and the utilizations metrics.
Analysis: Using Amazon QuickSight
You can visualize Athena tables or views using Amazon QuickSight.
Note: The following instructions assume a QuickSight account has been set up. If not, then find account creation instructions here.
Step 1: Open QuickSight and click on the Datasets menu item. Select New dataset.
Step 2: Select Athena as the data source.
Step 3: Provide a Data source name of your choice and click Create data source.
Step 4: Choose the desired database and table query view created in Athena and click Select.
Step 5: Import to SPICE (in-memory engine) or perform a direct query over the data. Click Visualize.
Step 6: In the visualize tab, you have the ability to create a custom report. The following is a bar chart created with the x-axis set to Volume Id. You can create the value field using minimum, median, and average QuickSight functions in the Utilization field.
Part 2 Summary
AWS Glue, Athena, and QuickSight combined provide a powerful mechanism to perform visualization and analysis.
- Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics.
- Athena is a powerful data querying solution, useful in many different ways to handle complex queries. You can use it to answer additional questions such as “which volume has utilization peak higher than x?”
- QuickSight provides the power to create custom BI dashboards to meet your needs.
To avoid incurring any ongoing charges, delete the resources created manually or using AWS CloudFormation.
- If your Amazon S3 bucket contains files, you have to manually delete them in the AWS Management Console before deleting other resources using CloudFormation.
- Open CloudFormation using the AWS Management Console, select the stack you created, and click Delete.
In this blog post, we cover a scalable method for capturing Amazon CloudWatch metrics for all Provisioned IOPS volumes. After capturing the metrics, we used them to create custom visualizations using services like AWS Glue, Amazon Athena, and Amazon QuickSight.
Capturing and visualizing IOPS enables you identify under or over-utilized volumes, which can help with cost optimization. Leveraging CloudWatch raw data gives you the flexibility to customize or extend this solution to fit your needs. Provisioning resources should not be a headache, and the analysis and visualization technique described in this post enable you to clearly look at your volume usage. Ultimately, this should put you in a better position to be more cost-effective and efficient, enabling you to shift focus from management of volumes to core competencies.
Thanks for reading this blog post about capturing IOPS utilization and creating custom analysis and visualizations. If you have any comments or questions, please don’t hesitate to leave them in the comments section.