AWS Storage Blog
Prevent IOPS over-provisioning by monitoring striped Amazon EBS volumes within EC2 instance limits
Enterprises are always looking for ways to optimize storage performance to support their performance-intensive applications and workloads. One technique is data striping, which involves segmenting data across multiple storage volumes to aggregate maximum logical capacity and increase performance. The technique is useful for workloads that require high levels of input/output operations per second (IOPS) and/or large amounts of block storage, such as enterprise-grade data warehouses, HPC middleware applications, or in-memory databases like SAP HANA. However, when managing these workloads, companies face a common challenge of provisioning the appropriate IOPS amount in storage volumes. Over-provisioning can result in unnecessary costs due to unused capacity, while under-provisioning can lead to degraded application performance. It is important for enterprises to carefully assess their IOPS needs and strike the right balance to ensure optimal performance and cost efficiency.
Customers use Amazon Elastic Block Store (EBS) to attach scalable and high-performance block storage volumes for their compute instances running on Amazon Elastic Compute Cloud (EC2). Enterprise-grade workloads often require large logical volumes exceeding Amazon EBS’s size limit (16 TiB for most volume types; 64 TiB for io2 Block Express) and higher disk IOPS performance and throughput. In order to combine multiple EBS volumes to aggregate their logical volume size and IOPS performance, customers can configure RAID 0 arrays (as long as they are supported by the underlying operating system) for data striping across multiple Amazon EBS volumes. However, the maximum aggregate bandwidth, throughput, and IOPS supported by an Amazon EBS-optimized instance may vary depending on the instance size. These limits are listed on the Amazon EBS documentation and can be programmatically queried and retrieved via the command-line interface (CLI) command: describe-instance-types and the Amazon EC2 application programming interface (API): DescribeInstanceTypes.
In this post, we demonstrate a solution using AWS Config Custom Rules associated with an AWS Lambda function to automate the invocation, detection, and reporting of the aggregated IOPS of your striped Amazon EBS volumes to stay within each of your Amazon EC2 instance’s maximum IOPS limit. This solution will report the over-provisioned IOPS in AWS Config. With this solution, you can set up notifications in the event of over-provisioning, and thus mitigate any resultant extra cost.
The challenge of over-provisioning
To elaborate on the complication of over-provisioning and how it will impact costs, we will use a real-life example that we’ve seen with our customers. A customer running a highly I/O-intensive database on a r6i.24xlarge instance needed 160,000 provisioned IOPS in storage volumes to support the demanding application workloads. Assuming that using four io2 volumes, each with 40,000 provisioned IOPS, would achieve a theoretical target maximum of 160,000 provisioned IOPS, the customer set up the four volumes as one logical volume by configuring RAID 0 through the OS. However, this misconception led to a classic mistake of over-provisioning storage volumes by overlooking the IOPS limitations of the instance types. A reference to the documentation (see Exhibit-A) or a CLI/API query revealed that the r6i.24xlarge instance has a maximum IOPS of 120,000 only (on 16 KiB I/O block size), indicating that the customer would see an over-provisioning of 40,000 IOPS that the EC2 instance can’t sustain due to its inherent maximum limit. The expected performance was never realized and the resources were not cost-optimized because the customer not only experienced a I/O far less than originally provisioned but also billed for the 40,000 unusable IOPS.
In this example, the customer can mitigate the additional cost by reducing the IOPS for each of the four volumes down to 30,000 to meet the maximum IOPS of 120,000 allowed by r6i.24xlarge. Alternatively, the customer can leave the Amazon EBS volumes intact and resize the underlying Amazon EC2 instance to r6i.32xlarge (see Exhibit-A) to realize the maximum IOPS of 160,000. Unfortunately, these manual operations are not sustainable as customer’s environment and the number of Amazon EC2 instances and Amazon EBS volumes grow at scale. As a result, it becomes nearly impossible to manually detect, manage, and customize volumes or instances to achieve the desired IOPS while risking the goals of effective cost optimization.
Exhibit-A: Maximum bandwidth, throughput, and IOPS for several instance types
Solution overview
The solution uses AWS Config and AWS Lambda. The following image maps out a high-level diagram of the various components and workflows deployed by the solution:
- AWS Config Custom Rules monitor for changes on Amazon EC2 instances and EBS volumes
- Upon the creation, removal, or changes of monitored Amazon EC2 instances and EBS volumes, the Custom Rule is triggered for re-evaluation
- Custom rule evaluation logic is stored in an AWS Lambda python function to calculate the attached EBS volumes’ total aggregate IOPS against the Amazon EC2 instance’s IOPS limits.
- Lambda function logic returns compliance status to AWS Config
- Evaluation results and compliance status are displayed on the AWS Config dashboard. To ensure the EBS volume resources stay within the IOPS limit inherent to the instance type, AWS Config continuously evaluates them, records results, and flags them as non-compliant when the aggregated IOPS surpasses the maximum threshold.
Walkthrough
In order to deploy the components and workflows in the above diagram, you will need to complete the following steps:
- Create a new AWS Lambda function
- Set up AWS Config Custom Rules
- Review the rules for resources compliance status
Prerequisites
This solution requires an AWS account, basic knowledge of Python, existing EC2 instances and EBS volumes deployed in the AWS account as resources for evaluation. Note that this solution also requires the setup of an AWS Lambda function and an AWS Config Custom Rule that will be bound to the resources within the selected account and a specific Region. You must repeat this process for each AWS Region you operate in and for each individual AWS account. For more information on how you can automate the deployment of AWS Config Custom Rules and their function logic automatically and programmatically at scale, read more here.
Step 1: Getting started with the AWS Lambda function
1. First, we create a new AWS Lambda function that serves as the logic of the code to check for compliance for AWS Config to invoke. On the AWS Management Console, select AWS Lambda and select Create function to create a new function using the Author from scratch option. We’ll give the function the name OverProvisionedIopsConfigRule-<region>. For this function, we use the Python 3.9 runtime. Optionally, we can cost-optimize with arm64 Architecture to run on AWS Graviton and ARM-based architecture (if available in the AWS Region). Next, expand on Change default execution role, and select Create a new role with basic Lambda permissions for this AWS Lambda function to create its own execution role, as shown in the following diagrams. Select Create function.
2. Select Author from scratch and provide basic information. Python is the programming language used in this example.
3. We then use the following logical code to paste into the Code source window inside lambda_function.py
:
import json
import boto3
def get_instance_type_details(ec2_client, instance_type):
instance_attr = ec2_client.describe_instance_types(
InstanceTypes=[
instance_type
],
)
print("InstanceTypeDetails: {}". format(instance_attr['InstanceTypes'][0]))
retval = {
'ebs_optimized_support': instance_attr['InstanceTypes'][0]['EbsInfo']['EbsOptimizedSupport'],
'max_iops': 0
}
if 'EbsOptimizedInfo' in instance_attr['InstanceTypes'][0]['EbsInfo']:
retval['max_iops'] = instance_attr['InstanceTypes'][0]['EbsInfo']['EbsOptimizedInfo']['MaximumIops']
return retval
def get_volumes(ec2_client, block_device_mappings):
volumes = []
for device in block_device_mappings:
if 'ebs' in device:
volumes.append(device['ebs']['volumeId'])
else:
volumes.append(device['Ebs']['VolumeId'])
return(volumes)
def get_volume_info(ec2_client, volumes):
volume_info = {
'total_iops': 0,
'volumes': []
}
volume_details = ec2_client.describe_volumes(
VolumeIds=volumes
)
for vol in volume_details['Volumes']:
volume_info['total_iops'] += vol['Iops']
volume_info['volumes'].append({'volumeId': vol['VolumeId'], 'iops': vol['Iops'], 'volumeType': vol['VolumeType'], 'size': vol['Size']})
return(volume_info)
def evaluate_compliance(ebs_optimized, vol_info, instance_type_details):
retval = {}
if instance_type_details['ebs_optimized_support'] == 'unsupported':
retval['compliance_value'] = 'NOT_APPLICABLE'
retval['annotation'] = "Ebs Optimized: Unsupported"
elif instance_type_details['ebs_optimized_support'] == 'supported' and ebs_optimized is False:
retval['compliance_value'] = 'NOT_APPLICABLE'
retval['annotation'] = "Ebs Optimized: Supported but not enabled"
else:
percent = int((vol_info['total_iops'] / instance_type_details['max_iops']) * 100)
retval['annotation'] = "{}% - Total IOPS / Max IOPS: {} / {}".format(percent, vol_info['total_iops'], instance_type_details['max_iops'])
if vol_info['total_iops'] > instance_type_details['max_iops']:
retval['compliance_value'] = 'NON_COMPLIANT'
else:
retval['compliance_value'] = 'COMPLIANT'
return retval
def lambda_handler(event, context):
ec2_client = boto3.client('ec2')
invoking_event = json.loads(event['invokingEvent'])
print(json.dumps(event))
if 'configurationItem' not in invoking_event:
exit()
elif not invoking_event['configurationItem']['configuration']:
evaluation = {
'compliance_value': 'INSUFFICIENT_DATA',
'annotation': 'No configuration data received'
}
else:
if 'instanceType' not in invoking_event['configurationItem']['configuration']:
resource_id = invoking_event['configurationItem']['relationships'][0]['resourceId']
resource_type = invoking_event['configurationItem']['relationships'][0]['resourceType']
describeInstances = ec2_client.describe_instances(
InstanceIds = [
resource_id
]
)
instance_details = describeInstances['Reservations'][0]['Instances'][0]
instance_type = instance_details['InstanceType']
block_device_mappings = instance_details['BlockDeviceMappings']
ebs_optimized = instance_details['EbsOptimized']
else:
instance_type = invoking_event['configurationItem']['configuration']['instanceType']
block_device_mappings = invoking_event['configurationItem']['configuration']['blockDeviceMappings']
ebs_optimized = invoking_event['configurationItem']['configuration']['ebsOptimized']
resource_id = invoking_event['configurationItem']['resourceId']
resource_type = invoking_event['configurationItem']['resourceType']
instance_type_details = get_instance_type_details(ec2_client, instance_type)
volumes = get_volumes(ec2_client, block_device_mappings)
vol_info = get_volume_info(ec2_client, volumes)
evaluation = evaluate_compliance(ebs_optimized, vol_info, instance_type_details)
print("Compliance: {}".format(evaluation['compliance_value']))
print("Annotation: {}".format(evaluation['annotation']))
config = boto3.client('config')
response = config.put_evaluations(
Evaluations=[
{
'ComplianceResourceType': resource_type,
'ComplianceResourceId': resource_id,
'ComplianceType': evaluation['compliance_value'],
'OrderingTimestamp': invoking_event['configurationItem']['configurationItemCaptureTime'],
'Annotation': evaluation['annotation']
},
],
ResultToken=event['resultToken'])
return {
'statusCode': 200,
'body': json.dumps(evaluation)
}
4. Within the Code source window, select File and Save.
5. Set up the Permissions for the AWS Lambda function using AWS Identify and Access Management (IAM) roles and policies. Select the Configuration tab in the current AWS Lambda function. You should see a role that has been created under Execution role, as shown in the below diagram. Select this role to open IAM console in a new window.
6. On the IAM role screen, select Add Permissions and Attach policies, as shown in the following diagram.
7. In addition to the default AWSLambdaBasicExecutionRole
permissions policy that was generated by the previous step, add the following two permissions policies. You can type in the policy names to filter, select each policy and then select Attach policies.
AmazonEC2ReadOnlyAccess AWSConfigRulesExecutionRole |
8. Your role’s permissions policies should look similar to the roles presented in the below diagram. The first default-generated policy is uniquely named, as shown in the following diagram:
9. If we investigate further into the details of permissions policies, we can validate that these permissions allow the AWS Lambda function to have read only access to the metadata, information, and attributes of both Amazon EC2 instances and EBS volumes. In addition, the permissions policies will also allow the AWS Lambda function to have execution privileges to record configuration evaluations and changes into AWS Config. Select Review Policy and Save Changes. Because we no longer need the IAM console windows, we can safely close the IAM policies and IAM roles windows.
On the AWS Lambda function window, select the Code tab. Select Deploy to allow the changes to be deployed by AWS Lambda. Select Copy ARN which will be used in the next step.
10. Select Copy ARN which will be used in the next step.
Step 2: Set up AWS Config Custom Rule
1. Next up, we proceed to AWS Config on the AWS Management Console to create the Custom Rule. For accounts and Regions that don’t yet have AWS Config enabled, choose either the 1-click setup to proceed or click on Get started to initialize a custom setup. In this example, utilize the 1-click setup and select Confirm. In the main AWS Config Dashboard, we select Rules on the menu on the left and click on Add rule.
2. Select Create custom Lambda rule and click on Next, as shown in the following diagram.
3. On the Configure rule screen, we give the rule a name (“EC2-Overprovisioned-IOPS-Volumes” in our example), a description to identify the Custom Rule, and paste the AWS Lambda function ARN from the previous step here. Next, for Trigger type, select When configuration changes and select Resources for scope of changes. From the drop down for Resource category, select AWS resources. For the Resource type, select AWS EC2 Instance and AWS EC2 Volume. This allows the rule to be triggered and re-evaluated when any of our deployed Amazon EC2 instance and volume (Amazon EBS) resources are launched or changed. Optionally, we can use Tags as the scope of changes to filter and exclude specific Amazon EC2 and Amazon EBS resources based on custom tag key value pairs.
Select Next, confirm the details, and select Add rule as shown below.
Step 3: Review the rules for resources compliance status
1. Now that the AWS Config rule is setup, it takes a few minutes for AWS Config to record and evaluate the configuration for the existing Amazon EC2 instances and Amazon EBS volume in the operating Region. If you are running on a test account, at this point you should deploy a selection of Amazon EC2 instances with attached Amazon EBS volumes.
2. Once the evaluation is complete, on the main AWS Config menu on the left, select Rules and name of the rule that we created. Under Resources in scope, we filter on either All, Compliant, or Noncompliant resources. Under the Annotation column, the AWS Lambda function aggregates the total IOPS for all volumes attached to the Amazon EC2 instance and compares them to the Amazon EC2 instance maximum IOPS limit, as shown in the following screenshot.
3. There are several different potential results for the assessment annotations:
Annotation | Compliance | Description |
EBS Optimized: Unsupported (AWS Config won’t report these under Resources in scope) |
Not Applicable | The Amazon EC2 instance does not support Amazon EBS optimization, and the Custom Config Rule cannot be evaluated for aggregated IOPS. This is likely an older generation instance. We recommend to upgrade to a modern instance type that supports Amazon EBS optimization. |
EBS Optimized: Supported but not enabled (AWS Config won’t report these under Resources in scope) |
Not Applicable | The Amazon EC2 instance supports Amazon EBS optimization, but this was not enabled and the Custom Config Rule cannot be evaluated for aggregated IOPS. Enable Amazon EBS optimization for these instances or upgrade to a modern instance type that natively supports and enables Amazon EBS optimization. |
<%> – Total IOPS / Max IOPS <#> / <#> | Noncompliant | The aggregate IOPS for all volumes attached to this volume exceeds the limit set for this instance type. If the volumes are striped and configured for RAID0 logical volumes, you may not be able to reach the theoretical/intended Total Aggregate IOPS. Reduce your Amazon EBS volumes’ Provisioned IOPS such that their aggregate is below the maximum listed, or increase the Amazon EC2 instance size to a type that is above your current aggregate IOPS. |
<%> – Total IOPS / Max IOPS <#> / <#> | Compliant | The aggregate IOPS for all volumes attached to this volume is below the limit set for this Amazon EC2 instance type. |
4. As you continue to deploy Amazon EC2 instances, attach and detach Amazon EBS volumes, or modify Amazon EBS volume Provisioned IOPS, the custom Config rule is triggered to re-evaluate for compliance. You can use the AWS Config Dashboard to validate and check for over-provisioned IOPS, and even use AWS Config to manage automatic remediation actions. Please check AWS Config Developer Guide to get more details on how AWS Config rules are triggered.
Cleaning up
Delete the AWS Config custom rule, AWS Lambda function, and any Amazon EC2 Instances or Amazon EBS volumes created as part of this example to avoid incurring future costs.
Conclusion
Using RAID 0 is a common pattern for businesses to achieve larger logical volume and increase overall aggregate IOPS performance for their applications and workloads running on Amazon EC2 instances. As each Amazon EC2 instance type has a maximum limit for Amazon EBS-optimized throughput and IOPS proportional to its size, deploying this solution will allow you to ensure the attached volumes’ aggregated IOPS is below the instance’s maximum limit.
The solution leveraging AWS Lambda and AWS Config Custom Rules automates the invocation, detection, and reporting of compliance posture for Amazon EC2 instances and Amazon EBS volumes. Applying this across all your AWS accounts across your selected Regions will reduce wastage, optimize for costs, maximize performance, and eliminate unintentional overprovisioning of IOPS for your io1 and io2 volumes when aggregating and striping EBS volumes.
If you have feedback, questions, or suggestions, please feel free to leave us comments below.