AWS Storage Blog

Prevent IOPS over-provisioning by monitoring striped Amazon EBS volumes within EC2 instance limits

Enterprises are always looking for ways to optimize storage performance to support their performance-intensive applications and workloads. One technique is data striping, which involves segmenting data across multiple storage volumes to aggregate maximum logical capacity and increase performance. The technique is useful for workloads that require high levels of input/output operations per second (IOPS) and/or large amounts of block storage, such as enterprise-grade data warehouses, HPC middleware applications, or in-memory databases like SAP HANA. However, when managing these workloads, companies face a common challenge of provisioning the appropriate IOPS amount in storage volumes. Over-provisioning can result in unnecessary costs due to unused capacity, while under-provisioning can lead to degraded application performance. It is important for enterprises to carefully assess their IOPS needs and strike the right balance to ensure optimal performance and cost efficiency.

Customers use Amazon Elastic Block Store (EBS) to attach scalable and high-performance block storage volumes for their compute instances running on Amazon Elastic Compute Cloud (EC2). Enterprise-grade workloads often require large logical volumes exceeding Amazon EBS’s size limit (16 TiB for most volume types; 64 TiB for io2 Block Express) and higher disk IOPS performance and throughput. In order to combine multiple EBS volumes to aggregate their logical volume size and IOPS performance, customers can configure RAID 0 arrays (as long as they are supported by the underlying operating system) for data striping across multiple Amazon EBS volumes. However, the maximum aggregate bandwidth, throughput, and IOPS supported by an Amazon EBS-optimized instance may vary depending on the instance size. These limits are listed on the Amazon EBS documentation and can be programmatically queried and retrieved via the command-line interface (CLI) command: describe-instance-types and the Amazon EC2 application programming interface (API): DescribeInstanceTypes.

In this post, we demonstrate a solution using AWS Config Custom Rules associated with an AWS Lambda function to automate the invocation, detection, and reporting of the aggregated IOPS of your striped Amazon EBS volumes to stay within each of your Amazon EC2 instance’s maximum IOPS limit. This solution will report the over-provisioned IOPS in AWS Config. With this solution, you can set up notifications in the event of over-provisioning, and thus mitigate any resultant extra cost.

The challenge of over-provisioning

To elaborate on the complication of over-provisioning and how it will impact costs, we will use a real-life example that we’ve seen with our customers. A customer running a highly I/O-intensive database on a r6i.24xlarge instance needed 160,000 provisioned IOPS in storage volumes to support the demanding application workloads. Assuming that using four io2 volumes, each with 40,000 provisioned IOPS, would achieve a theoretical target maximum of 160,000 provisioned IOPS, the customer set up the four volumes as one logical volume by configuring RAID 0 through the OS. However, this misconception led to a classic mistake of over-provisioning storage volumes by overlooking the IOPS limitations of the instance types. A reference to the documentation (see Exhibit-A) or a CLI/API query revealed that the r6i.24xlarge instance has a maximum IOPS of 120,000 only (on 16 KiB I/O block size), indicating that the customer would see an over-provisioning of 40,000 IOPS that the EC2 instance can’t sustain due to its inherent maximum limit. The expected performance was never realized and the resources were not cost-optimized because the customer not only experienced a I/O far less than originally provisioned but also billed for the 40,000 unusable IOPS.

In this example, the customer can mitigate the additional cost by reducing the IOPS for each of the four volumes down to 30,000 to meet the maximum IOPS of 120,000 allowed by r6i.24xlarge. Alternatively, the customer can leave the Amazon EBS volumes intact and resize the underlying Amazon EC2 instance to r6i.32xlarge (see Exhibit-A) to realize the maximum IOPS of 160,000. Unfortunately, these manual operations are not sustainable as customer’s environment and the number of Amazon EC2 instances and Amazon EBS volumes grow at scale. As a result, it becomes nearly impossible to manually detect, manage, and customize volumes or instances to achieve the desired IOPS while risking the goals of effective cost optimization.

Table containing examples of the maximum bandwidth, maximum IOPS, and maximum throughput supported for different instance sizes

Exhibit-A: Maximum bandwidth, throughput, and IOPS for several instance types

Solution overview

The solution uses AWS Config and AWS Lambda. The following image maps out a high-level diagram of the various components and workflows deployed by the solution:

The solution architecture diagram featuring AWS Config Dashboard, AWS Config Custom Rule, and AWS Lambda.

  1. AWS Config Custom Rules monitor for changes on Amazon EC2 instances and EBS volumes
  2. Upon the creation, removal, or changes of monitored Amazon EC2 instances and EBS volumes, the Custom Rule is triggered for re-evaluation
  3. Custom rule evaluation logic is stored in an AWS Lambda python function to calculate the attached EBS volumes’ total aggregate IOPS against the Amazon EC2 instance’s IOPS limits.
  4. Lambda function logic returns compliance status to AWS Config
  5. Evaluation results and compliance status are displayed on the AWS Config dashboard. To ensure the EBS volume resources stay within the IOPS limit inherent to the instance type, AWS Config continuously evaluates them, records results, and flags them as non-compliant when the aggregated IOPS surpasses the maximum threshold.

Walkthrough

In order to deploy the components and workflows in the above diagram, you will need to complete the following steps:

  1. Create a new AWS Lambda function
  2. Set up AWS Config Custom Rules
  3. Review the rules for resources compliance status

Prerequisites

This solution requires an AWS account, basic knowledge of Python, existing EC2 instances and EBS volumes deployed in the AWS account as resources for evaluation. Note that this solution also requires the setup of an AWS Lambda function and an AWS Config Custom Rule that will be bound to the resources within the selected account and a specific Region. You must repeat this process for each AWS Region you operate in and for each individual AWS account. For more information on how you can automate the deployment of AWS Config Custom Rules and their function logic automatically and programmatically at scale, read more here.

Step 1: Getting started with the AWS Lambda function

1. First, we create a new AWS Lambda function that serves as the logic of the code to check for compliance for AWS Config to invoke. On the AWS Management Console, select AWS Lambda and select Create function to create a new function using the Author from scratch option. We’ll give the function the name OverProvisionedIopsConfigRule-<region>. For this function, we use the Python 3.9 runtime. Optionally, we can cost-optimize with arm64 Architecture to run on AWS Graviton and ARM-based architecture (if available in the AWS Region). Next, expand on Change default execution role, and select Create a new role with basic Lambda permissions for this AWS Lambda function to create its own execution role, as shown in the following diagrams. Select Create function.

Creating a new function in AWS Lambda

2. Select Author from scratch and provide basic information. Python is the programming language used in this example.

Entering the details for creating new function in AWS Lambda

3. We then use the following logical code to paste into the Code source window inside lambda_function.py:

import json
import boto3

def get_instance_type_details(ec2_client, instance_type):
    instance_attr = ec2_client.describe_instance_types(
        InstanceTypes=[
            instance_type
        ],
        )
    print("InstanceTypeDetails: {}". format(instance_attr['InstanceTypes'][0]))
    retval = {
        'ebs_optimized_support': instance_attr['InstanceTypes'][0]['EbsInfo']['EbsOptimizedSupport'],
        'max_iops': 0
    }
    if 'EbsOptimizedInfo' in instance_attr['InstanceTypes'][0]['EbsInfo']:
        retval['max_iops'] = instance_attr['InstanceTypes'][0]['EbsInfo']['EbsOptimizedInfo']['MaximumIops']
    return retval
    
def get_volumes(ec2_client, block_device_mappings):
    volumes = []
    for device in block_device_mappings:
        if 'ebs' in device:
            volumes.append(device['ebs']['volumeId'])
        else:
            volumes.append(device['Ebs']['VolumeId'])
    return(volumes)        
    
def get_volume_info(ec2_client, volumes):
    volume_info = {
        'total_iops': 0,
        'volumes': []
    }
    volume_details = ec2_client.describe_volumes(
        VolumeIds=volumes
    )
    for vol in volume_details['Volumes']:
        volume_info['total_iops'] += vol['Iops']
        volume_info['volumes'].append({'volumeId': vol['VolumeId'], 'iops': vol['Iops'], 'volumeType': vol['VolumeType'], 'size': vol['Size']})
    return(volume_info)
    
def evaluate_compliance(ebs_optimized, vol_info, instance_type_details):

    retval = {}
    if instance_type_details['ebs_optimized_support'] == 'unsupported':
        retval['compliance_value'] = 'NOT_APPLICABLE'
        retval['annotation'] = "Ebs Optimized: Unsupported"
    elif instance_type_details['ebs_optimized_support'] == 'supported' and ebs_optimized is False:
        retval['compliance_value'] = 'NOT_APPLICABLE'
        retval['annotation'] = "Ebs Optimized: Supported but not enabled"
    else:
        percent = int((vol_info['total_iops'] / instance_type_details['max_iops']) * 100)
        retval['annotation'] = "{}% - Total IOPS / Max IOPS: {} / {}".format(percent, vol_info['total_iops'], instance_type_details['max_iops'])
        if vol_info['total_iops'] > instance_type_details['max_iops']:
            retval['compliance_value'] = 'NON_COMPLIANT'
        else:
            retval['compliance_value'] = 'COMPLIANT'
    return retval
    
def lambda_handler(event, context):
    ec2_client = boto3.client('ec2')
    invoking_event = json.loads(event['invokingEvent'])
    print(json.dumps(event))
    if 'configurationItem' not in invoking_event:
        exit()
    elif not invoking_event['configurationItem']['configuration']:
        evaluation = {
            'compliance_value': 'INSUFFICIENT_DATA',
            'annotation': 'No configuration data received'
        }
    else:
        if 'instanceType' not in invoking_event['configurationItem']['configuration']:
            resource_id = invoking_event['configurationItem']['relationships'][0]['resourceId']
            resource_type = invoking_event['configurationItem']['relationships'][0]['resourceType']
            
            describeInstances = ec2_client.describe_instances(
                InstanceIds = [
                    resource_id
                    ]
            )
            instance_details = describeInstances['Reservations'][0]['Instances'][0]
            
            instance_type = instance_details['InstanceType']
            block_device_mappings = instance_details['BlockDeviceMappings']
            ebs_optimized = instance_details['EbsOptimized']
        else:
            instance_type = invoking_event['configurationItem']['configuration']['instanceType']
            block_device_mappings = invoking_event['configurationItem']['configuration']['blockDeviceMappings']
            ebs_optimized = invoking_event['configurationItem']['configuration']['ebsOptimized']
            resource_id = invoking_event['configurationItem']['resourceId']
            resource_type = invoking_event['configurationItem']['resourceType']
            
            
        instance_type_details = get_instance_type_details(ec2_client, instance_type)
        volumes = get_volumes(ec2_client, block_device_mappings)
        vol_info = get_volume_info(ec2_client, volumes)
        evaluation = evaluate_compliance(ebs_optimized, vol_info, instance_type_details)
        

    print("Compliance: {}".format(evaluation['compliance_value']))
    print("Annotation: {}".format(evaluation['annotation']))
    config = boto3.client('config')
    response = config.put_evaluations(
       Evaluations=[
           {
               'ComplianceResourceType': resource_type,
               'ComplianceResourceId': resource_id,
               'ComplianceType': evaluation['compliance_value'],
               'OrderingTimestamp': invoking_event['configurationItem']['configurationItemCaptureTime'],
               'Annotation': evaluation['annotation']
           },
       ],
       ResultToken=event['resultToken'])
    return {
        'statusCode': 200,
        'body': json.dumps(evaluation)
    }

4. Within the Code source window, select File and Save.

5. Set up the Permissions for the AWS Lambda function using AWS Identify and Access Management (IAM) roles and policies. Select the Configuration tab in the current AWS Lambda function. You should see a role that has been created under Execution role, as shown in the below diagram. Select this role to open IAM console in a new window.

Setup the Permissions for the AWS Lambda function using AWS Identify & Access Management

6. On the IAM role screen, select Add Permissions and Attach policies, as shown in the following diagram.

Adding permissions and attaching policies in AWS IAM

7. In addition to the default AWSLambdaBasicExecutionRole permissions policy that was generated by the previous step, add the following two permissions policies. You can type in the policy names to filter, select each policy and then select Attach policies.

AmazonEC2ReadOnlyAccess

AWSConfigRulesExecutionRole

8. Your role’s permissions policies should look similar to the roles presented in the below diagram. The first default-generated policy is uniquely named, as shown in the following diagram:

Role's permissions and Policies from AWS IAM

9. If we investigate further into the details of permissions policies, we can validate that these permissions allow the AWS Lambda function to have read only access to the metadata, information, and attributes of both Amazon EC2 instances and EBS volumes. In addition, the permissions policies will also allow the AWS Lambda function to have execution privileges to record configuration evaluations and changes into AWS Config. Select Review Policy and Save Changes. Because we no longer need the IAM console windows, we can safely close the IAM policies and IAM roles windows.

On the AWS Lambda function window, select the Code tab. Select Deploy to allow the changes to be deployed by AWS Lambda. Select Copy ARN which will be used in the next step.

Deploying function in AWS Lambda

10. Select Copy ARN which will be used in the next step.

Copying the ARN of AWS Lambda function created

Step 2: Set up AWS Config Custom Rule

1. Next up, we proceed to AWS Config on the AWS Management Console to create the Custom Rule. For accounts and Regions that don’t yet have AWS Config enabled, choose either the 1-click setup to proceed or click on Get started to initialize a custom setup. In this example, utilize the 1-click setup and select Confirm. In the main AWS Config Dashboard, we select Rules on the menu on the left and click on Add rule. 

Set up custom config rule in AWS Config using Add Rule

2. Select Create custom Lambda rule and click on Next, as shown in the following diagram.

Selecting create custom Lambda rule

3. On the Configure rule screen, we give the rule a name (“EC2-Overprovisioned-IOPS-Volumes” in our example), a description to identify the Custom Rule, and paste the AWS Lambda function ARN from the previous step here. Next, for Trigger type, select When configuration changes and select Resources for scope of changes. From the drop down for Resource category, select AWS resources. For the Resource type, select AWS EC2 Instance and AWS EC2 Volume. This allows the rule to be triggered and re-evaluated when any of our deployed Amazon EC2 instance and volume (Amazon EBS) resources are launched or changed. Optionally, we can use Tags as the scope of changes to filter and exclude specific Amazon EC2 and Amazon EBS resources based on custom tag key value pairs.

Select Next, confirm the details, and select Add rule as shown below.

Configuring the custom lambda rule in AWS Config.

Step 3: Review the rules for resources compliance status

1. Now that the AWS Config rule is setup, it takes a few minutes for AWS Config to record and evaluate the configuration for the existing Amazon EC2 instances and Amazon EBS volume in the operating Region. If you are running on a test account, at this point you should deploy a selection of Amazon EC2 instances with attached Amazon EBS volumes.

2. Once the evaluation is complete, on the main AWS Config menu on the left, select Rules and name of the rule that we created. Under Resources in scope, we filter on either All, Compliant, or Noncompliant resources. Under the Annotation column, the AWS Lambda function aggregates the total IOPS for all volumes attached to the Amazon EC2 instance and compares them to the Amazon EC2 instance maximum IOPS limit, as shown in the following screenshot.

Custom config rule evaluation results in AWS Config

3. There are several different potential results for the assessment annotations:

Annotation Compliance Description

EBS Optimized: Unsupported

(AWS Config won’t report these under Resources in scope)

Not Applicable

The Amazon EC2 instance does not support Amazon EBS optimization, and the Custom Config Rule cannot be evaluated for aggregated IOPS. This is likely an older generation instance. We recommend to upgrade to a modern instance type that supports Amazon EBS optimization.

EBS Optimized: Supported but not enabled

(AWS Config won’t report these under Resources in scope)

Not Applicable The Amazon EC2 instance supports Amazon EBS optimization, but this was not enabled and the Custom Config Rule cannot be evaluated for aggregated IOPS. Enable Amazon EBS optimization for these instances or upgrade to a modern instance type that natively supports and enables Amazon EBS optimization.
<%> – Total IOPS / Max IOPS <#> / <#> Noncompliant

The aggregate IOPS for all volumes attached to this volume exceeds the limit set for this instance type. If the volumes are striped and configured for RAID0 logical volumes, you may not be able to reach the theoretical/intended Total Aggregate IOPS.

Reduce your Amazon EBS volumes’ Provisioned IOPS such that their aggregate is below the maximum listed, or increase the Amazon EC2 instance size to a type that is above your current aggregate IOPS.

<%> – Total IOPS / Max IOPS <#> / <#> Compliant The aggregate IOPS for all volumes attached to this volume is below the limit set for this Amazon EC2 instance type.

4. As you continue to deploy Amazon EC2 instances, attach and detach Amazon EBS volumes, or modify Amazon EBS volume Provisioned IOPS, the custom Config rule is triggered to re-evaluate for compliance. You can use the AWS Config Dashboard to validate and check for over-provisioned IOPS, and even use AWS Config to manage automatic remediation actions. Please check AWS Config Developer Guide to get more details on how AWS Config rules are triggered.

Cleaning up

Delete the AWS Config custom rule, AWS Lambda function, and any Amazon EC2 Instances or Amazon EBS volumes created as part of this example to avoid incurring future costs.

Conclusion

Using RAID 0 is a common pattern for businesses to achieve larger logical volume and increase overall aggregate IOPS performance for their applications and workloads running on Amazon EC2 instances. As each Amazon EC2 instance type has a maximum limit for Amazon EBS-optimized throughput and IOPS proportional to its size, deploying this solution will allow you to ensure the attached volumes’ aggregated IOPS is below the instance’s maximum limit.

The solution leveraging AWS Lambda and AWS Config Custom Rules automates the invocation, detection, and reporting of compliance posture for Amazon EC2 instances and Amazon EBS volumes. Applying this across all your AWS accounts across your selected Regions will reduce wastage, optimize for costs, maximize performance, and eliminate unintentional overprovisioning of IOPS for your io1 and io2 volumes when aggregating and striping EBS volumes.

If you have feedback, questions, or suggestions, please feel free to leave us comments below.

Ed Chan

Ed Chan

Ed Chan is a Principal Solutions Architect at AWS based in Seattle, Washington. With over two decades of information technology experience, he currently advises enterprise customers and solves business and technical challenges by providing architectural guidance and cloud-native best practices. Outside of work, he follows the NBA and is perennially excited and disappointed by his hometown team the New York Knicks.

Ranjith Rayaprolu

Ranjith Rayaprolu

Ranjith Rayaprolu is a Senior Solutions Architect at AWS working with Pacific Northwest customers. He helps customers design and operate Well-Architected solutions in AWS that address their business problems and accelerate the adoption of AWS services. He focuses on AWS security and networking technologies to develop solutions in the cloud across different industry verticals. Ranjith lives in Seattle area and loves outdoor activities.