AWS Database Blog

Automatically scale storage for Amazon RDS Multi-AZ DB clusters using AWS Lambda

When Amazon RDS storage reaches capacity, your database becomes unavailable. This is a critical failure that can disrupt client operations, corrupt in-flight transactions, and trigger unplanned downtime that is difficult to recover from quickly. Preventing this requires proactive storage monitoring and timely scaling. Amazon RDS Multi-AZ deployments handle this automatically through native storage auto-scaling, but Amazon RDS Multi-AZ clusters with two readable standbys do not support this feature. As a result, teams must scale storage manually. Manual scaling is operationally demanding. It requires initiating a modify-instance request, waiting for the change to be applied (which can take minutes to hours depending on cluster size), monitoring the operation to completion, and verifying that the new capacity is reflected correctly. These steps must happen before storage is exhausted, often under time pressure and outside business hours.

In this post, we walk you through building an automated storage scaling solution for Amazon RDS Multi-AZ clusters with two readable standbys. We use AWS Lambda to execute scaling logic, Amazon CloudWatch to detect and alarm on storage thresholds, and Amazon Simple Notification Service (Amazon SNS) to deliver timely notifications. This combination provides event-driven automation, native AWS integration, and operational visibility without requiring third-party tooling.

Solution overview

This solution uses Amazon CloudWatch to monitor the FreeStorageSpace metric of your Amazon RDS Multi-AZ DB cluster. When your free storage drops below a threshold you define, Amazon CloudWatch triggers an alarm that sends a notification through Amazon SNS and invokes a Lambda function to automatically scale up storage. The Lambda function retrieves the current storage allocation, calculates the new capacity based on a configurable percentage increase (default 15%), and applies the modification to your RDS Multi-AZ DB cluster.

The following diagram illustrates the solution architecture.

Architecture Diagram: AWS architecture diagram showing RDS Multi-AZ DB Cluster connected to CloudWatch Alarm, which triggers a Lambda Function to scale storage. The alarm also sends notifications to Amazon SNS. The components are arranged in a workflow with connecting arrows, displayed within a blue-bordered region box.

Amazon RDS enforces a 6-hour waiting period between storage modification. Choose your scaling percentage based on your data growth rate and operational preferences. Use 15-20% for cost optimization with more frequent scaling, or 30-40% for fewer scaling events during high-growth periods.In the following sections, we walk through the setup process for this automated storage scaling solution.

Prerequisites

For this walkthrough, you must have the following prerequisites:

  • An AWS account with permissions to manage CloudWatch, Amazon RDS, Amazon SNS, and Lambda services
  • An existing Amazon RDS Multi-AZ DB cluster to monitor and scale as needed

Create solution resources with AWS CloudFormation

You can deploy the core infrastructure using AWS CloudFormation, which creates the Lambda function, AWS Identity and Access Management (IAM) execution role, SNS topic, and CloudWatch alarm. To create the CloudFormation stack, complete the following steps:

  1. Download this CloudFormation template.
  2. Open the AWS CloudFormation console and choose your target AWS Region.
  3. Choose Create stack and select With new resources (standard).
    Create Stack Menu: AWS CloudFormation console dropdown menu showing "Create stack" button.
  4. To upload the YAML file under Upload a template file, select Choose file
    emplate Upload Interface: CloudFormation "Create stack" page showing template preparation options.
  5. Upload the template file you downloaded and choose Next.
  6. Enter a stack name.
  7. Provide values for the requested parameters
    1. For DbInstanceIdentifiers, enter a comma-separated list of RDS DB instance identifiers (not cluster IDs) to monitor (for example, database-1-instance-1,database-2-instance-1, ... to a limit of 4096 bytes).
    2. For AlarmThresholdGB, enter a comma-separated list of free storage space thresholds in GB that trigger alarms, corresponding to each instance (for example, 20,30,15). This list must be the same length as the comma-separated list entered in DBInstanceIdentifiers.
    3. For EmailAddress enter an email address to receive storage scaling notifications. This is the default configuration, but if you would like the notification sent to a target other than email, modify the SNS topic after deployment.
    4. For ScalingPercentage enter the percentage to increase storage by when alarm triggers.
  8. Optionally, choose an IAM role.

If no IAM role is selected, CloudFormation uses the credentials from the current user.

Create solution resources with AWS Console

You can alternatively deploy the core infrastructure using the AWS console. If you have already deployed the solution via CloudFormation you can skip to the Considerations and limitations section of the post. To create the necessary resources via the AWS console, complete the subsequent sections:

Create Lambda function

Use the following Python script to configure the Lambda function.

Refer to this GitHub link for the entire Lambda code.

In the Lambda function create an environmental variable named SCALING_PERCENTAGE. This variable controls the percentage your databases storage is increased. For example, if you want to increase it by 15%, set this variable equal to 15. Amazon RDS requires a minimum increase of 10% when additional storage is added and prevents additional changes from being made for a minimum of 6 hours If the Lambda function detects either of these minimums have been exceeded, it will return an error via the SNS topic. Make sure the Lambda function is scaling by the proper percentage to avoid a storage full scenario shortly after the scaling event. Review the full code and make changes as you see fit.

import boto3
import logging
import json
import os

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def _get_scaling_percentage():
    """Get scaling percentage from environment variable with validation."""
    try:
        scaling_percentage = float(os.environ.get('SCALING_PERCENTAGE', '15'))
        logger.info(f"Using scaling percentage: {scaling_percentage}%")
        return scaling_percentage
    except ValueError as e:
        logger.error(f"Invalid SCALING_PERCENTAGE environment variable: {str(e)}")
        default_percentage = 15
        logger.info(f"Defaulting to {default_percentage}% scaling")
        return default_percentage

def _parse_event_output(event):
    """Parse the CloudWatch alarm event to extract DB instance identifier."""
    try:
        event_string = str(event['alarmData']['configuration']['metrics'])
        start_marker = "{'DBInstanceIdentifier': '"
        end_marker = "'}"
        start_idx = event_string.find(start_marker) + len(start_marker)
        end_idx = event_string.find(end_marker, start_idx)
        db_instance_id = event_string[start_idx:end_idx]
        logger.info(f"Parsed DB Instance ID: {db_instance_id}")
        return db_instance_id
    except (KeyError, IndexError, ValueError) as e:
        logger.error(f"Failed to parse event data: {str(e)}")
        raise ValueError(f"Invalid event structure: {str(e)}")

def _get_db_instance_info(rds_client, db_instance_id):
    """Retrieve DB instance information from RDS with all storage-related parameters."""
    try:
        db_instance = rds_client.describe_db_instances(DBInstanceIdentifier=db_instance_id)
        instance_info = db_instance['DBInstances'][0]
        
        storage_config = {
            'db_instance_identifier': instance_info['DBInstanceIdentifier'],
            'db_cluster_identifier': instance_info.get('DBClusterIdentifier'),
            'allocated_storage': instance_info['AllocatedStorage'],
            'storage_type': instance_info.get('StorageType'),
            'iops': instance_info.get('Iops'),
            'storage_throughput': instance_info.get('StorageThroughput'),
            'max_allocated_storage': instance_info.get('MaxAllocatedStorage'),
            'engine': instance_info.get('Engine'),
            'engine_version': instance_info.get('EngineVersion'),
            'storage_encrypted': instance_info.get('StorageEncrypted', False)
        }
        
        logger.info(f"Current storage configuration: {json.dumps(storage_config, default=str)}")
        return storage_config
        
    except rds_client.exceptions.DBInstanceNotFoundFault as e:
        logger.error(f"DB instance not found: {db_instance_id}")
        raise
    except Exception as e:
        logger.error(f"Failed to describe DB instance: {str(e)}")
        raise

def _calculate_new_storage(current_storage, scaling_percentage):
    """Calculate new storage size based on scaling percentage."""
    scaling_multiplier = 1 + (scaling_percentage / 100)
    new_storage = int(current_storage * scaling_multiplier)
    logger.info(f"Calculated new storage: {new_storage} GB ({scaling_percentage}% increase from {current_storage} GB)")
    return new_storage

def _calculate_iops_for_storage(new_storage, storage_type, current_iops=None):
    """Calculate appropriate IOPS for the new storage size based on storage type."""
    if storage_type == 'gp3':
        # gp3: Keep current IOPS if set, otherwise use 3000 baseline
        return current_iops if current_iops else 3000
    
    elif storage_type in ['io1', 'io2']:
        # For Provisioned IOPS, maintain current IOPS value
        # AWS requires IOPS to be specified when modifying io1/io2 storage
        if current_iops:
            return current_iops
        else:
            # Default minimum for io1/io2
            return 1000
    
    elif storage_type == 'gp2':
        # gp2: IOPS scale automatically (3 IOPS per GB), no need to specify
        return None
    
    else:
        # standard (magnetic) storage doesn't support IOPS
        return None

def _build_modify_params(storage_config, new_storage):
    """Build modification parameters for the DB cluster, preserving storage characteristics."""
    params = {
        'DBClusterIdentifier': storage_config['db_cluster_identifier'],
        'AllocatedStorage': new_storage,
        'ApplyImmediately': True
    }
    
    # Include storage type if present
    if storage_config.get('storage_type'):
        params['StorageType'] = storage_config['storage_type']
        logger.info(f"Preserving storage type: {storage_config['storage_type']}")
    
    # Handle IOPS based on storage type
    storage_type = storage_config.get('storage_type')
    if storage_type in ['io1', 'io2', 'gp3']:
        iops = _calculate_iops_for_storage(
            new_storage, 
            storage_type, 
            storage_config.get('iops')
        )
        if iops:
            params['Iops'] = iops
            logger.info(f"Setting IOPS to: {iops}")
    
    # Handle storage throughput for gp3
    if storage_type == 'gp3' and storage_config.get('storage_throughput'):
        params['StorageThroughput'] = storage_config['storage_throughput']
        logger.info(f"Preserving storage throughput: {storage_config['storage_throughput']} MiB/s")
    
    return params

def _modify_db_cluster(rds_client, storage_config, new_storage):
    """Modify RDS Multi-AZ DB cluster with new storage allocation and proper IOPS/throughput."""
    try:
        cluster_id = storage_config['db_cluster_identifier']
        
        # Build modification parameters with all required storage settings
        modify_params = _build_modify_params(storage_config, new_storage)
        
        logger.info(f"Modifying DB cluster with parameters: {json.dumps(modify_params, default=str)}")
        
        response = rds_client.modify_db_cluster(**modify_params)
        
        logger.info(f"Successfully modified DB cluster {cluster_id} to {new_storage} GB")
        logger.info(f"Modification response: {json.dumps(response, default=str)}")
        return response
        
    except Exception as e:
        logger.error(f"Failed to modify DB cluster: {str(e)}")
        raise

def _create_success_response(storage_config, new_storage, scaling_percentage):
    """Create a successful response payload with detailed storage information."""
    return {
        'statusCode': 200,
        'body': json.dumps({
            'message': 'Storage modification successful',
            'cluster_id': storage_config['db_cluster_identifier'],
            'instance_id': storage_config['db_instance_identifier'],
            'old_storage_gb': storage_config['allocated_storage'],
            'new_storage_gb': new_storage,
            'scaling_percentage': scaling_percentage,
            'storage_type': storage_config.get('storage_type'),
            'iops': storage_config.get('iops'),
            'storage_throughput_mibs': storage_config.get('storage_throughput')
        })
    }

def _create_error_response(error):
    """Create an error response payload."""
    return {
        'statusCode': 500,
        'body': json.dumps({
            'message': 'Storage modification failed',
            'error': str(error)
        })
    }

def lambda_handler(event, context):
    """Main Lambda handler for RDS storage autoscaling."""
    rds = boto3.client('rds')
    
    try:
        logger.info(f"Lambda function invoked with event: {json.dumps(event)}")
        
        # Get configuration
        scaling_percentage = _get_scaling_percentage()
        
        # Parse event to get DB instance ID
        db_instance_id = _parse_event_output(event)
        
        # Get current DB instance information with all storage parameters
        storage_config = _get_db_instance_info(rds, db_instance_id)
        
        # Calculate new storage size
        new_storage = _calculate_new_storage(
            storage_config['allocated_storage'], 
            scaling_percentage
        )
        
        # Modify the DB cluster with proper storage parameters
        _modify_db_cluster(rds, storage_config, new_storage)
        
        # Return success response with detailed information
        return _create_success_response(storage_config, new_storage, scaling_percentage)
            
    except Exception as e:
        logger.error(f"Lambda execution failed: {str(e)}", exc_info=True)
        return _create_error_response(e)

Create SNS topic (optional)

If you want an email notification to be sent when the Lambda function is invoked, you must create an SNS topic and create a subscription using your email.

Create CloudWatch alarm

Complete the following steps to configure an Amazon CloudWatch alarm to monitor the FreeStorageSpace CloudWatch metric for the RDS Multi-AZ DB cluster:

  1. On the Amazon CloudWatch console, in the navigation pane, choose Alarms.
  2. Choose Create alarm.
    CloudWatch Alarms Page: Empty CloudWatch Alarms console showing "No alarms to display" message.
  3. On the Specify metric and conditions page, choose Select metric.
    Metric Selection Dialog: CloudWatch metric selection interface titled "Specify metric and conditions" with a "Select metric" button.
  4. Search for your database identifier and select the FreeStorageSpace CloudWatch metric, then choose Select metric
    Metric Browser: CloudWatch metric browser showing RDS metrics filtered for "freestoragespace".
  5. Select the conditions for the CloudWatch alarm to respond, specifically the amount of free storage space remaining in the database to initiate an order for scaling. We suggest setting the alarm to respond at 15% of the initial total storage space. If you have a 100 GB database, 15% of that (15 GB, or 15,000,000,000 bytes) is when the alarm should invoke the scaling action. Then choose Next, as shown in the following screenshot.
  6. Choose Next.
    Alarm Conditions Configuration: CloudWatch alarm conditions page showing threshold type options (Static vs Anomaly detection).
  7. Optionally, select Select an existing SNS topic and enter the topic name created in the previous section to notify when the CloudWatch alarm is in an alarm state
    Notification Configuration: CloudWatch alarm action configuration showing notification settings.
  8. Under Lambda action choose the name of the Lambda function previously created
    Lambda Action Configuration: Additional alarm actions section showing Lambda function configuration.
  9. Enter a CloudWatch alarm name, then choose Create alarm.

Allow CloudWatch alarm to invoke Lambda function

Complete the following steps to add a resource-based policy statement to allow the CloudWatch alarm to invoke your Lambda function.

  1. On the Lambda console choose Functions in the navigation pane.
  2. Select your function.
  3. Choose Configuration and then Permissions.
  4. Under Resource-based policy statements, choose Add permissions.
  5. Select AWS service to grant permissions to CloudWatch Logs.
  6. For Service, choose CloudWatch Logs.
  7. For Statement ID, enter a unique Lambda statement ID
  8. For Service Principal, enter lambda.alarms.cloudwatch.amazonaws.com.
  9. For Source ARN, enter the Amazon Resource Name (ARN) of the previously created CloudWatch alarm.
  10. For Action, choose lambda:InvokeFunction.
  11. Choose Save.

 IAM Permissions Editor: Lambda function permissions configuration page showing policy statement editor.

Considerations and limitations

Although this solution provides automated storage scaling for RDS Multi-AZ DB clusters, there are several important considerations to keep in mind when implementing it at scale.

Scaling to multiple databases

The CloudFormation template supports monitoring multiple DB instances through comma-delimited parameters. Instead of manually configuring alarms through the CloudWatch console for each database, you can specify all your DB instance identifiers and their corresponding thresholds when deploying the stack. For example:

  • DBInstanceIdentifiers: prod-db-1,prod-db-2,dev-db-1,dev-db-2
  • AlarmThresholdGB: 50,50,20,20

This approach makes it possible to deploy monitoring for hundreds of databases in a single CloudFormation stack deployment, making it practical for large-scale environments. However, keep in mind that CloudFormation has a template size limit of 1 MB, which may constrain the number of resources you can define in a single stack. For very large deployments (hundreds of databases), consider creating multiple stacks organized by environment, application, or Region. Alternatively, you could move the database instance and alarm threshold configuration outside of the stack itself and store it in an Amazon DynamoDB table or Parameter Store, a capability of AWS Systems Manager.

Customizing scaling behavior

The solution uses a single SCALING_PERCENTAGEenvironment variable that applies to all databases monitored by the Lambda function. If you need different scaling factors for different databases (for example, scaling production databases by 40% for safety while scaling development databases by only 10%), you have several options:

  1. Deploy multiple stacks – Create separate CloudFormation stacks for different database tiers (production, development, staging), each with its own scaling percentage configured. This provides clear separation and makes it straightforward to manage different scaling policies.
  2. Modify the Lambda function – Enhance the Lambda code to include a mapping of database identifiers to scaling percentages, either through additional environment variables or by reading from an Amazon DynamoDB table or Parameter Store.This provides fine-grained control within a single deployment.
  3. Use tags – Implement logic in the Lambda function to read RDS instance tags and determine the appropriate scaling percentage based on environment or criticality tags.

Additional Considerations

  • Cost implications – Storage scaling is permanent and can’t be reversed (you can only scale up, not down). Monitor your costs carefully, especially with aggressive scaling percentages.
  • Storage limits – Amazon RDS has maximum storage limits depending on the database engine and instance type. At the time of writing, the solution doesn’t validate against these limits before scaling, it will however notify you if a scaling request was rejected by the RDS API.
  • Scaling frequency – If your database consistently triggers the alarm, you might want to investigate the root cause rather than continuously scaling storage.
  • Multi-Region deployments – This solution is Region-specific. For multi-Region Amazon RDS deployments, deploy the CloudFormation stack in each Region where you have databases to monitor.
  • Lower environment testing – As with any change to your environment, we recommend testing this solution in a non-production environment before deploying it in production.

Clean up

If you need to decommission this solution to no longer scale up storage for Amazon RDS instances, complete the following steps:

Conclusion

In this post, we showed you how to automate storage scaling for Amazon RDS Multi-AZ DB clusters with two readable standbys using Amazon CloudWatch, AWS Lambda, and Amazon SNS. This solution helps you maintain database availability by preventing storage-full conditions while reducing the operational overhead of manual monitoring. You can customize the scaling percentage to match your specific data growth patterns and cost optimization goals.

If you have questions about the solution in this post, contact your AWS representative or leave a comment.


A special thanks to Vlad Podomatskiy for assisting in the creation of this blog post!

About the authors

Ryan Moore

Ryan Moore

Ryan is a Technical Account Manager at AWS supporting ISV customers. He enables ISV’s to build performant, scalable, and secure architectures within the AWS Cloud. Prior to his TAM role he was a database engineer specializing in Aurora MySQL and RDS MySQL.

Nirupam Datta

Nirupam Datta

Nirupam is a Sr. Technical Account Manager at AWS. With over 14 years of experience in database engineering and infra-architecture, Nirupam is also a subject matter expert in the Amazon RDS core systems and Amazon RDS for SQL Server. He provides technical assistance to customers, guiding them to migrate, optimize, and navigate their journey in the AWS cloud.

Pat Doherty

Pat Doherty

Pat is a Cloud Support Engineer at AWS supporting the database team. He currently provides technical support on MySQL, Amazon Aurora MySQL & PostgreSQL, MariaDB, PostgreSQL, and SQL Server databases, as well as assistance with the AWS Database Migration service.