Stop and start Amazon RDS Multi-AZ DB clusters on a schedule

Amazon RDS Multi-AZ DB cluster deployment is a high availability deployment mode of Amazon Relational Database Service (Amazon RDS) with two readable replica instances that also serve as standby instances. A RDS Multi-AZ DB cluster has a writer instance and two reader instances in three separate Availability Zones in the same AWS Region. RDS Multi-AZ DB clusters provide upto 2x lower write latency, additional capacity for read workloads, and faster failover times compared to RDS Multi-AZ DB instances. In addition, RDS Multi-AZ DB Clusters offer minor version upgrades and system maintenance updates with a downtime of one second or less.

Stopping and starting the RDS Multi-AZ DB clusters can be very useful if you want to temporarily stop the clusters for your development or test environments when you’re not using them for various reasons (such as vacations, holidays, or weekends) to reduce costs. In this post, we show you how to stop and start your RDS Multi-AZ DB clusters, enabling you to gain more control over your infrastructure resources.

Solution overview

This solution uses AWS Lambda and Amazon EventBridge to achieve more control by providing RDS Multi-AZ DB cluster deployments a self-managed solution and simulate a stop/start operation. Instead of stopping and starting, we follow the approach to delete and recreate clusters by restoring the cluster from a snapshot.

The solution demonstrated in this post restores RDS Multi-AZ DB cluster from a RDS snapshot. You can use the restored RDS Multi-AZ DB cluster as soon as its status is available. The restored RDS Multi-AZ DB Cluster continues to load data in the background. This is known as lazy loading. To help mitigate the effects of lazy loading on tables to which you require quick access, you can perform operations that involve full-table scans, such as SELECT * from Table. This allows Amazon RDS to download all of the backed-up table data from Amazon Simple Storage Service (Amazon S3).

The following diagram shows the architecture of our proposed solution:

EventBridge invokes a Lambda function on a user-defined schedule, takes a snapshot and then immediately deletes the Multi-AZ DB cluster. This RDS Snapshot is used as a source to restore the RDS Multi-AZ DB cluster whenever required. The approach not only helps control your overall Amazon RDS costs, but also gives you security and peace of mind because the cluster isn’t available during those times when you don’t want it up and available for new connections.

Prerequisites

To implement this solution, you need to complete the following high-level steps:

Create a RDS Multi-AZ DB cluster.
Create a Lambda execution role and attach an AWS Identity and Access Management (IAM) policy.
Create two Lambda functions: one to take a snapshot and then immediately delete the RDS Multi-AZ DB Cluster, and another one to create a new RDS Multi-AZ DB cluster from the snapshot.
Create an EventBridge rule to invoke the Lambda functions on a schedule.

Create an IAM policy and role for Lambda

You need to create an IAM policy and role for the Lambda functions to be able to perform the delete and restore operations. Complete the following steps to create the IAM policy:

On the IAM console, in the navigation pane, choose Policies.
Choose Create policy.
On the JSON tab, enter the following policy:

{

  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "rds:AddTagsToResource",
        "rds:CopyDBSnapshot",
        "rds:CopyDBClusterSnapshot",
        "rds:DeleteDBInstance",
        "rds:DeleteDBSnapshot",
        "rds:DeleteDBCluster",
        "rds:RestoreDBClusterFromSnapshot",
        "rds:RestoreDBInstanceFromDBSnapshot",
        "rds:CreateDBInstance",
        "rds:CreateDBCluster",
        "rds:Describe*",
        "rds:ListTagsForResource"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

From a security perspective, it’s recommended to limit the scope of the solution to just a few instances using the Resource parameter.

Choose Review policy.
For Policy Name, enter a name (for this post, RDSMAZDBAutomation).

Choose Create policy.

Now you can create your IAM role.

Choose Roles in the navigation pane.
Choose Create role.
For Select type of trusted entity, choose AWS service.
For Use case, choose Lambda.
Choose Next.
For IAM policy name, enter a name (for this post, RDSMAZDBAutomation).
Under Permissions policies, select the policy you created.
Choose Next.

For Role name, enter a name (for this post, RDSMAZDBAutomationRole).

Choose Create role.

Your IAM role is ready to be attached to the Lambda functions.

Create a Lambda function to delete the RDS Multi-AZ DB cluster

Complete the following steps to create a Lambda function for deleting the Multi-AZ DB cluster:

On the Lambda console, choose Functions in the navigation pane.
Choose Create function.
For Function name¸ enter a name (for this post, DeleteMAZDBCluster).
For Runtime, choose Python 3.12.

For Execution role¸ select Use an existing role.
For Existing role¸ choose the role you created.
Choose Create function.

On the Lambda function details page, go to the function code section and replace the sample code with the following (change the db_cluster value to the DB cluster identifier that you want to create the automation for, and change the region value to the Region where your DB cluster is hosted):

import boto3
import logging
import os
import datetime
import time
import sys

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

cluster = os.getenv('CLUSTER','macluster1' )
region = os.getenv('AWS_REGION', 'us-west-2')

def lambda_handler(event, _context):
      try:
            date=time.strftime("-%d-%m-%Y")
            snapshot_name = cluster+date
            source = boto3.client('rds', region_name=region)
            source.delete_db_cluster(DBClusterIdentifier=cluster,
            SkipFinalSnapshot=False,FinalDBSnapshotIdentifier=snapshot_name)
            logger.info(f'Snapshot {snapshot_name} deleted')
            if source.describe_db_clusters(DBClusterIdentifier=cluster):
                  logger.info(f'Cluster {cluster} still exists')
                  return False
            else:
                  logger.info(f'Cluster {cluster} deleted')
                  return True
      except Exception as e:
            logger.error(f'Something went wrong: {e}')
            return False

Choose Save.
Choose Test to test the function.
For Event name, enter a name (for example, DeleteDBCluster).
Choose
Choose Test again to test the function.

Note: When you choose Test the second time, it will delete the DB cluster if successful.

After the test is successful, you will get a response as shown in the following screenshot.

Create a Lambda function to restore the RDS Multi-AZ DB cluster from the latest DB snapshot

In this step, you create a Lambda function to restore the RDS Multi-AZ DB cluster using the latest snapshot. Follow the same steps as in the previous section to create a Lambda function. For this post, we have named it RestoreDBCluster and used the following code:

import os
import logging
import boto3
import datetime
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def by_timestamp(snap):
    if 'SnapshotCreateTime' in snap:
        # Convert the SnapshotCreateTime to a UTC datetime object
        snapshot_time = snap['SnapshotCreateTime'].replace(tzinfo=datetime.timezone.utc)
        return snapshot_time
    # Return the current UTC time
    return datetime.datetime.now(tz=datetime.timezone.utc)
def lambda_handler(event, _context):
    # Retrieve values from environment variables
    db_cluster = os.environ.get('DB_CLUSTER', 'macluster1')
    engine = os.environ.get('ENGINE', 'mysql')
    aws_region = os.environ.get('AWS_REGION', 'us-west-2')
    db_instance_class = os.environ.get('DB_INSTANCE_CLASS', 'db.m5d.large')
    subnet_group_name = os.environ.get('SUBNET_GROUP_NAME', 'default')
    storage_type = os.environ.get('STORAGE_TYPE', 'gp3')
    try:
        # Create RDS client
        rds = boto3.client('rds', region_name=aws_region)
        source_snaps = rds.describe_db_cluster_snapshots(DBClusterIdentifier=db_cluster)['DBClusterSnapshots']
        sorted_snaps = sorted(source_snaps, key=by_timestamp, reverse=True)
        logger.info(f"DB cluster snapshots: {sorted_snaps}")
        source_snap = sorted_snaps[0]['DBClusterSnapshotIdentifier']
        logger.info(f"Restoring {source_snap} to {db_cluster}")
        response = rds.restore_db_cluster_from_snapshot(
            DBClusterIdentifier=db_cluster,
            SnapshotIdentifier=source_snap,
            Engine=engine,
            DBClusterInstanceClass=db_instance_class,
            StorageType=storage_type,
            DBSubnetGroupName=subnet_group_name,
            PubliclyAccessible=False,
            DeletionProtection=False
        )
    except Exception as e:
        logger.error(f"An error occurred: {e}")
        raise e
    logger.info(f"Restore response: {response}")
    return {
        'statusCode': 200,
        'body': f"Successfully restored {source_snap} to {db_cluster}"
    }

If you experience a timeout error during the Lambda function test run, you can do the following:

On the function details page, choose the Configuration
Choose General configuration in the left pane.
Choose Edit.
Increase the timeout value (the default timeout is 3 seconds).
Choose Save.

Create an EventBridge rule

For this post, we want to delete the RDS Multi-AZ DB cluster every day at 21:00 UTC and restore it every day at 09:00 AM UTC. Complete the following steps to create this rule in EventBridge:

On the EventBridge console, choose Schedules in the navigation pane.
Choose Create schedule.
For Name, enter a name (for this post, DeleteMAZDBCluster).
For Occurrence, select Recurring schedule.
For Time zone, choose UTC.
For Schedule type, select Cron-based schedule.

For Cron expression, enter the following values:
- Minutes: 00
- Hours: 21
- Day of month: *
- Month: *
- Day of the week: ?
- Year: *

Choose Next.
For Target API, select Templated targets.
Select AWS Lambda Invoke.

Choose the DeleteMAZDBCluster Lambda function to be invoked and choose Next.
Choose Next on the Settings
Choose Create schedule.
Repeat these steps to create another schedule to invoke the RestoreDBCluster Lambda function every day at 09:00 AM UTC.

Clean up

To clean up your resources, complete the following steps:

Delete the RDS Multi-AZ DB cluster.
Delete the Lambda functions.
Delete the EventBridge rule.

Conclusion

In this post, we shared a solution for stopping (deleting) and starting (recreating) a RDS Multi-AZ DB cluster to save on billing and management overhead. You can adjust the automation process outlined in this post to suit your use case and align with your specific business requirements. Provide your feedback in the comments section.

About the authors

Kaustubh Wani is a Technical Account Manager at AWS and works closely with Education and Education technology customers based out of North America. He has been with AWS for over 5 years and started his journey working as a Cloud Support Engineer in the RDS Databases team. Kaustubh is also a subject matter expert in the Amazon RDS Core systems and works extensively on open-source engines like MySQL and PostgreSQL. He holds AWS Solutions Architect Associate, Database Specialty and Security Specialty Certifications. He works with enterprise customers providing technical assistance on improving database operational performance and sharing database best practices.

Nirupam Datta is a Senior Cloud Support DBE at AWS. He has been with AWS for over 4 years. With over 12 years of experience in database engineering and infra-architecture, Nirupam is also a subject matter expert in the Amazon RDS core systems and Amazon RDS for SQL Server. He provides technical assistance to customers, guiding them to migrate, optimize, and navigate their journey in the AWS Cloud.

AWS Database Blog