AWS Cloud Operations & Migrations Blog

Delete Empty CloudWatch Log Steams

Customers that use Amazon CloudWatch to monitor their applications and resources on AWS can accumulate a large number of log streams that are used only briefly, or are no longer required. While there is no charge for maintaining an empty log stream, having potentially thousands of them can be overwhelming, especially while using the CloudWatch console. Containerized and auto-scaling workloads (those with short-lived resources that create their own log streams) are common examples of this.

In this post, we will demonstrate process of automating the cleanup of Amazon CloudWatch log streams that have exceeded their retention period. A log stream is a sequence of log events that share the same source. Each separate source of logs in CloudWatch Logs makes up a separate log stream. By default, CloudWatch Logs are stored indefinitely, however you can configure how long data is stored in a log group. Any data older than the current retention setting is deleted. You can change the log retention for each log group at any time. CloudWatch Logs doesn’t immediately delete log events when they reach their retention setting. It typically takes up to 72 hours before log events are deleted, but in rare situations might take longer. However, CloudWatch will retain the log streams even after logs are emptied by retention period settings.

We will setup an AWS Lambda function that can be run on schedule to delete any empty log streams inside CloudWatch log groups.

Solution overview

A simple solution is to run the script provided below (lambda_function.py code) periodically in AWS Lambda. The script reads the retention settings for all CloudWatch log groups and clears those log streams that are past their retention day period.

The script:

  • Reads all log groups configuration
  • Checks retention day setting for each log group and picks only those log groups that do not have infinite retention setting enabled
  • Calls the CloudWatch DescribeLogStreams API and gets last log ingestion time
  • Deletes log steam if the last log ingestion time is greater than configured retention period
  • Adds delay between log stream deletions to avoid exceeding rate limits

Solution walkthrough

  1. In the AWS Lambda console, choose Create function
  2. Select Author from Scratch
  3. For Name, enter emptyLogStreamDeleter
  4. For Runtime Python 3.9 and Architecture arm64 and leave other fields as default
  5. Choose Create function

    Figure 1: Creating the Lambda function to operate our cleanup code

  6. For lambda_function.py use the code given below
    import boto3
    from datetime import datetime
    from time import sleep
    
    cloudwatchlogs_client = boto3.client('logs')
    
    def get_log_groups(next_token=None):
        log_group_request = {
            'limit': 50  # Maximum
        }
        if next_token:
            log_group_request['nextToken'] = next_token
        log_groups_response = cloudwatchlogs_client.describe_log_groups(**log_group_request)
        if log_groups_response:
            for log_group in log_groups_response['logGroups']:
                yield log_group
            if 'nextToken' in log_groups_response:
                yield from get_log_groups(log_groups_response['nextToken'])
    
    def get_streams(log_group, next_token=None):
        log_stream_request = {
            'logGroupName': log_group['logGroupName'],
            'limit': 50  # Max
        }
        if next_token:
            log_stream_request['nextToken'] = next_token
    
        response = cloudwatchlogs_client.describe_log_streams(**log_stream_request)
    
        if response:
            for log_stream in response['logStreams']:
                yield log_stream
            if 'nextToken' in response:
                yield from get_streams(log_group, response['nextToken'])
    
    def delete_old_streams(log_group):
        if 'retentionInDays' not in log_group:
            print("log group {} has infinite retention, skipping".format(log_group['logGroupName']) )
            return
    
        for log_stream in get_streams(log_group):
    
            #check to prevent accidental delete
            if 'lastEventTimestamp' not in log_stream:
                continue
            else:
                diff_millis = datetime.now().timestamp() * 1000 - log_stream['lastIngestionTime']
                diff_days = diff_millis / (1000 * 86400)
                
            if diff_days > log_group['retentionInDays']:
                    print("Deleting stream: {} in log group {} ".format(log_stream['logStreamName'], log_group['logGroupName']))
                    try:
                        cloudwatchlogs_client.delete_log_stream(
                            logGroupName=log_group['logGroupName'],
                            logStreamName=log_stream['logStreamName']
                        )
                        print("Stream deleted")
                        #pause every 200 ms to skip rate exceeded errors as too many API calls are made in short time
                        sleep(0.2)
                    except Exception as e:
                        if e.response['Error']['Message'] == "Rate exceeded":
                            print("We've hit a rate limit error so we are stopping for this log group.")
                        else:
                            print("Error deleting log stream", e.response['Error']['Message'])
                        return
    
            
    def lambda_handler(event, context):
        for log_group in get_log_groups():
            delete_old_streams(log_group)
        print("Done")
    

    Figure 7: Switching to new EventBridge console from CloudWatch Rules console

  7. Now select Configuration tab and choose Permissions

    Figure 3: Select the Configuration tab to find the link to the Lambda function’s IAM role

  8. Select link for Role name under Execution role section and this will launch role in Identity and Access Management (IAM) section
  9. For Permissions Policy select Create Inline Policy and add this below in JSON:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "logs:DescribeLogGroups",
                    "logs:DescribeLogStreams",
                    "logs:DeleteLogStream"
                ],
                "Resource": "*"
            }
        ]
    }

Here is a screenshot of the completed IAM policy. It displays the same policy JSON as the previous step, within the context of the IAM policy visual editor.

Figure 4: A screenshot of the IAM policy for this function

  1. Select Review Policy and on next step provide a name for this inline policy and select Create Policy
  2. In the Lambda console the updated permissions from the previous IAM Role permissions grant is displayed now.

Figure 5: The updated resource permissions are now visible

  1. Under General configuration section, choose Edit and change the Timeout to 5 minutes, leave all defaults and select Save

Figure 6: Changing lambda function Timeout to 5 minutes

To call the Lambda function periodically:

  1. In the CloudWatch console, choose Rules under Events in the left navigation pane
  2. If you are on older CloudWatch rules console, please select Try the new EventBridge console on top right

Figure 7: Switching to new EventBridge console from Cloudwatch Rules console

  1. Select Create rule

Figure 8: Screenshot of EventBridge console

  1. For Define rule detail step, enter name as empty-logstream-deleter-scheduling-rule and for Rule type choose Schedule and leave everything as defaults and select Continue in EventBridge Scheduler

Figure 9: Entering details on Define rule detail section on EventBridge Rules

  1. For Specify schedule detail step, under Schedule pattern section select Recurring schedule and select Rate-based schedule. For Rate expression enter 15 minutes and turn off Flexible time window. Leave defaults for Timeframe section and select Next

Figure 10: Selecting Schedule for the new EventBridge Rule

  1. For Select target step, choose AWS Lambda as Target API

Figure 11: Selecting AWS Lambda function as Target API

  1. For Invoke section, choose emptyLogStreamDeleter function that we created earlier and select Next

Figure 12: Selecting AWS Lambda function to Invoke

  1. Leave defaults in Settings step and choose Next
  2. Review the steps and select Create Schedule

Figure 13: Screenshot of EventBridge Schedule review

And that’s it, you’re done! Your empty log streams will now be deleted after the set retention date for the log group expires.

Please note some limitations of the solution:

  • Function timeout is set as 5 minutes, depending on log streams to be deleted on every run, function timeout and EventBridge scheduler rate needs to be adjusted
  • No action will be taken on those log groups that have infinite retention setup

As AWS Lambda has a free tier of one million free requests and 400,000 GB-seconds of compute time per month and Amazon CloudWatch has a 5GB/month free tier, pricing for this solution is free for most of the customers not exceeding free tier limits and for others, this solution will cost less than a dollar depending on the compute seconds lambda spends in deleting the log streams.

About the authors:

Vinod Kisanagaram

Vinod is a Solution Architect for AWS from Delaware. He is currently working with Worldwide Public Sector Enterprise customers on crafting highly scalable and resilient cloud architectures. He is passionate about DevOps, Observability and Serverless technologies.

Rich McDonough

Rich McDonough is a Sr. Solutions Architect for AWS based in Toronto. His primary focus is on Cloud Operations, helping customers scale their use of AWS safely and securely, and guiding customers in their adoption of observability practices and services. Before joining AWS in 2018, he specialized in helping migrate customers into the cloud.