AWS Cloud Operations & Migrations Blog
Delete Empty CloudWatch Log Steams
Customers that use Amazon CloudWatch to monitor their applications and resources on AWS can accumulate a large number of log streams that are used only briefly, or are no longer required. While there is no charge for maintaining an empty log stream, having potentially thousands of them can be overwhelming, especially while using the CloudWatch console. Containerized and auto-scaling workloads (those with short-lived resources that create their own log streams) are common examples of this.
In this post, we will demonstrate process of automating the cleanup of Amazon CloudWatch log streams that have exceeded their retention period. A log stream is a sequence of log events that share the same source. Each separate source of logs in CloudWatch Logs makes up a separate log stream. By default, CloudWatch Logs are stored indefinitely, however you can configure how long data is stored in a log group. Any data older than the current retention setting is deleted. You can change the log retention for each log group at any time. CloudWatch Logs doesn’t immediately delete log events when they reach their retention setting. It typically takes up to 72 hours before log events are deleted, but in rare situations might take longer. However, CloudWatch will retain the log streams even after logs are emptied by retention period settings.
We will setup an AWS Lambda function that can be run on schedule to delete any empty log streams inside CloudWatch log groups.
Solution overview
A simple solution is to run the script provided below (lambda_function.py code) periodically in AWS Lambda. The script reads the retention settings for all CloudWatch log groups and clears those log streams that are past their retention day period.
The script:
- Reads all log groups configuration
- Checks retention day setting for each log group and picks only those log groups that do not have infinite retention setting enabled
- Calls the CloudWatch DescribeLogStreams API and gets last log ingestion time
- Deletes log steam if the last log ingestion time is greater than configured retention period
- Adds delay between log stream deletions to avoid exceeding rate limits
Solution walkthrough
- In the AWS Lambda console, choose Create function
- Select Author from Scratch
- For Name, enter emptyLogStreamDeleter
- For Runtime Python 3.9 and Architecture arm64 and leave other fields as default
- Choose Create function
- For lambda_function.py use the code given below
import boto3 from datetime import datetime from time import sleep cloudwatchlogs_client = boto3.client('logs') def get_log_groups(next_token=None): log_group_request = { 'limit': 50 # Maximum } if next_token: log_group_request['nextToken'] = next_token log_groups_response = cloudwatchlogs_client.describe_log_groups(**log_group_request) if log_groups_response: for log_group in log_groups_response['logGroups']: yield log_group if 'nextToken' in log_groups_response: yield from get_log_groups(log_groups_response['nextToken']) def get_streams(log_group, next_token=None): log_stream_request = { 'logGroupName': log_group['logGroupName'], 'limit': 50 # Max } if next_token: log_stream_request['nextToken'] = next_token response = cloudwatchlogs_client.describe_log_streams(**log_stream_request) if response: for log_stream in response['logStreams']: yield log_stream if 'nextToken' in response: yield from get_streams(log_group, response['nextToken']) def delete_old_streams(log_group): if 'retentionInDays' not in log_group: print("log group {} has infinite retention, skipping".format(log_group['logGroupName']) ) return for log_stream in get_streams(log_group): #check to prevent accidental delete if 'lastEventTimestamp' not in log_stream: continue else: diff_millis = datetime.now().timestamp() * 1000 - log_stream['lastIngestionTime'] diff_days = diff_millis / (1000 * 86400) if diff_days > log_group['retentionInDays']: print("Deleting stream: {} in log group {} ".format(log_stream['logStreamName'], log_group['logGroupName'])) try: cloudwatchlogs_client.delete_log_stream( logGroupName=log_group['logGroupName'], logStreamName=log_stream['logStreamName'] ) print("Stream deleted") #pause every 200 ms to skip rate exceeded errors as too many API calls are made in short time sleep(0.2) except Exception as e: if e.response['Error']['Message'] == "Rate exceeded": print("We've hit a rate limit error so we are stopping for this log group.") else: print("Error deleting log stream", e.response['Error']['Message']) return def lambda_handler(event, context): for log_group in get_log_groups(): delete_old_streams(log_group) print("Done")
- Now select Configuration tab and choose Permissions
- Select link for Role name under Execution role section and this will launch role in Identity and Access Management (IAM) section
- For Permissions Policy select Create Inline Policy and add this below in JSON:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "logs:DescribeLogGroups", "logs:DescribeLogStreams", "logs:DeleteLogStream" ], "Resource": "*" } ] }
Here is a screenshot of the completed IAM policy. It displays the same policy JSON as the previous step, within the context of the IAM policy visual editor.
- Select Review Policy and on next step provide a name for this inline policy and select Create Policy
- In the Lambda console the updated permissions from the previous IAM Role permissions grant is displayed now.
- Under General configuration section, choose Edit and change the Timeout to 5 minutes, leave all defaults and select Save
To call the Lambda function periodically:
- In the CloudWatch console, choose Rules under Events in the left navigation pane
- If you are on older CloudWatch rules console, please select Try the new EventBridge console on top right
- Select Create rule
- For Define rule detail step, enter name as empty-logstream-deleter-scheduling-rule and for Rule type choose Schedule and leave everything as defaults and select Continue in EventBridge Scheduler
- For Specify schedule detail step, under Schedule pattern section select Recurring schedule and select Rate-based schedule. For Rate expression enter 15 minutes and turn off Flexible time window. Leave defaults for Timeframe section and select Next
- For Select target step, choose AWS Lambda as Target API
- For Invoke section, choose emptyLogStreamDeleter function that we created earlier and select Next
- Leave defaults in Settings step and choose Next
- Review the steps and select Create Schedule
And that’s it, you’re done! Your empty log streams will now be deleted after the set retention date for the log group expires.
Please note some limitations of the solution:
- Function timeout is set as 5 minutes, depending on log streams to be deleted on every run, function timeout and EventBridge scheduler rate needs to be adjusted
- No action will be taken on those log groups that have infinite retention setup
As AWS Lambda has a free tier of one million free requests and 400,000 GB-seconds of compute time per month and Amazon CloudWatch has a 5GB/month free tier, pricing for this solution is free for most of the customers not exceeding free tier limits and for others, this solution will cost less than a dollar depending on the compute seconds lambda spends in deleting the log streams.