How do I determine throttling in my CloudWatch logs ?
Last updated: 2022-04-06
I received a RequestLimitExceeded or ThrottlingException error when working with Amazon CloudWatch logs, and my API call was throttled. How can I prevent throttling in my CloudWatch logs?
When working with CloudWatch logs, you might exceed the API rate limit. When this happens, you receive a RequestLimitExceeded or ThrottlingException error, and your API call is throttled. You must identify where and when throttling is happening so you can resolve these errors and make informed rate limit increase requests.
Intermittent throttling on CloudWatch logs when accessing logs
You can use the FilterLogEvents or GetLogEvents API calls to list your log events or log streams. These API calls have a hard limit, and don't qualify for a limit increase. This means that if you use the FilterLogEvents API to search for log events from a specified log group, the default quaAPI has a default quota of 5 transactions per second (TPS) per account/Region. If you reach this limit, you receive the RateExceeded error.
Use these best practices to avoid throttling errors in this use case:
- Use a subscription filter to immediately retrieve log data from CloudWatch logs in real time. For more information, see Using CloudWatch Logs subscription filters and Real-time processing of log data with subscriptions.
- Use CloudWatch logs Insights to quickly get log data from CloudWatch logs. You can use queries to filter your logs to view specific log groups.
- Export log data to Amazon Simple Storage Service (Amazon S3) for batch use cases. This method isn't recommended for real-time analysis and processing because log data can take up to 12 hours to become available for export from CloudWatch Logs.
ThrottlingException errors when using an application/script to fetch CloudWatch log data
To collect CloudWatch logs, you can develop a collector script. This script tries to make a DescribeLogStream or GetLogEvents API call to pull data from different log streams or different time frames in the same log group. API calls like FilterLogEvents, GetLogEvents and DescribeLogStreams are designed for human interaction and not for automation, so you receive an error and the API call is throttled.
Use these best practices to avoid throttling in this use case:
- Use exponential backoff and retries when you make an API call. For more information, see Exponential backoff and jitter and Error retries and exponential backoff in AWS.
- Distribute your API calls over time. Try to schedule actions with some randomization so that they are spread over a period of time.
- Add sleep intervals between consecutive API calls. Add some delay between API calls that are sent from the same script or application. If API calls are all sent in rapid succession, this is more likely to cause rate errors.
- In some cases, you might use a SIEM solution like Splunk to fetch logs from CloudWatch. SIEM solutions are used to gather data from multiple systems and analyze this data to detect unusual behavior. You might experience API throttling when you use the Splunk plugin. In order to avoid this issue, create a CloudWatch logs subscription filter with Amazon Kinesis Data Firehose and deliver the log data to Splunk. For more information, see Configure Kinesis inputs for the Splunk Add-on for AWS.
Throttling errors when integrating PutLogEvents API calls with Lambda function
The PutLogEvents API call is used to upload logs to a specified log stream in batches of 1 MB. This API has two rate limits:
- 5 requests per second per log stream. Additional requests are throttled. This quota can't be changed.
- 800 transactions per second, per account, per Region. This applies except for the following Regions where the quota is 1500 transactions per second per account per Region: US East (N. Virginia), US West (Oregon), and Europe (Ireland). You can request a quota increase.
For more information on this, and to request a quota increase, see CloudWatch Logs quotas.
To write logs to the specified log stream, you must include the sequence token in the request. This was last extracted from the response to the previous call. In some cases, you can use the DescribeLogStreams API to get the next sequence token for the log stream before calling PutLogEvent. The PutLogEvent API has a much higher limit than DescribeLogsStream, and this causes throttling. To mitigate this, use the PutLogEvents API to get the sequence instead of DescribeLogStreams. You receive a 400 status code when you use the PutLogEvents API without a sequence token, but with this error message you get the next sequence token. You can use this sequence token again with the PutLogEvents API.
Use these tips to avoid throttling in this use case:
- Try to combine multiple log events in the same API call.
- Spread API calls over more log streams.
- Apply the retry logic with exponential backoff and jitter. For more information, see Managing and monitoring API throttling in your workloads.
- Distribute your API calls evenly over time.
Manage your CloudWatch Logs service quotas
AWS defines quotas for services to protect performance and to be sure of availability. CloudWatch has quotas for metrics, alarms, API request, and alarm email notifications. Use these steps to visualize your service quotas and set alarms if you reach the threshold:
- Open the Service Quotas console.
- In the navigation pane, choose AWS services.
- From the AWS services list, search for Amazon CloudWatch logs.
- In the Service quotas list, you can see the service quota name, the applied value (if its available), AWS default quota, and whether the quota value is adjustable.
- To view more information about a service quota, like the description, choose the quota name.
- After you choose the quota name, you can see more information about this quota. For example, if you choose GetLogEvents throttle limit in transactions per second you see:
- Quota code
- Quota ARN
- Utilization: %
- Applied quota value
- AWS default quota value
- Adjustable: Y/N
- In the Amazon CloudWatch alarms section, choose Create alarm, and enter an Alarm threshold and Alarm name.