Managing and monitoring API throttling in your workloads

When you’re architecting for the cloud, you need to keep API throttling in mind, particularly the types of calls and the frequency with which they are called. When the allotted rate limit for an API call is exceeded, you’ll receive an error response and the call will be throttled. Excessive API throttling can result in job failure, delays, and operational inefficiencies that ultimately cost your organization time and money.

There’s a difference between API throttling and Service Quotas (formerly referred to as service limits). AWS maintains Service Quotas for each account (mostly on a per-Region basis) to help guarantee the availability of AWS resources, prevent accidental provisioning of more resources than needed, and protect against malicious actions that might increase your bill. Some Service Quotas are raised automatically as you use AWS, but most AWS services require that you request a quota increase.

In this blog post, we’re going to discuss several strategies for managing API throttling in your cloud workloads and provide guidance for creating dashboards for API metrics that are available through Amazon CloudWatch.

Managing API throttling events

API rate limits serve two primary purposes:

To protect the performance and availability of the underlying service while ensuring access for all AWS customers.
To protect the customer from malicious code or misconfigurations that can result in unexpected charges.

Retry logic

If an API rate limit is exceeded, you will often receive a RequestLimitExceeded or ThrottlingException response and the API call will be throttled. To avoid impact to your workloads, you should proactively implement retry techniques. The following techniques increase the reliability of your application and can reduce operational costs for your organization.

Retry: AWS SDK implements automatic retry logic. You can configure the retry settings using the ClientConfiguration class in the AWS SDK for Java. If you’re not using an AWS SDK, you should retry original requests that receive server (5xx) or ThrottlingException. Some services such as Amazon Elastic Container Service (Amazon ECS) return 4xx status codes. For example: Rate exceeded (Service: AmazonECS; Status Code: 400; Error Code: ThrottlingException; Request ID: a30edf16-2220-4cc8-9e26-7b8b8ea2c52d; Proxy: null)Every API call should implement retry, not just in the CI/CD pipeline, but also in the application code where API calls are made.
Exponential backoff: In addition to simple retries, each AWS SDK implements an exponential backoff algorithm for better flow control. The idea behind exponential backoff is to use progressively longer waits between retries for consecutive error responses. Exponential backoff can lead to very long backoff times, because exponential functions grow quickly. You should implement a maximum delay interval and a maximum number of retries. The maximum delay interval and maximum number of retries are not necessarily fixed values. They should be set based on the operation being performed and other local factors, including network latency.
Jitter: Retries can be ineffective if all clients retry at the same time. To avoid this problem, we employ jitter, a random amount of time before making or retrying a request to help prevent large bursts by spreading out the arrival rate. Most exponential backoff algorithms use jitter to prevent successive collisions.

For more information, including examples, see:

Retries in the Boto3 documentation
Error retries and exponential backoff in AWS in the AWS General Reference Guide
Timeouts, retries, and backoff with jitter in the Amazon Builder’s Library
Exponential backoff and jitter blog post
Request throttling for the Amazon EC2 API in the Amazon EC2 API Reference
Examples of backoff and jitter using Python and Boto3

Multi-account strategy

API rate limits are configured for each AWS account on a per-Region basis. Implementing a multi-account strategy can help to spread workloads across multiple accounts. This provides independent Service Quotas for each account and Region, which means there is less contention for the same set of API rate limits. Another benefit of this approach is the smaller blast radius provided by the account boundary. If a workload in one account is negatively impacted by throttling, there is no negative effect on other workloads that are running in separate accounts. Tracking current API rate limits and quotas in production accounts can be useful when designing and configuring disaster recovery regions. This allows you to proactively make limit requests in those DR regions before deployment or activation.

For more information, see the Best Practices for Organizational Units with AWS Organizations blog post.

Monitoring, alerting, and troubleshooting

Understanding where and when throttling is occurring and impacting workloads is critical to identifying areas for improved retry logic or making informed rate limit increase requests through Service Quotas or the Support Center. You can use Amazon CloudWatch and AWS CloudTrail to create dashboards and alerts for this purpose.

Usage metrics

CloudWatch continuously monitors AWS control plane activities to generate API usage metrics. You can find API usage metrics organized by AWS service in the CloudWatch console, and search and discover usage metrics from thousands available in the AWS/Usage namespace.

Selecting key API usage metrics for your workloads and creating graphs of the metric data enables trend analysis over time. Visibility into long term trends and patterns can be useful to identify areas for further optimization as well as insight into proactive rate limit requirements before moving workloads into production. For services that publish their API rate limits, CloudWatch Alarms can be configured to perform one or more actions based on a breached threshold over a number of time periods. The action can be sending a notification to an Amazon SNS topic, performing an Amazon EC2 action or an Auto Scaling action, or creating an OpsItem or incident in Systems Manager.

Defining appropriate alarm thresholds can be a time-consuming process. Set thresholds too high and alerts/responses will be late. Set them too low and you’ll have an increase in the number of false alarms. Amazon CloudWatch Anomaly Detection simplifies this process by analyzing historical values for the chosen usage metric, and looking for predictable patterns that repeat hourly, daily, or weekly. It then creates a best-fit model that will help you to better predict the future, and to more cleanly differentiate normal and anomalous behavior. Using Anomaly Detection, you can more easily alarm on and investigate further when the metric value exceeds the expected values, as shown below.

The metrics console showing anomaly detection enabled

Figure 1: Amazon CloudWatch Anomaly Detection

Usage metrics integrated with Service Quotas

As of this writing, Amazon CloudWatch includes usage metrics and Service Quotas integration for several services, with more to follow. With these metrics, you can monitor and alert on the following AWS services:

AWS CloudHSM
Amazon CloudWatch
Amazon DynamoDB
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon Elastic Container Registry (Amazon ECR)
AWS Fargate
AWS Fault Injection Simulator
AWS Interactive Video Service
AWS Key Management Service (AWS KMS)
Amazon Kinesis Data Firehose
AWS RoboMaker

With this integration you can configure alarms that alert you when your usage approaches a Service Quota. Proactively manage your quotas via the Service Quotas console or an API call. For more information, see Introducing Service Quotas: View and manage your quotas for AWS services from one central location.

The Usage Metrics vs. Quotas page in the CloudWatch console displays usage metrics for dashboard, alarms, and more. CloudWatch collects the metrics that track the usage of some AWS resources. These metrics correspond to AWS service quotas.

Figure 2: Usage Metrics vs. Quotas

To create an API usage dashboard widget example

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
In the navigation pane, choose All metrics.
From the AWS Namespaces section, choose Usage, and then choose By AWS Resource. This section will contain resources in your account that support usage metrics. The list will vary based on what is currently deployed.
Choose the resource or API that you would like to monitor usage for.
On the Graphed metrics tab, from Math expression, choose Start with empty expression. Use the following values:
- Id: m1
  1. Label: Call Count
  2. Period: 1 Minute
- Id: e1
  1. Label: Service Quota
  2. Details: SERVICE_QUOTA(m1)

The table on the Graphed metrics tab displays the math expressions and usage metrics to visualize call count vs. current service quota.

Figure 3: Graphed metrics tab

From Actions, choose Add to dashboard.
Select a dashboard or create one.
For the widget time, use the default option (Line).
Give the widget title a descriptive name (for example, <API_name> Usage vs. Quota).
Choose Add to dashboard.

To create an alarm based on API usage example

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
In the navigation pane, choose All alarms and Create alarm.
From Specify metric and conditions choose Select metric.
From Metrics, choose Usage, and then choose By AWS Resource. This section will contain resources in your account that support usage metrics. The list will vary based on what is currently deployed.
Choose the resource or API that you would like to monitor usage for and then View graphed metrics.
On the Graphed metrics tab from Add math expression, choose Start with an empty expression. Input “m1/SERVICE_QUOTA(m1)*100 for the math expression and Apply.
Use the following values:
- Id: m1
  1. Label: Call Count
  2. Period: 1 Minute
- Id: e1
  1. Label: Service Quota
  2. Details: m1/SERVICE_QUOTA(m1)*100

Make sure that only the e1 expression is selected for the alarm as shown below and choose Select Metric.

The table on the Graphed metrics tab displays the math expressions and usage metrics that will be used to trigger alarm actions.

Figure 4: Graphed metrics tab

From Conditions define a threshold value that you want to receive alerts for. In this example we’re using .8 or 80% of the current Service Quota. Choose
- Threshold type: Static
- Whenever Service Quota Usage is: Greater
- than: 0.8
Configure the action that you would like to take when the alarm state is triggered (typically a notification via an SNS topic) and choose Next.
Give the alarm an appropriate name and description (for example, <API_name> Usage High).
Preview the configuration and choose Create alarm.

For sample dashboards, see:

For more information, see Monitoring API requests using Amazon CloudWatch and Amazon EC2 actions in the Amazon EC2 API Reference. Stay up to date here on the latest services that integrate their usage metrics with Service Quotas.

EC2 API metrics

You can use CloudWatch to monitor EC2 API requests. To enable this optional feature, contact AWS Support. Amazon EC2 detailed monitoring pricing is based on the number of custom metrics, with no API charge for sending metrics. The number of metrics sent by an EC2 instance as part of EC2 detailed monitoring depends on the instance type. For more information, see Instance metrics in the Amazon EC2 User Guide for Linux Instances.

There is a cost associated with EC2 detailed monitoring with different price tiering options available. As with all custom metrics, EC2 detailed monitoring is prorated by the hour and metered only when the instance sends metrics to CloudWatch. For current costs and practices, see the CloudWatch pricing page.

After they are enabled, the EC2 API metrics are contained in the AWS/EC2/API namespace. The following metrics are provided:

ClientErrors
RequestLimitExceeded
ServerErrors
SuccessfulCalls

Tracking the RequestLimitExceeded and SuccessfulCalls metrics over time can provide valuable insight into your typical usage. You can use them to set alarms for when RequestLimitExceeded counts exceed predetermined thresholds. The EC2 metric data can be filtered across all EC2 API actions to provide a more granular view.

The AWS-EC2-API-metrics dashboard displays All EC2 API Actions widgets and DescribeInstances widgets.

Figure 5: AWS-EC2-API-Metrics

For a sample dashboard, see:

AWS-EC2-API-Metrics.txt

For more information, see Monitoring API requests using Amazon CloudWatch and Amazon EC2 actions in the Amazon EC2 API Reference.

CloudTrail analysis to determine top API calls and throttling

Use CloudTrail analysis for deeper insight into other API activity in your accounts. Configure the trail to send events to a CloudWatch log group and utilize Contributor Insights to identify the source of operational issues due to API throttling.

These sample graphs show the top 10 API calls in an account and the resource ARNs responsible for the highest number of RequestLimitExceeded events. Often you find a throttled API call is one of the top entries in the first graph that is monitoring the top 10 API calls. The two graphs together provide clear insight into API call activity in your account, including those calls that can negatively impact your applications.

The visualization of TopAPICallsOnly contributor rule displays a graph and a table that shows the sum of data points (in this example, 94) and event name and AWS Region.

Figure 6: TopAPICallsOnly

The visualization of the RequestLimitExceeded contributor rule displays a graph and a table that shows the sum of data points (in this example, 94) and event name and user ARN.

Figure 7: RequestLimitExceeded

For more information, see the Analyzing AWS CloudTrail in Amazon CloudWatch blog post.

Conclusion

In this post, we shared methods for monitoring and managing your API and resource usage in AWS and methods to employ to avoid excessive throttling in your accounts. As more services make their usage metrics available in Service Quotas and CloudWatch, you can use these visualization and alerting practices to identify potential issues and make quota increase requests proactively.

AWS Cloud Operations & Migrations Blog