Why are my Kinesis Data Streams throttling?

5 分的閱讀內容

I want to know why my Amazon Kinesis Data Streams are throttling.

Short description

Even though your metrics are within Kinesis Data Streams quotas, your stream can throttle for the following reasons:

You receive either the ProvisionedThroughputExceededException or LimitExceededException error code. For more information, see API limits.
Enhanced Kinesis stream monitoring is turned off for your streams.
Hidden micro spikes are in the Amazon CloudWatch metrics.
Your CloudWatch metrics report only successful operations and exclude failed operations.

Resolution

You receive either the ProvisionedThroughputExceededException or LimitExceededException error code

When you use the GetRecords or PutRecords API, your stream can produce ProvisionedThroughputExceededException or LimitExceededException throttling errors.

The following scenarios can also cause these errors:

The number of records written to the Kinesis stream exceeds the stream quotas.
The size of the records, including the partition keys, exceeds 1 MB.
The total throughput in bytes exceeds the Kinesis stream limits.
The producer makes too many rapid requests to write to the stream. This issue is usually indicated with a "Slow down" or "Rate exceeded" error.

You can also take the value from a one-minute data point and divide it by 60. This gives you an average value per second to help determine if throttling occurs within the specific time period. If the successful count doesn't exceed the quota, then add the IncomingRecords metric to the WriteProvisionedThroughputExceeded metric, and retry the calculation. The IncomingRecords metric signals successful or accepted records. The WriteProvisionedThroughputExceeded metric indicates how many records were throttled.

Note: Check the sizes and number of records that are sent from the producer. If the combined total of the incoming and throttled records is greater than the stream quota, then change the size or number of records.

The PutRecord or PutRecord.Success metric also indicates operations that fail. When the success metric dips, review the data producer logs to find the root causes of the failures. If throttling occurs, then establish logging on the data producer side to determine the total number and size of submitted records. If the total number of records in the PutRecord.Success or PutRecords.Success metrics exceed the stream quota, then your Kinesis stream throttles.

Enhanced Kinesis stream monitoring is turned off for your streams

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

Some shards in your Kinesis stream might receive more records than others. When the distribution is uneven, throttling errors can occur and result in hot shards. Hot shards indicate that the partition key that's used in PUT operations doesn't equally distribute the records across the shards in a Kinesis stream.

If you turn off enhanced monitoring, then hot shards can be hidden in the metrics because stream-level metrics are aggregated across all shards. When you turn off enhanced stream monitoring, you can't individually examine any streams. To examine streams on a per shard basis, run the enable-enhanced-monitoring command.

You can also compare the IncomingBytes average and maximum values to verify whether hot shards are in your stream. When you turn on enhanced monitoring, you can also see the shards that deviate from the average. For more information, see Strategies for resharding.

Use random partition keys

If hot shards are in your Kinesis stream, then use a random partition key to group your records. If the operations already use a random partition key, then adjust the key to correct the distribution. Then, monitor the key for changes in metrics, such as IncomingBytes and IncomingRecords. If the maximum and average patterns are close, then there are no hot shards.

Hidden micro spikes are in the CloudWatch metrics

To identify micro spikes or metrics that exceed stream quotas, log full records or custom code to count the streams and check record size. Then, evaluate the number and size of records that are sent to the Kinesis stream to identify any spikes that exceed data quotas.

Kinesis Data Streams quotas are determined per shard and per second. When values aggregate over the 60 second time limit in CloudWatch, a micro spike that exceeds the Kinesis Data Streams quota can occur. Also, the overall number of records within a minute can seem low because metrics are aggregated for 60 seconds. However, a record at any second within the minute represents a higher number. The traffic might show that it's below stream quotas, but the shard throttled within that second appears as throttling on the stream.

If the CloudWatch metrics don't indicate a quota exception or micro spikes in the data, then complete the following tasks:

Increase the number of shards, and then split the size of the log records.
Scale your Kinesis streams to match the producer output.
Use an exponential backoff and retry behavior in the producer logic.
Change the configuration settings for your producer so that your write rate decreases.
Limit the request rate of the producer and the number of records that are sent per second to match the capacity of the stream.

Your CloudWatch metrics report only successful operations and exclude failed operations

Kinesis metrics record only successful operations on the stream. The Kinesis stream might not incorporate throttled operations. The stream quota is then exceeded, but the metrics don't indicate the exception.

When failed records can't enter the Kinesis stream, the stream throttles. If a retry behavior is in the producer, then failed records are tried again. The process can then get delayed.

To check if too many records are being sent to the Kinesis stream, add the total number of incoming records to the number of throttled records.