Why is my Kinesis data stream throttling?

Last updated: 2020-03-27

Why is my Amazon Kinesis data stream throttling?

Short Description

Even though your metrics are within Kinesis data stream quotas, your stream can throttle for the following reasons:

  • WriteThroughputExceeded and Rate Exceeded errors.
  • Enhanced Kinesis stream monitoring is disabled.
  • Hidden micro spikes in the CloudWatch metrics.
  • CloudWatch metrics report only on successful operations and exclude any failed operations.

Resolution

Your Amazon Kinesis data stream can throttle for the following reasons:

WriteThroughputExceeded and Rate Exceeded errors

Your stream can produce WriteThroughputExceeded and Rate Exceeded throttling errors, which are caused by the following:

  • The number of records written to the Kinesis data stream exceeds the stream quotas.
  • The size of the records (including the partition keys) exceeds 1 MB.
  • The total throughput in bytes exceeds the Kinesis stream limits.
  • The producer is making too many rapid requests to write to the stream, usually indicated with an error stipulating "Slow down" or "Rate exceeded".

Enhanced Kinesis stream monitoring is disabled

Some shards in your Kinesis data stream might receive more records than others. This can lead to throttling errors in the stream, resulting in overworked shards, also known as hot shards. Hot shards indicate that the partition key being used in PUT operations isn't equally distributing the records across the shards in a Kinesis stream.

Hot shards can be hidden in the metrics if the disable-enhanced-monitoring feature is on. This is because the stream level metrics are an aggregated value across all the shards present. When enhanced monitoring of streams is disabled, you can't examine any streams on an individual basis. To examine streams on a per shard basis, use the enable-enhanced-monitoring command.

Hidden micro spikes in the CloudWatch metrics

Kinesis stream quotas are determined per shard and per second. When values aggregate over the 60 second time limit in CloudWatch, a micro spike that breaches the Kinesis time quotas can occur. Also, the overall number of records within a minute can seem low because metrics are aggregated for 60 seconds. However, a record at any particular second within the minute actually represents a higher number. The traffic might show that it is below stream quotas, but the shard throttled within that particular second appears as throttling on the stream.

CloudWatch metrics only report on successful operations and exclude failed operations

Kinesis metrics record only successful operations on the stream. Therefore, operations that are throttled might not be ingested into the Kinesis data stream. This can result in a breach of stream limits and throttling without any metric indications.

When there are failed records that aren't able to enter the Kinesis data stream, the stream throttles. If there is a retry mechanic in the producer, failed records are tried again. This can also cause a delay in processing.

To check if there are too many records being sent to the Kinesis data stream, calculate the total number of records sent. You can do this by adding the total number of incoming records to the number of throttled records. This gives you an overview of the number of incoming records during throttling events.


Did this article help you?

Anything we could improve?


Need more help?