How do I change the number of open shards in Kinesis Data Streams?

Last updated: 2020-06-05

I'm trying to change the number of open shards in Amazon Kinesis Data Streams. How can I do that and what do I do after resharding?

Short Description

A shard can be in an OPEN, CLOSED, or EXPIRED state. When a shard is in the OPEN state, you can add and retrieve data records from the shard.

To change the number of open shards in Kinesis Data Streams, do one of the following:

  1. Update the number of total shards. This changes the number of shards in the stream.
  2. Split a single shard.
  3. Merge two shards into one shard.

Note: If you change the number of open shards in your Amazon Kinesis data stream, consider its impact on parent shards and hash key range values.

Resolution

Update the number of total shards

You can update the number of open shards using the Kinesis console or API operations. When you use the AWS Management Console, Data Streams reshards your stream by using the UpdateShardCount API.

  • This API operation updates the shard count of the specified stream to the specified number of shards (by splitting or merging the individual shards in the background). It is a stream-level API call.
  • It only supports uniform scaling, which creates shards of equal percentage for the hash key range.
  • The UpdateShardCount operation readjusts the number of shards to a specific target value that is of equal sizes.

Split a single shard

Consider splitting a shard into two shards if you encounter a "hot" or "cold" shard.

Note: You can only split or merge shards through API operations.

Shards that receive more data than expected are known as “hot” shards. You can then selectively split the hot shards by using the SplitShard API to increase capacity for the hash keys that target those shards. For more information about managing hot shards, see Strategies for resharding.

You can also monitor the Amazon Kinesis Data Streams Service with Amazon CloudWatch. Use CloudWatch metrics to determine "hot" or "cold" shards by enabling shard-level metrics, such as IncomingRecords and IncomingBytes.

Merge two shards into one shard

Consider merging two shards into one if you encounter a "hot" or "cold" shard.

Note: You can only split or merge shards through API operations.

Shards that receive much less data than expected are known as “cold” shards. You can then merge cold shards by using MergeShards API to use their full capacity. This is a shard-level API call. Note that you can only merge two adjacent shards, where the union of their hash key ranges forms a contiguous set without any gaps.

You can also monitor the Amazon Kinesis Data Streams Service with Amazon CloudWatch. Use CloudWatch metrics to determine "hot" or "cold" shards by enabling shard-level metrics, such as IncomingRecords and IncomingBytes.

Additional considerations

The shard or pair of shards that the resharding operation acts on are known as parent shards. The shards that are created after the resharding operation are known as child shards. A parent shard also transitions from an OPEN state to a CLOSED state (and eventually, to an EXPIRED state, after the stream’s retention period). This can result in child shards being assigned an OPEN state. For more information about the parent shard transitions, see Data routing, data persistence, and shard state after a reshard.

After resharding, you should also continue to read data from the CLOSED shards until they are exhausted. This helps to preserve the order of the data read by the consumer applications. After you have exhausted all of the CLOSED shards, you can begin to read data from open child shards. The Amazon Kinesis client library (KCL) is designed to adapt to resharding operations. Any data that existed in the shards before resharding is processed first. For more information about resharding operations, see Resharding, scaling, and parallel processing.

If you change the number of open shards, it can also result in changes to the hash key ranges for some (or all) shards. A hash key range is the range of possible hash key values for a shard. It is a set of ordered contiguous positive integers. The range consists of starting and ending key values. For example, if you create a Kinesis data stream with five open shards, then the hash key range is divided equally by 20%. Therefore, all the shards have 20% as the hash keyspace value.

If you perform resharding operations on the Kinesis data stream, the hash key range is divided among the child shards like this:

  • Using the console or UpdateShardCount API: If you change the number of open shards from 5 to 10, then the resulting child shards now have a hash keyspace of 10%. The hash key range is divided equally into all of the open shards with a value of 10%.
  • Using SplitShard API: Split the last shard (Shard-5) into two shards (Shard-6 and Shard-7). Before splitting a shard, the parent shard (Shard-5) had a hash keyspace of 20%. After splitting, the resultant child shards (Shard-6 and Shard-7) now have a hash keyspace of 10%. The hash key range for the parent shards is divided equally into both of the child shards with a value of 10%. Therefore, the hash keyspace is split 20%-20%-20%-20%-10%-10%. 
  • Using MergeShards API: Merge the last two shards (Shard-4 and Shard-5) into one shard (Shard-6). The parent shards (Shard-4 and Shard-5) have a hash keyspace of 20%. After merging, the resulting child shard (Shard-6) now has a hash keyspace of 40%. The hash key range of the parent shards is added to both of the child shards to equal 40%. Therefore, the hash keyspace is split 20%-20%-20%-40%.

Did this article help you?

Anything we could improve?


Need more help?