How do I change the number of open shards in Kinesis Data Streams?

5 minute read
0

I want to change the number of open shards in Amazon Kinesis Data Streams and know what to do after I reshard.

Short Description

A shard can be in an OPEN, CLOSED, or EXPIRED state. When a shard is in the OPEN state, you can add and retrieve data records from the shard.

To change the number of open shards in Kinesis Data Streams, complete one of the following tasks:

  • Update the number of total shards. This action changes the number of shards in the stream.
  • Split a single shard.
  • Merge two shards into one shard.

Note: If you change the number of open shards in your Kinesis data stream, then that change affects the parent shards and hash key range values.

Resolution

Update the number of total shards

Use the Kinesis console or API operations to update the number of open shards. When you use the AWS Management Console, Data Streams uses the UpdateShardCount API to reshard your streams. Review the following points about the UpdateShardCount API:

  • The API splits or merges individual shards in the background to update the shard count of the specified stream to the specified number of shards. UpdateShardCount is a stream-level API call.
  • UpdateShardCount supports only uniform scaling. Uniform scaling creates shards of equal percentage for the hash key range.
  • The UpdateShardCount operation readjusts the number of shards to a specific target value that's of equal sizes.

Split a single shard

You can split a "hot" or "cold" shard into two shards

Note: You can split or merge shards only through API operations.

Shards that receive more data than expected are called "hot" shards. Use the SplitShard API to selectively split the hot shards to increase capacity for the hash keys that target those shards. For more information about managing hot shards, see Strategies for resharding.

You can also monitor the Amazon Kinesis Data Streams Service with Amazon CloudWatch. To use CloudWatch metrics to determine "hot" or "cold" shards, turn on shard-level metrics, such as IncomingRecords and IncomingBytes.

Merge two shards into one shard

You can split a "hot" or "cold" shard into two shards.

Note: You can split or merge shards only through API operations.

Shards that receive much less data than expected are called "cold" shards. Use the MergeShards API to merge cold shards to use their full capacity. This is a shard-level API call. Note that you can merge only two adjacent shards, where the union of their hash key ranges forms a contiguous set without any gaps.

You can also monitor the Amazon Kinesis Data Streams Service with Amazon CloudWatch. Use CloudWatch metrics to determine "hot" or "cold" shards by turning on shard-level metrics, such as IncomingRecords and IncomingBytes.

Additional considerations

The shard or pair of shards that the reshard operation acts on are called parent shards. The shards that are created after the reshard operation are called child shards. A parent shard also transitions from an OPEN state to a CLOSED state and eventually, to an EXPIRED state, after the stream's retention period. This can result in child shards being assigned an OPEN state. For more information about the parent shard transitions, see Data routing, data persistence, and shard state after a reshard.

After the reshard, continue to read data from the CLOSED shards until the CLOSED shards are exhausted. This helps to preserve the order of the data read by the consumer applications. After you exhausted all the CLOSED shards, read data from open child shards. The Amazon Kinesis client library (KCL) is designed to adapt to resharding operations. Data that existed in the shards before the reshard is processed first. For more information about resharding operations, see Resharding, scaling, and parallel processing.

If you change the number of open shards, then changes can also occur to the hash key ranges for some the shards. A hash key range is the range of possible hash key values for a shard a set of ordered contiguous positive integers. The range consists of starting and ending key values. For example, if you create a Kinesis data stream with five open shards, then the stream is divided into 5 equal parts based on the hash key range. Therefore, all the shards have 20% as the hash keyspace value.

As an example, suppose you have a hash key range of X shards, from Shard-1 to Shard-X. You can modify the ranges by further splitting the shards or merging the shards:

Console or UpdateShardCount API

If you change the number of open shards from 5 to 10, then the resulting keyspace of the child shards is 10%. The hash key range is equally divided into all of the open shards with a value of 10%.

SplitShard API

Split the last shard (Shard-5) into two shards (Shard-6 and Shard-7). Before you split a shard, the keyspace of the parent shard is 20%. After you split a shard, the keyspace of the child shards (Shard-6 and Shard-7) is 10%. The hash key range for the parent shards is equally divided into both child shards with a value of 10%. Therefore, the hash keyspace is split 20%-20%-20%-20%-10%-10%.

MergeShards API

Merge the last two shards (Shard-4 and Shard-5) into one shard (Shard-6). The hash keyspace of the parent shards (Shard-4 and Shard-5) is 20%. After you merge a shard, the keyspace of the child shard (Shard-6) is 40%. The hash key range of the parent shards is added to both of the child shards to equal 40%. Therefore, the hash keyspace is split 20%-20%-20%-40%.

AWS OFFICIAL
AWS OFFICIALUpdated 5 months ago