Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

Amazon Kinesis Data Streams is a serverless data streaming service that makes it straightforward to capture and store streaming data at any scale. Kinesis Data Streams not only offers the flexibility to use many out-of-box integrations to process the data published to the streams, but also provides the capability to build custom stream processing applications that can be deployed on your compute fleet.

When building custom stream processing applications, developers typically face challenges with managing distributed computing at scale that is required to process high throughput data in real time. This is where Kinesis Client Library (KCL) comes in. Thousands of AWS customers use KCL to operate custom stream processing applications with Kinesis Data Streams without worrying about the complexities of distributed systems. KCL uses Kinesis Data Streams APIs to read data from the streams and handles the heavy lifting of balancing stream processing across multiple workers, managing failovers, and checkpointing processed records. By abstracting away these concerns, KCL allows developers to focus on what matters most—implementing their core business logic for processing streaming data.

As applications process more and more data over time, customers are looking to reduce the compute costs for their stream processing applications. We are excited to launch Kinesis Client Library 3.0, which enables you to reduce your stream processing cost by up to 33% compared to previous KCL versions. KCL 3.0 achieves this with a new load balancing algorithm that continuously monitors the resource utilization of workers and redistributes the load evenly to all workers. This allows you to process the same data with fewer compute resources.

In this post, we discuss load balancing challenges in stream processing using a sample workload, demonstrating how uneven load distribution across workers increases processing costs. We then show how KCL 3.0 addresses this challenge to reduce compute costs, and walk you through how to effortlessly upgrade from KCL 2.x to 3.0. Additionally, we cover additional benefits that KCL 3.0 provides. This includes using the AWS SDK for Java 2.x and removing the dependency on the AWS SDK for Java v1.x. Lastly, we provide a key checklist as you prepare to upgrade your stream processing application to use KCL 3.0.

Load balancing challenges with operating custom stream processing applications

Customers processing real-time data streams typically use multiple compute hosts such as Amazon Elastic Compute Cloud (Amazon EC2) to handle the high throughput in parallel. In many cases, data streams contain records that must be processed by the same worker. For example, a trucking company might use multiple EC2 instances, each running one worker, to process streaming data with real-time location coordinates published from thousands of vehicles. To accurately keep track of routes of vehicles, each truck’s location needs to be processed by the same worker. For such applications, customers specify the vehicle ID as a partition key for every record published to the data stream. Kinesis Data Streams writes data records belonging to the same partition key to a single shard (the base throughput unit of Kinesis Data Streams) so that they can be processed in order.

However, data in the stream is often unevenly distributed across shards due to varying traffic associated with partition keys. For instance, some vehicles may send more frequent location updates when operational, whereas others send less frequent updates when idle. With previous KCL versions, each worker in the stream processing application processed an equal number of shards in parallel. As a result, workers processing data-heavy shards might reach their data processing limits, whereas those handling lighter shards remain underutilized. This workload imbalance presents a challenge for customers seeking to optimize their resource utilization and stream processing efficiency.

Let’s look at a sample workload with uneven traffic across shards in the stream to elaborate how this leads to uneven utilization of the compute fleet with KCL 2.6, and why it results in higher costs.

In the sample workload, the producer application publishes 2.5MBps of data across four shards. However, two shards receive 1MBps each and the other two receive 0.25MBps based on the traffic pattern associated with partition keys. In our trucking company example, you can think of it as two shards storing data from actively operating vehicles and the other two shards storing data from idle vehicles. We used three EC2 instances, each running one worker, to process this data with KCL 2.6 for this sample workload.

Initially, the load was distributed across three workers with the CPU utilizations of 50%, 50%, and 25%, averaging 42% (as shown in the following figure in the 12:18–12:29 timeframe). Because the EC2 fleet is under-utilized, we removed one EC2 instance (worker) from the fleet to operate with two workers for better cost-efficiency. However, after we removed the worker (red vertical dotted line in the following figure), the CPU utilization of one EC2 instance went up to almost 100%.

This occurs because KCL 2.6 and earlier versions distribute the load to make sure each worker processes the same number of shards, regardless of throughput or CPU utilization of workers. In this scenario, one worker processed two high-throughput shards, reaching 100% CPU utilization, and another worker handled two low-throughput shards, operating at only 25% CPU utilization.

Due to this CPU utilization imbalance, the worker compute fleet can’t be scaled down because it can lead to processing delays due to over-utilization of some workers. Even though the entire fleet is under-utilized in aggregate, uneven distribution of the load prevents us from downsizing the fleet. This increases compute costs of the stream processing application.

Next, we explore how KCL 3.0 addresses these load balancing challenges.

Load balancing improvements with KCL 3.0

KCL 3.0 introduces a new load balancing algorithm that monitors CPU utilization of KCL workers and rebalances the stream processing load. When it detects a worker approaching data processing limits or high variance in CPU utilization across workers, it redistributes the load from over-utilized to underutilized workers. This balances the stream processing load across all workers. As a result, you can avoid over-provisioning of capacity due to imbalanced CPU utilization among workers and save costs by right-sizing your compute capacity.

The following figure shows the result for KCL 3.0 with the same simulation settings we had with KCL 2.6.

With three workers, KCL 3.0 initially distributed the load similarly to KCL 2.6, resulting in 42% average CPU utilization (20:35–20:55 timeframe). However, when we removed one worker (marked with the red vertical dotted line), KCL 3.0 rebalanced the load from one worker to other two workers considering the throughput variability in shards, not just equally distributing shards based on the number of shards. As a result, two workers ended up running at about 65% CPU utilization, allowing us to safely scaling down the compute capacity without any performance risk.

In this scenario, we were able to reduce the compute fleet size from three workers to two workers, resulting in 33% reduction in compute costs compared to KCL 2.6. Although this is a sample workload, imagine the potential savings you can achieve when streaming gigabytes of data per second with hundreds of EC2 instances processing them! You can realize the same cost saving benefit for your KCL 3.0 applications deployed in containerized environments such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Fargate, or your own self-managed Kubernetes clusters.

Other benefits in KCL 3.0

In addition to the stream processing cost savings, KCL 3.0 offers several other benefits:

Amazon DynamoDB read capacity unit (RCU) reduction – KCL 3.0 reduces the Amazon DynamoDB cost associated with KCL by optimizing read operations on the DynamoDB table storing metadata. KCL uses DynamoDB to store metadata such as shard-worker mapping and checkpoints.
Graceful handoff of shards from one worker to another – KCL 3.0 minimizes reprocessing of data when the shard processed by one worker is handed over to another worker during the rebalancing or during deployments. It allows the current worker to complete checkpointing the records that it has processed and the new worker taking over the work from the previous worker to pick up from the latest checkpoint.
Removal of the AWS SDK for Java 1.x dependency – KCL 3.0 has completely removed the dependency on the AWS SDK for Java 1.x, aligning with the AWS recommendation to use the latest SDK versions. This change improves overall performance, security, and maintainability of KCL applications. For details regarding AWS SDK for Java 2.x benefits, refer to Use features of the AWS SDK for Java 2.x.

Migrating to KCL 3.0

You may now be wondering how to migrate to KCL 3.0 and what code changes you’ll need to make to take advantage of its benefits. If you’re currently on KCL 2.x version, you don’t have to make any changes to your application code! Complete the following steps to migrate to KCL 3.0:

Update your Maven (or build environment) dependency to KCL 3.0.
Set the clientVersionConfig to CLIENT_VERSION_CONFIG_COMPATIBLE_WITH_2X.
Build and deploy your code.

After all KCL workers are updated, KCL 3.0 automatically starts running the new load balancing algorithm to achieve even utilization of the workers. For detailed migration instructions, see Migrating from previous KCL versions.

Key checklists when you choose to use KCL 3.0

We recommend checking the following when you decide to use KCL 3.0 for your stream processing application:

Make sure you added proper permissions required for KCL 3.0. KCL 3.0 creates and manages two new metadata tables (worker metrics table, coordinator state table) and a global secondary index on the lease table in DynamoDB. See IAM permissions required for KCL consumer applications for detailed permission settings you need to add.
The new load balancing algorithm introduced in KCL 3.0 aims to achieve even CPU utilizations across workers, not an equal number of leases per worker. Setting the maxLeasesForWorker configuration too low may limit the KCL’s ability to balance the workload effectively. If you use the maxLeasesForWorker configuration, consider increasing its value to allow for optimal load distribution.
If you use automatic scaling for your KCL application, it’s important to review your scaling policy after upgrading to KCL 3.0. Specifically, if you’re using average CPU utilization as a scaling threshold, you should reassess this value. If you’re conservatively using a higher-than-needed threshold value to make sure your stream processing application won’t have some workers running hot due to the imbalanced load balancing, you might be able to adjust this now. KCL 3.0 introduces improved load balancing, which results in more evenly distributed workloads across workers. After deploying KCL 3.0, monitor your workers’ CPU utilization and see if you can lower your scaling threshold to optimize your resource usage and costs while maintaining performance. This step makes sure you’re taking full advantage of KCL 3.0’s enhanced load balancing capabilities.
To gracefully hand off leases, make sure you have implemented a checkpointing logic inside your shutdownRequested() method in the RecordProcessor class. Refer to Step 4 of Migrating from KCL 2.x to KCL 3.x for details.

Conclusion

The release of KCL 3.0 introduces significant enhancements that can help optimize the cost-efficiency and performance of KCL applications. The new load balancing algorithm enables more even CPU utilization across worker instances, potentially allowing for right-sized and more cost-effective stream processing fleets. By following the key checklists, you can take full advantage of KCL 3.0’s features to build efficient, reliable, and cost-optimized stream processing applications with Kinesis Data Streams.

About the Authors

Minu Hong is a Senior Product Manager for Amazon Kinesis Data Streams at AWS. He is passionate about understanding customer challenges around streaming data and developing optimized solutions for them. Outside of work, Minu enjoys traveling, playing tennis, skiing, and cooking.

Pratik Patel is a Senior Technical Account Manager and streaming analytics specialist. He works with AWS customers and provides ongoing support and technical guidance to help plan and build solutions using best practices and proactively helps in keeping customers’ AWS environments operationally healthy.

Priyanka Chaudhary is a Senior Solutions Architect and data analytics specialist. She works with AWS customers as their trusted advisor, providing technical guidance and support in building Well-Architected, innovative industry solutions.

AWS Big Data Blog