Scale Amazon Kinesis Data Streams with AWS Application Auto Scaling
Recently, AWS launched a new feature of AWS Application Auto Scaling that let you define scaling policies that automatically add and remove shards to an Amazon Kinesis Data Stream. For more detailed information about this feature, see the Application Auto Scaling GitHub repository.
As your streaming information increases, you require a scaling solution to accommodate all requests. If you have a decrease in streaming information, you might use scaling to reduce costs. Currently, you scale an Amazon Kinesis Data Stream shard programmatically. Alternatively, you can use the Amazon Kinesis Scaling Utilities. To do so, you can use each utility manually, or automated with an AWS Elastic Beanstalk environment.
With the new feature of Application Auto Scaling, you can use AWS services to create a scaling solution without manual intervention or complex solutions.
Auto scaling solution overview
This blog post shows you how to deploy an auto scaling solution for your Amazon Kinesis Data Streams based on the default Amazon CloudWatch metrics. It also provides an AWS CloudFormation template to set up the environment automatically and the code related to the lambda function.
How the auto scaling solution works
Begin with a CloudWatch alarm that monitors Kinesis Data Stream shard metrics. When a custom threshold of the alarm is reached, for example because the number of requests has grown, the alarm is fired. This firing sends a notification to an Application Auto Scaling policy that responds based on the stated preference, scale up or down.
When the scaling policy is triggered, Application Auto Scaling calls an API operation. The call passes the new number of Kinesis Data Stream shards for the desired capacity (for more information, see here). The call also passes the name of the resource to scale, provided by Amazon API Gateway. Amazon API Gateway invokes an AWS Lambda function. Based on the information sent by Application Auto Scaling, the Lambda function increases or decreases the number of shards in the Kinesis Data Stream. It does so by using Kinesis Data Stream’s UpdateShardCount API operation. The following diagram illustrates the scenario.
As you can see from the diagram, AWS System Manager Parameter Store is also involved. We use Parameter Store to store the desired capacity value that Application Auto Scaling sends to API Gateway to increase or decrease the capacity. (In this scenario, the capacity is the number of shards.) In fact, Application Auto Scaling often invokes API Gateway to get the status of the custom resource, in this case the Kinesis Data Stream. It does so to see if there are actions to be taken and if previous actions were successful. Because Lambda is stateless, we need somewhere to save the desired capacity value communicated by Application Auto Scaling at any point.
This solution uses the following components:
Application Auto Scaling scalable target – A scalable target is a resource registered with the Application Auto Scaling service. The service can scale any defined and registered resources. A scalable target handles the minimum and maximum value for the scalable dimension. It requires the following parameters:
- ResourceId: The resource that is the scalable target. For custom resources, such as in the following example, specify the OutputValue returned from the AWS CloudFormation template.
- RoleARN: The service-linked role used to grant permission to modify scalable target resources.
- ScalableDimension: The dimension of the scalable target. For custom resources, the value must be custom-resource:ResourceType:Property.
- ServiceNamespace: The namespace of the AWS service. In this case, this value is the custom resource.
Scaling policy – After you register a scalable target, you can apply a scaling policy that describes how the service should scale.
The following policy types are supported:
- TargetTrackingScaling — Only for Amazon DynamoDB
- StepScaling — Supported by Amazon ECS, Amazon EC2 Spot Fleets, and Amazon RDS
- TargetTrackingScaling — Supported by Amazon ECS, EC2 Spot Fleets, and Amazon RDS
- StepScaling — Supported by other services
In our scenario, we use a StepScaling policy, because we are using a custom resource type, as discussed later in Scaling policy and scheduled actions section. However, custom resource type can also support scheduled actions.
API Gateway – In our solution, we use Amazon API Gateway to expose a secure REST endpoint. Application Auto Scaling uses this endpoint to send authenticated calls, using IAM, to get the current capacity of the custom service to scale with HTTP GET. Application Auto Scaling also uses this endpoint to adjust the relative capacity of the custom service (with HTTP PATCH).
CloudWatch metrics and alarms – KPI to monitor and trigger an alarm directed to the Application Auto Scaling endpoint.
Lambda function – In our scenario, the AWS Lambda function mainly does two tasks:
- If the API request is GET, the Lambda function returns JSON that includes the information of the status of the custom resource that Application Auto Scaling controls. In this case, this custom resource is the Kinesis Data Stream.
- If the API request is PATCH, the Lambda function stores the new desired capacity in a DynamoDB table. The Lambda function then calls the UpdateShardCount API operation for the Kinesis Data Stream.
AWS System Manager Parameter Store – KPI to monitor and trigger an alarm directed to the Application Auto Scaling endpoint.
Prerequisites for this solution include the following:
- User credentials with permissions that allow you to configure automatic scaling and create the required service-linked role. For more information, see the Application Auto Scaling User Guide.
- Permissions to create a stack using an AWS CloudFormation template, plus full access permissions to resources within the stack. For more information, see the AWS CloudFormation User Guide.
Scaling policy and scheduled actionsseconds
You can use the same architecture to work in two different situations for your Amazon Kinesis Data Stream:
- The first is predictable traffic, which means the scheduled actions. An example of predictable traffic is when your Kinesis Data Stream endpoint sees growing traffic in specific time window. In this case, you can make sure that an Application Auto Scaling scheduled action increases the number of Kinesis Data Stream shards to meet the demand. For instance, you might increase the number of shards at 12:00 p.m. and decrease them at 8:00 p.m.
- The second is the classic on-demand scenario, which specifies the scaling policy. In this case, you create an Application Auto Scaling scaling policy that increases or decreases the number of Kinesis Data Stream shards to meet the client demand.
In this blog post we are going to focus on the seconds scenario with the scaling policy, as we believe it is more challenging to implement.
Application Auto Scaling can scale up and down continuously to make sure that you can meet your demand. However, Kinesis Data Streams have some limitations to consider when configuring Application Auto Scaling. With Kinesis Data Streams, you can’t do the following:
- Scale more than ten times per rolling 24-hour period per stream
- Scale up to more than double your current shard count for a stream
- Scale down below half your current shard count for a stream
- Scale up to more than 10000 shards in a stream
- Scale a stream with more than 10000 shards down unless the result is less than 10000 shards
- Scale up to more than the shard limit for your account
If you need to scale more than once a day, you can use this AWS Support form to request an increase to this limit.
Choosing the metric
When choosing the metrics to monitor to scale up and down, we can use the stream-level metrics IncomingBytes and IncomingRecords, as described in the Kinesis Data Streams documentation. Kinesis supports streaming 1 MiB of data per second or 1000 records per second. We can use IncomingBytes and IncomingRecords to set an alarm based on a threshold, let’s say 80 percent. We do this to call the Application Auto Scaling service before Amazon Kinesis start throttling our requests. This is the most effective method to proactively scale our resource. However, we need to set up the right cooldown period in Application Auto Scaling to avoid multiple scaling actions triggered by both metrics at the same time.
Alternatively, we can use the WriteProvisionedThroughputExceeded metric to scale when we reach the Amazon Kinesis shard limit, as described in the CloudWatch documentation.
In this example, we use the first approach, using IncomingRecords.
Deploying and testing the solution
To test the solution, we can use the AWS CloudFormation template found here. The AWS CloudFormation template automatically creates for you: the API Gateway, the Lambda function, the Kinesis Data Stream, the DynamoDB table, and the Application Auto Scaling group, and its scaling policy.
Deploying the solution
To let AWS CloudFormation create these resources on your behalf:
- Open the AWS Management Console in the AWS Region you want to deploy the solution to, and on the Services menu, choose CloudFormation.
- Choose Create Stack, choose Upload a template to Amazon S3, and then choose the file custom-application-autoscaling-kinesis.yaml included in the solution.
- Give a friendly name to the stack. Specify the Amazon S3 bucket that contains the compressed version of AWS Lambda function (index.py) included in the solution.
- For Options, you can specify tags for your stack and an optional IAM role to be used by AWS CloudFormation to create resources. If the role isn’t specified, a new role is created. You can also perform additional configuration for rollback settings and notification options.
- The review section shows a recap of the information. Be sure to select the two AWS CloudFormation acknowledgements to allow AWS CloudFormation to create resources with custom names on your behalf. Also, create a change set, because the AWS CloudFormation template includes the AWS::Serverless-2016-10-31
- Choose stream level metrics to create the resources present in the stack.
Testing the solution
Now that the environment is created, test it. To manually fire the Amazon CloudWatch alarm, we must generate traffic to the stream. By taking advantage of the Amazon Kinesis Data Generator, this is an efficient way to do it.
- First, it is necessary to follow this guide to set up your Amazon Kinesis Data Generator https://awslabs.github.io/amazon-kinesis-data-generator/web/help.html
- After the generator is created, it is necessary to select the Region and the newly created Kinesis Data Stream, in our case Kinesis-MyKinesisStream-1MUOGAD9OBCJH
- In Records per second insert a value greater than 1000 if you have one shard. Otherwise, multiply this number time the number of shards (for instance, if you have two shards, 1500 * 2 = 3000).
- In the form, enter test, and then choose Send data.
- Now that the traffic is being generated, open the Amazon CloudWatch console, and in Alarms, choose Alarms.
- In the ALARM list, select IncomingRecords-alarm-outOpen the History tab on the bottom of the page to see that the alarm triggered the Application Auto Scaling.
To verify that the number of open shards has been updated:
- Open the Amazon Kinesis console and select Data Streams, then select your Data Stream, in our case Kinesis-MyKinesisStream-1MUOGAD9OBCJH.
- In Details, it is possible to see that the number of shards increased to three, as shown in the following example:
Cleaning up the environment after testing
To clean up the environment after the testing, the procedure is straight-forward. By removing the AWS CloudFormation stack, everything is removed, as follows:
- Open the AWS Management Console in the AWS Region that you want to deploy the solution to, and select the CloudFormation stack from the list.
- Click on Actions and Delete Stack.
- OPTIONALLY: you can delete the S3 bucket and the Lambda function that you created.
This post described how you use Application Auto Scaling service to automatically scale Amazon Kinesis Data Stream. With the help of Amazon API Gateway, you can allow Application Auto Scaling to securely invoke the AWS Lambda function that interacts with the desired stream.
About the Authors
Giorgio Nobile works as Solutions Architect for Amazon Web Services in Italy. He works with enterprise customers and helps them to embrace the digital transformation. Giorgio’s field of expertise covers Big Data. In his free time, Giorgio loves playing with his two children and is addicted to DIY and snowboarding.
Diego Natali works as Solutions Architect for Amazon Web Services in Italy. With several years engineering background, he helps ISV and Start up customers designing flexible and resilient architectures using AWS services. In his spare time he enjoys watching movies and riding his dirt bike.