AWS Big Data Blog
Run Kinesis Agent on Amazon ECS
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more.
Kinesis Agent is a standalone Java software application that offers a straightforward way to collect and send data to Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose. The agent continuously monitors a set of files and sends new data to the desired destination. The agent handles file rotation, checkpointing, and retry upon failures. It delivers all of your data in a reliable, timely, and simple manner. It also emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process.
This post describes the steps to send data from a containerized application to Kinesis Data Firehose using Kinesis Agent. More specifically, we show how to run Kinesis Agent as a sidecar container for an application running in Amazon Elastic Container Service (Amazon ECS). After the data is in Kinesis Data Firehose, it can be sent to any supported destination, such as Amazon Simple Storage Service (Amazon S3).
In order to present the key points required for this setup, we assume that you are familiar with Amazon ECS and working with containers. We also avoid the implementation details and packaging process of our test data generation application, referred to as the producer.
Solution overview
As depicted in the following figure, we configure a Kinesis Agent container as a sidecar that can read files created by the producer container. In this instance, the producer and Kinesis Agent containers share data via a bind mount in Amazon ECS.
Prerequisites
You should satisfy the following prerequisites for the successful completion of this task:
- Familiarity working with containers and Amazon ECS
- Docker Desktop installed
- The AWS Command Line Interface (AWS CLI) installed
- An ECS cluster
- An Amazon Elastic Container Registry (Amazon ECR) repository to store the Kinesis Agent container image
With these prerequisites in place, you can begin next step to package a Kinesis Agent and your desired agent configuration as a container in your local development machine.
Create a Kinesis Agent configuration file
We use the Kinesis Agent configuration file to configure the source and destination, among other data transfer settings. The following code uses the minimal configuration required to read the contents of files matching /var/log/producer/*.log
and publish them to a Kinesis Data Firehose delivery stream called kinesis-agent-demo
:
Create a container image for Kinesis Agent
To deploy Kinesis Agent as a sidecar in Amazon ECS, you first have to package it as a container image. The container must have Kinesis Agent, which
and find
binaries, and the Kinesis Agent configuration file that you prepared earlier. Its entry point must be configured using the start-aws-kinesis-agent
script. This command is installed when you run the yum install aws-kinesis-agent
step. The resulting Dockerfile should look as follows:
Run the docker build
command to build this container:
After the image is built, it should be pushed to a container registry like Amazon ECR so that you can reference it in the next section.
Create an ECS task definition with Kinesis Agent and the application container
Now that you have Kinesis Agent packaged as a container image, you can use it in your ECS task definitions to run as sidecar. To do that, you create an ECS task definition with your application container (called producer
) and Kinesis Agent container. All containers in a task definition are scheduled on the same container host and therefore can share resources such as bind mounts.
In the following sample container definition, we use a bind mount called logs_dir
to share a directory between the producer
container and kinesis-agent
container.
You can use the following template as a starting point, but be sure to change taskRoleArn
and executionRoleArn
to valid IAM roles in your AWS account. In this instance, the IAM role used for taskRoleArn
must have write permissions to Kinesis Data Firehose that you specified earlier in the agent.json
file. Additionally, make sure that the ECR image paths and awslogs-region
are modified as per your AWS account.
Register the task definition with the following command:
Run a new ECS task
Finally, you can run a new ECS task using the task definition you just created using the aws ecs run-task
command. When the task is started, you should be able to see two containers running under that task on the Amazon ECS console.
Conclusion
This post showed how straightforward it is to run Kinesis Agent in a containerized environment. Although we used Amazon ECS as our container orchestration service in this post, you can use a Kinesis Agent container in other environments such as Amazon Elastic Kubernetes Service (Amazon EKS).
To learn more about using Kinesis Agent, refer to Writing to Amazon Kinesis Data Streams Using Kinesis Agent. For more information about Amazon ECS, refer to the Amazon ECS Developer Guide.
About the Author
Buddhike de Silva is a Senior Specialist Solutions Architect at Amazon Web Services. Buddhike helps customers run large scale streaming analytics workloads on AWS and make the best out of their cloud journey.