How can I put data records into a Kinesis data stream using the KPL?
Last updated: 2020-10-16
I want to write and put data records into a Kinesis data stream using the Amazon Kinesis Producer Library (KPL). How can I do this?
To put a record into a Kinesis data stream using the KPL, you must meet the following requirements:
- You have a running Amazon Elastic Compute Cloud (Amazon EC2) Linux instance.
- An AWS Identity and Access Management (IAM) role is attached to your instance.
- The KinesisFullAccess policy is attached to the instance's IAM role.
To put records into a Kinesis data stream using the KPL:
2. Install the latest version of the OpenJDK 8 developer package:
sudo yum install java-1.8.0-openjdk-devel
3. Confirm that Java is installed:
The expected output looks like this:
java version "1.7.0_181" OpenJDK Runtime Environment (amzn-220.127.116.11.80.amzn1-x86_64 u181-b00) OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
4. Run the following commands to set Java 1.8 as the default java and javac providers:
sudo /usr/sbin/alternatives --config java sudo /usr/sbin/alternatives --config javac
5. Add a repository with an Apache Maven package:
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
6. Set the version number for the Maven packages:
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
7. Use yum to install Maven:
sudo yum install -y apache-maven
8. Confirm that Maven is installed properly:
The expected output looks like this:
Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) Maven home: /usr/share/apache-maven Java version: 1.7.0_181, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-1.7.0-openjdk-18.104.22.168.x86_64/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "4.14.33-51.37.amzn1.x86_64", arch: "amd64", family: "unix"
9. Install git, and then download the KPL from Amazon Web Services - Labs:
sudo yum install git git clone https://github.com/awslabs/amazon-kinesis-producer
10. Open the amazon-kinesis-producer/java/amazon-kinesis-producer-sample/ directory, and then list the files:
cd amazon-kinesis-producer/java/amazon-kinesis-producer-sample/ ls default_config.properties pom.xml README.md src target
11. Run a command similar to the following to create a Kinesis data stream:
aws kinesis create-stream --stream-name kinesis-kpl-demo --shard-count 2
For more information about the number of shards needed, see Resharding, scaling, and parallel processing.
12. Run list-streams to confirm that the stream was created:
aws kinesis list-streams
13. Open the SampleProducer.java file on the Amazon Web Services - Labs GitHub repository, and then modify the following fields:
For public static final String STREAM_NAME_DEFAULT, enter the name of the Kinesis data stream that you previously created.
For public static final String REGION_DEFAULT, enter the Region that you're using.
cd src/com/amazonaws/services/kinesis/producer/sample vi SampleProducerConfig.java public static final String STREAM_NAME_DEFAULT = "kinesis-kpl-demo"; public static final String REGION_DEFAULT = "us-east-1";
14. Run the following command in the amazon-kinesis-producer-sample directory to allow Maven to download all of the directory's dependencies:
mvn clean package
15. Run the following command in the amazon-kinesis-producer-sample directory to run the producer and to send data into the Kinesis data stream:
mvn exec:java -Dexec.mainClass="com.amazonaws.services.kinesis.producer.sample.SampleProducer"
16. Check the Incoming Data (Count) graph on the Monitoring tab of the Kinesis console to verify the number of records sent to the stream.
Note: The record count might be lower than the number of records sent to the data stream. This lower record count can occur because the KPL uses aggregation.