Video streaming and deep learning: Using Amazon Kinesis Video Streams with Deep Java Library
Amazon Kinesis Video Streams allows you to easily ingest video data from connected devices for processing. One of the most effective ways to process this video data is using the power of deep learning. You can create an efficient service infrastructure to run these computations with a Java server, but Java support for deep learning has traditionally been difficult to come by.
Deep Java Library (DJL) is a new open-source deep learning framework for Java built by AWS. It sits on top of native engines, so you can train entirely in DJL while using different engines on the backend, such as PyTorch and Apache MXNet. It can also import and run models built using Tensorflow, Keras, and PyTorch. DJL can bridge the ease of Kinesis Video Streams with the power of deep learning for your own video analytics application.
In this tutorial, we walk through running an object detection model against a Kinesis video stream. In object detection, the computer finds different types of objects in an image and draws a bounding box, describing their locations inside the image. For example, you can use detection to recognize objects like dogs or people to avoid false alarms in a home security camera.
The full project and instructions to run it are available in the DJL demo repository.
To begin, create a new Java project with the following dependencies, shown here in gradle format:
The DJL ImageVisitor
Because the model works on images, you can create a DJL FrameVisitor that visits and runs your model on each frame in the video. In real applications, it might help to only run your model on a fraction of the frames in the video. See the following code:
DjlImageVisitor class extends the
H264FrameDecoder to provide the capability to convert the frame into a standard Java
BufferedImage. Because DJL natively supports this class, you can run it directly from the
In DJL, the
Predictor is used to run the trained model against live data. This is often referred to as inference or prediction. It fully encapsulates the inference experience by taking your input through preprocessing to prepare it into the model’s data structure, running the model itself, and postprocessing the data into an easy-to-use output class. In the following code block, the
Predictor converts an
Image to the set of outputs,
ImageFactory converts a standard Java
BufferedImage into the DJL
DJL also provides a model zoo where you can find many models trained on different tasks, datasets, and engines. For now, create a Predictor using the basic SSD object detection model. You can also use the default preprocessing and postprocessing defined within the model zoo to directly create a Predictor. For your own applications, you can define custom processing in a
Translator and pass it in when creating a new
Then, you just need to define the
FrameVisitors process method that is called to handle the various frames as follows. You convert the
Frame into a
BufferedImage using the decodeH264Frame method defined within the
H264FrameDecoder. You wrap that into an
Image using the
ImageFactory you created earlier. Then, you use your
Predictor to run prediction using the SSD model. See the following code:
Using the prediction
At this point, you have the detected objects and can use them for whatever your application requires. For a simple application, you could just print out all the class names that you detected to standard out as follows:
You could also find out if there is a high probability that a person was in the image using the following code:
Another option is to use the image visualization methods in the Image class to draw the bounding boxes on top of the original image. Then, you can get a visual representation of the detected objects. See the following code:
Running the stream
You’re now ready to set up your video stream. For instructions, see Create a Kinesis Video Stream. Make sure to record the
STREAM_NAME that you used so you can pass it into your application.
Then, create a new thread pool to run your application. You also need to build a
GetMediaWorker with all the data for your video stream and run it on the thread pool. For your
getMediaworker, you need to pass in the data you pulled from the Kinesis Video Streams console describing your video stream. You also need to provide the AWS credentials for accessing the stream. Use the
SystemPropertiesCredentialsProvider, which finds the credentials in the JVM System Properties. You can find more details about providing these credentials in the demo repository. Lastly, we need to pass in the
StartSelectorType.NOW to start using the stream immediately. See the following code:
That’s it! You’re ready to begin sending data to your stream and detecting the objects in the video. You can find more information about the Kinesis Video Streams API in the Amazon Kinesis Video Streams Producer SDK Java GitHub repo. The full Kinesis Video Streams DJL demo is available with the rest of the DJL demo applications and integrations with many other AWS and Java tools in the demo repository.
Now that you have integrated Kinesis Video Streams and DJL, you can improve your application in many different ways. You can choose additional object detection and image-based models from the more than 70 pre-trained and ready-to-use models in our model zoo from GluonCV, TorchHub, and Keras. You can run these or custom models across any of the engines supported by DJL, including Tensorflow, PyTorch, MXNet, and ONNX Runtime. DJL even has full training support so you can build your own model to add to your video streaming application instead of relying on a pre-trained one.
About the Authors
Zach Kimberg is a Software Engineer with AWS Deep Learning working mainly on Apache MXNet for Java and Scala. Outside of work he enjoys reading, especially Fantasy.
Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.