AWS Machine Learning Blog

Video streaming and deep learning: Using Amazon Kinesis Video Streams with Deep Java Library

Amazon Kinesis Video Streams allows you to easily ingest video data from connected devices for processing. One of the most effective ways to process this video data is using the power of deep learning. You can create an efficient service infrastructure to run these computations with a Java server, but Java support for deep learning has traditionally been difficult to come by.

Deep Java Library (DJL) is a new open-source deep learning framework for Java built by AWS. It sits on top of native engines, so you can train entirely in DJL while using different engines on the backend, such as PyTorch and Apache MXNet. It can also import and run models built using Tensorflow, Keras, and PyTorch. DJL can bridge the ease of Kinesis Video Streams with the power of deep learning for your own video analytics application.

In this tutorial, we walk through running an object detection model against a Kinesis video stream. In object detection, the computer finds different types of objects in an image and draws a bounding box, describing their locations inside the image. For example, you can use detection to recognize objects like dogs or people to avoid false alarms in a home security camera.

The full project and instructions to run it are available in the DJL demo repository.

Setting up

To begin, create a new Java project with the following dependencies, shown here in gradle format:

dependencies {
    implementation platform("ai.djl:bom:0.8.0")
    implementation "ai.djl:api"
    
    runtimeOnly "ai.djl.mxnet:mxnet-model-zoo"
    runtimeOnly "ai.djl.mxnet:mxnet-native-auto"
    
    implementation "software.amazon.awssdk:kinesisvideo:2.10.75"
    implementation "software.amazon.kinesis:amazon-kinesis-client:2.2.9"
    implementation "com.amazonaws:amazon-kinesis-video-streams-parser-library:1.0.13"
}

The DJL ImageVisitor

Because the model works on images, you can create a DJL FrameVisitor that visits and runs your model on each frame in the video. In real applications, it might help to only run your model on a fraction of the frames in the video. See the following code:

FrameVisitor frameVisitor = FrameVisitor.create(new DjlImageVisitor());

The DjlImageVisitor class extends the H264FrameDecoder to provide the capability to convert the frame into a standard Java BufferedImage. Because DJL natively supports this class, you can run it directly from the BufferedImage.

In DJL, the Predictor is used to run the trained model against live data. This is often referred to as inference or prediction. It fully encapsulates the inference experience by taking your input through preprocessing to prepare it into the model’s data structure, running the model itself, and postprocessing the data into an easy-to-use output class. In the following code block, the Predictor converts an Image to the set of outputs, DetectedObjects. An ImageFactory converts a standard Java BufferedImage into the DJL Image class:

public class DjlImageVisitor extends H264FrameDecoder {

    Predictor<Image, DetectedObjects> predictor;
    ImageFactory factory = ImageFactory.getInstance();

    ...

}

DJL also provides a model zoo where you can find many models trained on different tasks, datasets, and engines. For now, create a Predictor using the basic SSD object detection model. You can also use the default preprocessing and postprocessing defined within the model zoo to directly create a Predictor. For your own applications, you can define custom processing in a Translator and pass it in when creating a new Predictor:

Criteria<Image, DetectedObjects> criteria = Criteria.builder()
    .setTypes(Image.class, DetectedObjects.class)
    .optArtifactId("ai.djl.mxnet:ssd")
    .build();
predictor = ModelZoo.loadModel(criteria).newPredictor();

Then, you just need to define the FrameVisitors process method that is called to handle the various frames as follows. You convert the Frame into a BufferedImage using the decodeH264Frame method defined within the H264FrameDecoder. You wrap that into an Image using the ImageFactory you created earlier. Then, you use your Predictor to run prediction using the SSD model. See the following code:

    @Override
    public void process(
            Frame frame,
            MkvTrackMetadata trackMetadata,
            Optional<FragmentMetadata> fragmentMetadata)
            throws FrameProcessException {

        Image image = factory.fromImage(decodeH264Frame(frame, trackMetadata));
        DetectedObjects prediction = predictor.predict(image);
    }

Using the prediction

At this point, you have the detected objects and can use them for whatever your application requires. For a simple application, you could just print out all the class names that you detected to standard out as follows:

        String classStr =
                prediction
                        .items()
                        .stream()
                        .map(Classification::getClassName)
                        .collect(Collectors.joining(", "));
        System.out.println("Found objects: " + classStr);

You could also find out if there is a high probability that a person was in the image using the following code:

        boolean hasPerson =
                prediction
                        .items()
                        .stream()
                        .anyMatch(
                                c ->
                                        "person".equals(c.getClassName())
                                                && c.getProbability() > 0.5);

Another option is to use the image visualization methods in the Image class to draw the bounding boxes on top of the original image. Then, you can get a visual representation of the detected objects. See the following code:

        image.drawBoundingBoxes(prediction);
        Path outputFile = Paths.get("out/annotatedImage.png");
        try (OutputStream os = Files.newOutputStream(outputFile)) {
            image.save(os, "png");
        }

Running the stream

You’re now ready to set up your video stream. For instructions, see Create a Kinesis Video Stream. Make sure to record the REGION and STREAM_NAME that you used so you can pass it into your application.

Then, create a new thread pool to run your application. You also need to build a GetMediaWorker with all the data for your video stream and run it on the thread pool. For your getMediaworker, you need to pass in the data you pulled from the Kinesis Video Streams console describing your video stream. You also need to provide the AWS credentials for accessing the stream. Use the SystemPropertiesCredentialsProvider, which finds the credentials in the JVM System Properties. You can find more details about providing these credentials in the demo repository. Lastly, we need to pass in the StartSelectorType.NOW to start using the stream immediately. See the following code:

ExecutorService executorService = Executors.newFixedThreadPool(1);

AmazonKinesisVideoClientBuilder amazonKinesisVideoBuilder =
        AmazonKinesisVideoClientBuilder.standard();
amazonKinesisVideoBuilder.setRegion(REGION.getName());
amazonKinesisVideoBuilder.setCredentials(new SystemPropertiesCredentialsProvider());
AmazonKinesisVideo amazonKinesisVideo = amazonKinesisVideoBuilder.build();



GetMediaWorker getMediaWorker =
        GetMediaWorker.create(
                REGION,
                new SystemPropertiesCredentialsProvider(),
                STREAM_NAME,
                new StartSelector().withStartSelectorType(StartSelectorType.NOW),
                amazonKinesisVideo,
                frameVisitor);
executorService.submit(getMediaWorker);

Conclusion

That’s it! You’re ready to begin sending data to your stream and detecting the objects in the video. You can find more information about the Kinesis Video Streams API in the Amazon Kinesis Video Streams Producer SDK Java GitHub repo. The full Kinesis Video Streams DJL demo is available with the rest of the DJL demo applications and integrations with many other AWS and Java tools in the demo repository.

Now that you have integrated Kinesis Video Streams and DJL, you can improve your application in many different ways. You can choose additional object detection and image-based models from the more than 70 pre-trained and ready-to-use models in our model zoo from GluonCV, TorchHub, and Keras. You can run these or custom models across any of the engines supported by DJL, including Tensorflow, PyTorch, MXNet, and ONNX Runtime. DJL even has full training support so you can build your own model to add to your video streaming application instead of relying on a pre-trained one.

Don’t forget to follow our GitHub repo, demo repository, Slack channel, and Twitter for more documentation and examples of DJL!


About the Authors

Zach Kimberg is a Software Engineer with AWS Deep Learning working mainly on Apache MXNet for Java and Scala. Outside of work he enjoys reading, especially Fantasy.

 

 

 

 

Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.