AWS Machine Learning Blog
Use AWS DeepLens to give Amazon Alexa the power to detect objects via Alexa skills
April 2023 Update: Starting January 31, 2024, you will no longer be able to access AWS DeepLens through the AWS management console, manage DeepLens devices, or access any projects you have created. To learn more, refer to these frequently asked questions about AWS DeepLens end of life. |
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. |
People are using Alexa for all types of activities in their homes, such as checking their bank balances, ordering pizza, or simply listening to their music from their favorite artists. For the most part, the primary interaction with the Echo has been your voice. In this blog post, we’ll show you how to build a new Alexa skill that will integrate with AWS DeepLens so when you ask “Alexa, what do you see?” Alexa returns objects detected by the AWS DeepLens device.
Object detection is an important topic in the AI deep learning world. For example, in autonomous driving, the camera on the vehicle needs to be able to detect objects (people, cars, signs, etc.) on the road first before making any decisions to turn, slow down, or stop.
AWS DeepLens was developed to put deep learning in the hands of developers. It ships with a fully programmable video camera, tutorials, code, and pre-trained models. It was designed so that you can have your first Deep Learning model running on the device within about 10 minutes after opening the box. For this blog post we’ll use one of the built-in object detection models included with AWS DeepLens. This enables AWS DeepLens to perform real-time object detection using the built-in camera. After the device detects objects, it sends information about the objects detected to the AWS IoT platform.
We’ll also show you how to push this data into an Amazon Kinesis data stream, and use Amazon Kinesis Data Analytics to aggregate duplicate objects detected in the stream and push them into another Kinesis data stream. Finally, you’ll create a custom Alexa skill with AWS Lambda to retrieve the detected objects from the Kinesis stream and have Alexa verbalize the result back to the user.
Solution overview
The following diagram depicts a high-level overview of this solution.
Amazon Kinesis Data Streams
You can use Amazon Kinesis Data Streams to build your own streaming application. This application can process and analyze real-time, streaming data by continuously capturing and storing terabytes of data per hour from hundreds of thousands of sources.
Amazon Kinesis Data Analytics
Amazon Kinesis Data Analytics provides an easy and familiar standard SQL language to analyze streaming data in real time. One of its most powerful features is that there are no new languages, processing frameworks, or complex machine learning algorithms that you need to learn.
AWS Lambda
AWS Lambda lets you run code without provisioning or managing servers. With AWS Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and AWS Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web app, mobile app, or, in this case, an Alexa skill.
Amazon Alexa
Alexa is the Amazon cloud-based voice service available on tens of millions of devices from Amazon and third-party device manufacturers. With Alexa, you can build natural voice experiences that offer customers a more intuitive way to interact with the technology they use every day. Our collection of tools, APIs, reference solutions, and documentation make it easy for anyone to build with Alexa.
Solution summary
The following is a quick walkthrough of the solution that’s illustrated in the diagram:
- First set up the AWS DeepLens device and download the object detection model onto the device. Then it will load the model, perform local inference, and send detected objects as MQTT messages to the AWS IoT platform.
- The MQTT messages are then sent to a Kinesis data stream by configuring an IoT rule.
- By using Kinesis Data Analytics on the Kinesis data stream, detected objects are aggregated and put into another Kinesis data stream for the Alexa custom skill to query.
- Upon the user’s request, the custom Alexa skill will invoke an AWS Lambda function, which will query the final Kinesis data stream and return list of objects detected by the AWS DeepLens device.
Implementation steps
The following sections walk through the implementation steps in detail.
Setting up DeepLens and deploying the built-in object detection model
- Open the AWS DeepLens console.
- Register your AWS DeepLens device if it’s not registered. You can follow this link for a step by step guide to register the device.
- Choose Create new project on the Projects page. On the Choose project type page, choose the Use a project template option, and select Object detection in the Project templates section.
- Choose Next to move to the Review
- In the Project detail page, give the project a name and then choose Create.
- Back on the Projects page, select the project that you created earlier, and click Deploy to device.
- Make sure the AWS DeepLens device is online. Then select the device to deploy, and choose Review. Review the deployment summary and then choose Deploy.
- The page will redirect to device detail page and at the top the page, deployment status is displayed. This process will take a few minutes, wait until the deployment is successful.
- After the deployment is complete, on the same page, scroll to the device details section and copy the MQTT topic from AWS IoT that the device is registered to. As mentioned, any object detected by the AWS DeepLens device will send the information to this topic.
- In the same section, choose the AWS IoT console link.
- On the AWS IoT MQTT Client test page, paste the MQTT topic copied earlier and choose Subscribe to topic.
- You should now see detected object messages flowing on the screen.
Setting up Kinesis Streams and Kinesis Analytics
- Open the Amazon Kinesis Data Streams console.
- Create a new Kinesis Data Stream. Give it a name that indicates it’s for raw incoming stream data—for example, RawStreamData. For Number of shards, type 1.
- Go back to the AWS IoT console and in the left navigation pane choose Act. Then choose Create to set up a rule to push MQTT messages from the AWS DeepLens device to the newly created Kinesis data stream.
- On the create rule page, give it a name. In the Message source section, for Attribute enter *. For Topic filter, enter the DeepLens device MQTT topic. Choose Add action.
- Choose Sends message to an Amazon Kinesis Stream, then click Configuration action. Select the Kinesis data stream created earlier, and for Partition key type ${newuuid()} in the text box. Then choose Create a new role or Update role and choose Add action. Choose Create rule to finish the setup.
- Now that the rule is set up, messages will be loaded into the Kinesis data stream. Now we can use Kinesis Data Analytics to aggregate the data and load the result to the final Kinesis data stream.
- Open the Amazon Kinesis Data Streams console.
- Create another new Kinesis Data Stream (follow instruction steps 1 and 2). Give it a name that indicates that it’s for aggregated incoming stream data—for example, AggregatedDataOutput. For Number of shards, type 1.
- In the Amazon Kinesis Data Analytics console, create a new application.
- Give it a name and choose Create application. In the source section, choose Connect stream data.
- Select Kinesis stream as Source and select the source stream created in step 2. Choose Discover schema to let Kinesis Data Streams to auto-discover the data schema.
- The Object Detection model deployment can detect up to 20 objects. However, AWS DeepLens might only detect a few of them depending on what’s in front of the camera. For example, in the screenshot below, the device only detects a chair and a person. Therefore, Kinesis auto discovery only detects the two objects as columns. You can add the other 18 objects manually to the schema by choosing the Edit schema button.
- On the schema page, add the rest of the objects then choose Save schema and update stream. Wait for the update to complete then click Exit (done).
- Scroll down to the bottom of the page then choose Save and continue.
- Back on the application configuration page, choose Go to SQL editor.
- Copy and paste the following SQL statement to the Analytics SQL window, and then choose Save and run SQL. After SQL finishes saving and running, choose Close at the bottom of the page. The SQL script aggregates each object detected in a 10 second tumbling window and stores them in the destination stream.
- Back on application configuration page, choose Connect to a destination. In the Analytics destination section, make sure the Kinesis data stream (for example, AggregatedDataOutput) created in step 8 is selected, and enter DESTINATION_SQL_STREAM as the In-application stream name and select JSON as the output format of. Note that you can also easily have Kinesis Data Analytics send data directly to a Lambda function. That Lambda function would write to other data stores, such as Amazon DynamoDB. Then have the Alexa read from DynamoDB upon user request.
- Messages are now aggregated and loaded into the final Kinesis data stream (for example, AggregatedDataOutput) that was created in step 8. Next, you will create an Alexa custom skill with AWS Lambda.
Creating Alexa custom skill with AWS Lambda
- Open the AWS Lambda console and create a new function.
- The easiest way to create an Alexa skill is to create the function from the existing blue print provided by AWS Lambda and then overwrite the code with your own.
- It is a security best practice to enable Alexa Skill ID when using a Lambda function. If you have not created a skill for this yet, you can disable it for now and then re-enable it later by adding another Alexa trigger to the Lambda function.
- Copy the following Python code and replace the sample code provided by the blueprint. This code reads data from the Kinesis data stream and returns the result back to Alexa. Note: Change the default Northern Virginia AWS Region (us-east-1) in the code if you are following along in other Regions. Look for the following code to change region, kinesis = boto3.client(‘kinesis’, region_name=’us-east-1′)
Setting up an Alexa custom skill with AWS Lambda
- Open the Amazon Alexa Developer Portal, and choose Create a new custom skill.
- You can upload the JSON document in the Alexa Skill JSON Editor to automatically configure intents and sample utterances for the custom skill. Be sure to click Save Model to apply the changes.
- Finally, in the endpoint section, enter the AWS Lambda function’s Amazon Resource Name (ARN) that’s created. Your Alexa Skill is set and ready to tell you the objects detected by the AWS DeepLens device. You can wake Alexa up by saying “Alexa, open Deep Lens demo” and then ask Alexa questions such as “What do you see?” Alexa should return an answer such as “I see a person” or “I see a chair,” etc.
Conclusion
In this blog post, you learned how to set up the AWS DeepLens device and deploy the built-in object detection model from the DeepLens console. Then, you could use AWS DeepLens to perform real-time object detection. Objects detected on the AWS DeepLens device were sent to the AWS IoT platform which then forwarded them to an Amazon Kinesis data stream. You also learned how to use Amazon Kinesis Data Analytics to aggregate duplicate objects detected in the stream and push them into another Kinesis data stream. Finally, you created a custom Alexa skill with AWS Lambda to retrieve the detected objects in the Kinesis data stream and return the result back to the users from Alexa.
About the authors
Tristan Li is a Solutions Architect with Amazon Web Services. He works with enterprise customers in the US, helping them adopt cloud technology to build scalable and secure solutions on AWS.
Grant McCarthy is an Enterprise Solutions Architect with Amazon Web Services, based in Charlotte, NC. His primary role is assisting his customers move workloads to the cloud securely and ensuring that the workloads are architected in way that aligns with AWS best practices.