ASLens uses the AWS DeepLens to translate the American Sign Language alphabet to speech.
What it does
The AWS DeepLens captures video and runs a deep learning model (built with AWS SageMaker) against each frame. When a letter from the ASL alphabet is recognised the AWS DeepLens plays the audio of that letter (using an MP3 file, generated using AWS Polly).
ASLens runs locally on the AWS DeepLens, as such an internet connection is not required (which eliminates bandwidth issues and increases speed, by eliminating hops between networks).
Created By: Chris Coombs
Learn more about Chris and the ASLens project in this AWS Machine Learning blog post.
How I built it
The ASLens deep learning model was created with AWS SageMaker. Using the image transfer learning example I was able to go from training data to my first model in under an hour!
The Lambda function first optimizes the AWS SageMaker model to run on the AWS DeepLens GPU, and then crops and scales each frame. Once resized, the video frame is run against the model, and if an ASL letter is detected, a corresponding MP3 file is played.
As the letters J and Z include motion, I excluded these from the training set.
I spent a significant amount of time, using trial and error, to get AWS Polly MP3s to play on the AWS DeepLens. For anyone else struggling with this, in summary: add the ggc_user to the audio group, add resources to the Greengrass group (and the Lambda functions therein) - repeat after every deploy!
Accomplishments that I'm proud of
I still can’t believe it works! It’s like magic! My wife came up with the idea, and I thought it was too big to work. Whilst I was confident I could master the AWS DeepLens hardware, I was concerned that I lacked the experience to create the appropriate model. Thankfully, Amazon SageMaker takes care of all of the machine learning heavy lifting, which meant I could focus on collating training data (and getting audio to play on the AWS DeepLens device).
What I learned
What's next for DeepLens ASLens