Learn about ReadToMe – The first place winner of the AWS DeepLens Challenge Hackathon
When Alex Schultz first heard about the AWS DeepLens workshop in Dr. Matt Wood’s keynote address at re:Invent 2017, little did he know that a few months later he would be the first place winner of the AWS DeepLens Challenge Hackathon, owe his kids $400, and be the star of a blog post on the AWS Machine Learning Blog!
Before starting this project Alex had no machine learning experience, however, in just a few weeks he was up and running—building and training models, and hungry to do more. We interviewed Alex about his experience with AWS DeepLens and asked him to do a deep dive into how he created his winning entry.
Getting started with machine learning
Alex is the creator of the project called ReadToMe. ReadToMe combines deep learning and computer vision with AWS services to enable AWS DeepLens to read to you, literally. You can show the DeepLens device a page, say from a storybook, and it reads it out loud. Alex built ReadToMe using a combination of technologies that were all new to him including, not only AWS DeepLens, but also Python, Amazon Polly, AWS Greengrass, AWS Lambda, Tesseract OCR, Apache MXNet, and TensorFlow.
Alex is a Senior Software Engineer at Princeton TMX based in Fort Wayne, Indiana. He’s wanted to get started with machine learning for a while, but he never felt it was accessible to “normal developers” until now:
“I’ve always been interested in AI and machine learning, but it’s never really been something that has seemed normal developer friendly. Learning AI seems to be about reading documentation full of calculus and linear algebra, so how do you go from traditional development background to getting into that? When I saw and heard about AWS DeepLens at the keynote, it was like, wow, that’s pretty cool. I’ve played with open CV in the past and done some basic stuff, but nothing to this extent. So I was just really excited to start playing with it.”
Hands-on learning at re:Invent
Alex was one of the re:Invent 2017 attendees to get a coveted place in the AWS DeepLens workshops, where he was able to start his journey with AWS DeepLens and machine learning.
“I was in one of the last workshops. 7pm Thursday night, braindead. But I went, and it was really a cool workshop. The whole hotdog thing and going through the examples was key to being able to get started. Then getting into the repo and being able to play with the sample code. I thought the workshop was great, being able to actually get hands-on before you’re set loose with the device.”
Building with AWS DeepLens
AWS DeepLens allows developers of all skill levels to get started with deep learning by providing sample projects with practical, hands-on examples. Alex used the sample projects as a springboard for his learning:
”Having the example projects is a great starting point. Whenever I try anything new, I try to look for content that I can use to sort of springboard me forward. You can take what already works and then improve on it or tweak it or make it do what it is you want it to do.”
Alex went through two or three different ideas before he landed on the idea for his final project—ReadToMe. Before this, in December he started a project where DeepLens was set to monitor people coming into the office:
“I brought it to work and started recording faces. I was able to get images and send them up to S3 and then use Cloud Watch to trigger an event that would send them to Rekognition and start classifying them. So I was actually pretty far along on that project. Then I noticed on the AWS Machine Learning Blog that there was a posting about all the different sample projects, and you guys kept adding stuff to it. Then one showed up for facial recognition, not just facial detection. So I assumed, okay, well, they’re going to release sample code that does facial recognition, my project’s not going to be worth anything, anybody can do it, so I need to shift gears”.
Alex was a couple of weeks in when he decided to pivot, but all was not lost. This first experiment helped him figure out AWS DeepLens, start working with AWS Greengrass, and start sending images to Amazon S3:
“That project took a little bit of my time, but helped get my feet wet. I had had a basis to kick start the next project.”
ReadToMe – A deep-learning-enabled application that can read books to kids
This father of four got the inspiration for ReadToMe from his kids. Specifically, his three and four year olds. Both kids love books, but they aren’t reading by themselves yet. Alex wanted to create something that they could use to enjoy reading, even when a parent isn’t available to read to them. He started this project in January, and put in a lot of late nights working “after-family hours” from 8 pm to midnight. He estimates that it took about 200 hours to complete the project. He describes how he approached the project in four stages:
Stage one – OCR: “I didn’t start with the model. What I actually started with was making sure that I could figure out all the different pieces. So the OCR was the priority – I wanted to make sure I could actually do that before I went down the path of training a model. I wanted to tackle the easy part first.”
Stage two – Audio: “I wanted to get the audio working on the device itself; this probably took me four or five days. If I couldn’t do that, the project was dead in the water. I thought maybe I could use Alexa and somehow use the speaker on that, but I didn’t want to do that if I could avoid it. So, finally, I reached out to the DeepLens Forum where one of the AWS experts helped me out. After another day of trial and error, I got the audio working through DeepLens, and then I could move on to the next step.”
Stage three – Model training: “The third thing I did was actually look at training the model. I tried to find sample images online of pages of books; I was able to find one or two, but I was not able to find the amount of data that I needed. I knew I had to create my own data. So I just grabbed all my kids’ books and started taking a bunch of pictures. It probably took a good week or so to get the model trained, from reading about how to do it to trying to figure it out in MXNet to then shifting gears to TensorFlow.”
Stage four – Putting it all together and adding Amazon Polly: “So once I had the model trained, then I had to put it all together. It took about a week and a half of putting all of the workflow together, making sure that I grabbed a fifth frame after detecting the text block and figuring out how to clean up the image. Hooking up to Polly was really easy. That was actually a lot easier than I thought it would be.”
The combination of new technology and being new to machine learning meant that Alex faced several unexpected challenges as he tackled this project. However, his passion and determination to learn meant he never gave up and the result is one he is happy with:
“I was very happy with the end results of this project. In spite of having a full-time job and four kids, I was able to create something really cool and get it to actually work reasonably well in just a couple months.”
What’s next for ReadToMe
Alex is passionate about continuing to work on the project. He plans to do more training and tuning of the model and extend the functionality:
“I want to get it trained for MXNet, and I think I have ways to get around the problems I was running into before. Another feature I was talking to my neighbor about and we thought would be cool is translation. Amazon just released, Amazon Translate, so I’m thinking, that would actually not be that hard to add that feature as well.”
Alex is also considering using different voices, improving the image quality of the text blocks, and making other more minor updates.
After taking time away from the family to work on this project Alex felt like sharing some of his $7,500 winnings would be the right thing to do:
“I actually didn’t expect to win. But I promised my children, sort of half-jokingly, if I win, I’ll give each of you $100. The kids have decided to pool some of the money for us to go to an indoor waterpark as a family.”
The main investment Alex is going to make is in a new computer to further his AI and machine learning journey:
“I’m actually looking at getting a new computer, with a graphics card, something that I can use to keep learning and keep getting into this stuff.”
It’s not just about the material prize though. It’s about the journey Alex took from having no machine learning experience to, in just a matter of weeks, accelerating his knowledge and his skills:
“Before diving into this project, I had no experience with deep learning or AI in general. I have discovered, through this process, that it is possible to create real useful deep learning projects without a PhD in math, and that with enough effort, and patience anyone with a decent development background can start using it.”
Congratulations to Alex and the entire Schultz family on this well-deserved win!
Hopefully, Alex’s story has inspired you to want to learn more about AWS DeepLens. You can view all of the projects from the AWS DeepLens Challenge on the DeepLens Community Projects webpage. For more general information, take a look at the AWS DeepLens Website or browse AWS DeepLens posts on the AWS Machine Learning blog.
The AWS DeepLens Challenge was a virtual hackathon brought to you by AWS and Intel to encourage developers to get creative with their AWS DeepLens. To learn more about the contest, check out the DeepLens Challenge website. Entries are now closed.
About the Author
Sally Revell is a Principal Product Marketing Manager for AWS DeepLens. She loves to work on innovative products that have the potential to impact people’s lives in a positive way. In her spare time, she loves to do yoga, horseback riding and being outdoors in the beauty of the Pacific Northwest.