First Place Winner:
Accomplishments that I'm proud of
I was very happy with the end results of this project. In spite of having a full-time job and four kids, I was able to create something really cool and get it to actually work reasonably well in just a couple months. I have more ideas for future projects with this device and am excited to keep learning.
What I learned
Before diving into this project, I had no experience with Deep Learning or AI in general. I have always been interested in the topic but it always seemed to unapproachable to "normal developers". I have discovered, through this process, that it is possible to to create real useful deep learning projects without a PhD in math, and that with enough effort, and patience anyone with a decent development background can start using it.
What's next for ReadToMe
I have a few ideas for improving the project. One feature I would like to add is the ability to translate the text that is read. (I signed up for early access to Amazon’s new Translate service but haven’t yet been approved.) I also plan to continue to improve my model to see if I can increase the model accuracy a bit as well as make it work with a broader range of books.
Lastly, the text image cleanup function function, which feeds directly into Tesseract, can be improved. Specifically, it would be beneficial to be able to rotate and or warp the image before sending it to Tesseract. That way when a child isn’t holding the book correctly, it could still read the text. Motion blur was also a definite issue I had to contend with in image cleanup. If the book isn’t held very still for a few seconds, the image is just to blurry for the OCR to work. I have read about various techniques to solve this problem, like using image averaging over multiple frames, or applying different filters to the image to smooth out the pixels. I am sure that it's possible to achieve a better/faster outcome but it's tricky working on a resource constrained device.
There were many online resources that helped me along the way but these links proved to be the most helpful.
(In no particular order)
Also, many thanks to all the participants in the forums answering questions and especially to the AWS DeepLens team for getting my unstuck numerous times! :)