AWS Startups Blog
Alex Smola Showcases the Breadth of AWS’s Machine Learning Capabilities at Collision 2018
“Ultimately, my creativity is very limited, but your creativity is large,” said Alex Smola, Amazon Web Services Director of Machine Learning, to the assembled audience at the 2018 Collision Conference in New Orleans. “You can do amazing things with the tools that we’ve built.”
And while Smola’s creativity is anything but limited, he demonstrated during his talk that an individual’s creativity is insignificant compared to a network’s.
To give the audience members a sense of the ambition of the “Voice and Language in AWS” project, Smola kicked off his presentation by projecting a 16th-century diagram of the human head, with its abilities—speech, hearing, imagination—labeled in antique script.
“In a way, what we’re seeing right now is people designing services that try to make available, in a computational way, higher order functions of humans,” Smola stated. He then outlined exactly which Amazon services have replicated these higher functions. “Polly for text and speech, Rekognition for computer vision, Comprehend for understanding what we see, Translate for translating it from one language into another, Lex for dialogues, and then Transcribe, of course, because at some point you need to hear things.”
Recreating the abilities of the human brain is a massive project, and the application of these services is similarly large-scaled. Smola mentioned both the business-oriented applications—finding out what customers want, personalizing content—and ways that these services can bring people closer together and achieve greater equality.
“You can use it to teach the next generation,” Smola added, “and […] it opens a broader range of data to people with limited senses. If I cannot hear, but if I can have something that transcribes voice for me, then all of a sudden I’ve gained another sense.”
Transcribe, as Smola described it, is a “huge enabler for people with disabilities.” More than a simple automated transcription service, Transcribe’s ability to detect multiple speakers and customize vocabulary can serve as a powerful aid for the deaf community. Smola went into detail, showing how Amazon services can create a “pipeline” from audio input, through Amazon Transcribe, Amazon Comprehend, Amazon Athena, and Amazon Quicksight, to create flawless transcriptions and sophisticated workflows.
He also went into the far-reaching applications of Amazon Polly. Not only does Polly turn text to speech, but it also acts as a flawlessly accented translator, helping bridge international linguistic boundaries. Polly is equipped with twenty-five languages, and it also has the ability to add semantic meaning and accurately process text automatically—it can differentiate, for example, between the nuanced pronunciation of the verb “live” and the adjective “live.”
But even this isn’t enough for Alex Smola. “How do we make the applications conversational?” Smola wondered out loud. His answer: “Enter Amazon Lex.”
Amazon Lex can personalize a conversation based on a user’s social profile, give the bot a personality, create a conversational flow based on user input, and leverage rich formatting capabilities for a more engaging experience. And this dynamic conversational ability has use cases across the board, from Internet of Things bots to chatbots for everyday consumer requests.
And Smola outlined the positive potential of this conversational ability. “You could call and find out what the weather’s like,” he said. “[…] If you cannot see very well, this is actually a good thing.”
We agree—Smola, as an individual, can predict many good things. But a whole network of users, harnessing their collective creativity, can create things none of us could imagine on our own.