AWS AI Blog

Amazon Lex Now Supports Telephony Audio (8 kHz) for Increased Speech Recognition Accuracy

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

To increase the accuracy of speech recognition for conversations over the phone, Amazon Lex now supports telephony audio (8 kHz). You can now employ the same deep learning technology as Amazon Alexa to converse with your applications and fulfill the most common requests. Amazon Lex maintains context and dynamically manages the dialogue, adjusting responses based on the conversation.

Amazon Lex integrates with Amazon Connect, a cloud-based contact center service that scales to meet your needs, so you can deploy chatbots to handle first-level customer support. With the Amazon Lex integration to Amazon Connect, you can solve many customer problems without involving a human operator. When necessary, Amazon Lex can transfer a customer support call to an agent, with full context.

(more…)

Voice-Enabled Mobile Bot Drives Auto Industry Innovation with Real-Time Trade-in Values for Vehicles

by Harshal Pimpalkhute and Dennis Hills | on | Permalink | Comments |  Share

The Kelley Blue Book Bot allows users to get real-time Kelley Blue Book® Trade-In Value for vehicles using natural language. Users can interact with the chatbot in both voice and text. A simple question like, “Kelley Blue Book, can you tell me the trade-in value for my 2012 Honda Civic?” is all that is needed for getting expert car advice from an industry leading automotive company. The bot is built using Amazon Lex and AWS Mobile Hub. Once the conversation has started, Amazon Lex captures user input and manages the dialogue about the vehicle until all information is received. Amazon Lex then calls the Kelley Blue Book API to retrieve the current Kelley Blue Book Trade-In Value based on the user’s location. AWS Mobile Hub enables easy integration of the bot into your mobile app.

In this post, we explain how we built the Kelley Blue Book Bot. We then walk you through building your own bot with Amazon Lex. We will also describe how you can embed the bot into a fully-functional iOS or Android mobile app using AWS Mobile Hub.

The basics

Before you dive into this post, we recommend that you review the basics of building a conversational bot in Amazon Lex: How It Works in the Amazon Lex Developer Guide. It’s worth noting that Amazon Lex uses the same technology that powers Amazon Alexa and can process both speech and text input. The bot that you create understands both input types and you can incorporate one or both ways to interact. 

The design

The premise of a chatbot is that you interact with a bot naturally, using voice or text. You ask the bot questions (or make demands if you’d like), get answers, and complete sophisticated tasks. The Kelley Blue Book Bot is no different. The automotive experts at Kelley Blue Book have built an extensive vehicle database along with their own APIs to retrieve information from the database. Amazon Lex and AWS Mobile Hub enable a simplified user experience by making information from this database available via a conversational interface. During the conversation, Amazon Lex maintains the context by keeping track of the intent, the questions and user responses. Amazon Lex is a fully managed service so you don’t have to worry about designing or managing an infrastructure. The bot can be made available to web, mobile, and enterprise customers.

In action

The interaction begins when the mobile user asks the Kelley Blue Book Bot for the trade-in value for a vehicle. The client captures this input and sends it across to Amazon Lex. Amazon Lex translates the voice into text, captures slot values, and validates the values using AWS Lambda. Amazon Lex manages the dialogue, dynamically adjusting the responses until it has all the information that it needs and then sends to a Lambda function for fulfillment. The Lambda function then queries a Kelley Blue Book API, and responds back to Amazon Lex with real-time vehicle data. The real magic here is the innovative way that a user interacts with an existing API using just voice.

The details: intents, utterances, and slots, oh my!

The Kelley Blue Book Bot has one Intent (VehicleMarketValue) which retrieves vehicle market value. It contains utterances such as “Get Trade-In value for my {VehicleYear}, {VehicleMake}, and {VehicleModel}” to identify the user intent. To fulfill the business logic, an intent needs information or ‘slots’. For example, to retrieve market value the VehicleValue intent requires slots such as VehicleYear, VehicleMake, and VehicleModel. If a user replies with a slot value that includes additional words, such as “87,000 miles”, Amazon Lex can still understand the intended slot value (VehicleMileage:87000). The Kelley Blue Book Bot uses pre-defined slot types to capture certain user information. For example, AMAZON.NUMBER is a built-in slot type that is used for the {VehicleYear} and {VehicleMileage} slots. Amazon Lex provides an easy-to-use console to guide you through creating your own bot.  Alternately, you can also programmatically build and connect to bots via SDKs.

Analyze and improve

Amazon Lex provides analytics so that you can view how your customers are interacting with the bot and make necessary improvements over time.  In the console, on the Monitoring tab, you can track the number of utterances (for speech and text), the number of utterances that were not recognized (also known as missed utterances), and the request latency for your bot. The Utterances section provides details on detected and missed utterances. Simply choose the missed utterances to view the inputs that were not recognized by your bot. You can add these utterances to the intent.

Let’s see what happens under the hood

The following graphic shows the voice interaction between a mobile user and the Kelley Blue Book Bot:

(more…)

Find Distinct People in a Video with Amazon Rekognition

by Nicolas Malaval | on | Permalink | Comments |  Share

Amazon Rekognition makes it easy to detect, search for, and compare faces in images to find matches. In this post, we show how to use Amazon Rekognition to find distinct people in a video and identify the frames that they appear in. You could use face detection in videos, for example, to identify actors in a movie, find relatives and friends in a personal video library, or track people in video surveillance.

First, we explain how the serverless solution finds distinct people in a video. Then, we explain how to implement the solution in your AWS account with AWS CloudFormation and to test it with a sample video.

How it works

The following diagram shows how this solution works:

(more…)

Using Amazon Polly to Deliver Health Care for People with Long-Term Conditions

by Michael Wray | on | Permalink | Comments |  Share

This is a guest post by Michael Wray, senior software architect at Inhealthcare. Founded in 2012, Inhealthcare has created a digital infrastructure which supports remote home monitoring for the entire UK population.

Listen to this post

Voiced by Amazon Polly


Disclaimer: Amazon Polly is not an AWS HIPAA Eligible Service at the time of this writing. Consistent with the AWS Business Associate Addendum (BAA), Amazon Polly should not be used to create, receive, maintain, or transmit Protected Health Information (PHI) under the U.S. Health Insurance Portability and Accountability Act (HIPAA). It is each customer’s responsibility to determine whether they are subject to HIPAA, and if so, how best to comply with HIPAA and its implementing regulations.  Accounts that create, receive, maintain, or transmit PHI using a HIPAA Eligible Service should encrypt PHI as required under the BAA.  For a current list of HIPAA Eligible Services, and for more information generally, see the AWS HIPAA Compliance page.


With an aging population that continues to grow, healthcare is being changed forever. Are we ready for it? Which cost-effective technologies can we use to meet the ever-increasing demands on healthcare-related services?

With the right technology, many needs related to healthcare can be met remotely. This is already implemented by the National Health Service (NHS) in the UK. Although remote healthcare is far from being widespread, innovative organizations are realizing that by tapping into low-cost digital health solutions, some great efficiencies can be delivered at scale.

Despite being a dinosaur in the communications space, automated telephony can be a perfect communication channel to deploy services at scale because nearly everyone can use it, even if they do not have access to the internet or own a smartphone. And for many older people, the telephone is a piece of technology they are comfortable with and confident using.

In this post, we highlight how Inhealthcare has enabled NHS healthcare providers to leverage the capabilities of Amazon Polly in connection with remote communications. We show how Amazon Polly can be used at design time with our call script design tools to help design and simulate automated telephone calls. We illustrate how protocols can be built into automated telephone call scripts, how telephone calls are placed, and how synthesized speech is generated by Amazon Polly and streamed down the telephone line.

Inhealthcare provides a digital health platform that specializes in providing care in the UK, outside of hospital walls. The Inhealthcare platform connects to existing established healthcare software systems and enables clinical protocols and pathways to be modeled, created, tested, executed, and monitored. An important factor in delivering services remotely is to use an appropriate communication method. While apps, wearables, and web access are suitable for certain people, many individuals struggle with using these advanced technologies. Simpler alternatives like text messaging or automated telephony provide a better solution. As a platform provider, we support all of these communication channels, but in this post we focus on how we use Amazon Polly with automated telephony.

IVR

IVR (interactive voice response) has been around for ages, and it is for this reason that nearly everybody knows how to use it. Whether you experienced it as a reminder to set your watch with the help of the speaking clock, or as a nuisance call asking you about the recent injury you didn’t have, like most people, you have experienced IVR. This is important when delivering healthcare on a national basis: it must be simple and inclusive. IVR enables two-way communication; the computer can communicate with the human using a synthesized voice, and the human can communicate with the computer by using dual tone multi frequency (DTMF) codes. These are the codes you hear when you press on the buttons of the keypad.

How it works

(more…)

Build PMML-based Applications and Generate Predictions in AWS

by Gitansh Chadha | on | Permalink | Comments |  Share

If you generate machine learning (ML) models, you know that the key challenge is exporting and importing them into other frameworks to separate model generation and prediction. Many applications use PMML (Predictive Model Markup Language) to move ML models from one framework to another. PMML is an XML representation of a data mining model.

In this post, I show how to build a PMML application on AWS. First, you build a PMML model in Apache Spark using Amazon EMR. You import the model using AWS Lambda with JPMML, a PMML producer and consumer library for the Java platform. With this approach, you can export PMML models to an Amazon S3 bucket using a Spark application. After exporting the data, you can terminate the Amazon EMR cluster. Then, you use AWS Lambda to import the PMML model for generating predictions. We use the Iris dataset from UC Irvine. It contains three classes, one for each species of iris, with 50 instances of irises for each class.

For a list of PMML producer and consumer software, see the PMML Powered section of the Data Mining Group (DMG) website.

PMML application overview

The PMML application uses the MLlib K-means clustering algorithm in Apache Spark. MLlib K-means clustering is an unsupervised learning algorithm that tries to cluster data based on similarity. In K-means clustering, you have to specify the number of clusters that you want to group the data into. The Iris dataset contains three species, so you will configure the algorithm to group the data into three clusters. You then train the model.

Next, you export the model to a PMML document, which is stored in an S3 bucket. Spark ML has multiple options for exporting PMML. Spark ML added support for exporting them with MLlib libraries with the org.apache.spark.mllib.pmml.PMMLExportable trait starting with Spark MLib 1.4. With Spark ML, you can also use JPMML libraries to export PMML models.

Note: Spark doesn’t support exporting all models to PMML. For a list of models that can be exported, see PMML model export – RDD-based API.

Finally, you import the generated PMML model into AWS Lambda. Lambda allows you to build a cost-effective PMML application without provisioning or managing servers. You also can set up your application so that actions in other AWS services automatically trigger it or call it directly from any web or mobile app.

(more…)

AWS Partners with Mapillary to Support the Large-Scale Scene Understanding Challenge at CVPR 2017

by Joseph Spisak and Peter Kontschieder | on | Permalink | Comments |  Share

On July 26, 2017, Mapillary, Princeton University and others are hosting the Large-Scale Scene Understanding (LSUN) Challenge in conjunction with CVPR, the premier computer vision conference, in Honolulu, Hawaii. The LSUN Challenge and an associated workshop will bring computer vision researchers and practitioners together to solve problems with large-scale scene classification, scene segmentation, saliency prediction, and RGB-D detection.

Street-level image recognition is the foundation of next-generation applications, such as autonomous vehicles, delivery drones, and smart city projects. The unavailability of large-scale datasets with dense annotations is one of the biggest obstacles to making these applications a reality. In the LSUN Challenge, the world’s computer vision experts will leverage the new Mapillary Vistas dataset to help push the state-of-the-art forward.

(more…)

In the Research Spotlight: Zornitsa Kozareva

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

As AWS continues to support the Artificial Intelligence (AI) community with contributions to Apache MXNet and the release of Amazon Lex, Amazon Polly, and Amazon Rekognition managed services, we are also expanding our team of AI experts, who have one primary mission: To lower the barrier to AI for all AWS developers, making AI more accessible and easy to use. As Swami Sivasubramanian, VP of Machine Learning at AWS, succinctly stated, “We want to democratize AI.”

In our Research Spotlight series, I spend some time with these AI team members for in-depth conversations about their experiences and get a peek into what they’re working on at AWS.


Dr. Zornitsa Kozareva joined AWS in June, 2016, as the Manager of Applied Science for Deep Learning, focusing on natural language processing (NLP) and dialog applications. Zornitsa is a recipient of the John Atanasoff Award, which was given to her by the President of the Republic of Bulgaria in 2016 for her contributions and impact in science, education, and industry; the Yahoo! Labs Excellence Award in 2014; and the RANLP Young Researcher Award in 2011. You can read more about Dr. Kozareva on her website, or visit Google Scholar to find her 80 papers and 1464 citations.

Getting into the field of natural language processing

Zornitsa’s interest in the field of natural language processing dates back to 2003, when she was doing her undergraduate studies in computer science in her native Bulgaria. In her third year of undergrad, she applied to the Leonardo Da Vinci Program, which is funded by the European Commission. She was selected to conduct research on multilingual information retrieval at the New University of Lisbon, Portugal. “This was a really great experience. I learned how to build a search engine; how to innovate, write, and publish scientific papers; and, most importantly, how to share my findings with the rest of the research community. For an undergrad such as myself, this opened my eyes to a brand new horizon.”

From that moment, Zornitsa says that she was “mesmerized by machine learning and its ability to solve natural language problems. I became super passionate about the field and I decided that I wanted to pursue a PhD in NLP.”

In 2004, Zornitsa went to Spain for graduate studies, where she worked on “a wide spectrum of topics, including information extraction, semantics, and question answering. This is how my career in NLP started.”

While working toward her PhD, Zornitsa had the opportunity to do a full-year internship. “I picked the Information Sciences Institute, located in Los Angeles, because I wanted to work with world-renowned leaders in the NLP field, such as Dr. Eduard Hovy. For a year, I worked with Dr. Hovy and Dr. Ellen Riloff conducting research on knowledge extraction. It was a great learning experience, and I also received valuable career advice. Right after I graduated, I decided that I wanted to come back to the US and continue to enhance my scientific career.”

(more…)

Build a Real-time Object Classification System with Apache MXNet on Raspberry Pi

by Aran Khanna | on | Permalink | Comments |  Share

In the past five years, deep neural networks have solved many computationally difficult problems, particularly in the field of computer vision. Because deep networks require a lot of computational power to train, often using tens of GPUs, many people assume that you can run them only on powerful cloud servers. In fact, after a deep network model has been trained, it needs relatively few computational resources to run predictions. This means that you can deploy a model on lower-powered edge (non-cloud) devices and run it without relying on an internet connection.

Enter Apache MXNet, Amazon’s open source deep learning engine of choice. In addition to effectively handling multi-GPU training and deployment of complex models, MXNet produces very lightweight neural network model representations. You can deploy these representations on devices with limited memory and compute power. This makes MXNet perfect for running deep learning models on devices like the popular $35 Raspberry Pi computer.

In this post, we walk through creating a computer vision system using MXNet for the Raspberry Pi. We also show how to use AWS IoT to connect to the AWS Cloud. This allows you to use the Cloud to manage a lightweight convolutional neural network running real-time object recognition on the Pi.

Prerequisites

To follow this post, you need a Raspberry Pi 3 Model B device running Jessie or a later version of the Raspbian operating system, the Raspberry Pi Camera Module v2, and an AWS account.

Setting up the Raspberry Pi

First, you set up the Pi with the camera module to turn it into a video camera, and then install MXNet. This allows you to start running deep network-based analysis on everything that the Pi “sees.”

Set up your Pi with the Camera Module and connect the device to the Internet, either through the Ethernet port or with WiFi. Then, open the terminal and type the following commands to install the Python dependencies for this post:

sudo apt-get update
sudo apt-get install python-pip python-opencv python-scipy \
python-picamera

Build MXNet for the Pi with the corresponding Python bindings by following the instructions for Devices. For this tutorial, you won’t need to build MXNet with OpenCV.

(more…)

“Greetings, visitor!” — Engage Your Web Users with Amazon Lex

by Niranjan Hira | on | Permalink | Comments |  Share

All was well with the world last night. You went to bed thinking about convincing your manager to add some time in the next sprint for much-needed improvements to the recommendation engine for shoppers on your website. The machine learning models are out of date and people are complaining, but no one is looking past the one-off tickets that stream in every day. You wake up to the usual flurry of email.

But what’s this? You learn that the Chief Marketing Officer is at an industry conference where she’s heard the buzz about conversational experiences. She just tried out some chatbots, and now she wants one for the site. She wants to connect with shoppers one-on-one to offer them a personalized experience. That’s a fun technology problem. As long as the management team hires someone to help with the look and feel, you can focus on the fun part of putting the chatbot together.

In this post, we show how easy it is to create a chatbot and a personalized web experience for your customers using Amazon Lex and other AWS services.

What do you need to prove?

Personalized experience covers a lot of ground, but you have ideas. You could create a virtual shopping assistant that can answer questions about products; check colors, styles, and pricing; offer product recommendations; bring up relevant deals; remember shopping preferences; look up ratings and reviews–and, of course, you’ll need to look up the most useful and recent reviews first–or wait … maybe even talk about what the Twittiverse thinks. But you have to nail basic stuff like “Do you have this in red?,” “Where can I get it?,” and “What’s the return policy?.”

Basically, you need to prove:

  1. That you can build a bot quickly (check, you have Amazon Lex for that)
  2. That you can integrate your bot with the site (and later on, you might use AWS Lambda to connect to other apps)
  3. That it’s easy to monitor the bot and update it (you’re not really sure about this one)

For starters, you decide to keep it simple. You decide to build an example bot using Amazon Lex, wire it up to static HTML, connect it to a stub service, and see what it takes to update the bot. This is going to be fun!

Build an Amazon Lex bot

The specific bot isn’t important. You just want to make sure that you can put together a web experience that integrates with a service on the backend. You can start with the Amazon Lex BookTrip example. It takes a couple of minutes, but when you’re done, you’re ready to test the “Return parameters to client” (no code hooks yet) version of the bot. San Francisco for two nights, anyone?

Next, you follow the instructions to use a blueprint to create a Lambda function (BookTripCodeHook) that will serve as the code hook for initialization, data validation, and fulfillment activities. You use the Test events from the Sample event template list to confirm that the code works as expected and that you don’t have any setup or permissions issues.

(more…)

In the Research Spotlight: Hassan Sawaf

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

As AWS continues to support the Artificial Intelligence (AI) community with contributions to Apache MXNet and the release of Amazon Lex, Amazon Polly, and Amazon Rekognition managed services, we are also expanding our team of AI experts, who have one primary mission: To lower the barrier to AI for all AWS developers, making AI more accessible and easy to use. As Swami Sivasubramanian, VP of Machine Learning at AWS, succinctly stated, “We want to democratize AI.”

In our Research Spotlight series, I spend some time with these AI team members for in-depth conversations about their experiences and get a peek into what they’re working on at AWS.


Hassan Sawaf has been with Amazon since September 2016. This January, he joined AWS as Director of Applied Science and Artificial Intelligence.

Hassan has worked in the automatic speech recognition, computer vision, natural language understanding, and machine translation fields for 20+ years. In 1999, he cofounded AIXPLAIN AG, a company focusing on speech recognition and machine translation. His partners were, among others, Franz Josef Och, who eventually started the Google Translate team, and Stephan Kanthak, now Group Manager with Nuance Communications, and Stefan Ortmanns, today Senior Vice President, Mobile Engineering and Professional Services with Nuance Communications. Hassan also spent time at SAIC as Chief Scientist for Human Language Technology, where he worked on multilingual spoken dialogue systems. Coincidentally, his peer from Raytheon BBN Technologies was Rohit Prasad, who is now VP and Head Scientist for Amazon Alexa.

How did you get started?

“I started working in development on information systems in airports, believe it or not. Between airlines and airports, and from airport-to-airport, the communication used to be via Telex messages, using something similar to “shorthand” information about the plane. These messages included information such as Who has boarded the plane? What’s the cargo? How is the baggage distributed on the plane? How much fuel does it have? What kinds of passengers (first class, business class), etc. This kind of information was sent from airline to airport before the plane landed. But by the 1990’s, flight travel had grown exponentially. And it used to be that humans had to read this information and translate that into actions in the airport. So, we built the technology that could do this fully automatically, so that manual human intervention was no longer needed. People no longer needed to sit there reading Telex messages and typing ahead on the computer. We converted this such that the process was completely done by machine. This was my first project in natural language understanding.

(more…)