AWS AI Blog

Activity Tracking with a Voice-Enabled Bot on AWS

by Bob Strahan, Oliver Atoa and Bob Potterveld | on | Permalink | Comments |  Share
Listen to this post

Voiced by Amazon Polly

It’s New Year’s Eve. Your friends and loved ones have gone to the party, but you can’t go just yet because you haven’t figured out how to track the key performance indicators for your New Year’s resolution.

You’ve already divided your resolution into categories, and you’ve set personal targets for each category. Now you just need to log your activities and routinely calculate how you’re doing, so you stay on track. But you’ve been down this road before. You know that in a few short days log keeping will become tedious. You’ll start putting it off, and then you’ll forget. Before you know it, your resolution has gone the way of so many resolutions before it.

We’ve all been there. Right?

This year, you want a new way to log your activities. A fun and easy way, that will keep you engaged and prevent the procrastination that has so often proved disastrous. The midnight deadline is approaching – you need to implement this thing quickly so you can get to the party and celebrate the arrival of the New Year, secure in the knowledge that this year will be different!

In this post, we provide a solution to this perennial problem: a sample tracking bot application, called TrackingBot, which lets you log your activities by talking to it. The following flowchart shows the steps required to develop and use TrackingBot:

(more…)

Capturing Voice Input in a Browser and Sending it to Amazon Lex

by Andrew Lafranchise | on | Permalink | Comments |  Share

Ever since we released Amazon Lex, customers have asked us how to embed voice into a web application. In this blog post, we show how to build a simple web application that uses the AWS SDK for JavaScript to do that. The example application, which users can access from a browser, records audio, sends the audio to Amazon Lex, and plays the response. Using browser APIs and JavaScript we show how to request access to a microphone, record audio, downsample the audio, and PCM encode the audio as a WAV file. As a bonus, we show how to implement silence detection and audio visualization, which are essential to building a user-friendly audio control.

Prerequisites

This post assumes you have some familiarity with

Don’t want to scroll through the details? You can download the example application here: https://github.com/awslabs/aws-lex-browser-audio-capture

The following sections describe how to accomplish important pieces of the audio capture process. You don’t need to copy/paste them–they are intended as a reference. You can see everything working together in the example application.

Requesting access to a microphone with the MediaDevices API

To capture audio content in a browser, you need to request access to an audio device, in this case, the microphone. To access the microphone, you use the navigator.mediaDevices.getUserMedia method in the MediaDevices API. To process the audio stream, you use the AudioContext interface in the Web Audio API. The code that follows performs these tasks:

  1. Creates an AudioContext
  2. Calls the getUserMedia method and requests access to the microphone. The getUserMedia method is supported in Chrome, Firefox, Edge, and Opera. We tested the example code in Chrome and Firefox.
  3. Creates a media stream source and a Recorderobject. More about the Recorder object later.
  // control.js
 
  /**
   * Audio recorder object. Handles setting up the audio context, 
   * accessing the mike, and creating the Recorder object.
   */
  lexaudio.audioRecorder = function() {
    /**
     * Creates an audio context and calls getUserMedia to request the mic (audio).
     * If the user denies access to the microphone, the returned Promise rejected 
     * with a PermissionDeniedError
     * @returns {Promise} 
     */
    var requestDevice = function() {
 
      if (typeof audio_context === 'undefined') {
        window.AudioContext = window.AudioContext || window.webkitAudioContext;
        audio_context = new AudioContext();
      }
 
      return navigator.mediaDevices.getUserMedia({ audio: true })
        .then(function(stream) {
          audio_stream = stream; 
        });
    };
 
    var createRecorder = function() {
      return recorder(audio_context.createMediaStreamSource(audio_stream, worker));
    };
 
    return {
      requestDevice: requestDevice,
      createRecorder: createRecorder
    };
 
  };

The code snippet illustrates the following important points:

  • The user has to grant us access the microphone. Most browsers request this with a pop-up. If the user denies access to the microphone, the returned Promise rejected with a PermissionDeniedError.
  • In most cases, you need only one AudioContext instance. Browsers set limits on the number of AudioContextinstances you can create and throw exceptions if you exceed them.
  • We use a few elements and APIs (audio element, createObjectURL, and AudioContext) that require thorough feature detection in a production environment.

(more…)

Updated AWS Deep Learning AMIs with Apache MXNet 0.10 and TensorFlow 1.1 Now Available

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

You can now use Apache MXNet v0.10 and TensorFlow v1.1 with the AWS Deep Learning AMIs for Amazon Linux and Ubuntu. Apache MXNet announced version 0.10, available at http://mxnet.io, with significant improvements to documentation and tutorials including updated installation guides for running MXNet on various operating systems and environments, such as NVIDIA’s Jetson TX2. In addition, current tutorials have been augmented with definitions for basic concepts around foundational development components. API documentation is now more comprehensive, with accompanying samples. Python PIP install packages are now available for the v0.10 release, making it easy to install MXNet on Mac OSX or Linux CPU or GPU environments. These packages also include Intel’s Math Kernel Library (MKL) support for acceleration of math routines on Intel CPUs.

Visit the AWS Marketplace to get started with the AWS Deep Learning AMI v1.4_Jun2017 for Ubuntu and the AWS Deep Learning AMI v2.2_Jun2017 for Amazon Linux. The AWS Deep Learning AMIs are available in the following public AWS regions: US East (N. Virginia), US West (Oregon), and EU (Ireland).

Tuning Your DBMS Automatically with Machine Learning

by Dana Van Aken, Geoff Gordon and Andy Pavlo | on | Permalink | Comments |  Share

This is a guest post by Dana Van Aken, Andy Pavlo, and Geoff Gordon of Carnegie Mellon University. This project demonstrates how academic researchers can leverage our AWS Cloud Credits for Research Program to support their scientific breakthroughs.

Database management systems (DBMSs) are the most important component of any data-intensive application. They can handle large amounts of data and complex workloads. But they’re difficult to manage because they have hundreds of configuration “knobs” that control factors such as the amount of memory to use for caches and how often to write data to storage. Organizations often hire experts to help with tuning activities, but experts are prohibitively expensive for many.

OtterTune, a new tool that’s being developed by students and researchers in the Carnegie Mellon Database Group, can automatically find good settings for a DBMS’s configuration knobs. The goal is to make it easier for anyone to deploy a DBMS, even those without any expertise in database administration.

OtterTune differs from other DBMS configuration tools because it leverages knowledge gained from tuning previous DBMS deployments to tune new ones. This significantly reduces the amount of time and resources needed to tune a new DBMS deployment. To do this, OtterTune maintains a repository of tuning data collected from previous tuning sessions. It uses this data to build machine learning (ML) models that capture how the DBMS responds to different configurations. OtterTune uses these models to guide experimentation for new applications, recommending settings that improve a target objective (for example, reducing latency or improving throughput).

In this post, we discuss each of the components in OtterTune’s ML pipeline, and show how they interact with each other to tune a DBMS’s configuration. Then, we evaluate OtterTune’s tuning efficacy on MySQL and Postgres by comparing the performance of its best configuration with configurations selected by database administrators (DBAs) and other automatic tuning tools.

OtterTune is an open source tool that was developed by students and researchers in the Carnegie Mellon Database Research Group. All code is available on GitHub, and is licensed under Apache License 2.0.

How OtterTune works

The following diagram shows the OtterTune components and workflow.

At the start of a new tuning session, the user tells OtterTune which target objective to optimize (for example, latency or throughput). The client-side controller connects to the target DBMS and collects its Amazon EC2 instance type and current configuration.

Then, the controller starts its first observation period, during which it observes the DBMS and records the target objective. When the observation period ends, the controller collects internal metrics from the DBMS, like MySQL’s counters for pages read from disk and pages written to disk. The controller returns both the target objective and the internal metrics to the tuning manager.

(more…)

In the Research Spotlight: Edo Liberty

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

As AWS continues to support the Artificial Intelligence (AI) community with contributions to Apache MXNet and the release of Amazon Lex, Amazon Polly, and Amazon Rekognition managed services, we are also expanding our team of AI experts, who have one primary mission: To lower the barrier to AI for all AWS developers, making AI more accessible and easy to use. As Swami Sivasubramanian, VP of Machine Learning at AWS, succinctly stated, “We want to democratize AI.”

In our Research Spotlight series, I spend some time with these AI team members for in-depth conversations about their experiences and get a peek into what they’re working on at AWS.


Edo Liberty is a Principal Scientist at Amazon Web Services (AWS) and the manager of the Algorithms Group at Amazon AI. His work has received more than 1000 citations since 2012. You can view his conference and journal publications, patents, and manuscripts in process on www.edoliberty.com, or access his papers on Google Scholar.

Although fully immersed in AI today, Edo shared with me that he originally wanted to be a physicist when he started college in Tel Aviv. “I knew absolutely nothing about computers and I felt I would be a lousy physicist if I didn’t learn to code, at least a little bit.” So, he minored in computer science, “even though I knew I was going to hate it,” he admitted. “But the more I learned, physics became more numerical, technical, and counterintuitive as relativity and quantum mechanics kicked in. At the same time, computer science became less technical and more abstract and beautiful. Computer science stopped being about software, and started being about algorithms and math and complexity and that’s when it became very interesting. I ended up shifting to a major in computer science, and really fell in love with this whole field.”

In 2004, Edo moved to the United States, and completed his PhD in computer science at Yale University, where he started getting into a lot of different types of machine learning, data science, and data mining. He ended up most interested in math and algorithms – specifically, the theory and the algorithms behind big data. “Back then, we were working on hyperspectral images. Every image was about 1½ GB, but my desktop only had 512 Mb of memory. That was big data for me!  But I still needed to analyze the image, so I had to really figure out what to do.”

Edo finished his PhD doing theoretical computer science, and then completed a post-doctorate in applied math. He then opened a startup in New York City, building a distributed video search platform “with lots of algorithms and systems and math, and it was very exciting and a lot of fun.”

(more…)

Personalizing Videos: BeeLiked uses Amazon Polly to Launch the #DanBrownOrigin campaign, the World’s First Virtual Book Signing

by Robin Dautricourt | on | Permalink | Comments |  Share
Listen to this post

Voiced by Amazon Polly

Just as Dan Brown has captivated millions of readers through countless plot twists and turns, the launch of his new novel, Origin, will lead you along an inspired journey that guarantees to speak to you and draw you in. Literally.

The 2003 best-selling author of The Da Vinci Code invites you to participate in selecting the book cover design for a limited edition of his novel, to be released on October 3rd 2017. In return for casting your vote, you will receive a personalized video in which you will be greeted by name, and witness Dan Brown signing a copy of his new book, just for you.

The magic behind the #DanBrownOrigin experience is produced by BeeLiked, a self-service social pollination platform that is rewriting the script for launching engaging marketing campaigns. The secret behind the voice that greets each fan by name, in their very own personalized video, is Amazon Polly. To get further behind the scenes, and to learn more about how BeeLiked pulled off this amazing opportunity for millions of loyal fans to connect with Dan Brown, read the full blog post about the #DanBrownOrigin campaign.

(more…)

In the Research Spotlight: Mu Li

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

As AWS continues to support the Artificial Intelligence (AI) community with contributions to Apache MXNet and the release of Amazon Lex, Amazon Polly, and Amazon Rekognition managed services, we are also expanding our team of AI experts, who have one primary mission: To lower the barrier to AI for all AWS developers, making AI more accessible and easy to use. As Swami Sivasubramanian, VP of Machine Learning at AWS, succinctly stated, “We want to democratize AI.”

In our Research Spotlight series, I spend some time with these AI team members for in-depth conversations about their experiences and get a peek into what they’re working on at AWS.


Mu Li is a principal scientist for machine learning at AWS. Before joining AWS, he was the CTO of Marianas Labs, an AI start-up. He also served as a principal research architect at the Institute of Deep Learning at Baidu. He obtained his PhD in computer science from Carnegie Mellon University, where one of his advisors was Alex Smola, now Director of Machine Learning at AWS. Mu’s research has focused on large-scale machine learning. In particular, he is interested in the co-design of distributed systems and machine learning algorithms. He has been the first-author for computer science conference and journal papers on subjects that span theory (FOCS), machine learning (NIPS, ICML), applications (CVPR, KDD), and operating systems (OSDI).

At AWS, Mu leads a team that works primarily on the Apache MXNet framework. Their focus is making it easier to use deep learning and to run deep learning applications on AWS. To accomplish this, Mu and his team are charting new territory in deep learning research, investigating and simplifying new algorithms that can run on large-scale datasets in distributed systems. “The speed of machine learning training depends on two things: how fast you can process images and how fast you can process the final model,” says Mu. The framework should support using multiple GPUs and multiple machines. “The latter is related to optimization–what we call the convergence rate. When we move from a single machine to multiple machines, we need to develop new distributed training algorithms. We need to change the algorithm itself–to change the neural network structure–so that it can be easily used to train very large datasets on a large number of machines.”

(more…)

Join Us as We Go Deep with AI on AWS at These Upcoming Events

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

For our customers in Europe, Julien Simon, Principal Technical Evangelist at Amazon Web Services, is leading workshops and speaking at several events throughout the month. Julien is a frequent speaker at workshops and conferences taking participants on a journey through Deep Learning with AWS, Amazon Lex, Amazon Polly, Amazon Rekognition, Apache MXNet and more. If you’re attending any of these events, please join us for a great conversation.

More information:

May 20, 2017 – AI on a Pi – Dev It, Thessaloniki (Greece) 

May 23, 2017 – Amazon AI – AWS Transformation Day, Utrecht (Netherlands)

May 30, 2017 – Amazon AI – Sharks in IT, Sofia (Bulgaria)

May 30, 2017 – Deep Learning with Apache MXNet –  AWS User Group Sofia (Bulgaria) 

Integrate Your Amazon Lex Bot with Any Messaging Service

by Ahmad Khan | on | Permalink | Comments |  Share

Is your Amazon Lex chatbot ready to talk to the world? When it is, chances are that you’ll want it to be able to interact with as many users as possible. Amazon Lex offers built-in integration with Facebook, Slack and Twilio. But what if you want to connect to a messaging service that isn’t supported? Well, there’s an API for that–the Amazon Lex API. In this post, I show how to integrate an Amazon Lex bot with an external messaging service by using Twilio Programmable SMS as the example service.

You can integrate any messaging service that provides the right APIs with Amazon Lex using the design pattern described in this post. The solution includes a serverless middle tier or a preprocessing layer “in front of” Amazon Lex. This is useful if you want to incorporate Amazon Lex as another building block into your systems. For example, if you’re in an enterprise, you could use this solution to implement custom message routing to specialized bots developed by different business units.

For simpler uses cases, the built-in integration for Twilio in the Amazon Lex console might be a better option.

Architecture and message flow

For this integration, I chose a serverless architecture that uses Amazon API Gateway and AWS Lambda to robustly and scalably integrate the Amazon Lex bot with the Twilio messaging service. Going serverless means that you don’t have to worry about managing individual instances, and that you incur costs only for the resources that your application uses. API Gateway provides the secure API endpoint for a Lambda function that implements your business logic.

(more…)

In the Research Spotlight: Anima Anandkumar

by Victoria Kouyoumjian | on | Permalink | Comments |  Share

As AWS continues to support the Artificial Intelligence (AI) community with contributions to Apache MXNet and the release Amazon Lex, Amazon Polly, and Amazon Rekognition managed services, we are also expanding our team of AI experts, who have one primary mission: To lower the barrier to AI for all AWS developers, making AI more accessible and easy to use. As Swami Sivasubramanian, VP of Machine Learning at AWS, succinctly stated, “We want to democratize AI.”

In our Research Spotlight series, I spend some time with these AI team members for in-depth conversations about their experiences and get a peek into what they’re working on at AWS.


Anima Anandkumar joined AWS in November 2016, as Principal Scientist on Deep Learning. She is currently on leave from the EECS Department at UC Irvine, where she has been an associate professor since August 2010. Anima has earned several prestigious awards, including the Alfred P. Sloan Research Fellowship, the NSF CAREER award, and Young Investigator Research awards from the Army Research Office and the Air Force Office for Sponsored Research. Her research interests include large-scale machine learning, non-convex optimization, and high-dimensional statistics. In particular, she’s been spearheading the development and analysis of tensor algorithms.

“My mission is to make machine learning accessible to everyone on the planet, and AWS is an awesome place to achieve that.” She went on to explain that she wants to remove the guesswork for launching large-scale machine learning jobs, so that you don’t have to be an expert in machine learning, application domains, or programming, especially because it’s humanly impossible for one person to have all these skill sets. As Anima notes, there is a huge gap between formulating theories and going into production with a machine learning workload. Her goal is to shrink the gap from prototyping to deployment.

One of the tools that Anima plans to work with is Apache MXNet. She wants to add a lot more functionality to exploit Apache MXNet’s programmability and ease of use. “Our roadmap includes operations that surpass the existing deep learning framework. We want to develop multi-modal processing algorithms.” Multi-modal processing allows an algorithm to simultaneously process text, images, and other modalities at scale.

(more…)