AWS Deep Learning AMI Now Supports PyTorch, Keras 2 and Latest Deep Learning Frameworks

Today, we’re pleased to announce an update to the AWS Deep Learning AMI.

The AWS Deep Learning AMI, which lets you spin up a complete deep learning environment on AWS in a single click, now includes PyTorch, Keras 1.2 and 2.0 support, along with popular machine learning frameworks such as TensorFlow, Caffe2 and Apache MXNet.

Using PyTorch for fast prototyping

The AMI now includes PyTorch 0.2.0, allowing developers to create dynamic neural networks in Python, a good fit for dynamic inputs such as text and time series. Developers can get started quickly using these beginner and advanced tutorials, including setting up distributed training with PyTorch.

Improved Keras support

The AMI now supports the most recent version of Keras, v2.0.8. By default, your Keras code will run against TensorFlow as a backend; you can also swap to other supported backends such as Theano and CNTK. We’ve also included a modified version of Keras 1.2.2 which runs on the Apache MXNet backend with better training performance.

Pre-installed and configured with the latest frameworks

This release of the AMI includes support for the latest versions of the following frameworks:

  • Apache MXNet 0.11.0
  • TensorFlow 1.3.0
  • Caffe2 0.8.0
  • Caffe 1.0
  • PyTorch 0.2.0
  • Keras 2.0.8
  • Keras 1.2.2 (DMLC fork) for Apache MXNet
  • Theano 0.9.0
  • CNTK 2.0
  • Torch (master branch)

It is also packaged with the following pre-configured libraries for GPU acceleration:

  • CUDA Toolkit 8.0
  • cuDNN 5.1
  • NVidia Driver 375.66
  • NCCL 2.0


Your Guide to Machine Learning at re:Invent 2017

re:Invent 2017 is almost here! As you plan your agenda, machine learning is undoubtedly a hot topic on your list. This year we have a lot of great technical content in the Machine Learning track, with over 50 breakout sessions, hands-on workshops, labs, and deep-dive chalk talks. You’ll hear first-hand from customers and partners about their success with machine learning including Facebook, NVIDIA, TuSimple, Visteon, Matroid, Butterfleye, Open Infuence, Whooshkaa, Infor and Toyota Racing Development.

This year we’re hosting our inaugural Deep Learning Summit where thought leaders, researchers, and venture capitalists share their perspectives on the direction in which deep learning is headed. In addition you can take part in our deep-learning-powered Robocar Rally. Join the rally to get first-hand experience building your own autonomous vehicle and competing in an AI-powered race.

Here are a few highlights of this year’s lineup from the re:Invent session catalog to help you plan your event agenda.

  • Robocar Rally 2017
  • Deep Learning Summit
  • Deep Learning on AWS
  • Computer Vision
  • Language & Speech 

Robocar Rally 2017
Get behind the keyboard at Robocar Rally 2017, a hackathon for hands on experience with deep learning, autonomous cars, and Amazon AI and IoT services. At Robocar Rally 2017, you’ll learn the latest in autonomous car technology from industry leaders and join a pit crew to customize, train, and race your own 1/16th scale autonomous car. Follow along on the road to Robocar Rally 2017 with a series of Twitch streams and blog posts to accelerate your learning on how to build, deploy, and train your own 1/16th scale autonomous car. Teams will be provided a car to race in time trials on the re:Invent Circuit, and you can also build and bring your own car for separate exhibition races.

Robocar Rally 2017 is a two-day hackathon at the Aria that starts on Sunday November 26th from 6pm to 10pm, followed by a full day of hacking on Monday November 27th from 7am to 10pm, with a final race from 10pm to 12am.

Deep Learning Summit

The Deep Learning Summit is designed for developers interested in learning more about the latest in deep learning applied research and emerging trends. Attendees will hear from industry thought leaders—members of the academic and venture capital communities—who will share their perspectives on deep learning trends and emerging centers of gravity. The Summit will be held on Thursday November 30th at the Aria Hotel.


Amazon Polly Expands to the Asia Pacific (Tokyo) Region and Adds Two New Voices

Amazon Polly is an AWS service that turns text into lifelike speech. Today, we are excited to announce the expansion to the Asia Pacific (Tokyo) Region, as well as the release of two new Text-to-Speech voices. We are pleased to present Takumi, a new Japanese male voice, and Matthew, a new US English male voice.

When we launched Amazon Polly in November 2016, we offered a portfolio of 47 voices spread across 24 languages. Since the day we launched, customers have been requesting additional languages and voices, as well as expansions to new AWS Regions. We’ve been listening.

Amazon Polly has been accessible worldwide from the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Today we add a fifth AWS Region: Asia Pacific (Tokyo). This new option will provide increased stability and reduced latency for those customers in the Asia Pacific Region, and we look forward to continuing our regional expansion to further optimize the Amazon Polly Text-to-Speech service for all customers.

To accompany our regional expansion in Japan, we are adding the voice of Takumi to our Japanese language portfolio. Our voice offerings already include the female voice Mizuki, so we now offer gender parity in Japanese. This will especially benefit those customers whose use cases are enhanced by increased diversity in voice profile.


Introducing Gluon — An Easy-to-Use Programming Interface for Flexible Deep Learning

Today, AWS and Microsoft announced a new specification that focuses on improving the speed, flexibility, and accessibility of machine learning technology for all developers, regardless of their deep learning framework of choice. The first result of this collaboration is the new Gluon interface, an open source library in Apache MXNet that allows developers of all skill levels to prototype, build, and train deep learning models. This interface greatly simplifies the process of creating deep learning models without sacrificing training speed.

Here are Gluon’s four major advantages and code samples that demonstrate them:

(1) Simple, easy-to-understand code

In Gluon, you can define neural networks using simple, clear, and concise code. You get a full set of plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers. These abstract away many of the complicated underlying implementation details. The following example shows how you can define a simple neural network with just a few lines of code:

# First step is to initialize your model
net = gluon.nn.Sequential()
# Then, define your model architecture
with net.name_scope():
    net.add(gluon.nn.Dense(128, activation="relu")) # 1st layer - 128 nodes
    net.add(gluon.nn.Dense(64, activation="relu")) # 2nd layer – 64 nodes
    net.add(gluon.nn.Dense(num_outputs)) # Output layer

The following diagram shows you the structure of the neural network:

For more information, go to this tutorial to learn how to build a simple neural network called a multilayer perceptron (MLP) with the Gluon neural network building blocks. It’s also easy to write parts of the neural network from scratch for more advanced use cases. Gluon allows you to mix and match predefined and custom components in your neural network.


Introducing NNVM Compiler: A New Open End-to-End Compiler for AI Frameworks

You can choose among multiple artificial intelligence (AI) frameworks to develop AI algorithms. You also have a choice of a wide range of hardware to train and deploy AI models. The diversity of frameworks and hardware is crucial to maintaining the health of the AI ecosystem. This diversity, however, also introduces several challenges to AI developers. This post briefly addresses these challenges and introduces a compiler solution that can help solve them.

Let’s review the challenges first, introduce you to the UW and AWS research teams, and then walk you through how the compiler works.

Three challenges

First, it is nontrivial to switch from one AI framework to another because of differences among the frontend interfaces and the backend implementations. In addition, algorithm developers might use more than one framework as part of the development and delivery pipeline. At AWS we have customers who want to deploy their Caffe model on MXNet to enjoy the accelerated performance on Amazon EC2. According to Joaquin Candela’s recent blog, users might use PyTorch to develop quickly and then deploy on Caffe2.

Second, framework developers need to maintain multiple backends to guarantee performance on hardware ranging from smartphone chips to data center GPUs. Take MXNet as an example. It has a portable C++ implementation built from scratch. It also ships with target dependent backend support like cuDNN for Nvidia GPU and MKLML for Intel CPUs. Guaranteeing that these different backends deliver consistent numerical results to users is challenging.

Last, chip vendors need to support multiple AI frameworks for every new chip they build. The workloads in each framework are represented and executed in unique ways, so even a single operation such as Convolution might need to be defined in different ways. Supporting multiple frameworks requires enormous engineering efforts.

Introducing the research team from UW and AWS

Diverse AI frameworks and hardware bring huge benefits to users, but it is very challenging to AI developers to deliver consistent results to end users. Luckily, we are not the first to face this kind of problems. Computer science has a long history of running various programming languages on different hardware. One key technology to solve this problem is the compiler. Motivated by the compiler technology, a group of researchers including Tianqi Chen, Thierry Moreau, Haichen Shen, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy from Paul G. Allen School of Computer Science & Engineering, University of Washington, together with Ziheng Jiang from the AWS AI team, introduced the TVM stack to simplify this problem.

Today, AWS is excited to announce, together with the research team from UW, an end-to-end compiler based on the TVM stack that compiles workloads directly from various deep learning frontends into optimized machine codes. Let’s take a look at the architecture.


We observed that a typical AI framework can be roughly partitioned into three parts:

  • The frontend exposed an easy-to-use interface to users.
  • The workloads received from the frontend are often presented as computation graphs, which consist of data variables (a, b, and c) and operators (* and +).
  • The operators, ranging from basic arithmetic operations to neural network layers, are implemented and optimized for multiple hardware.

The new compiler, called the NNVM compiler, is based on two components in the TVM stack: NNVM (Neural Network Virtual Machine) for computation graphs and TVM (Tensor Virtual Machine) for tensor operators.


Capture and Analyze Customer Demographic Data Using Amazon Rekognition & Amazon Athena

Millions of customers shop in brick and mortar stores every day. Currently, most of these retailers have no efficient way to identify these shoppers and understand their purchasing behavior. They rely on third-party market research firms to provide customer demographic and purchase preference information.

This blog post walks you how you can use AWS services to identify purchasing behavior of your customers. We show you:

  • How retailers can use captured images in real time.
  • How Amazon Rekognition can be used to retrieve face attributes like age range, emotions, gender, etc.
  • How you can use Amazon Athena and Amazon QuickSight to analyze the face attributes.
  • How you can create unique insights and learn about customer emotions and demographics.
  • How to implement serverless architecture using AWS managed services.

The next section describes the basic AWS architecture.

How it works

The following diagram illustrates the steps in the process.

This is what happens in greater detail:

  1. You place the images in an Amazon Simple Storage Service (S3) bucket. This triggers the Lambda function.
  2. The Lambda function calls the Rekognition service to extract the image label information.
  3. Image attributes are stored in the .csv format in another S3 bucket.
  4. The Amazon Athena service reads all of the face attributes in the .csv files and loads the data for ad-hoc queries.
  5. We use Amazon QuickSight to build the customer insight dashboards.


Using Amazon Polly to Provide Real-Time Home Monitoring Alerts

This is a guest blog post by Siva K. Syamala, Senior Developer from Y-cam Solutions. In her own words, “Y-cam is a provider of high quality security video solutions, our vision is to make smart home security easy and accessible to all.”

Home security is a very important constituent in home automation and the use of the Internet of Things. Y-cam Solutions Limited, with the help of Amazon as a backbone, has delivered a smart security system that can be monitored and controlled from anywhere in the world with a smart phone. To improve the alerts, notifications, and the way to control the system, Y-cam uses Amazon Polly to provide a first class AI service where the user interacts with the security system through speech.

How our service works

When the alarm is triggered, we notify our customers with a voice call through Twilio. After the call is established, Twilio steps through the TwiML instructions and uses synthesized speech retrieved from Amazon Polly to start streaming to the customer. Call recipients respond by pressing buttons on their mobile phone keypad (DTMF codes). Depending on the DTMF codes, our service takes the specified action and returns the TwiML instructions for synthesized speech retrieval from Amazon Polly. To sound like a realistic conversation, it’s essential that Amazon Polly responds quickly. Delays and waiting can cause frustration and increase the likelihood of the recipient hanging up.

Below is a sample audio clip of a phone call to customer when an alarm is triggered.




Calling Amazon Polly

The following Java code shows requesting the synthesized speech from Amazon Polly and storing it in an S3 bucket.


Build an Autonomous Vehicle on AWS and Race It at the re:Invent Robocar Rally

Autonomous vehicles are poised to take to our roads in massive numbers in the coming years. This has been made possible due to advances in deep learning and its application to autonomous driving. In this post, we take you through a tutorial that shows you how to build a remote control (RC) vehicle that uses Amazon AI services.

Typically each autonomous vehicle is stacked with a lot of sensors that provide rich telemetry. This telemetry can be used to improve the driving of the individual vehicle but also the user experience. Some examples of those improvements are time saved by smart drive routing, increased vehicle range and efficiency, and increased safety and crash reporting. On AWS, customers like TuSimple have built a sophisticated autonomous platform using Apache MXNet. Recently TuSimple completed a 200-mile driverless ride.

To drive awareness of deep learning, AWS IoT, and the role of artificial intelligence (AI) in autonomous driving, AWS will host a workshop-style hackathon—Robocar Rally at re:Invent 2017. This is the first in a series of blog posts and Twitch videos for developers to learn autonomous AI technologies and to prepare for the hackathon. For more details on the hackathon, see Robocar Rally 2017.

In this tutorial we’ll leverage the open source platform project called Donkey. If you want, you can experiment with your own 1/10 scale electric vehicle. However we’ll stick to the recommended 1/16 scale RC vehicle used in the donkey project.

Here are a couple of videos that show two of the cars that we have built at AWS using the tutorial that follows.



Vehicle Build Process

The process for assembling and configuring the autonomous vehicle can be found in this repo. It also includes a full materials list with links on where to purchase the individual components. The main components are the RC Car, Raspberry Pi, Pi Cam, and Adafruit Servo HAT, the combined cost of which was less than $250. You can buy additional sensors, such as a stereo camera, LIDAR data collector, and an accelerometer, to name a few.

We recommend that you follow the steps on this Github repo to ensure a basic level of capabilities and a path to success that minimizes some undifferentiated heavy lifting.


Build a Voice Kit with Amazon Lex and a Raspberry Pi

In this post, we show how you can embed Amazon Lex into custom hardware using widely available components. We demonstrate how you can build a simple voice-based AI kit and connect it to Amazon Lex. We’ll use a Raspberry Pi and a few off-the-shelf components totaling less than $60. By the end of this blog you will have a network-connected hardware device integrated with the Amazon Lex PostContent API. We also demo a couple of example Bots—a voice controlled robot and a voice controlled metronome.

Component Overview

You need the following components to build the Amazon Lex hardware kit.

Physical Construction

Raspberry Pi

Figure 1. Raspberry PI Model B

We use a stock Raspberry PI 3 Model B for this project. Figure 1 shows the Raspberry Pi mounted in a Clear Case Box kit. The Clear Case Box neatly packages the Pi, Digital Audio Controller (DAC), and speakers but is not necessary.


Two New Courses are Now Available for Machine Learning and Deep Learning on AWS

AWS Training and Certification helps you advance your knowledge with practical skills so you can get more out of the AWS Cloud. We now have two new courses to help you learn about how to leverage artificial intelligence (AI) solutions using AWS: Introduction to Machine Learning web-based training and Deep Learning on AWS instructor-led training. If you are looking to learn more about how you can put AI capabilities to use, we recommend that you start with the web-based training. Developers looking to learn more should then attend the one-day instructor-led training.

Here’s a bit more about each of these new training courses:

Introduction to Machine Learning is a free 40 minute web-based training intended for developers, solutions architects, and IT decision makers who already know the foundations of working with AWS. This online course gives an overview of machine learning (ML), walks through an example use case, teaches relevant terminology, and walks through the process for incorporating ML solutions into a business or product. Specifically, this course teaches you how to do the following:

  • Approach ML as a business problem and work toward a technical solution.
  • Frame your business problem as a ML problem.
  • Use ML terminology and describe techniques in real-world business use cases.
  • Understand the end-to-end process of building ML models correctly, from posing the question/problem, collecting data, and building a model to evaluating model performance and integrating it into your application.

The course also includes knowledge checks to help validate understanding.