AWS AI Blog

Amazon and Facebook Collaborate to Optimize Caffe2 for the AWS Cloud

by Joseph Spisak and Yangqing Jia | on | | Comments

From Apache MXNet to Torch, there is no shortage of frameworks for deep learners to leverage. The various offerings each excel at different aspects of the deep learning pipeline and each meets different developer needs. The research-centric community tends to gravitate toward frameworks such as Theano, Torch and most recently PyTorch, while many in the industry have Caffe, TensorFlow or Apache MXNet deployed at scale for production applications. Given the heterogeneity in usage and users, AWS supports a range of frameworks as part of its developer tool offerings and, as a result, supports a broad spectrum of users.
AWS provides an open environment for developers to conduct deep learning. As we announced on April 18th, we are excited to further increase developer choice by offering support for Facebook’s newly launched Caffe2 project in the Ubuntu version of the AWS Deep Learning AMI (and coming soon in the Amazon Linux version, too).

What is Caffe2?

Caffe2—architected by Yangqing Jia, the original developer of Caffe—is a lightweight, modular, and scalable deep learning framework. Facebook deployed Caffe2 internally to help researchers train large machine learning models and deliver AI on mobile devices.
Now, all developers have access to many of the same tools for running large-scale distributed training and building machine learning applications for mobile. This allows the machine learning community to rapidly experiment with more complex models and deploy machine learning applications and services for mobile scenarios.

Caffe2 features include:

  • Easy implementation of a variety of models, including CNNs (convolutional neural networks), RNNs (recurrent neural networks), and conventional MLPs (multi-layer perceptrons)
  • Native distributed training interfaces
  • Mixed-precision and reduced-precision computations
  • Graph-based computation patterns that facilitate easy heterogeneous computation across multiple devices
  • Modularity, allowing the addition of custom recipes and hardware without risking codebase collisions
  • Strong support for mobile and embedded platforms in addition to conventional desktops and server environments

Why “yet another” deep learning framework?

The original Caffe framework, with unparalleled performance and a well-tested C++ codebase, is useful for large-scale conventional CNN applications. However, as new computation patterns emerge—especially distributed computation, mobile, reduced precision computation, and more non-vision use cases—Caffe’s design limitations became apparent.

By early 2016, the Facebook team had developed an early version of Caffe2 that improved Caffe by implementing a modern computation graph design, minimalist modularity, and the flexibility to easily port to multiple platforms. In the last year, Facebook has fully embraced Caffe2 as a multipurpose, deep learning framework, and has begun using it in Facebook products.

The Facebook team is very excited about Caffe2’s ability to support a wide range of machine learning use cases, and is equally excited to contribute Caffe2 to the open source community. The team’s also looking forward to working with partners like AWS and the open source software community to push the state-of-the-art in machine learning systems.

(more…)

Running BigDL, Deep Learning for Apache Spark, on AWS

by Joseph Spisak, Jason Dai, and Radhika Rangarajan | on | | Comments

In recent years, deep learning has significantly improved several AI applications, such as recommendation engines, voice and speech recognition, and image and video recognition. Many customers process the massive amounts of data that feed these deep neural networks in Apache Spark, only to later feed it into a separate infrastructure to train models using popular frameworks, such as Apache MXNet and TensorFlow. Because of the popularity of Apache Spark and contributors that exceed a thousand, the developer community has expressed interest in uniting the big data infrastructure and deep learning into a single workflow under Apache Spark.

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley‘s AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which maintains it. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

BigDL is a distributed deep learning framework for Apache Spark that was developed by Intel and contributed to the open source community for the purposes of uniting big data processing and deep learning. BigDL helps make deep learning more accessible to the big data community by allowing developers to continue using familiar tools and infrastructure to build deep learning applications. BigDL is licensed under the Apache 2.0 license.

As the following diagram shows, BigDL is implemented as a library on top of Spark, so that users can write their deep learning applications as standard Spark programs. As a result, BigDL can be seamlessly integrated with other libraries on top of Spark—Spark SQL and DataFrames, Spark ML pipelines, Spark Streaming, Structured Streaming, etc.—and can run directly on top of existing Spark or Hadoop clusters.

(more…)

Deep Learning AMI for Ubuntu v1.3_Apr2017 Now Supports Caffe2

by Joseph Spisak | on | | Comments

We are excited to announce that the AWS Deep Learning AMI for Ubuntu now supports the newly launched Caffe2 project led by Facebook. AWS is the best and most open place for developers to run deep learning, and the addition of Caffe2 adds yet another choice. To learn more about Caffe2, check out the the Caffe2 developer site or the GitHub repository.


The Deep Learning AMI v1.3_Apr2017 for Ubuntu provides a stable, secure, and high-performance execution environment for deep learning applications running on Amazon EC2. This AMI includes the following framework versions:

  • MXNet v0.9.3
  • Caffe2 v0.6.0 (new)
  • TensorFlow v1.0.1 (updated)
  • Caffe rc5
  • Theano rel-0.8.2
  • Keras 1.2.2
  • CNTK v2.0 RC1 (updated)
  • Torch master branch

(more…)

AI Tech Talk: An Overview of AI on the AWS Platform

by Victoria Kouyoumjian | on | | Comments

ConnectedHeads_800_o

AWS offers a family of intelligent services that provide cloud-native machine learning and deep learning technologies to address your different use cases and needs. For developers looking to add managed AI services to their applications, AWS brings natural language understanding (NLU) and automatic speech recognition (ASR) with Amazon Lex, visual search and image recognition with Amazon Rekognition, text-to-speech (TTS) with Amazon Polly, and developer-focused machine learning with Amazon Machine Learning.

For more in-depth deep learning applications, the AWS Deep Learning AMI lets you run deep learning in the cloud, at any scale. Launch instances of the AMI, pre-installed with open source deep learning frameworks (Apache MXNet, TensorFlow, Caffe, Theano, Torch and Keras), to train sophisticated, custom AI models, experiment with new algorithms, and learn new deep learning skills and techniques; all backed by auto-scaling clusters of GPU-based instances.

Whether you’re just getting started with AI or you’re a deep learning expert, this session will provide a meaningful overview of the managed AI services, the AI Platform offerings, and the AI Frameworks you can run on the AWS Cloud.

(more…)

AWS AI Blog Month in Review: March 2017

by Derek Young | on | | Comments

We’ve just finished another month of AI solutions on the AWS AI Blog. Please take a look at our summaries below and learn, comment, and share. Thanks for reading!

NEW POSTS

Deploy Deep Learning Models on Amazon ECS
In this post, learn how to connect the workflow between the data scientists and DevOps. Using a number of AWS services, take the output of a model’s training and deploy it to perform predictions in real time with low latency and high availability. In particular, see the ease of deploying DL predict functions using Apache MXNet (a deep learning library), Amazon ECS, Amazon S3, and Amazon ECR, Amazon developer tools, and AWS CloudFormation.

Amazon at WMT: Improving Machine Translation with User Feedback
Since 2006, the important annual event Workshop on Machine Translation invites participants to submit machine translation systems for competitive ranking in a number of categories. This year Amazon, in collaboration with Germany’s Heidelberg University, is hosting a new competition for machine translation systems that adapt well to simulated customer feedback; in other words, systems that are able to correct their mistakes by learning from a stream of translation assessments. The results will be presented at this year’s WMT Conference in Copenhagen.

Build Your Own Text-to-Speech Applications with Amazon Polly
In this blog post, create a basic, serverless application that uses Amazon Polly to convert text to speech. The application has a simple user interface that accepts text in many different languages and then converts it to audio files which you can play from a web browser.

AI Tech Talk: How to Get the Most Out of Amazon Polly, a Text-to-Speech Service
Although there are many ways to optimize the speech generated by Amazon Polly‘s text-to-speech voices, new customers may find it challenging to quickly learn how to apply the most effective enhancements in each situation. The objective of this webinar is to educate customers about all of the ways in which they can modify the speech output, and to learn some insider tips to help them get the most out of the Polly service. This webinar will provide a comprehensive overview of the available tools and techniques available for modifying Polly speech output, including SSML tags, lexicons, and punctuation. The post has been updated with a link to the video archive for this presentation.

(more…)

Deploy Deep Learning Models on Amazon ECS

by Asif Khan | on | | Comments

Artificial intelligence (AI) is the computer science field dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition. Machine learning (ML) and deep learning (DL) are computer science fields derived from the AI discipline.

Most ML and DL systems have two distinct parts: training (or learning) and prediction (or classifying). DL systems are usually developed by data scientists, who are good at mathematics and computer science. Those skills are essential for learning. But to deploy models to the field, you need the DevOps mindset and tools.

The power of DL stems from the learning system’s ability to identify more relationships than humans can code in software, or relationships that humans might not even be able to perceive consciously. After sufficient training, the network of algorithms can begin to make predictions on, or interpretations of, very complex data.

In this post, I show you how to connect the workflow between the data scientists and DevOps. Using a number of AWS services, I take the output of a model’s training and deploy it to perform predictions in real time with low latency and high availability. In particular, I illustrate the ease of deploying DL predict functions using Apache MXNet (a deep learning library), Amazon ECS, Amazon S3, and Amazon ECR, Amazon developer tools, and AWS CloudFormation.

How DL models are developed

The key stages of ML are training/learning and prediction/classifying:

Training/Learning – In this stage, data scientists develop the model and program it to learn with training data. They optimize the algorithms, define the layout of the model’s network and its hyperparameters, and program the model to learn.

The advent of notebooks, such as Jupyter, makes this much easier. Notebooks allow interactive development and collaboration. To try using a Jupyter notebook, see Run Jupyter Notebook and JupyterHub on Amazon EMR.

Prediction/Classification – After the DevOps team deploys the model on a scalable, highly available, and low-latency infrastructure, the trained model predicts the outcome on new observations.

DevOps deploys the models onto the infrastructure, usually with an API, to allow application developers to use the APIs to leverage AI capabilities in their applications. The data scientists continue to improve their models and release new versions all the time, just like other software. Updated models should integrate with the company’s usual CI/CD practices.

Deploying an ML predict function at scale is an evolving science.

(more…)

Amazon at WMT: Improving Machine Translation with User Feedback

by Kellen Sunderland | on | | Comments

Machine translation is one of the most exciting and important applications of machine learning.  It’s a widely researched topic, both in the academic community and within major technology companies, such as Amazon.  At Amazon, we use machine translation to do things like document the same products in multiple languages.  This helps us to offer additional language options to our customers. For example, customers can now switch their language preference to Spanish on Amazon.com  or to English, Dutch, Polish, or Turkish on Amazon.de.

Since 2006, the important annual event Workshop on Machine Translation invites participants to submit machine translation systems for competitive ranking in a number of categories.  This year Amazon, in collaboration with Germany’s Heidelberg University, is hosting a new competition for machine translation systems that adapt well to simulated customer feedback; in other words, systems that are able to correct their mistakes by learning from a stream of translation assessments.  The results will be presented at this year’s WMT Conference in Copenhagen.

To support this competition, we’re using many AWS services, including Amazon API Gateway to host our service front end and provide an SDK, AWS Lambda to perform backend computations, and Amazon DynamoDB to store the state of experiments and our training data. Finally, we’re using Amazon CloudWatch to monitor the service and Amazon SNS to notify us when an alarm is triggered.

To learn more about the competition, see the WMT17 Bandit Learning Task page.

If you’re a researcher in the machine translation field and are interested in the competition, there’s still time to sign up.  Contact us at mt-shared-task@amazon.com.

Build Your Own Text-to-Speech Applications with Amazon Polly

by Tomasz Stachlewski | on | | Comments

In general, speech synthesis isn’t easy.  You can’t just assume that when an application reads each letter of a sentence the output will make sense. A few common challenges for text-to-speech applications include:

  • Words that are written the same way, but that are pronounced differently: I live in Las Vegas. vs. This presentation broadcasts live from Las Vegas.
  • Text normalization. Disambiguating abbreviations, acronyms, and units: St., which can be expanded as street or saint.
  • Converting text to phonemes in languages with complex mapping, such as, in English, tough, through, though. In this example, similar parts of different words can be pronounced differently depending on the word and context.
  • Foreign words (déjà vu), proper names (François Hollande), slang (ASAP, LOL), etc.

Amazon Polly provides speech synthesis functionality that overcomes those challenges, allowing you to focus on building applications that use text-to-speech instead of addressing interpretation challenges.

Amazon Polly turns text into lifelike speech. It lets you create applications that talk naturally, enabling you to build entirely new categories of speech-enabled products. Amazon Polly is an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. It currently includes 47 lifelike voices in 24 languages, so you can select the ideal voice and build speech-enabled applications that work in many different countries.

In addition, Amazon Polly delivers the consistently fast response times required to support real-time, interactive dialog. You can cache and save Polly’s audio files for offline replay or redistribution. (In other words, what you convert and save is yours. There are no additional text-to-speech charges for using the speech.) And Polly is easy to use. You simply send the text you want to convert into speech to the Amazon Polly API. Amazon Polly immediately returns the audio stream to your application so that your application can play it directly or store it in a standard audio file format such as an MP3.

In this blog post, we create a basic, serverless application that uses Amazon Polly to convert text to speech. The application has a simple user interface that accepts text in many different languages and then converts it to audio files which you can play from a web browser. We’ll use blog posts, but you can use any type of text. For example, you can use the application to read recipes while you are preparing a meal, or news articles or books while you’re driving or riding a bike.

The application’s architecture

The following diagram shows the application architecture. It uses a serverless approach, which means that we don’t need to work with servers – no provisioning, no patching, no scaling. The Cloud automatically takes care of this, allowing us to focus on our application.

The application provides two methods – one for sending information about a new post, which should be converted into an MP3 file, and one for retrieving information about the post (including a link to the MP3 file stored in an S3 bucket). Both methods are exposed as RESTful web services through Amazon API Gateway. Let’s look at how the interaction works in the application.

(more…)

AI Tech Talk: How to Get the Most Out of Amazon Polly, a Text-to-Speech Service

by Victoria Kouyoumjian | on | | Comments

ConnectedHeads_800_o

Although there are many ways to optimize the speech generated by Amazon Polly‘s text-to-speech voices, new customers may find it challenging to quickly learn how to apply the most effective enhancements in each situation. The objective of this webinar is to educate customers about all of the ways in which they can modify the speech output, and to learn some insider tips to help them get the most out of the Polly service. This webinar will provide a comprehensive overview of the available tools and techniques available for modifying Polly speech output, including SSML tags, lexicons, and punctuation. Other topics will include recommendations for streamlining the process of applying these techniques, and how to provide feedback that the Polly team can use to continually improve the quality of voices for you.

Learning Objectives

  • Build a simple speech-enabled app with Polly’s text-to-speech voices.
  • Learn about the complete set of available SSML tags, and how you can apply them in order to modify and enhance your speech output.
  • Learn how you can override the default Polly pronunciation for specific words, by creating a lexicon of these words, along with the pronunciation that matches your needs.
  • Learn about how you can use punctuation to modify the way text is spoken by Polly voices.
  • Get insider tips on the best speech optimization techniques to apply to each of the most common speech production concerns.
  • Discover ways to streamline the process of getting the most out of Polly voices through SSML tags and lexicons.
  • Find out the best way to submit your feedback on Polly voices, pronunciation, and the available feature set, so that we can continue to improve this service for you!

Monday, March 27, 2017 9:00 AM PDT – 10:00 AM PDT

Video Archive

Updated AWS CloudFormation Deep Learning Template Adds New Features and Capabilities

by Naveen Swamy | on | | Comments
Listen to this post

Voiced by Polly

AWS CloudFormation, which creates and configures Amazon Web Services resources with a template, simplifies the process of setting up a distributed deep learning cluster. The AWS CloudFormation Deep Learning template uses the latest updated Amazon Deep Learning AMI (which provides Apache MXNet, TensorFlow, Caffe, Theano, Torch, and CNTK frameworks) to launch a cluster of EC2 instances and other AWS resources needed to perform distributed deep learning. AWS CloudFormation creates all resources in the customer account.

We’ve updated the AWS CloudFormation Deep Learning template with exciting additional capabilities including automation to dynamically adjust the cluster to the maximum number of available worker instances when an instance fails to provision (perhaps due to reached limit). This template also lets you choose between GPU and CPU instance types as well as adds support to run under either Ubuntu or Amazon Linux environments for your cluster. We’ve also added the ability to provision a new, or attach an existing Amazon EFS file system to your cluster to let you easily share code/data/logs and results.

To learn more, visit the AWS Labs – Deep Learning GitHub repo and follow the tutorial, where we show how easy it is to run distributed training on AWS using the MXNet and TensorFlow frameworks.