AWS Machine Learning Blog
Your Guide to Machine Learning at re:Invent 2017
re:Invent 2017 is almost here! As you plan your agenda, machine learning is undoubtedly a hot topic on your list. This year we have a lot of great technical content in the Machine Learning track, with over 50 breakout sessions, hands-on workshops, labs, and deep-dive chalk talks. You’ll hear first-hand from customers and partners about their success with machine learning including Facebook, NVIDIA, TuSimple, Visteon, Matroid, Butterfleye, Open Influence, Whooshkaa, Infor and Toyota Racing Development.
This year we’re hosting our inaugural Deep Learning Summit where thought leaders, researchers, and venture capitalists share their perspectives on the direction in which deep learning is headed. In addition you can take part in our deep-learning-powered Robocar Rally. Join the rally to get first-hand experience building your own autonomous vehicle and competing in an AI-powered race.
Here are a few highlights of this year’s lineup from the re:Invent session catalog to help you plan your event agenda.
- Robocar Rally 2017
- Deep Learning Summit
- Deep Learning on AWS
- Computer Vision
- Language & Speech
Robocar Rally 2017
Get behind the keyboard at Robocar Rally 2017, a hackathon for hands on experience with deep learning, autonomous cars, and Amazon AI and IoT services. At Robocar Rally 2017, you’ll learn the latest in autonomous car technology from industry leaders and join a pit crew to customize, train, and race your own 1/16th scale autonomous car. Follow along on the road to Robocar Rally 2017 with a series of Twitch streams and blog posts to accelerate your learning on how to build, deploy, and train your own 1/16th scale autonomous car. Teams will be provided a car to race in time trials on the re:Invent Circuit, and you can also build and bring your own car for separate exhibition races.
Robocar Rally 2017 is a two-day hackathon at the Aria that starts on Sunday November 26th from 6pm to 10pm, followed by a full day of hacking on Monday November 27th from 7am to 10pm, with a final race from 10pm to 12am.
Deep Learning Summit
The Deep Learning Summit is designed for developers interested in learning more about the latest in deep learning applied research and emerging trends. Attendees will hear from industry thought leaders—members of the academic and venture capital communities—who will share their perspectives on deep learning trends and emerging centers of gravity. The Summit will be held on Thursday November 30th at the Venetian, Ballroom F.
This year’s lineup includes the following provocative Lightning Talks:
DLS303: The Deep Learning Revolution
Terrence Sejnowski, The Salk Institute for Biological Studies
The recent rise of deep learning has its roots in learning algorithms that go back to the 1950s, which have been scaled up by a factor of a billion with high performance computing and big data. In this talk, Terrence Sejnowski will explore how recent advances in deep learning have impacted previously intractable problems in speech recognition, image understanding and natural language processing, opening up many new commercial applications. But just as we couldn’t predict the impact of the Internet when it was commercialized in the 1990s, we may not be able to imagine the impact of deep learning for the future.
DLS301: Eye, Robot: Computer Vision and Autonomous Robotics
Aaron Ames & Pietro Perona, California Institute of Technology
Like mind and body, AI and robotics are increasingly connected. In this talk, Pietro Perona and Aaron Ames will present their latest work on bipedal locomotion and discuss how deep learning approaches combined with traditional controls and dynamic systems theory are used to design and train walking robots that can tackle any surface: from polished pavements to slippery snow. They will also share deep learning and computer vision approaches to the analysis of behavior toward the design of machines that can interact naturally with people.
DLS305: Exploiting the Power of Language
Alexander Smola, Amazon Web Services
Deep Learning is vital for natural language processing—whether it’s for understanding, speech recognition, machine translation or answering questions. In this talk, Alex Smola shares simple guiding principles for building a large variety of services, ranging from sequence annotation to sequence generation. He will also discuss how these designs can be carried out efficiently using modern deep learning capabilities.
DLS304: Reducing Supervision: Making More with Less
Martial Hebert, Carnegie Mellon University
A key limitation of machine learning, in particular for computer vision tasks, is their reliance on vast amounts of strongly supervised data. This limits scalability, prevents rapid acquisition of new concepts, and limits adaptability to new tasks or new conditions. To address this limitation, Martial Hebert will explore ideas in learning visual models from limited data. The basic insight behind all of these ideas is that it is possible to learn from a large corpus of vision tasks how to learn models for new tasks with limited data, by representing the way visual models vary across tasks, also called model dynamics. The talk will also show examples from common visual classification tasks.
DLS302: Learning Where to Look in Video
Kristen Grauman, University of Texas
The status quo in visual recognition is to learn from batches of photos labeled by human annotators. Yet cognitive science tells us that perception develops in the context of acting and moving in the world—and without intensive supervision. In this talk, Kristen Grauman will share recent work exploring how a vision system can learn how to move and where to look. Her research considers how an embodied vision system can internalize the link between “how I move” and “what I see”, explore policies for learning to look around actively, and learn to mimic human videographer tendencies, automatically deciding where to look in unedited 360 degree video.
17455: Look, Listen, Learn: The Intersection of Vision and Sound
Antonio Torralba, MIT
Neural networks have achieved remarkable levels of performance and constitute the state of the art in recognition. In this talk, Antonio Torralba will discuss the importance of visualization in order to understand how neural networks work—in particular, the procedure to automatically interpret the internal representation learned by a neural network. He will also discuss how a neural network can be trained to predict sounds associated with a video. By studying the internal representation, the network learns to detect visual objects that make specific sounds like cars, sea waves, and other sources.
DLS201: Investing in the Deep Learning Future
Matt Ocko, Data Collective Venture Capital
The rise of deep learning has become a hotbed for enterprises and startups creating new AI applications in everything from customer service bots to autonomous driving. But looking beyond what’s possible today in computer vision and natural language, what comes next? In this talk, Matt Ocko will share ideas about emerging trends and hidden opportunities in deep learning that can make money while solving the world’s hardest and most urgent problems.
Deep Learning on AWS
MCL205: Introduction to Deep Learning (Video)
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. In this session, we’ll provide an overview of deep learning focusing on relevant application domains. We’ll introduce popular deep learning frameworks such as Apache MXNet and TensorFlow, and how to select the right fit for your targeted use cases. We’ll also walk you through other key considerations for optimizing deep learning training and inference, including setting up and scaling your infrastructure on AWS.
MCL303: Deep Learning with Apache MXNet and Gluon (Video)
Developing deep learning applications just got even simpler and faster. In this session, you will learn how to program deep learning models using Gluon, the new intuitive, dynamic programming interface available for the Apache MXNet open-source framework. We’ll also explore neural network architectures such as multi-layer perceptrons, convolutional neural networks (CNNs) and LSTMs.
MCL333: Building Deep Learning Applications with TensorFlow on AWS
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. In this workshop, we will provide an overview of deep learning focusing on getting started with the TensorFlow framework on AWS.
MCL305: Scaling Convolutional Neural Networks with Kubernetes and TensorFlow on AWS (Video)
In this session, Reza Zadeh, CEO of Matroid, presents a Kubernetes deployment on Amazon AWS that provides customized computer vision to a large number of users. Reza offers an overview of Matroid’s pipeline and demonstrates how to customize computer vision neural network models in the browser, followed by building, training, and visualizing TensorFlow models, which are provided at scale to monitor video streams.
MCL309: Deep Learning on a Raspberry Pi
In this workshop, we’ll introduce you to the open source deep learning framework Apache MXNet, and show you how to install it on a Raspberry Pi. Then, using a camera and a pre-trained object detection model, we’ll show real-life objects to the Pi and listen to what it thinks the objects are, thanks to the text-to-speech capabilities of Amazon Polly.
MCL315: Deep Learning for Autonomous Driving (Video)
Reinforcement learning is emerging as a powerful tool for autonomous driving, enabling complex maneuvers in a wide range of traffic situations. In this session Vijay Nadkarni, Vice President of Artificial Intelligence at Visteon, will demonstrate how to build a reinforcement learning engine for autonomous vehicles on AWS, showing how it receives environmental input from object detection and produces outputs for controlling the vehicle’s steering, acceleration and braking.
MCL316: Deep Learning for Industrial IoT (Video)
Deep learning and IoT are emerging as an innovative pairing as the explosion of data produced by a growing number of devices needs to be analyzed to quickly produce meaningful insights. In this session, Andrew Cresci, General Manager, Artificial Intelligence for the Industrial Sector at NVIDIA, will discuss how deep learning can be applied to real-world IoT use cases with a demo of computer vision and anomaly detection. We’ll also do a step-by-step tutorial on how to develop deep learning models for computer vision at the edge using Raspberry Pi and NVIDIA Jetson.
MCL313: Scaling Vision Models Using Caffe2 on AWS (Video)
Join Pieter Noordhuis and Yangqing Jia from Facebook to learn about Caffe2, a lightweight and scalable framework for deep learning. Learn about its features, the ways Facebook applies it in production, and how to use it to scale up your own deep learning on Amazon EC2 GPU-powered instances. Understand cost tradeoffs and time to model measurements, how to quickly spin up a cluster of the latest NVIDIA GPUs in the AWS Cloud, and how to perform large-scale model training.
MCL311: Accelerating Apache MXNet Models on Apple Platforms Using CoreML
Running deep learning models on devices at the edge is one of the hottest trends in AI today. This workshop provides a tutorial on developing and training deep learning models with Apache MXNet and will walk you through how to easily bring them into the Apple ecosystem of products. You’ll learn how to convert MXNet models easily and efficiently to formats that can be integrated into iOS/macOS applications.
MCL402: Building Content Recommendation Systems Using Apache MXNet and Gluon
Recommendations are becoming an integral part of how many business serve customers, from targeted shopping on demand video. In this session, you’ll learn the key elements to build a recommendation system using Gluon, the new intuitive, dynamic programming interface for Apache MXNet. You’ll use matrix factorization techniques to build a video on-demand solution using deep learning.
Computer Vision
MCL314: Unlocking Media Workflows Using Amazon Rekognition (Video)
Companies can have large amounts of image and video content in storage with little or no insight about what they have—effectively sitting on an untapped licensing and advertising goldmine. Learn how media companies are using Amazon Rekognition APIs for object or scene detection, facial analysis, facial recognition, or celebrity recognition to automatically generate metadata for images to provide new licensing and advertising revenue opportunities. Understand how to use Amazon Rekognition APIs to index faces into a collection at high scale, filter frames from a video source for processing, perform face matches that populate a person index in Elasticsearch, and use the Amazon Rekognition celebrity match feature to optimize the process for faster time to market and more accurate results.
MCL306: Making IoT Devices Smarter with Amazon Rekognition (Video)
Motion detection triggers have reduced the amount of video recorded by modern devices. But maybe you want to reduce that further—maybe you only care if a car or a person is on-camera before recording or sending a notification. Security cameras and smart doorbells can use Amazon Rekognition to reduce the number of false alarms. Learn how device makers and home enthusiasts are building their own smart layers of person and car detection to reduce false alarms and limit video volume. Learn too how you can use face detection and recognition to notify you when a friend has arrived.
MCL318: Deep dive on Amazon Rekognition architectures for image analysis (Video)
Join us for a deep dive on how to use Amazon Rekognition for real world image analysis! You will learn how to integrate Rekognition with other AWS services to make your image libraries searchable, verify user identities by comparing their live image with a reference image, and estimate the satisfaction/sentiment of your customers. We will also be sharing best practices around fine-tuning and optimizing your Rekognition usage as well as references to CloudFormation templates.
MCL327: Integrating Amazon Rekognition into a Security Camera
In this session, AWS customer, Butterfleye, does a deep dive into how Amazon Recognition can be integrated into security camera devices for facial analysis. We examine an architecture to successfully process images, identify faces, and personalize alerts to known and unknown faces. We also discuss how to increase the accuracy of alerts and significantly reduce false alarms.
MCL330: Humans, Plants, and Chairs: Insights from Analyzing over 30 Million Instagram Posts with Amazon Rekognition
At Open Influence, we’ve analyzed over 30 million Instagram posts as a part of our quest to provide the world’s best influencer search engine. The resulting 120 million labels weave an interesting story of what’s being posted on Instagram. Join us as we walk through how Amazon Rekognition helped us deliver deeper insights in the data that our customers cared about, and enhance our recommendation and search algorithms.
Language & Speech
MCL302: Maximizing the Customer Experience with AI on AWS (Video)
We will review the decision points for using democratized services such as Amazon Lex, Amazon Polly and integration with services such as Amazon Connect. We will address optimizing the customer experience with Amazon Lex chatbots, and streamlining the customer experience predicting responses with Amazon Connect. We will dive deep into the most common of these patterns and cover design and implementation considerations. By the end of the session you will understand how to use Amazon Lex to optimize the user experience, through different user interactions.
MCL403 – Building an Intelligent Multi-Modal User Agent with Voice, Natural Language Understanding, and Facial Animation
We are all expected to interface with an exploding number of information sources and tools to perform our daily tasks. This Chalk Talk introduces how to architect an intelligent agent that aims to help augment our ability to complete these tasks more quickly and with less effort, by allowing for delegation to a conversational user agent. To build this intelligent agent, we combine several powerful AI services and other offerings from AWS, like Amazon Polly, Amazon Lex, Amazon Rekognition, and Amazon ElastiCache, and other open source technologies like Blender, Apache MXNet, and CLIPS.
MCL319 – Capturing Voice Input in a Browser and Sending it to Amazon Lex
In this chalk talk, we demonstrate how to build a simple web application that uses the AWS SDK for JavaScript. The example application (accessible from a browser) records audio, sends it to Amazon Lex, and plays the response. We show how to use browser APIs and JavaScript to request access to a microphone, record audio, downsample the audio, and PCM encode the audio as a WAV file. We also show how to implement silence detection and audio visualization—essential to building a user friendly audio control. Please be familiar with AWS JavaSript SDK, Amazon Lex PostContent API, Web Audio API, getUserMedia, and Amazon Lex Runtime Service.
MCL308: Using a Digital Assistant in the Enterprise for Business Productivity (Video)
Enterprises must transform at the pace of technology. Through chatbots built with Amazon Lex, enterprises are improving business productivity, reducing execution time, and taking advantage of efficiency savings for common operational requests. These include inventory management, human resources requests, self-service analytics, and even the onboarding of new employees. In this session, learn how Infor integrated Amazon Lex into their standard technology stack, with several use cases based on advisory, assistant, and automation roles deeply rooted in their expanding AI strategy. This strategy powers one of the major functionalities of Infor Coleman to enable their users to make business decisions more quickly.
MCL401: What Do Users Want? Using Semantics to Predict User Intent at Scale
Over the years, the search paradigm has shifted from document retrieval to deeper understanding of user intent. Today’s users are no longer satisfied with seeing a list of relevant documents. Instead, they want to complete tasks and take actions on them. This session addresses how to build an automated user intent understanding system, where given a query, the user sees relevant and personalized recommendations. The session introduces the main challenges with semantic understanding, then describes categorization and structured prediction algorithms for entity detection and intent prediction. The talk highlights results and findings for user intent prediction from the domains of shopping, movies, restaurant, and sports.
MCL312: Building Multi-Channel Conversational Interfaces Using Amazon Lex (Video)
In this session, attendees will discover how to build a multi-channel conversational interface that leverages a pre-processing layer in front of Amazon Lex, to enable you to integrate your conversational interface with external services, as well as use multiple specialized Amazon Lex chatbots as part of an overall solution. As an example of how to integrate with an external service, you will learn how to integrate with Skype and watch it in action through a chatbot demonstration with interaction through Skype messaging and voice.
MCL206: Creating Next Generation Speech-enabled Applications with Amazon Polly (Video)
Amazon Polly is a service that turns text into lifelike speech, making it easy to develop applications that use high-quality speech to increase engagement and accessibility. Get a glimpse into successful applications that use Amazon Polly text-to-speech service to enable apps to converse with its users. Attendees will benefit from understanding real-world business use cases, and learn how to add feature-rich voice capabilities to their new or existing applications.
MCL307: Amazon Polly Tips and Tricks: How to Bring Your Text-to-Speech Voices to Life (Video)
Although there are many ways to optimize the speech generated by Amazon Polly’s text-to-speech voices, you might find it challenging to apply the most effective enhancements in each situation. Learn how you can control pronunciation, intonation, timing, and emotion for text-to-speech voices. In this session, you get a comprehensive overview of the available tools and methods available for modifying Amazon Polly speech output, including SSML tags, lexicons, and punctuation. You also get recommendations for streamlining application of these techniques. Come away with insider tips on the best speech optimization techniques to provide a more natural voice experience.
MCL331:Building a Virtual Assistant with Amazon Polly and Amazon Lex
The advancement of technology has enabled people with disabilities communicate more meaningfully and participate more fully in their daily lives. We will discuss the many challenges for those with special needs, and how AWS voice technologies can provide hope and promise. In this workshop, participants will also learn how to build Pollexy (“Polly” + “Lex”), a Raspberry Pi and mobile-based special needs verbal assistant that lets caretakers schedule audio task prompts and messages both on a recurring schedule and/or on-demand.
MCL326: Convert Any Digital Text Into Natural Sounding Speech
Text-to-speech can turn any digital text into a multimedia experience, so people can listen to ebooks, blogs, or even a PDF document, on-the-go or while multitasking. In this talk, we will walk through an architecture and workflow to create audio files of long form text with Amazon Polly and AWS Batch. Using AWS Batch to process a document in chunks asynchronously and in parallel, you can quickly turn long form content into audio files with Amazon Polly. We will also discuss how to build a text sentiment analysis engine with Lambda and DynamoDB to auto inject SSML into the text to enhance the speech output. The proposed architecture uses NLP to generate sentiment analysis around text, and uses that analysis to inject SSML tags into the text.
MCL325: Creating the Voices You Want with Amazon Polly
Many of today’s text-to-speech systems limit your choice to a few—often just one or two—voices per language. If those voices aren’t quite right for your need, the process of adding more voices is usually a costly and time-consuming one. The ability to modify voices make Amazon Polly a fast, versatile, and convenient solution for speech production. Learn how you can change various voice aspects including the speech rate, pitch intonation, and other elements to make it sound like another speaker and/or give it a different style. In this talk, we’ll dive deep into SSML with Amazon Polly to enable you to produce the voice you desire.