AWS Machine Learning Blog
New Features For Amazon SageMaker: Workflows, Algorithms, and Accreditation
We’ve seen a ton of progress in machine learning during the past 12 months, with customers using Amazon SageMaker – a fully-managed service which has put ML into the hands of tens of thousands of developers and data scientists – to find fraud, predict pitches, and tune engines. We’ve added nearly 100 new features and capabilities since we introduced SageMaker at re:Invent last year, with the vast majority based on customer feedback (keep it coming). We continue that drum beat today, with major new announcements for Amazon SageMaker.
Introducing SageMaker Workflows
Today, we’re announcing new automation, orchestration, and collaboration features for Amazon SageMaker to make it easier to build, manage, and share machine learning workflows.
Machine learning is a highly collaborative process – combining domain experience with technical skills is the bedrock of success, and often requires multiple iterations and experimentation with different datasets and features. Developers often need to share progress and gather feedback from many collaborators. Training a successful model is almost never a hole-in-one, and so it’s important to be able to keep track of the important decisions, replay the successful parts, reuse what worked, and get help on what didn’t. We’re introducing new capabilities to make these iterations easier to manage, repeat, and share.
Experiment Management with SageMaker Search
Developing a successful ML model requires continuous experimentation, trying new algorithms and model hyperparameters, all the while observing the impact of potentially small changes on performance and accuracy. This iterative exercise means it can be hard to track which unique combination of datasets, algorithms, and parameters brewed the “winning” model.
Data scientists and developers can now organize, track, and evaluate their machine learning model training experiments with Amazon SageMaker Search. SageMaker Search lets you quickly find and evaluate the most relevant model training runs from the potentially thousands of Amazon SageMaker model training runs, right from the AWS console.
Collaboration with Version Control
Data scientists, developers, data engineers, analysts, and business leaders often need to share ideas, tasks, and collaborate to make progress with machine learning. The de-facto standard for this type of collaboration with traditional software development has been version control. It plays an important role in ML too, and we’re making it easier by adding Git integration and visualization to Amazon SageMaker.
Customers can now link GitHub, AWS CodeCommit, or self-hosted Git repositories with SageMaker notebooks, clone public and private repositories, and store repository information in Amazon SageMaker securely using IAM, LDAP, and AWS Secrets Manager. You can review your branches, merges, and versions directly in SageMaker, using a new open source notebook app.
Automation with Step Functions & Apache Airflow
ML often requires multiple steps in a complete workflow to be run in a coordinated sequence. For example, you may want to perform a query in Amazon Athena or aggregate and prepare data in AWS Glue, before training a model in SageMaker, and deploying it to production. Automating these steps and orchestrating them across multiple services helps build reusable, reproducible ML workflows which can be shared between engineers and scientists.
You can now use Step Functions to automate and orchestrate SageMaker steps in an end-to-end workflow. You can automate publishing datasets to Amazon S3, training an ML model on your data with SageMaker, and deploying your model for prediction. AWS Step Functions will monitor SageMaker (and Glue) jobs until they succeed or fail, and either transition to the next step of the workflow or retry the job. It includes built-in error handling, parameter passing, state management, and a visual console that lets you monitor your ML workflows as they run.
In addition to Step Functions, many developers currently use Apache Airflow, a popular open source framework to author, schedule, and monitor multi-stage workflows. Amazon SageMaker now also integrates with Airflow, so you can use the same orchestration tool you’re used to to drive SageMaker tasks such as data preparation, training, and tuning. If you’re new to Airflow, you can spin up a new instance and start orchestrating workflows on AWS in just a few clicks, using CloudFormation.
These new features will be available to customers to take for a test drive, starting early next month.
New Algorithms and Frameworks
Not that long ago, part of the ‘cost of doing business’ with machine learning was significant investment in research and development of new algorithms; both in achieving the right levels of accuracy, and in bringing those algorithms out of the lab and into the real world where they could run across large, complex training datasets. Customers can run algorithms for training models in three ways in SageMaker; by bringing their own in a custom container, by using built-in SageMaker Algorithms, or by running fully-managed MXNet, TensorFlow, PyTorch, and Chainer algorithms with just 20 lines of code. We’ve been adding new algorithms through the year too, including BlazingText for text classification, and Object Detection in images.
We’re pleased to announce new built-in algorithms for detecting suspicious IP addresses (IP Insights), low dimensional embeddings for high dimensional objects (Object2Vec), and – an oldie but a goodie – unsupervised grouping (K-means clustering), all designed to support petabyte scale datasets, at 10x better performance than you would expect to see with traditional methods. Without needing an entire R&D department, any developer can access these algorithms as they would any other API in SageMaker, and get the benefit of fast, low cost training, even at scale.
We’ve also been adding new framework support through the year (including PyTorch 1.0 and Chainer) and keeping others up to date (such as the latest MXNet 1.3), and we’re pleased to announce that customers will soon also be able to run fully-managed Horovod jobs for high scale distributed training, and scikit-learn and Spark MLeap for inference.
New Compliance Standards and Accreditation
Security, encryption, compliance, and accreditation are all critical areas for machine learning; ensuring you can meet the regulatory and organizational requirements on your data (and data dependent assets such as models and notebooks) is job zero for everyone using ML.
We’re pleased to add SageMaker to our System and Organizational Controls (SOC) Level 1, Level 2, and Level 3 audits. The SOC reports are available now in the AWS Management Console, and you can download the SOC3 report as a PDF. These controls complement SageMaker’s existing accreditations; the service is in scope for ISO 9001:2015, 27001:2013, 27017:2015, 27018:2014, PCI DSS 3.2 Level 1, and is eligible for HIPAA and BAA coverage on AWS. ITAR workloads can be run on SageMaker in the AWS GovCloud (US) region.
Real World Machine Learning with Amazon SageMaker
These new capabilities, algorithms, and accreditation will help bring more machine learning workloads to more developers. By focusing almost exclusively on what customers are asking for, we’re making real strides in making machine learning useful and usable in the real world through Amazon SageMaker. Accreditation, experimentation, and automation aren’t always the first thing you may think of when it comes to artificial intelligence, but our customers tell us that these features can further shorten the time it takes to build, train, and deploy their models. No R&D department required.
–Dr. Matt Wood, General Manager of Artificial Intelligence, AWS