AWS Machine Learning Blog

Amazon SageMaker support for TensorFlow 1.5, MXNet 1.0, and CUDA 9

Amazon SageMaker pre-built deep learning framework containers now support TensorFlow 1.5 and Apache MXNet 1.0, both of which take advantage of CUDA 9 optimizations for faster performance on SageMaker ml.p3 instances. In addition to performance benefits, this provides access to updated features such as Eager execution in TensorFlow and advanced indexing for NDArrays in MXNet. More details can be found in change logs here and here.

If you’re new to Amazon SageMaker pre-built deep learning containers, see our repository for examples covering their use. Their goal is to allow users to write idiomatic TensorFlow or MXNet code and then send that code to be processed in Amazon SageMaker’s distributed, managed training clusters or real-time, hosted endpoints. This provides you with the power and flexibility to write and test your deep learning code on a sample of the data on your laptop and then scale effortlessly to running on the full dataset in a multi-machine, GPU setting.

Follow these steps to use the updated containers:

  1. Install (or update) the most recent version of the SageMaker Python SDK with pip install -U sagemaker
  2. By default, your new jobs will take advantage of the latest version of each framework. However, if your workload requires you to use the previous version of the framework, you can specify that version as follows:

For MXNet:

from sagemaker.mxnet import MXNet
estimator = MXNet(entry_point='mnist.py',
                  framework_version=’0.12’,
                  role=role,
                  output_path=model_artifacts_location,
                  code_location=custom_code_upload_location,
                  train_instance_count=1,
                  train_instance_type='ml.m4.xlarge',
                  hyperparameters={'learning_rate': 0.1})

For TensorFlow:

from sagemaker.tensorflow import TensorFlow
estimator = TensorFlow(entry_point='mnist.py',
                       framework_version=’1.4’,
                       role=role,
                       output_path=model_artifacts_location,
                       code_location=custom_code_upload_location,
                       train_instance_count=1,
                       train_instance_type='ml.m4.xlarge',
                       hyperparameters={'learning_rate': 0.1})

Amazon SageMaker TensorFlow 1.5 and MXNet 1.0 containers are available today in the following AWS Regions: US East (N. Virginia), US East (Ohio), EU (Ireland), and US West (Oregon).

Once you’ve updated the Amazon SageMaker Python SDK, you can immediately take advantage of the enhanced functionality and performance improvements in these newly released containers, using code available on GitHub.


David Arpin is AWS’s AI Platforms Selection Leader and has a background in managing Data Science teams and Product Management.