AWS Machine Learning Blog

Exploiting the Unique Features of the Apache MXNet Deep Learning Framework with a Cheat Sheet

Apache MXNet (incubating) is a full-featured, highly scalable deep learning framework that supports creating and training state-of-the-art deep learning models. With it, you can create convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and others. It supports a variety of languages, including, but not limited to, Python, Scala, R, and Julia.

In this post, we showcase some unique features that make MXNet a developer friendly framework in the AWS Cloud. For developers who prefer symbolic expression, we also provide a cheat sheet for coding neural networks with MXNet in Python. The cheat sheet simplifies onboarding to MXNet. It’s also a handy reference for developers who already use the framework.

Multi-GPU support in a single line of code

The ability to run on multiple GPUs is a core part of the MXNet architecture. All you need to do is pass a list of devices that you want to train the model on. By default, MXNet uses data parallelism to partition the workload over multiple GPUs. For example, if you have 3 GPUs, each one receives a copy of the complete model and trains it on one-third of each training data batch.

import mxnet as mx 
# Single GPU
module = mx.module.Module(context=mx.gpu(0))

# Train on multiple GPUs
module = mx.module.Module(context=[mx.gpu(i) for i in range(N)], ...)

Training on multiple computers

MXNet is a distributed deep learning framework designed to simplify training on multiple GPUs on a single server or across servers. To train across servers, you need to install MXNet on all computers, ensure that they can communicate with each other over SSH, and then create a file that contains the server IPs.

$ cat hosts 
192.30.0.172 
192.30.0.171
python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_sync

MXNet uses a key-value store to synchronize gradients and parameters between machines. This allows you to perform distributed training, and makes sure that MXNet is compiled using USE_DIST_KVSTORE=1.

Custom data iterators and iterating data is stored in Amazon S3

In MXNet, data iterators are similar to Python iterator objects, except that they return a batch of data as a DataBatch object that contains “n” training examples along with corresponding labels. MXNet has prebuilt, efficient data iterators for common data types like NDArray and CSV. It also has a binary format for efficient I/O on distributed file systems, like HDFS. You can create custom data iterators by extending the mx.io.DataIter class. For information on how to implement this feature, see this tutorial.

Amazon Simple Storage Service (Amazon S3) is a popular choice for customers who need to store large amounts of data at very low cost. In MXNet, you can create iterators that reference the data stored in Amazon S3 in RecordIO, ImageRecordIO, CSV, or NDArray formats without needing to explicitly download the data to disk.

data_iter = mx.io.ImageRecordIter(     
     path_imgrec="s3://bucket-name/training-data/caltech_train.rec",
     data_shape=(3, 227, 227),
     batch_size=4,
     resize=256)

Visualizing neural nets

To enable you to visualize neural network architectures, MXNet is integrated with Graphviz.  To generate a network visualization, you use the symbol that references the last layer of a defined network, along with the shape of the network as defined by its node_atters attribute. The following example shows how to visualize the LeNet canonical CNN:

mx.viz.plot_network(symbol=lenet, shape=shape)

For detailed code and instructions on implementation, see this tutorial.

Profiler support

MXNet has a built-in profiler, which you enable by building MXNet with the USE_PROFILER=1 flag. This can help you profile execution times, layer by layer, in the network (at the symbol level). This feature complements general profiling tools, like nvprof and gprof, by summarizing at the operator level, instead of at the function, kernel, or instruction level. You can enable it for the entire Python program using an environment variable. Or, you can enable it for a subset of the program by integrating it into the code, as follows:

mx.profiler.profiler_set_config(mode='all', filename='output.json')     
mx.profiler.profiler_set_state('run')      
# Code to be profiled goes here...      
mx.profiler.profiler_set_state('stop')

You can load the profiler output into a browser, like Chrome, and view the profile by navigating to the browser’s tracing (chrome://tracing in a Chrome browser), as follows:

This screenshot shows the profile for training the MNIST dataset with the original LeNet architecture implemented in MXNet with profiler instrumentation.

Cheat Sheet

Now that you know about some of the unique features of MXNet, you probably can’t wait to get hands on. This cheat sheet helps you get started building neural networks. It includes some common architectures for CNN, RNN/LSTM, linear regression, and logistic regression. Use it to learn how to create data iterators and Amazon S3 iterators, implement checkpointing, and save model files. It even includes examples of how to build a complete model architecture, and how to fine tune a pretrained model.

Apache MXNet Cheat Sheet

Click to enlarge

To get started on deep learning with MXNet, see our tutorials.

The MXNet community is working on a dynamic, elegant, easy-to-use imperative interface called Gluon. Gluon will introduce new ways to build neural networks in MXNet. Stay tuned!


Additional Reading

Learn how to build a real-time object classification system with Apache MXNet on Raspberry Pi.


About the Author

Sunil Mallya is a Senior Solutions Architect in the AWS Deep Learning team. He helps our customers build machine learning and deep learning solutions to advance their businesses. In his spare time, he enjoys cooking, sailing and building self driving RC autonomous cars.