AWS Contributes to Milestone 1.0 Release of Apache MXNet Including the Addition of a New Model Serving Capability

Today AWS announced contributions to the milestone 1.0 release of the Apache MXNet deep learning engine and the introduction of a new model serving capability for MXNet. These new capabilities (1) simplify training and deploying deep learning models, (2) enable implementation of cutting-edge performance enhancements, and (3) provide easy interoperability between deep learning frameworks.

In this blog post we discuss how to get started with each of the major features introduced today:

Simple, easy to use

Model server for Apache MXNet: The Model Server packages, runs, and serves deep learning models in seconds with just a few lines of code, making them accessible over the internet via an API endpoint. You can then call this endpoint from your applications to make predictions. The Model Server also includes a model zoo with 10 pre-trained models that you can easily deploy without having to train a model yourself. The Model Server simplifies the development of AI capabilities within your web, mobile, and IoT applications.

To get started with the Model Server for Apache MXNet, install the library with the following command:

pip install mxnet-model-server

The Model Server library includes the SqueezeNet v1.1 object classification model. You can start serving the SqueezeNet model using the following command:

mxnet-model-server --models squeezenet= 

Learn more about the Model Server and view the source code, reference examples, and tutorials here.

Advanced Indexing: The 1.0 release includes an advanced indexing capability that enables users to perform tensor operations in a more intuitive manner by leveraging existing knowledge of the ndarray array object class within the Python NumPy library. This capability saves developers time and effort by allowing them to access indices in a more efficient manner. The following are examples of the new advanced indexing capabilities in MXNet:

  • Support the list of integers as an index:
    x = nd.array([[1, 2], [3, 4], [5, 6]], dtype = ‘int32’)
    print(x[[0, 1, 0]])
    [[1, 2] [3, 4]  [1, 0]] # the row index is [0, 1, 0] and print the corresponding row of matrix 
  • Get the diagonal elements from a square matrix:
    • Without the advanced indexing capability, you have to write three lines of code:
      a = [[0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15]]
      index = mx.nd.array([0, 1, 2, 3], dtype=’int32’)
      index = mx.nd.stack(index, index)
      diag = mx.nd.gather_nd(a, index) 
    • However, with the support of advanced indexing, a single line of code does the job:
      diag = a[[0, 1, 2, 3], [0, 1, 2, 3]] 

You can learn more about advanced indexing here.

High performance

The 1.0 release includes implementation of features that are based on the most recent research advances in the deep learning field for optimizing the performance of model training and inference.

Gradient Compression: In distributed training, each machine must communicate frequently with others to update the model parameters and thereby collectively build a single model. This results in high network traffic and often negatively affects training speed and performance. Gradient compression enable users to train models up to five times faster by compressing the model changes communicated by each instance without loss in convergence rate or accuracy. Gradient compression uses the approach of delaying the synchronization of weight updates below the threshold. Architectures like VGGNet and AlexNet can exhibit significant improvements in training speed since they have a low compute to communication ratio.

The following example shows how 2-bit gradient compression compresses the gradients when the threshold is set to 2. The elements in green meet the threshold and thereby represented by the two bits `11` in quantized form. Negative values of the elements in blue meet the threshold and are thereby represented by a different set of two bits `10`. The rest of the values whose absolute values are lower than the threshold are represented by `00`. Gradient compression compresses the gradients being communicated by using only two bits for each gradient value. The difference between actual gradients and the decompressed gradients is stored as residual, and is added to the next iteration before quantization.

Gradient compression is a runtime configuration parameter, and you can enable it using the Gluon API as follows. You can learn more about gradient compression here.

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1}, compression_params={'type’:'2bit', 'threshold':0.5})

NVIDIA Collective Communication Library (NCCL): NCCL implements multi-GPU collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides communication routines that are optimized to achieve high bandwidth over connections between multi-GPUs. MXNet supports NCCL for single node multi-GPU systems, and results in an approximately 20% increase in training speed. This requires installing the library, and passing a runtime configuration parameter for the type of kvstore as shown below.

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1}, kvstore = ‘nccl’)

You can learn more about NCCL here.

Easy interoperability

MXNet now includes a tool for converting neural network code written with the Caffe framework to MXNet code, making it easier for users to take advantage of MXNet’s scalability and performance. The Caffe translator takes the training/validation prototxt and solver prototxt as input and produces MXNet Python code as output. The translated Python code uses MXNet Symbol and Module API to build the network, reads data from LMDB files, runs training, and saves the trained model using the MXNet Module API. To get started with the Caffe translator for MXNet, download the runnable JAR file from the Apache maven.

To translate train_val.prototxt and solver.prototxt to MXNet Python code, run the following command:

java -jar caffe-translator-<version>.jar --training-prototxt <train_val.prototxt_path> \
    --solver <solver.prototxt_path> \ --output-file <output_file_path>

To run the translated code, either Caffe with the MXNet interface or MXNet with the Caffe plugin is required. You can simply run the translated Python code like any other Python code as follows:


You can learn more about the Caffe Translator here.

Getting started with MXNet

To get started with Apache MXNet, perform a pip install with the following command:

pip install mxnet==1.0.0

To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models.

About the Author

Sukwon Kim is a Senior Product Manager for AWS Deep Learning. He works on products that make it easier for customers to use deep learning engines with a specific focus on the open source Apache MXNet engine. In his spare time, he enjoys hiking and traveling.