AWS Machine Learning Blog

Apache MXNet (incubating) adds support for Keras 2

The Keras-MXNet deep learning backend is available now, thanks to contributors to the Keras and Apache MXNet (incubating) open source projects. Keras is a high-level neural network API written in Python. It’s popular for its fast and easy prototyping of CNNs and RNNs.

Keras developers can now use the high-performance MXNet deep learning engine for distributed training of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). With an update of a few lines of code, Keras developers can increase training speed by using MXNet’s multi-GPU distributed training capabilities. Saving an MXNet model is another valuable feature of the release. You can design in Keras, train with Keras-MXNet, and run inference in production, at-scale with MXNet.

Distributed training with Keras 2 and MXNet

This article shows how to install Keras-MXNet and demonstrates how to train a CNN and an RNN. If you tried distributed training with other deep learning engines before, you know that it can be tedious and difficult. Let us show you what it’s like now, with Keras-MXNet.

Installation is only a few steps

  1. Deploy an AWS Deep Learning AMI
  2. Install Keras-MXNet
  3. Configure Keras-MXNet

1. Deploy an AWS Deep Learning AMI

Follow this short tutorial for deploying an AWS Deep Learning AMI (DLAMI). To take advantage of the multi-GPU training examples, launch a p3.8xlarge or similar multi-GPU instance type.

Want to install the dependencies to run CUDA, Keras, MXNet, and other frameworks like TensorFlow yourself? Then follow the Keras-MXNet installation guide.

2. Install Keras-MXNet

Install Keras-MXnet and its dependencies in the MXNet Conda environment on your DLAMI. It already has Keras version 1.0, so you will need to uninstall that first.  Login to your DLAMI and run the following:

# Activate the MXNet Python 3 environment on the DLAMI
$ source activate mxnet_p36

# Install a dependency needed for Keras datasets
$ pip install h5py

# Uninstall older versions Keras-MXNet
$ pip uninstall keras-mxnet

# Install Keras-MXNet v2.1.6 
$ pip install keras-mxnet

Keras-MXnet and its dependencies are now installed in the MXNet Conda environment on your DLAMI.

3. Validate Keras-MXNet Installation

Validate your Keras is running an MXNet backend with the following:

$ python
>>>import keras as k
   Using MXNet backend

CNN support

Now let’s train a ResNet model on the CIFAR-10 dataset to identify 10 classes: Airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. We can use a Keras 2 example script from the examples section of the Keras-MXNet repository. Using MXNet as a backend for Keras requires very little updating of the script on your part.

First, download the example script from the Keras-MXNet repo folder.

$ wget

The script calls the multi_gpu_model API and passes the number of GPUs to use.

Second, run nvidia-smi in your terminal window to determine the number of available GPUs on your DLAMI. In the next step, you will run the script as-is if you have four GPUs, otherwise run the following command to open the script for editing.

$ vi

The script has the following line that defines the number of GPUs. Update it if necessary.

model = multi_gpu_model(model, gpus=4)

Now, run the training.

$ python

(Optional) Check GPU utilization and memory use with the nvidia-smi command while your training is running. Open another terminal session for this.

RNN Support

Keras-MXNet currently has experimental support for RNNs. There are some limitations when using an RNN with an MXNet backend. For more information, see Keras-MXNet documentation. The example here includes the workarounds you need in order to train the IMDB dataset using LSTM layer. Despite the workarounds, training this RNN on a multi-GPU AMI is going to be both relatively easy and faster than what you may have been used to.

Use the imdb_lstm example script.  Pass the input length in the embedding layer and set unroll=True as follows.

First, at a terminal session on your DLAMI, download the example script from the Keras-MXNet repo folder.

$ wget

Second, open the script and jump to the following line to review it:

model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(LSTM(128, unroll=True))

Third, the example script has already been modified to be compatible with MXNet backend, so you can run it:

$ python

(Optional) Check GPU utilization and memory use with the nvidia-smi command while your training is running. Open another terminal session for this.


To help you evaluate performance of the different Keras backends, we have added a benchmark module to Keras-MXNet. By using various models and datasets on CPU, single GPU, and multi-GPU machines as described in the tables here, you can see that Keras-MXNet has faster CNN training speeds, and efficient scaling across multiple GPUs. This is shown in the bar chart of training speed. For information about how to run the benchmark scripts and generate detailed benchmark results, see the Keras Benchmarks readme.

Benchmark Configuration

  • Keras Version 2.1.6
  • MXNet Version 1.2.0
  • Image Data Format: Channel first

Training the CIFAR10 dataset resulted in sublinear scaling due to the smaller nature of the dataset’s images. The dataset is composed of 50,000 images with a size of 32×32 pixels. The communication overhead of conveying these small images is higher than the computational power offered by the jump from four to eight GPUs. ImageNet and synthetic data datasets are better at demonstrating the performance improvements that are possible with Keras-MXNet. You can see this in the graphs below the table.

Instance GPUs used



ImageNet (images/s) Synthetic Data
P3.8xLarge 1 831 135 194
P3.8xLarge 4 1783 536 764
P3.16xLarge 8 1680 722 1068

Image Processing Speed Comparison with Keras-MXNet

What’s Next?

Try out some additional Keras-MXNet tutorials or read the details from the release notes.

Further Reading


  1. CIFAR:  “Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.”
  2. IMDB:  “Information courtesy of IMDb, (, Used with permission.”

About the Authors

Sandeep Krishnamurthy is a Software Engineer with AWS Deep Learning. He is on a mission to build software to make usage of AI technologies accessible for every developer to accelerate usage of AI in the day to day life. In his spare time, he is busy learning the basics of good life with his newborn daughter.



Kalyanee Chendke is a Software Engineer for AWS Deep Learning. She is focusing on building tools to allow for increased adoption of Deep Learning technologies. Outside of work, she enjoys playing badminton, painting and spending time with friends and family.




Lai Wei is a Software Engineer with AWS Deep Learning. He is focusing on building easy to use, high-performance and scalable deep learning frameworks for data scientists and engineers. Outside of work, he enjoys skiing and scuba diving.




Aaron Markham is a programmer writer for MXNet and AWS Deep Learning AMI. He has a degree in winemaking and a passion for new technology which he shares by writing and teaching. Aside from talking about deep learning tech, he teaches computer skills to the homeless in Santa Cruz and web programming to prisoners at San Quentin. When not working or teaching, you can find him on the slopes snowboarding or hiking.