AWS Machine Learning Blog

Model Server for Apache MXNet introduces ONNX support and Amazon CloudWatch integration

Today AWS released version 0.2 of Model Server for Apache MXNet (MMS), an open-source library that packages and serves deep learning models for making predictions at scale. Now you can serve models in Open Neural Network Exchange (ONNX) format and publish operational metrics directly to Amazon CloudWatch, where you can create dashboards and alarms.

What is MMS?

MSS is an open-source library that simplifies deploying deep learning models for inference at scale. MMS provides the following:

  • Tooling to package model artifacts into a single model archive. The archive encapsulates all of the artifacts needed to serve the model.
  • The ability to customize every step in the inference execution pipeline using custom code packaged in the model archive.
  • A preconfigured serving stack, including REST API endpoints and an inference engine.
  • For serving scalable models, Docker images that include MMS, MXNet, and nginx.
  • Real-time operational metrics for monitoring MMS and endpoints.

You can install MMS from a PyPI (Python Package Index) package, a preconfigured Docker image, or directly from the Model Server GitHub repository.

Introducing ONNX model serving

ONNX enables interoperability between deep learning frameworks. With MMS version 0.2, you can use MMS to serve ONNX models created with any framework that supports ONNX. This includes PyTorch, Caffe2, Microsoft Cognitive Toolkit (CNTK), and Chainer.

To get started serving ONNX models, see the MMS ONNX Serving documentation.

Publishing model serving metrics to CloudWatch

This release includes integration with CloudWatch, a monitoring service for cloud resources and applications. You can use CloudWatch to collect and track metrics, set alarms, and automatically react to changes.

MMS is now integrated directly with CloudWatch APIs, making it easy to publish operational metrics to CloudWatch. Monitoring operational metrics in near real time is critical to any production service.

MMS reports model serving metrics, such as request counts, errors, latencies, and host resource utilization for CPU, memory, and disk. With CloudWatch integration, you can take advantage of a web-based dashboard, metrics rendering in real time, and the ability to configure triggers and alerts.

To get started with CloudWatch integration for MMS, see the MMS CloudWatch Metrics documentation.

Learn more and contribute

To learn more about MMS, start with our Single Shot Multi Object Detection (SSD) tutorial, which walks you through exporting and serving an SSD model. You can find more examples and documentation in the repository’s model zoo and documentation folder.

As we continue to develop MMS, we welcome community participation submitted as questions, requests, and contributions. Head over to awslabs/mxnet-model-server to get started!

About the authors

Hagay Lupesko is an Engineering Leader for AWS Deep Learning. He focuses on building deep learning systems that enable developers and scientists to build intelligent applications. In his spare time, he enjoys reading, hiking, and spending time with his family.




Jonathan Esterhazy is a Senior Software Engineer with AWS Deep Learning. He builds tools and systems that make it easier to train and deploy deep learning systems at scale.




Ruofei Yu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys spending time with friends and family.