AWS Machine Learning Blog
AWS launches open source Neo-AI project to accelerate ML deployments on edge devices
At re:Invent 2018, we announced Amazon SageMaker Neo, a new machine learning feature that you can use to train a machine learning model once and then run it anywhere in the cloud and at the edge. Today, we are releasing the code as the open source Neo-AI project under the Apache Software License. This release enables processor vendors, device makers, and deep learning developers to rapidly bring new and independent innovations in machine learning to a wide variety of hardware platforms.
Ordinarily, optimizing a machine learning model for multiple hardware platforms is difficult because developers need to tune models manually for each platform’s hardware and software configuration. This is especially challenging for edge devices, which tend to be constrained in compute power and storage. These constraints limit the size and complexity of the models that they can run. Therefore, developers spend weeks or months manually tuning a model to get the best performance. The tuning process requires rare expertise in optimization techniques and deep knowledge of the hardware. Even then, it typically requires considerable trial and error to get good performance because good tools aren’t readily available.
Differences in software further complicate this effort. If the software on the device isn’t the same version as the model, the model will be incompatible with the device. This leads developers to limit themselves to only the devices that exactly match their model’s software requirements.
All of this makes it very difficult to quickly build, scale, and maintain machine learning applications.
Neo-AI eliminates the time and effort needed to tune machine learning models for deployment on multiple platforms by automatically optimizing TensorFlow, MXNet, PyTorch, ONNX, and XGBoost models to perform at up to twice the speed of the original model with no loss in accuracy. Additionally, it converts models into an efficient common format to eliminate software compatibility problems. On the target platform, a compact runtime uses a small fraction of the resources that a framework would typically consume. By making optimization easier, Neo-AI allows sophisticated models to run on resource-constrained devices, where they can unlock innovation in areas such as autonomous vehicles, home security, and anomaly detection. Neo-AI currently supports platforms from Intel, NVIDIA, and ARM, with support for Xilinx, Cadence, and Qualcomm coming soon.
At its core, Neo-AI is a machine learning compiler and a runtime built on decades of research on traditional compiler technologies, such as LLVM and Halide. It uses TVM and Treelite, which started as open source research projects at the University of Washington. The Neo-AI project uses TVM to compile deep learning models, Treelite to compile decision tree models, platform-specific optimizations from various contributors, and a common runtime for compiled models. AWS is an active contributor to the open source TVM and Treelite projects, and supports the growing TVM and LLVM communities.
Today’s release of AWS code back to open source through the Neo-AI project allows any developer to innovate on the production-grade Neo compiler and runtime. The Neo-AI project will be steered by the contributions of several organizations, including AWS, ARM, Intel, NVIDIA, Qualcomm, Xilinx, Cadence, and others.
By working with the Neo-AI project, processor vendors can quickly integrate their custom code into the compiler at the point at which it has the greatest effect on improving model performance. The project also enables device makers to customize the Neo-AI runtime for the particular software and hardware configuration of their devices. The Neo-AI runtime is currently deployed on devices from ADLINK, Lenovo, Leopard Imaging, Panasonic, and others. The Neo-AI project will absorb innovations from diverse sources into a common compiler and runtime for machine learning to deliver the best available performance for models.
“Intel’s vision of Artificial Intelligence is motivated by the opportunity for researchers, data scientists, developers, and organizations to obtain real value from advances in deep learning,” said Naveen Rao, General Manager of the Artificial Intelligence Products Group at Intel. “To derive value from AI, we must ensure that deep learning models can be deployed just as easily in the data center and in the cloud as on devices at the edge. By supporting Neo through Intel’s software efforts including nGraph and OpenVINO, device makers and system vendors can get better performance for models developed in almost any framework on platforms based on all Intel compute platforms.”
“NVIDIA Jetson with TensorRT is the best performing platform for AI at the edge” said Ian Buck, Vice President and General Manager, Accelerated Computing, NVIDIA. “Neo simplifies the deployment of deep learning models in production by optimizing them for both NVIDIA Tensor Core GPUs and NVIDIA Jetson GPUs to provide higher throughput and low-latency. Our collaboration with AWS and Neo will bring the full capability of NVIDIA Inferencing from the edge to the cloud to a broader set of developers.”
Sudip Nag, Corporate Vice President at Xilinx, said, “Xilinx provides the FPGA hardware and software capabilities that accelerate machine learning inference applications in the cloud and at the edge. We are pleased to support developers using Neo to optimize models for deployment on Xilinx FPGAs. We look forward to enabling Neo-AI to use Xilinx ML Suite to deliver optimal inference performance per watt.”
“ARM’s vision of a trillion connected devices by 2035 is driven by the additional consumer value derived from innovations like machine learning,” said Jem Davies, fellow, General Manager and Vice President for the Machine Learning Group at ARM. “The combination of Neo and the ARM NN SDK will help developers optimize machine learning models to run efficiently on a wide variety of connected edge devices.”
To learn more, see the Neo-AI repository on GitHub.
About the Authors
Sukwon Kim is a Senior Product Manager for AWS Deep Learning. He works on products that make it easier for customers to use deep learning engines. In his spare time, he enjoys hiking and traveling.
Vin Sharma is a Engineering Leader for AWS Deep Learning. He leads the team building Neo, which helps ML models train once and run anywhere in the cloud and at the edge.