Skip to main content

AWS AI Chips

AWS Neuron

SDK to optimize AI and deep learning on AWS Trainium and AWS Inferentia

What is AWS Neuron?

AWS Neuron is the developer stack for running deep learning and generative AI workloads on AWS Trainium and AWS Inferentia. Built on an open-source foundation, Neuron enables developers to build, deploy and explore natively with PyTorch and JAX frameworks and with ML libraries such as HuggingFace, vLLM, PyTorch Lightning, and others without modifying your code.  It includes a compiler, runtime, training and inference libraries, and developer tools for monitoring, profiling, and debugging. Neuron supports your end-to-end machine learning (ML) development lifecycle from building and deploying deep learning and AI models, optimizing to achieve highest performance and lowest cost, and getting deeper insights into model behavior. 

Neuron enables rapid experimentation, production scale training of frontier models, low level performance optimization through the Neuron Kernel Interface (NKI) for custom kernels, cost optimized inference deployment for agentic AI and reinforcement learning workloads, and comprehensive profiling and debugging with Neuron Explorer. 

An abstract illustration of a human head silhouette filled with colorful geometric data patterns and lines, representing artificial intelligence and machine learning concepts.

Built For Researchers

Neuron enables rapid AI research by running native PyTorch code unchanged on Trainium. Researchers can try new ideas and iterate quickly with PyTorch Eager mode support. Scaling is easy with PyTorch distributed libraries such as, FSDP, DDP, and DTensor to shard models across chips or scale to multiple nodes. Neuron supports torch.compile, and libraries like TorchTitan and HuggingFace Transformers now work directly on Trainium without modification. Also, JAX developers can utilize Neuron to easily develop, optimize and deploy their models to Inferentia and Trainium.   

Abstract digital illustration featuring concentric rings of pink brick-like segments on a dark blue background, representing AWS security, identity, and compliance concepts.

Built for Productivity

Neuron optimizes inference economics for agentic AI and reinforcement learning workloads. Standard vLLM V1 APIs work on Trainium and Inferentia with out-of-the-box high performance with features like Expert Parallelism, disaggregated inference, and speculative decoding, and optimized kernels from the Neuron Kernel Library to maximize token economics at scale.  ML developers can train with HuggingFace Optimum Neuron, PyTorch Lightning, and TorchTitan, then deploy inference with standard vLLM APIs. 

A vibrant abstract background featuring a pattern of colorful rectangular shapes arranged in horizontal rows, with various shades of blue, red, purple, pink, and yellow.

Built for Innovation

Building AI models requires both rapid innovation and performance optimization. While standard frameworks like PyTorch make it easy to scale experimentation, pushing boundaries of performance requires optimizing the full stack (chip, server and UltraServer). Neuron provides ML performance engineers unparallelled access to our AWS AI chips, through Neuron Kernel Interface(NKI), deeper insights through Neuron Explorer, and our optimized kernel library called Neuron Kernel Library (NKILib). NKI provides APIs for memory allocation, and execution scheduling and direct access to the Trainium ISA enabling control over instruction-level programming. NKI Compiler is open sourced, built on MLIR, and provides developers visibility into the complete compiler pipeline. The open-source Neuron Kernel Library provides optimized implementations with source code, documentation, and benchmarks. Neuron Explorer provides a unified suite of tools that guides developers through their performance optimization and debugging journey. Performance engineers can trace execution from source code down to hardware operations, profile single-node and distributed applications, and receive AI-powered insights and actionable recommendations for kernel optimizations and performance improvements.

Abstract illustration featuring various geometric shapes, such as triangles, circles, rectangles, and grids, in gradient pastel yellow and purple tones on a lavender background. Arrows and dotted lines add motion and connectivity, suitable for themes of application integration and creative design.

Built for Open Source

AI innovation thrives in open communities where developers can inspect, modify, and contribute. Neuron is committed to the open-source community and fostering innovation. While we move more parts of our stack to opensource, NKI Compiler, Neuron Kernel Driver, Neuron Kernel Library, NxD Inference, Neuron Explorer, and PyTorch, JAX and vLLM integrations are available today fully open-sourced. Open source libraries and tools enables developers to inspect compiler implementations, contribute optimizations, and adapt kernel code without any barriers. Come build with us. 

An abstract background featuring flowing, multicolored waves beneath a variety of geometric shapes (such as circles, rectangles, hexagons, and triangles) on a dark backdrop. The image visually suggests data visualization, analytics, or machine learning concepts.

Meet Neuron

Neuron provides native integration with PyTorch, enabling researchers and ML developers to run existing code unchanged on Trainium. Standard APIs including FSDP, DDP, and DTensor work seamlessly for distributed training across multi-node setups. Popular ML libraries like TorchTitan, HuggingFace Optimum Neuron, PyTorch Lightning and others, run directly with minimal modifications. Train models with familiar workflows and tools, from pre-training to post-training with reinforcement learning, while leveraging Trainium's performance and cost advantages for both experimentation and production scale training. 

Neuron enables production inference deployment with standard frameworks and APIs on Trainium and Inferentia. vLLM integration with standard APIs delivers high performance serving with optimized kernels from the Neuron Kernel Library. Advanced features including Expert Parallelism, disaggregated inference, and speculative decoding maximize tokens per second while minimizing cost per token. Deploy agentic AI and reinforcement learning workloads at scale with out of the box performance optimizations.  

For performance engineers seeking maximum hardware efficiency, Neuron provides complete control through the Neuron Kernel Interface (NKI), with direct access to the NeuronISA instruction set, memory allocation, and execution scheduling. Developers can create new operations not available in standard frameworks and optimize performance critical code with custom kernels. The open source NKI Compiler, built on MLIR, provides transparency into the compilation processes. The Neuron Kernel Library offers production ready, optimized kernels with complete source code, documentation, and benchmarks.

Neuron Explorer provides a unified suite of tools that guides developers through their performance optimization and debugging journey. By consolidating profiling, debugging, implementing optimizations, and validating improvements into a single environment, Neuron Explorer eliminates time lost across fragmented tools. Hierarchical profiling with code linking for PyTorch, JAX, and NKI traces execution from source code to hardware operations. AI powered recommendations analyze profiles to identify bottlenecks and deliver actionable insights for sharding strategies and kernel optimizations. The UI is open source on GitHub.

Neuron provides comprehensive monitoring and observability capabilities that enable ML developers and MLOps teams to maintain operational excellence for production deployments. Native Amazon CloudWatch integration enables centralized monitoring across ML infrastructure, with support for containerized applications on Kubernetes and Amazon EKS. Partner platform integrations with tools like Datadog extend observability with unified monitoring, logging, and alerting. Neuron provides utilities including neuron-top for real time monitoring, Neuron Monitor for metrics collection, neuron-ls for device listing, and Neuron Sysfs for detailed system information.  

Neuron simplifies deployment for ML developers and MLOps teams with pre-configured environments and infrastructure tooling. Neuron Deep Learning AMIs (DLAMIs) and Deep Learning Containers (DLCs) come ready with the Neuron software stack, popular frameworks, and essential libraries. For Kubernetes deployments, the Neuron Device Plugin manages resource allocation, the Neuron Scheduler Extension provides intelligent workload placement, and the Dynamic Resource Allocation (DRA) driver abstracts hardware topology complexity with intuitive size based resource selection. Helm Charts streamline orchestration for containerized deployments.

Build with Neuron