AWS Neuron adds support for PyTorch 2.1 and Llama-2-70b model inference

AWS Neuron is the SDK for Amazon EC2 Inferentia and Trainium based instances purpose-built for generative AI. Today, with Neuron 2.16 release, we are announcing support for Llama-2 70b model inference on Inf2 instances.

Neuron integrates with popular ML frameworks like PyTorch and TensorFlow, so you can get started with minimal code changes and without vendor-specific solutions. Neuron includes a compiler, runtime, tools, and libraries to support high performance training and inference of generative AI models on Trn1 instances and Inf2 instances.

Neuron 2.16 adds inference support for Llama-2 70b and Mistral-7b models with Transformers NeuronX. This release includes support for PyTorch 2.1 (beta) and Amazon Linux 2023. Neuron 2.16 improves LLM model training user experience with PyTorch Lightning Trainer (beta) support. PyTorch inference now allows to dynamically swap different fine-tuned weights for loaded models. This release introduces Neuron Distributed Event Tracing (NDET) tool to improve debuggability and profiling collective communication operators in the Neuron Profiler tool.

You can use AWS Neuron SDK to train and deploy models on Trn1 and Inf2 instances, available in the following AWS Regions as On-Demand Instances, Reserved Instances, Spot Instances, or part of Savings Plan: US East (N. Virginia), US West (Oregon), and US East (Ohio).

For a full list of new features and enhancements in Neuron 2.16, visit Neuron Release Notes. To get started with Neuron, see:

AWS Neuron adds support for PyTorch 2.1 and Llama-2-70b model inference

Ending Support for Internet Explorer