AWS Neuron adds support for Llama 2, GPT-NeoX, and SDXL generative AI models

AWS Neuron is the SDK for Amazon EC2 Inferentia and Trainium based instances purposely-built for generative AI. Today, with Neuron 2.13 release, we are launching support for Llama 2 model training and inference, GPT-NeoX model training and adding support for Stable Diffusion XL and CLIP models inference.

Neuron integrates with popular ML frameworks like PyTorch and TensorFlow, so you can get started with minimal code changes and without vendor-specific solutions. Neuron includes a compiler, runtime, profiling tools, and libraries to support high performance training of generative AI models on Trn1 instances and inference on Inf2 instances. Neuron 2.13 introduces AWS Neuron Reference for Nemo Megatron library supporting distributed training of LLMs like Llama 2 and GPT-3 and adds support for GPT-NeoX model training with the Neuron Distributed library. This release adds optimized LLM inference support for Llama 2 with the Transformers Neuron library and support for SDXL, Perceiver and CLIP models inference using PyTorch Neuron.

You can use AWS Neuron SDK to train and deploy models on Trn1 and Inf2 instances, which are available in the following AWS Regions as On-Demand Instances, Reserved Instances, and Spot Instances, or as part of a Savings Plan: US East (N. Virginia), US West (Oregon), and US East (Ohio).

For a full list of new features and enhancements in Neuron 2.13, visit Neuron Release Notes. To get started with Neuron, see:

AWS Neuron adds support for Llama 2, GPT-NeoX, and SDXL generative AI models

Ending Support for Internet Explorer