Amazon Machine Learning

Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

LLMs are scaling exponentially. Learn how advanced technologies like Triton, TRT-LLM and EKS enable seamless deployment of models like the 405B parameter Llama 3.1. Let’s go large.

Deploying generative AI applications with NVIDIA NIMs on Amazon EKS

Learn how to deploy AI models at scale with @AWS using NVIDIA’s NIM and Amazon EKS! This step-by-step guide shows you how to create a GPU cluster for inference. Don’t miss part 1 of this 2-part blog series!

Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs

AWS Batch multi-node parallel jobs can now run on Amazon EKS to provide gang scheduling of pods across nodes for large scale distributed computing like ML model training. More details here.

Large scale training with NeMo Megatron on AWS ParallelCluster using P5 instances

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.

Enhancing ML workflows with AWS ParallelCluster and Amazon EC2 Capacity Blocks for ML

No more guessing if GPU capacity will be available when you launch ML jobs! EC2 Capacity Blocks for ML let you lock in GPU reservations so you can start tasks on time. Learn how to integrate Caacity Blocks into AWS ParallelCluster to optimize your workflow in our latest technical blog post.

How computer vision is enabling a circular economy

In this post, we show how Reezocar uses computer vision to change the way they detect damage and price used vehicles for re-sale in secondary markets. This reduces landfill and helps achieve the goals of the circular economy.

Improving NFL player health using machine learning with AWS Batch

In this post we’ll show you how the NFL used AWS to scale their ML workloads and produce the first comprehensive dataset of helmet impacts across multiple NFL seasons. They were able to reduce manual labor by 90% and the results beats human labelers in accuracy by 12%!

Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Batch processing is a common need across varied machine learning use cases such as video production, financial modeling, drug discovery, or genomic research. The elasticity of the cloud provides efficient ways to scale and simplify batch processing workloads while cutting costs. In this post, you’ll learn a scalable and cost-effective approach to configure AWS Batch Array jobs to process datasets that are stored on Amazon S3 and presented to compute instances with Amazon FSx for Lustre.

Select your cookie preferences

AWS HPC Blog

Category: Amazon Machine Learning

Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

Deploying generative AI applications with NVIDIA NIMs on Amazon EKS

Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Enhancing ML workflows with AWS ParallelCluster and Amazon EC2 Capacity Blocks for ML

How computer vision is enabling a circular economy

Improving NFL player health using machine learning with AWS Batch

Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Learn

Resources

Developers

Help