AWS HPC Blog
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS
LLMs are scaling exponentially. Learn how advanced technologies like Triton, TRT-LLM and EKS enable seamless deployment of models like the 405B parameter Llama 3.1. Let’s go large.
Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances
Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.

