AWS Machine Learning Blog

Category: Containers

Scale LLMs with PyTorch 2.0 FSDP on Amazon EKS – Part 2

This is a guest post co-written with Meta’s PyTorch team and is a continuation of Part 1 of this series, where we demonstrate the performance and ease of running PyTorch 2.0 on AWS. Machine learning (ML) research has proven that large language models (LLMs) trained with significantly large datasets result in better model quality. In […]

Training and serving H2O models using Amazon SageMaker

Model training and serving steps are two essential pieces of a successful end-to-end machine learning (ML) pipeline. These two steps often require different software and hardware setups to provide the best mix for a production environment. Model training is optimized for a low-cost, feasible total run duration, scientific flexibility, and model interpretability objectives, whereas model […]