Artificial Intelligence

Brad Doran

Author: Brad Doran

Introducing Amazon SageMaker HyperPod to train foundation models at scale

Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. Creating a resilient environment that can handle failures and environmental changes without losing days or weeks of model training progress is an operational challenge that requires you to […]