Posted On: Sep 30, 2022

We are excited to announce that Amazon SageMaker Model Training now supports SageMaker Training Managed Warm Pools. Users can now opt in to keep their machine learning (ML) model training hardware instances warm for a specified duration of time after the job completes. Using this feature, customers can do iterative experimentation or run consecutive jobs at scale for model training on the same warm instances, with up to 8x reduction reduction in job startup latency.

Amazon SageMaker Model Training is a fully managed capability that spins up instances for every job, trains a model, and then spins down instances after the job. Customers are billed only for the duration of the job. This fully managed capability gives customers the freedom to focus on their ML algorithm, and not worry about infrastructure management while training their models. However, because hardware instances are provisioned for every training job, this behavior introduces startup latency for repetitive training workloads. Given that the model training process requires substantial iterative experimentation, this startup latency for every single job is an additional overhead for customers. Moreover, customers who like to train high volumes of models at scale often use the same instance configurations for consecutive training jobs, and find this startup latency for every job burdensome.

With SageMaker Training Managed Warm Pools, customers can keep their model training hardware instances warm after every job for a specified period. This allows them to start training using an instance that is already up and running, in order to do iterative experimentation or train high volumes of models consecutively. With SageMaker Training Managed Warm Pools, customers can reduce the startup latency for a model training job reduces by up to 8x. Customers can enable SageMaker Training Managed Warm Pools by specifying a keep-alive period in the training API. If they opt in to use warm pools, then they are billed for the instances and EBS volumes for the duration of the keep-alive period.

SageMaker Training Managed Warm Pools are available in all public AWS Regions where Amazon SageMaker Model Training is available. To get started, see Train Using SageMaker Managed Warm Pools in the Amazon SageMaker Developer Guide.