Amazon SageMaker HyperPod now supports Spot Instances
Amazon SageMaker HyperPod now supports Spot Instances, enabling customers to reduce GPU compute costs by up to 90% compared to on-demand instances on HyperPod . As AI workloads scale, optimizing infrastructure costs becomes increasingly critical. SageMaker HyperPod's Spot integration addresses this by allowing customers to automatically leverage spare EC2 capacity at significant discounts, while providing the managed AI experience customers enjoy on HyperPod.
With Spot Instances, organizations can run fault-tolerant workloads cost-effectively at scale. You can combine Spot with on-demand instances to balance cost optimization with guaranteed availability. The feature is available on HyperPod EKS clusters and integrates with Karpenter for intelligent auto-scaling, automatically discovering available Spot capacity and handling instance interruptions.
You can enable Spot Instances when creating instance groups through the CreateCluster API or AWS Console. The feature supports all instance types available on HyperPod, including CPUs and GPUs. Capacity availability depends on supply from EC2 and varies by region and instance type. Spot instance support is available in all regions where SageMaker HyperPod is currently available. To learn more, please refer to the documentation.