Amazon SageMaker HyperPod now supports autoscaling using Karpenter
Amazon SageMaker HyperPod now supports managed node autoscaling using Karpenter, enabling customers to automatically scale their clusters to meet dynamic inference and training demands. Real-time inference workloads require automatic scaling to address unpredictable traffic patterns and maintain service level agreements, while optimizing costs. However, organizations often struggle with the operational overhead of installing, configuring, and maintaining complex autoscaling solutions. HyperPod-managed node autoscaling eliminates the undifferentiated heavy lifting of Karpenter setup and maintenance, while providing integrated resilience and fault tolerance capabilities.
Autoscaling on HyperPod with Karpenter enables customers to achieve just-in-time provisioning that rapidly adapts GPU compute for inference traffic spikes. Customers can scale to zero nodes during low-demand periods without maintaining dedicated controller infrastructure and benefit from workload-aware node selection that optimizes instance types and costs. For inference workloads, this provides automatic capacity scaling to handle production traffic bursts, cost reduction through intelligent node consolidation during idle periods, and seamless integration with event-driven pod autoscalers like KEDA. Training workloads also benefit from automatic resource optimization during model development cycles. You can enable autoscaling on HyperPod using the UpdateCluster API with AutoScaling mode set to "Enable" and AutoScalerType set to "Karpenter".
This feature is available in all AWS Regions where Amazon SageMaker HyperPod EKS clusters are supported. To learn more about autoscaling on SageMaker HyperPod with Karpenter, see the user guide and blog.