Posted On: Dec 27, 2023

We are excited to announce the launch of a new easy and secure way to remotely connect to the model training environment in Amazon SageMaker for improved observability and quick debugging.

Starting today, you can perform remote debugging of model training code running in SageMaker from your local development environment. You can now easily diagnose a stuck training job, use command line tools to monitor the underlying compute resources, debug the training script, and then, quickly fix and execute it. This new capability uses the AWS Systems Manager (SSM) to give you shell level access to the underlying training container. If you use your own Amazon Virtual Private Cloud (VPC) for your model training job, you can also use AWS PrivateLink to set up a VPC Endpoint for SSM and connect to the containers privately.

This feature is now available in all regions where Amazon SageMaker Model Training is available. Learn more by visiting our documentation page.