How do I use Amazon SageMaker Python SDK local mode with SageMaker Studio?

5 minute read
1

I want to use Amazon SageMaker Python SDK local mode with SageMaker Studio.

Short description

Install the SageMaker Studio Docker CLI and (optional) SageMaker Studio Docker UI extensions to add local mode and Docker functionality to SageMaker Studio.

Resolution

Prerequisites

Before you begin, be sure that you complete the following:

  • Your SageMaker Studio domain setup is in VpcOnly mode (note that the PublicInternetOnly mode isn't supported).
  • Your domain is connected to Amazon VPC with DNS hostname and DNS resolution options turned on.
  • Your SageMaker Studio user profile execution role has the following permissions:
sagemaker:DescribeDomain
sagemaker:DescribeUserProfile
sagemaker:ListTags
elasticfilesystem:DescribeMountTargets
elasticfilesystem:DescribeMountTargetSecurityGroups
elasticfilesystem:ModifyMountTargetSecurityGroups
ec2:RunInstances
ec2:TerminateInstances
ec2:DescribeInstances
ec2:DescribeInstanceTypes
ec2:DescribeImages
ec2:DescribeSecurityGroups
ec2:DescribeNetworkInterfaces
ec2:DescribeNetworkInterfaceAttribute
ec2:CreateSecurityGroup
ec2:AuthorizeSecurityGroupIngress
ec2:ModifyNetworkInterfaceAttribute
ec2:CreateTags
  • You installed the Docker CLI extension. (Note that Docker CLI is required for using the UI extension.)
  • You installed Docker Compose.
  • You installed PpYAML, 5.4.1.

Create SageMaker Studio Lifecycle Configuration scripts

1.    Create a Studio Lifecycle Configuration script for JupyterServer App to install the extensions in one of two ways:

Install both the CLI and UI extensions

#!/bin/bash

set -ex
cd ~
if cd sagemaker-studio-docker-cli-extension
then
  git reset --hard
  git pull
else
  git clone https://github.com/aws-samples/sagemaker-studio-docker-cli-extension.git
  cd sagemaker-studio-docker-cli-extension
fi
nohup ./setup.sh > docker_setup.out 2>&1 &
if cd ~/sagemaker-studio-docker-ui-extension
then
  git reset --hard
  git pull
  cd
else
  cd
  git clone https://github.com/aws-samples/sagemaker-studio-docker-ui-extension.git
fi

nohup ~/sagemaker-studio-docker-ui-extension/setup.sh > docker_setup.out 2>&1 &

Install only the CLI extension

#!/bin/bash

set -ex
cd ~
if cd sagemaker-studio-docker-cli-extension
then
 git reset --hard
 git pull
else
 git clone https://github.com/aws-samples/sagemaker-studio-docker-cli-extension.git
 cd sagemaker-studio-docker-cli-extension
fi
nohup ./setup.sh > docker_setup.out 2>&1 &

2.    Create a SageMaker Studio Lifecycle Configuration script for the KernelGateway App:

#!/bin/bash

set -eux
STATUS=$(python3 -c "import sagemaker_dataprep";echo $?)
if [ "$STATUS" -eq 0 ]
then
 echo 'Instance is of Type Data Wrangler'
else
 echo 'Instance is not of Type Data Wrangler'
 cd ~
 if cd sagemaker-studio-docker-cli-extension
 then
  git reset --hard
  git pull
 else
  git clone https://github.com/aws-samples/sagemaker-studio-docker-cli-extension.git
  cd sagemaker-studio-docker-cli-extension
 fi
 nohup ./setup.sh > docker_setup.out 2>&1 &
fi

3.    From a terminal, encode both script contents using base64 encoding:

$ LCC_JS_CONTENT=`openssl base64 -A -in <LifeCycle script file for JupyterServer>`
$ LCC_KG_CONTENT=`openssl base64 -A -in <LifeCycle script file for KernelGateway>`

4.    Create Studio Lifecycle Configurations from environment variables LCC_JS_CONTENT and LCC_KG_CONTENT using these AWS Command Line Interface (CLI) commands:

$ aws sagemaker create-studio-lifecycle-config --studio-lifecycle-config-name sdocker-js --studio-lifecycle-config-content $LCC_JS_CONTENT --studio-lifecycle-config-app-type JupyterServer
$ aws sagemaker create-studio-lifecycle-config --studio-lifecycle-config-name sdocker-kg --studio-lifecycle-config-content $LCC_KG_CONTENT --studio-lifecycle-config-app-type KernelGateway

Note: If you get errors running the CLI commands, make sure you are using the most recent version of AWS CLI. See Troubleshooting AWS CLI errors - AWS Command Line Interface.

Update the Studio domain (optional)

Update the Studio domain to add LCC to default user settings:

$ aws sagemaker update-domain --domain-id <domain-id> --default-user-settings '{"JupyterServerAppSettings": {"DefaultResourceSpec": {"InstanceType": "system", "LifecycleConfigArn": "arn:aws:sagemaker:<region>:<AWS account ID>:studio-lifecycle-config/sdocker-js"}}, "KernelGatewayAppSettings": {"DefaultResourceSpec": {"InstanceType": "<default instance type>", "LifecycleConfigArn": "arn:aws:sagemaker:<region>:<AWS account ID>:studio-lifecycle-config/sdocker-kg"}}}'

Update the Studio user profile

Update your Studio use profile settings as follows:

$ aws sagemaker update-user-profile --domain-id <domain-id> --user-profile-name <user profile> --user-settings '{"JupyterServerAppSettings ": {"DefaultResourceSpec": {"InstanceType": "system", "LifecycleConfigArn": "arn:aws:sagemaker:<region>:<AWS account ID>:studio-lifecycle-config/sdocker-js"}, "LifecycleConfigArns": ["arn:aws:sagemaker:<region>:<AWS account ID>:studio-lifecycle-config/sdocker-js"]}, "KernelGatewayAppSettings": {"DefaultResourceSpec": {"InstanceType": "<default instance type>", "LifecycleConfigArn": "arn:aws:sagemaker:<region>:<AWS account ID>:studio-lifecycle-config/sdocker-kg"}, "LifecycleConfigArns": ["arn:aws:sagemaker:<region>:<AWS account ID>:studio-lifecycle-config/sdocker-kg"]}}'

Launch the new JuypterServer App

Delete any running instance of the JupyterServer App to complete the configuration. Then, launch the new JupyterServer App. When done, the new app shows an InService status.

If you're using the UI extension, wait for it to install. This takes about 10 minutes after you launch the new JupyterServer App. When done, refresh your browser to see the extension.

(Optional) Some Studio kernels come with PyYAML>=6.0 and don't have pgrep or procps Python packages. Local mode requires PyYAML==5.4.1 as higher versions break this functionality. Also, you need pgrep to delete a local endpoint. If required, use the following commands to install these requirements from your Studio notebook. Restart your kernel after the installation is complete.

!conda update --force -y conda
!conda install -y pyyaml==5.4.1
!apt-get install -y procps

Create a Docker host

Now, create a Docker host using the CLI extension that you installed earlier. Use any Amazon Elastic Compute Cloud (EC2) instance type (for example c5.xlarge) as follows:

!sdocker create-host --instance-type c5.xlarge

The output must look similar to the following:

Successfully launched DockerHost on instance i-xxxxxxxxxxxxxxxxx with private DNS ip-xxx-xxx-xxx-xxx.ec2.internal
Waiting on docker host to be ready
Docker host is ready!
ip-xxx-xxx-xxx-xxx.ec2.internal
Successfully created context "ip-xxx-xxx-xxx-xxx.ec2.internal "
ip-xxx-xxx-xxx-xxx.ec2.internal
Current context is now "ip-xxx-xxx-xxx-xxx.ec2.internal "

If you installed the UI extension, select the instance type from the UI, and then choose the Start Host button. The new host appears in the Docker Hosts list, next to a green circle.

Run in local mode

Use the SageMaker Python SDK in local mode.

Important: To avoid extra charges, close any Docker host that you launched after you're done with local mode and no longer need to use Docker. To close a Docker host using the CLI extension, enter:

!sdocker terminate-current-host

Or, in the UI extension, under Docker Hosts, choose the Power icon next to each Docker host. This action shuts down the Docker host and removes it from the Docker Hosts list.

Note: For more information on how to use the CLI extension, see SageMaker Docker CLI extension - Docker integration for SageMaker Studio on the GitHub website.


AWS OFFICIAL
AWS OFFICIALUpdated a year ago