AWS Machine Learning Blog
Bringing your own R environment to Amazon SageMaker Studio
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). With a single click, data scientists and developers can quickly spin up SageMaker Studio notebooks to explore datasets and build models. On October 27, 2020, Amazon released a custom images feature that allows you to launch SageMaker Studio notebooks with your own images.
SageMaker Studio notebooks provide a set of built-in images for popular data science and ML frameworks and compute options to run notebooks. The built-in SageMaker images contain the Amazon SageMaker Python SDK and the latest version of the backend runtime process, also called kernel. With the custom images feature, you can register custom built images and kernels, and make them available to all users sharing a SageMaker Studio domain. You can start by cloning and extending one of the example Docker files provided by SageMaker, or build your own images from scratch.
This post focuses on adding a custom R image to SageMaker Studio so you can build and train your R models with SageMaker. After attaching the custom R image, you can select the image in Studio and use R to access the SDKs using the RStudio reticulate package. For more information about R on SageMaker, see Coding with R on Amazon SageMaker notebook instances and R User Guide to Amazon SageMaker.
You can create images and image versions and attach image versions to your domain using the SageMaker Studio Control Panel, the AWS SDK for Python (Boto3), and the AWS Command Line Interface (AWS CLI)—for more information about CLI commands, see AWS CLI Command Reference. This post explains both AWS CLI and SageMaker console UI methods to attach and detach images to a SageMaker Studio domain.
Prerequisites
Before getting started, you need to meet the following prerequisites:
- Install the AWS CLI on your local machine. This post uses AWS CLI version 2. You should make necessary adjustments if you use a different AWS CLI version.
- Permissions to access the Amazon Elastic Container Registry (Amazon ECR). For more information, see Amazon ECR Managed Policies.
- Install Docker on your local machine. For more information, see Orientation and setup. This is necessary for building Docker images and pushing them to Amazon ECR.
- For instructions on building a container from a Studio development environment, see Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks and the SageMaker Docker Build GitHub repo.
- Dockerfile for the R image. A sample Dockerfile is provided in this post that you can customize for your own specific case, but you can also use your own Dockerfile.
- Building the R image from the Dockerfile installs dependencies that may be licensed under copyleft licenses such as GPLv3. You should review the license terms and make sure they are acceptable for your use case before proceeding to build this image.
- An AWS Identity and Access Management (IAM) role that has the
AmazonSageMakerFullAccess
policy attached. If you have onboarded to SageMaker Studio, you can get the role from the Studio Summary section on the SageMaker Studio Control Panel. - A SageMaker Studio domain. For instructions, see Onboard to Amazon SageMaker Studio.
Creating your Dockerfile
Before attaching your image to Studio, you need to build a Docker image using a Dockerfile. You can build a customized Dockerfile using base images or other Docker image repositories, such as Jupyter Docker-stacks repository, and use or revise the ones that fit your specific need.
SageMaker maintains a repository of sample Docker images that you can use for common use cases (including R, Julia, Scala, and TensorFlow). This repository contains examples of Docker images that are valid custom images for Jupyter KernelGateway Apps in SageMaker Studio. These custom images enable you to bring your own packages, files, and kernels for use within SageMaker Studio.
For more information about the specifications that apply to the container image that is represented by a SageMaker image version, see Custom SageMaker image specifications.
For this post, we use the sample R Dockerfile. This Dockerfile takes the base Python 3.6 image and installs R system library prerequisites, conda via Miniconda, and R packages and Python packages that are usable via reticulate
. You can create a file named Dockerfile
using the following script and copy it to your installation folder. You can customize this Dockerfile for your specific use case and install additional packages.
Setting up your installation folder
You need to create a folder on your local machine and add the following files in that folder:
In the following scripts, the Amazon Resource Names (ARNs) should have a format similar to:
Dockerfile
is the Dockerfile that you created in the previous step.
- Create a file named
app-image-config-input.json
with the following content:
- Create a file named
default-user-settings.json
with the following content. If you’re adding multiple custom images, add to the list ofCustomImages
.
- Create one last file in your installation folder named
create-and-attach-image.sh
using the following bash script. The script runs the following in order:- Creates a repository named smstudio-custom in Amazon ECR and logs into that repository
- Builds an image using the Dockerfile and attaches a tag to the image r
- Pushes the image to Amazon ECR
- Creates an image for SageMaker Studio and attaches the Amazon ECR image to that image
- Creates an
AppImageConfigfor
this image usingapp-image-config-input.json
Updating an existing SageMaker Studio domain with a custom image
If you already have a Studio domain, you don’t need to create a new domain, and can easily update your existing domain by attaching the custom image. You can do this either using the AWS CLI for Amazon SageMaker or the SageMaker Studio Control Panel (which we discuss in the following sections). Before going to the next steps, make sure your domain is in Ready status, and get your Studio domain ID from the Studio Control Panel. The domain ID should be in d-xxxxxxxx
format.
Using the AWS CLI for SageMaker
In the terminal, navigate to your installation folder and run the following commands. This makes the bash scrip executable:
Then execute the following command in terminal:
After you successfully run the bash script, you need update your existing domain by executing the following command in the terminal. Make sure you provide your domain ID and Region.
After executing this command, your domain status shows as Updating
for a few seconds and then shows as Ready
again. You can now open Studio.
When in the Studio environment, you can use the Launcher to launch a new activity, and should see the custom-r (latest)
image listed in the dropdown menu under Select a SageMaker image to launch your activity.
Using the SageMaker console
Alternatively, you can update your domain by attaching the image via the SageMaker console. The image that you created is listed on the Images page on the console.
- To attach this image to your domain, on the SageMaker Studio Control Panel, under Custom images attached to domain, choose Attach image.
- For Image source, choose Existing image.
- Choose an existing image from the list.
- Choose a version of the image from the list.
- Choose Next.
- Choose the IAM role. For more information, see Create a custom SageMaker image (Console).
- Choose Next.
- Under Studio configuration, enter or change the following settings. For information about getting the kernel information from the image, see DEVELOPMENT in the SageMaker Studio Custom Image Samples GitHub repo.
- For EFS mount path, enter the path within the image to mount the user’s Amazon Elastic File System (Amazon EFS) home directory.
- For Kernel name, enter the name of an existing kernel in the image.
- (Optional) For Kernel display name, enter the display name for the kernel.
- Choose Add kernel.
- (Optional) For Configuration tags, choose Add new tag and add a configuration tag.
For more information, see the Kernel discovery and User data sections of Custom SageMaker image specifications.
- Choose Submit.
- Wait for the image version to be attached to the domain.
While attaching, your domain status is in Updating
. When attached, the version is displayed in the Custom images list and briefly highlighted, and your domain status shows as Ready
.
The SageMaker image store automatically versions your images. You can select a pre-attached image and choose Detach to detach the image and all versions, or choose Attach image to attach a new version. There is no limit to the number of versions per image or the ability to detach images.
Using a custom image to create notebooks
When you’re done updating your Studio domain with the custom image, you can use that image to create new notebooks. To do so, choose your custom image from the list of images in the Launcher. In this example, we use custom-r
. This shows the list of kernels that you can use to create notebooks. Create a new notebook with the R kernel.
If this is the first time you’re using this kernel to create a notebook, it may take about a minute to start the kernel, and the Kernel Starting
message appears on the lower left corner of your Studio. You can write R scripts while the kernel is starting but can only run your script after your kernel is ready. The notebook is created with a default ml.t3.medium
instance attached to it. You can see R (Custom R Image) kernel and the instance type on the upper right corner of the notebook. You can change ML instances on the fly in SageMaker Studio. You can also right-size your instances for different workloads. For more information, see Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker.
To test the kernel, enter the following sample R script in the first cell and run the script. This script tests multiple aspects, including importing libraries, creating a SageMaker session, getting the IAM role, and importing data from public repositories.
The abalone dataset in this post is from Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (http://archive.ics.uci.edu/ml/datasets/Abalone).
If the image is set up properly and the kernel is running, the output should look like the following screenshot.
Listing, detaching, and deleting custom images
If you want to see the list of custom images attached to your Studio, you can either use the AWS CLI or go to SageMaker console to view the attached image in the Studio Control Panel.
Using the AWS CLI for SageMaker
To view your list of custom images via the AWS CLI, enter the following command in the terminal (provide the Region in which you created your domain):
The response includes the details for the attached custom images:
If you want to detach or delete an attached image, you can do it on the SageMaker Studio Control Panel (see Detach a custom SageMaker image). Alternatively, use the custom image name from your default-user-settings.json
file and rerun the following command to update the domain by detaching the image:
Then, delete the app image config:
Delete the SageMaker image, which also deletes all image versions. The container images in Amazon ECR that are represented by the image versions are not deleted.
After deleting the image, it will not be listed under custom images in SageMaker Studio. For more information, see Clean up resources.
Using the SageMaker console
You can also detach (and delete) images from your domain via the Studio Control Panel UI. To do so, under Custom images attached to domain, select the image and choose Detach. You have the option to also delete all versions of the image from your domain. This detaches the image from the domain.
Getting logs in Amazon CloudWatch
You can also get access to SageMaker Studio logs in Amazon CloudWatch, which you can use for troubleshooting your environment. The metrics are captured under the /aws/sagemaker/studio
namespace.
To access the logs, on the CloudWatch console, choose CloudWatch Logs. On the Log groups page, enter the namespace to see logs associated with the Jupyter server and the kernel gateway.
For more information, see Log Amazon SageMaker Events with Amazon CloudWatch.
Conclusion
This post outlined the process of attaching a custom Docker image to your Studio domain to extend Studio’s built-in images. We discussed how you can update an existing domain with a custom image using either the AWS CLI for SageMaker or the SageMaker console. We also explained how you can use the custom image to create notebooks with custom kernels.
For more information, see the following resources:
- Bringing your own custom container image to Amazon SageMaker Studio notebooks
- R User Guide to Amazon SageMaker
- Coding with R on Amazon SageMaker notebook instances
- Amazon SageMaker Examples GitHub repo
About the Authors
Nick Minaie is an Artificial Intelligence and Machine Learning (AI/ML) Specialist Solution Architect, helping customers on their journey to well-architected machine learning solutions at scale. In his spare time, Nick enjoys family time, abstract painting, and exploring nature.
Sam Liu is a product manager at Amazon Web Services (AWS). His current focus is the infrastructure and tooling of machine learning and artificial intelligence. Beyond that, he has 10 years of experience building machine learning applications in various industries. In his spare time, he enjoys making short videos for technical education or animal protection.