AWS Machine Learning Blog
Automate Amazon SageMaker Studio setup using AWS CDK
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place, making you much more productive.
In this post, we see how to use the AWS Cloud Development Kit (AWS CDK) to use the new native resource in AWS CloudFormation to set up Studio and configure its access for data scientists and developers in your organization. This way you can set up Studio quickly and consistently, enabling you to apply DevOps best practices and meet safety, compliance, and configuration standards across all AWS accounts and Regions. For this post, we use Python as the main language, but the code can be easily changed to other AWS CDK supported languages.
The AWS CDK is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.
With the AWS CloudFormation native resource to create a Studio domain (AWS::SageMaker::Domain) and a user profile within the domain (AWS::SageMaker::UserProfile), you can automate the setup of Studio.
Before the release of this native resource, developers needed to use a custom resource in AWS CloudFormation to create a Studio domain and add a user profile. With the new native resource, you no longer need to use a custom resource to create and manage your domain.
Prerequisites
To get started, make sure you have the following prerequisites:
- The AWS Command Line Interface (AWS CLI) installed
- The AWS CDK installed
- An AWS profile with permissions to create AWS Identity and Access Management (AWS IAM) roles, Studio domains, and Studio user profiles
Clone the GitHub repo
First, let’s clone the demo code from GitHub using your method of choice at: https://github.com/aws-samples/aws-cdk-sagemaker-studio.
As you clone the repo, you can observe that we have a classic AWS CDK project with the following components:
- app.py – The entry point to deploy the AWS CDK stack
sagemakerStudioCDK
- sagemakerStudioConstructs – Our AWS CDK constructs using the AWS CloudFormation resources from
sagemakerStudioCloudformationStack
- sagemaker-domain-template and sagemaker-user-template – The CloudFormation templates for the native resources to create the Studio domain and user profile
- sagemakerStudioCDK/sagemaker_studio_stack.py – The AWS CDK stack that calls our constructs to create first the Studio domain and add the user profile
AWS CDK constructs
When we open sagemakerStudioConstructs/__init__.py
, we find two AWS CDK constructs:
- SagemakerStudioDomainConstruct – The construct takes as input the mandatory fields required for the native resource
AWS::SageMaker::Domain
and outputs the Studio domain ID with the following parameters: - sagemaker_domain_name – The name of the newly created Studio domain
- vpc_id – The ID of the Amazon Virtual Private Cloud (Amazon VPC) that the domain uses for communication
- subnet_ids – The VPC subnets that the domain uses for communication
- default_execution_role_user – The IAM execution role for the user by default
- SagemakerStudioUserConstruct – The construct takes as input the mandatory fields required for the native resource
AWS::SageMaker::UserProfile
, with the following parameters : - sagemaker_domain_id – The Studio domain ID
- user_profile_name – The user profile name
With the cfn_inc function from the module cloudformation_include
of aws_cdk
, we can include CloudFormation templates in an AWS CDK project.
The constructs use cfn_inc.CfnInclude
to call the native AWS CloudFormation resource with the appropriate parameters. See the following code:
The preserveLogicalIds
parameter makes sure the logical IDs of the user profile are renamed using the AWS CDK algorithm, which makes sure they’re unique within your application. Without that parameter passed, instantiating SagemakerStudioUserConstruct
twice in the same Stack results in duplicated logical IDs.
For simplicity, we use only the mandatory fields in the constructs, but you can add the fields that the native resource supports to the construct and map them as parameters in your CloudFormation template.
Deploy your AWS CDK stack
To deploy your AWS CDK stack, run the following commands in the location where you cloned the repository:
python3 –m venv .cdk-venv
source .cdk-venv/bin/activate
pip install –r requirements.txt
cdk deploy
Review the resources that AWS CDK creates for you in your AWS account and choose yes to deploy the stack.
Wait for your stack to be deployed by checking the status on the AWS CloudFormation console.
When the stack is complete, on the Amazon SageMaker console, choose Amazon SageMaker Studio. You can see a Studio domain created and the user profile added to your Studio Control Panel.
You can update the domain by deploying the same stack again, because the domain creation was with preserve_logical_ids = True
by default.
For instance, if you want to delete or add a user profile to your Studio domain, you can simply change the list team_to_add_in_sagemaker_studio
in sagemaker_studio_stack.py
and deploy the stack again to see your changes.
Clean up:
To avoid ongoing charges for resources you created destroy your AWS CDK stack by running the following commands in the location where you have cloned the repository: cdk destroy
When asked to confirm the deletion of the stack, select yes.
Conclusion
This post showed how you can easily create a Studio domain and add a user profile to the Studio Control Panel using the AWS CDK—without using any custom resources. If you have any questions or comments, please comment on the GitHub repository. If you have any additional examples you want to add, we encourage you to create a Pull request with your example!
You can also as a next step configure Amazon SageMaker Studio for teams and groups with complete resource isolation.
For more information about Amazon SageMaker Studio, see Amazon SageMaker Studio resources.
About the Author
Amine Ait el harraj is a Data Architect with AWS Professional Services. He helps customers and partners embrace the analytics workload in the cloud with flexible and resilient architectures using AWS services. In his free time, Amine enjoys playing competitive video games.