AWS Machine Learning Blog

Provision and manage ML environments with Amazon SageMaker Canvas using AWS CloudFormation, AWS CDK and AWS Service Catalog

June 2024: This blog post has been updated to reflect the updates in the architecture described. Additionally, support for CloudFormation templates has been added.

The proliferation of machine learning (ML) across a wide range of use cases is becoming prevalent in every industry. However, this outpaces the increase in the number of ML practitioners who have traditionally been responsible for implementing these technical solutions to realize business outcomes.

In today’s enterprise, there is a need for machine learning to be used by non-ML practitioners who are proficient with data, which is the foundation of ML. To make this a reality, the value of ML is being realized across the enterprise through no-code ML platforms. These platforms enable different personas, for example business analysts, to use ML without writing a single line of code and deliver solutions to business problems in a quick, simple, and intuitive manner. Amazon SageMaker Canvas is a visual point-and-click service that enables business analysts to use ML to solve business problems by generating accurate predictions on their own—without requiring any ML experience or having to write a single line of code. SageMaker Canvas has expanded the use of ML in the enterprise with a simple-to-use intuitive interface that helps businesses implement solutions quickly.

Although SageMaker Canvas has enabled democratization of ML, the challenge of provisioning and deploying ML environments in a secure manner still remains. Typically, this is the responsibility of central IT teams in most large enterprises. In this post, we discuss how IT teams can administer, provision, and manage secure ML environments using Amazon SageMaker Canvas, AWS CloudFormation, AWS Cloud Development Kit (AWS CDK) and AWS Service Catalog. The post presents a step-by-step guide for IT administrators to achieve this quickly and at scale.

Overview of Infrastructure as Code on AWS

Deploying and managing infrastructure as code (IaC) is a best practice for any AWS Cloud Administrator. Writing infrastructure as code allows you to automate the provisioning and management of your AWS resources, ensuring consistency and repeatability. It also enables version control, making it easier to track changes and collaborate on infrastructure configurations.

AWS offers some options for your IaC needs, which you can choose according to your preference in terms of programming language, verbosity, and self-service capability.

AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS. CloudFormation allows you to use a simple text file to define all the resources and dependencies in your cloud environment.

The AWS CDK is an open-source software development framework to define your cloud application resources. It uses the familiarity and expressive power of programming languages for modeling your applications, while provisioning resources in a safe and repeatable manner.

AWS Service Catalog lets you centrally manage deployed IT services, applications, resources, and metadata. With AWS Service Catalog, you can create, share, organize and govern cloud resources with infrastructure as code (IaC) templates and enable fast and straightforward provisioning.

Solution overview

In this blog post, we enable provisioning of ML environments using SageMaker Canvas in three steps:

  1. We set up the SageMaker Domain with enhanced security features, including limited internet access, VPC restrictions, VPC Endpoints for AWS Service usage, limited IAM permissions in the SageMaker Execution Role; we also deploy a SageMaker User Profile for the Admin;
  2. We provide an additional template for deploying a user profile which can use the default execution role and therefore access SageMaker Canvas; this template can be reused as many times as it’s needed to onboard all users;
  3. We set up an automation which automatically shuts down idle SageMaker Canvas applications, to keep costs in check; idle applications will be automatically shutdown by means of an AWS Lambda function.

We provide both CloudFormation templates, which can be directly deployed in the AWS account via CLI or the AWS Management Console, as well as the CDK code which creates the Service Catalog product to be deployed. More about Service Catalog later in the blog. Instructions will be provided for both services.

Getting Started

To get started, clone the GitHub repository. This will be useful for both approaches – the AWS CloudFormation approach, as well as the AWS CDK / AWS Service Catalog.

Provision approved ML environments with Amazon SageMaker Canvas using AWS CloudFormation

To deploy the above architecture using AWS CloudFormation, follow these steps:

  1. Clone the repository above
  2. Head over to the cfn-templates folder
  3. Identify the three required CloudFormation templates:
    • sagemaker-domain-with-vpc.yaml
    • citizen-data-scientist-user-profile.yaml
    • canvas-auto-shutdown.yaml
  4. Deploy the template as follows:
    • Using the AWS Management Console:
      • Navigate to AWS CloudFormation Console, and make sure you’re in the right region where you want to create the domain.
      • From the homepage, choose Create Stack.
      • Select Choose an existing template, and then Upload a template file.
      • Upload the template file sagemaker-domain-with-vpc.yaml
      • Provide the required parameters, choose Next, confirm that you allow the creation of IAM resources by selecting the checkbox and then choose Create.
    • Using the AWS CLI:
      aws cloudformation deploy \
          --template-file sagemaker-domain-with-vpc.yaml \
          --stack-name SageMakerDomainStack \
          --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM

After deploying successfully the first template, you can proceed with the deployment of the user profiles for each of your users, as well as the (optional, but recommended) deployment of the auto-shutdown for the idle SageMaker Canvas apps.

Provision approved ML environments with Amazon SageMaker Canvas using AWS Service Catalog

In regulated industries and most large enterprises, you need to adhere to the requirements mandated by IT teams to provision and manage ML environments. These may include a secure, private network, data encryption, controls to allow only authorized and authenticated users such as AWS Identity and Access Management (IAM) for accessing solutions such as SageMaker Canvas, mandatory tagging, logging and monitoring requirements for audit purposes.

As an IT administrator, you can use AWS Service Catalog to create and organize secure, reproducible ML environments with SageMaker Canvas into a product portfolio. This is managed using IaC controls that are embedded to meet the requirements mentioned before, and can be provisioned on demand within minutes. You can also maintain control of who can access this portfolio to launch products.

The following diagram illustrates this architecture.

In this section, you will learn how to:

  1. Manage a portfolio of resources necessary for the approved usage of SageMaker Canvas using AWS Service Catalog.
  2. Deploy an example AWS Service Catalog portfolio for SageMaker Canvas using the AWS CDK.
  3. Provision SageMaker Canvas environments on demand within minutes.

Prerequisites

To provision ML environments with SageMaker Canvas, the AWS CDK, and AWS Service Catalog, you need to do the following:

  1. Have access to the AWS account where the Service Catalog portfolio will be deployed. Make sure you have the credentials and permissions to deploy the AWS CDK stack into your account. The AWS CDK Workshop is a helpful resource you can refer to if you need support.
  2. We recommend following certain best practices that are highlighted through the concepts detailed in the following resources:
  3. Clone the GitHub repository into your environment.

Example flow

In this section, we demonstrate an example of an AWS Service Catalog portfolio with SageMaker Canvas. The portfolio consists of different aspects of the SageMaker Canvas environment that are part of the Service Catalog portfolio:

  • Studio domain – SageMaker Canvas is an application that runs within Studio domains. The domain consists of an Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a range of security, application, policy, and Amazon Virtual Private Cloud (VPC) configurations. An AWS account is linked to one domain per Region.
  • SageMaker Canvas user – SageMaker Canvas is an application where you can add user profiles within the Studio domain for each Canvas user, who can proceed to import datasets, build and train ML models without writing code, run predictions on the model, use Generative AI models.
  • Automated shutdown of SageMaker Canvas workspace instance – SageMaker Canvas users can log out from the Canvas interface when they’re done with their tasks. Alternatively, administrators can configure automatic shut down of SageMaker Canvas sessions. This allows administrators to make sure that instances are shut down when not in use, avoiding incurring in unintended charges.

This example flow can be found in the GitHub repository for quick reference.

Deploy the flow with the AWS CDK

In this section, we deploy the flow described earlier using the AWS CDK. After it’s deployed, you can also do version tracking and manage the portfolio.

The portfolio stack can be found in app.py and the product stacks under the products/ folder. You can iterate on the IAM roles, AWS Key Management Service (AWS KMS) keys, and VPC setup in the studio_constructs/ folder. Before deploying the stack into your account, you can edit the following lines in app.py and grant portfolio access to an IAM role of your choice.

You can manage access to the portfolio for the relevant IAM users, groups, and roles. See Granting Access to Users for more details.

You can now run the following commands to install the AWS CDK and make sure you have the right dependencies to deploy the portfolio:

npm install -g aws-cdk@2.147.0
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Run the following commands to deploy the portfolio into your account:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
AWS_REGION=$(aws configure get region)
cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
cdk deploy --require-approval never

The first two commands get your account ID and current Region using the AWS Command Line Interface (AWS CLI) on your computer. Following this, cdk bootstrap and cdk deploy build assets locally, and deploy the stack in a few minutes.

The portfolio can now be found in AWS Service Catalog, as shown in the following screenshot.

On-demand provisioning

The products within the portfolio can be launched quickly and easily on demand from the Provisioning menu on the AWS Service Catalog console. A typical flow is to launch the Studio domain and the SageMaker Canvas auto shutdown first because this is usually a one-time action. You can then add SageMaker Canvas users to the domain. The domain ID and user IAM role ARN are saved in AWS Systems Manager and are automatically populated with the user parameters as shown in the following screenshot.

You can also use cost allocation tags that are attached to each user. For example, UserCostCenter is a sample tag where you can add the name of each user.

Key considerations for governing ML environments using SageMaker Canvas

Now that we have provisioned and deployed your IaC for SageMaker Canvas, we’d like to highlight a few considerations to govern the SageMaker Canvas-based ML environments focused on the domain and the user profile.

The following are considerations regarding the Studio domain:

  • Networking for SageMaker Canvas is managed at the Studio domain level, where the domain is deployed on a private VPC subnet for secure connectivity. See Securing Amazon SageMaker Studio connectivity using a private VPC to learn more.
  • A default IAM execution role is defined at the domain level. This default role is assigned to all SageMaker Canvas users in the domain.
  • Encryption is done using AWS KMS by encrypting the EFS volume in the domain. For additional controls, you can specify your own managed key, also known as a customer managed key (CMK). See Protect Data at Rest Using Encryption to learn more.
  • The ability to upload files from your local disk is done by attaching a cross-origin resource sharing (CORS) policy to the S3 bucket used by SageMaker Canvas. See Give Your Users Permissions to Upload Local Files to learn more.

The following are considerations regarding the user profile:

  • Authentication in Studio can be done both through single sign-on (SSO) and IAM. If you have an existing identity provider to federate users to access the console, you can assign a Studio user profile to each federated identity using IAM. See the section Assigning the policy to Studio users in Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation to learn more.
  • You can assign IAM execution roles to each user profile. While using Studio, a user assumes the role mapped to their user profile that overrides the default execution role. You can use this for fine-grained access controls within a team.
  • You can achieve isolation using attribute-based access controls (ABAC) to ensure users can only access the resources for their team. See Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation to learn more.
  • You can perform fine-grained cost tracking by applying cost allocation tags to user profiles. When users access SageMaker Canvas, these tags are automatically applied to the Canvas jobs allowing you to track Canvas session and training charges for each user profile.

Cost management

We can track the SageMaker Canvas session and model building costs using AWS Cost Explorer and AWS Cost and Usage Reports (AWS CUR). To learn more, see Manage billing and cost in SageMaker Canvas.  In addition, we can categorize the costs by user by leveraging the UserCostCenter custom tag we assigned to each user profile.

First, we need to activate the custom tag in the Billing and Management Console, see Activating user-defined cost allocation tags for the exact steps. Once the tag is activated, we can use it to group or filter the Canvas charges in AWS Cost Explorer.

You could add additional tags or modify the UserCostCenter  tag used by the “Canvas User” product in your Service Catalog, by editing the below line in the file products/canvas_user_product.py.

tags=[CfnTag(key="cost-center", value=self.user_tag_param.value_as_string)],

Clean up

In order to clean up the resources created, navigate over to the AWS CloudFormation stacks page and delete the SageMaker Canvas stacks. If you’ve used AWS CDK, you can also run cdk destroy from within the repository folder, to do the same.

Conclusion

In this post, we shared how you can quickly and easily provision ML environments with SageMaker Canvas using AWS Service Catalog and the AWS CDK. We discussed how you can create a portfolio on AWS Service Catalog, provision the portfolio, and deploy it in your account. IT administrators can use this method to deploy and manage users, sessions, and associated costs while provisioning Canvas.

Learn more about SageMaker Canvas on the product page and the Developer Guide. For further reading, you can learn how to enable business analysts to access SageMaker Canvas using AWS SSO without the console. You can also learn how business analysts and data scientists can collaborate faster using SageMaker Canvas and Studio.


About the Authors

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Sofian Hamiti is an AI/ML specialist Solutions Architect at AWS. He helps customers across industries accelerate their AI/ML journey by helping them build and operationalize end-to-end machine learning solutions.

Shyam Srinivasan is a Principal Product Manager on the AWS AI/ML team, leading product management for Amazon SageMaker Canvas. Shyam cares about making the world a better place through technology and is passionate about how AI and ML can be a catalyst in this journey.

Avi Patel works as a software engineer on the Amazon SageMaker Canvas team. His background consists of working full stack with a frontend focus. In his spare time, he likes to contribute to open source projects in the crypto space and learn about new DeFi protocols.

Jared Heywood is a Senior Business Development Manager at AWS. He is a global AI/ML specialist helping customers with no-code machine learning. He has worked in the AutoML space for the past 5 years and launched products at Amazon like Amazon SageMaker JumpStart and Amazon SageMaker Canvas.

Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including Natural Language Processing (NLP), MLOps and Low Code No Code tools.

Bharath Sridharan is a Senior Technical Account Manager at AWS and works with strategic customers of AWS to proactively monitor their environment and assist with optimization. Bharath also specialises in the educational devices and Low-Code, No-Code Machine Learning services of AWS. You might run into him at a DeepRacer track at AWS summits.