AWS Cloud Operations Blog
Enable self-service, secured data science using Amazon SageMaker notebooks and AWS Service Catalog
by Sanjay Garje and Vebhhav (Veb) Singh
Enterprises of all sizes are moving to the AWS Cloud. We hear from leadership of those enterprise teams that they are looking to provide a safe, cost-governed way to provide easy access to Amazon SageMaker to promote experimentation with data science to unlock new business opportunities and disrupt the status quo. In this blog post, Veb Singh and I will show you how you can easily enable self-service, secured data science using Amazon SageMaker, AWS Service Catalog, and AWS Key Management Service (KMS).
This blog post explains how AWS Service Catalog uses a pre-configured AWS KMS key to encrypt data at rest on the machine learning (ML) storage volume that is attached to your notebook instance without ever exposing the complex, unnecessary details to data scientists. ML storage volume encryption is enforced by an AWS Service Catalog product that is pre-configured and blessed by centralized security and/or infrastructure teams. When you create Amazon SageMaker notebook instances, training jobs, or endpoints, you can specify an AWS KMS key ID and that key will encrypt the attached ML storage volumes. You can specify an output Amazon S3 bucket for training jobs that is also encrypted with a key managed with AWS KMS. You can pass in the KMS Key ID for storing the model artifacts in that output Amazon S3 bucket.
AWS Service Catalog’s launch constraint feature allows provisioning of AWS resources by giving developers and data scientists either minimum or no IAM permissions to underlying AWS services. Governed access through AWS Service Catalog enables a better security posture and limits the blast radius. AWS Service Catalog also allows the centralized infrastructure team to enforce configuration standards across AWS services, while granting development teams the flexibility to customize AWS resources by using parameters at launch time.
The following diagram shows how AWS Service Catalog ensures two separate workflows for cloud system administrators and data scientists or developers who work with Amazon SageMaker.
Depending on your role, you will perform different tasks in the workflow:
- Administrator: Create an AWS CloudFormation template that deploys the Amazon SageMaker notebook instance.
- Administrator: Create a product portfolio and a product (the Amazon SageMaker notebook instance) in AWS Service Catalog.
- Developer / data scientist: Discover and launch the Amazon SageMaker notebook instance.
- (optional) Administrator: Ensure that the notebooks are encrypted by using Amazon CloudWatch and AWS CloudTrail logs.
Step 1. Create an AWS CloudFormation template
Open a text editor or your favorite code editor, copy the following CloudFormation template, and paste it into a new file.
Note the AWS KMS key ID being used to encrypt data at rest on the ML storage volume that is attached to your notebook instance. You will replace this value with your own already-provisioned AWS KMS key for the specific AWS region.
AWSTemplateFormatVersion: '2010-09-09'
Metadata:
License: Apache-2.0
Description: '@Author: Sanjay Garje. AWS CloudFormation Sample Template SageMaker NotebookInstance: This template demonstrates
the creation of a SageMaker NotebookInstance with encryption. You will be billed for the AWS resources used if you create a stack from
this template.'
Parameters:
NotebookInstanceName:
AllowedPattern: '[A-Za-z0-9-]{1,63}'
ConstraintDescription: Maximum of 63 alphanumeric characters. Can include hyphens
(-), but not spaces. Must be unique within your account in an AWS Region.
Description: SageMaker Notebook instance name
MaxLength: '63'
MinLength: '1'
Type: String
Default: 'myNotebook'
NotebookInstanceType:
AllowedValues:
- ml.t2.medium
ConstraintDescription: Must select a valid notebook instance type.
Default: ml.t2.medium
Description: Select Instance type for the SageMaker Notebook
Type: String
KMSKeyId:
Description: AWS KMS key ID used to encrypt data at rest on the ML storage volume attached to notebook instance.
Type: String
Default: 'Replace it with your KMSKeyId'
Resources:
SageMakerRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- "sagemaker.amazonaws.com"
Action:
- "sts:AssumeRole"
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
- "arn:aws:iam::aws:policy/AmazonS3FullAccess"
- "arn:aws:iam::aws:policy/IAMReadOnlyAccess"
SageMakerNotebookInstance:
Type: "AWS::SageMaker::NotebookInstance"
Properties:
KmsKeyId: !Ref KMSKeyId
NotebookInstanceName: !Ref NotebookInstanceName
InstanceType: !Ref NotebookInstanceType
RoleArn: !GetAtt SageMakerRole.Arn
Outputs:
SageMakerNoteBookURL:
Description: "URL for the newly created SageMaker Notebook Instance"
Value: !Sub 'https://${AWS::Region}.console.aws.amazon.com/sagemaker/home?region=${AWS::Region}#/notebook-instances/openNotebook/${NotebookInstanceName}'
SageMakerNoteBookTerminalURL:
Description: "Terminal access URL for the newly created SageMaker Notebook Instance"
Value: !Sub 'https://${NotebookInstanceName}.notebook.${AWS::Region}.sagemaker.aws/terminals/1'
SageMakerNotebookInstanceARN:
Description: "ARN for the newly created SageMaker Notebook Instance"
Value: !Ref SageMakerNotebookInstance
Step 2. Create a product portfolio and a product for instantiating the Amazon SageMaker notebook in AWS Service Catalog
To provide users with products, begin by creating a portfolio for those products. To create a portfolio, follow the detailed instructions in the AWS Service Catalog documentation.
On the AWS Service Catalog console Create portfolio page, use the following values for creating the portfolio:
- Portfolio name – ML Portfolio
- Description – Machine Learning Portfolio
- Owner – IT
Provide TagOptions details to mandate tags.
AWS Service Catalog enforces the use of mandated tags when any of the portfolio products are launched. Follow the Managing Tag options links for further details.
Create a new product using detailed instructions in the AWS Service Catalog documentation. On the AWS Service Catalog console Upload new product page, use the following values for creating the product:
- Product name – SageMaker Notebooks
- Description – Notebooks for Data Scientists
- Provided by – IT
- Vendor (optional) – Amazon Web Services
On the Enter support details page, type the following and then choose Next:
- Email contact – valid Email address
- Support link – http://it.org/support
- Support description – Support details for SageMaker Notebooks
On the Version details page, choose Upload a template file, select Choose file, locate the deploy-sagemaker-notebook.template file you saved when you set up the CloudFormation template, and then choose Next:
- Version title – 1.0
- Description – This is the initial version of enabling SageMaker Notebooks
On the Review page, choose CREATE.
Let’s add the “SageMaker Notebooks” product to an existing product portfolio. Choose ADD PRODUCT.
Select SageMaker Notebooks and choose ADD PRODUCT.
Add the end user who needs access to this product portfolio by following steps mentioned here.
Step 3. Log in as a data scientist or a developer to launch the product
Log in as a data scientists end user and choose Products List.
Choose Launch product.
The mandatory tag Cost Center is automatically populated by AWS Service Catalog.
Tags are a very powerful feature that you can use further to optimize your costs. For example, you can write an AWS Lambda function that can stop all Amazon SageMaker notebook instances tagged as ‘dev’ at 6 PM and start them again at 8 AM every day. And you can write a Lambda function to keep them stopped over weekend. That’s cost optimization! Here is a sample example.
Review all the parameters and choose Launch.
The product is launched and its status is “In progress”. After the status changes to “Succeeded”, go to SageMaker Notebooks and you should see the newly-provisioned notebook.
Choose the notebook name to see the details.
Step 4. Validate using the AWS CloudTrail console to ensure that the AWS KMS key is used during notebook instance creation
Conclusion
Customers from enterprises of all sizes have asked for self-service enablement of a machine learning environment for data scientists that comes with the right level of governance. In this blog post, Veb Singh and I show you how AWS Service Catalog now provides an easy way to enforce governance and security for provisioning Amazon SageMaker notebooks. By leveraging AWS Service Catalog, cloud administrators are able to define the right level of controls and enforce data encryption along with centrally-mandated tags for any AWS service used by various groups. At the same time, data scientists can achieve self-service and a better security posture by simply launching an Amazon SageMaker notebook instance through AWS Service Catalog.
About the Authors
Vebhhav (Veb) Singh is a San Francisco-based Sr. Solutions Architect for AWS. Veb works with some of the large strategic AWS customers. He loves to play with technologies and find simple solutions for complex problems. Veb is passionate about hydroponics. Using AWS IoT, serverless, and machine learning, he has built a fully automated green house with 20+ micro controllers. Leafy greens and strawberries are available in his greenhouse all year round. He loves to travel to a new destination every year with his family.
Sanjay Garje leads US west region’s technical business development for AWS Service Catalog and AWS Control Tower. Sanjay is a passionate technology leader who takes pride in helping customers on their AWS Cloud journeys by showing them how to transform their business and technology outcomes. In his free time, Sanjay enjoys running, learning new things, teaching Cloud & Big Data technologies at SJSU and traveling to new destinations with his family.