Building secure Amazon SageMaker access URLs with AWS Service Catalog
Many customers need a secure method to access Amazon SageMaker notebooks within their private network without logging in to the AWS console, or using the AWS CLI/SDKs. This may be desired for enhanced security or to provide an easier self-service path for data scientists.
In this blog post, we show you a how to connect to a private Amazon SageMaker notebook or Amazon SageMaker Studio without using the internet. For the purpose of this post, we provide a bastion host that emulates a VPN or similar connection. We provide all supporting code as AWS CloudFormation templates that can be easily deployed in an AWS account.
We make the following assumptions:
- The user has no permissions to provision Amazon SageMaker from the AWS Management Console.
- The user cannot use the Open Jupyter button to open notebooks.
- The user has permissions to use AWS Service Catalog.
- In the context of the enterprise, a private enterprise web portal (like Jenkins) interacts with the architecture through API calls to AWS Service Catalog.
Figure 1 shows the network architecture used in this scenario:
The private subnet hosts the Amazon SageMaker and other required endpoints. The public subnet hosts the internet gateway and the Windows bastion host.
Users must use the bastion host’s browser to open their Amazon SageMaker notebook instance. Otherwise, they receive a 403 error. This is the recommended method to connect to the notebook for enhanced security.
Developers and data scientists use AWS Service Catalog to provision the URL required to connect from the bastion host to either Amazon SageMaker Studio or an Amazon SageMaker notebook. For more information about provisioning, see the AWS Service Catalog Administrator Guide.
The solution includes the following components:
- AWS Service Catalog makes it possible for an enterprise to offer an approved list of offerings that end users can quickly deploy. Provisioned products from AWS Service Catalog are preconfigured and preapproved by the enterprise to follow its constraints and best practices. In this solution, an AWS Service Catalog product is used to provision the URL on behalf of the user.
- Amazon SageMaker is a managed machine learning service that helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models. In this solution, a Lambda function invokes the SageMaker API.
- An AWS CloudFormation template with IAM roles and policies to create:
- The VPC, which includes all solution components, including endpoints.
- The SageMaker notebook instance or Amazon SageMaker Studio.
- The private subnet
- NAT gateway
- SageMaker endpoints
- The public subnet
- Internet gateway
- Bastion host
- Security group
- An IAM role called Developer
- A Windows bastion host that allows you to use RDP to connect to the server and use a browser to connect to the notebook instance privately. A bastion host simulates the on-premises connection, but you can use AWS Direct Connect or VPN instead.
Before you deploy the solution, you also need the following:
- An Amazon Elastic Compute Cloud (Amazon EC2) key pair in the same AWS Region as the deployment. We suggest you use BURNER for the key pair name.
- An RDP client.
- Your IP address, to ensure the security group allows your incoming RDP connection. You can use a web search engine to return your IP address. When you deploy the solution, add the netmask (for example, /32).
The AWS Service Catalog product provides a secure method (without CLI or SDK) for a user to retrieve a notebook’s URL from on-premises resources (through AWS Direct Connect or VPN). After the URL has been provisioned, the user’s connection is from the user’s computer to the notebook endpoint. The user does not leave the boundaries of the corporate network. There is no network traffic exposure to the internet aside from AWS Management Console access to AWS Service Catalog.
The product launches a Lambda function that retrieves the URL using the SDK. It is important to enforce restrictions in the IAM role on the Lambda function when making the SDK call. These restrictions apply to the actual URL. For example, if the IAM role that retrieves the URL has a condition for source-vpc when calling CreatePresignedNotebookInstanceUrl, that same condition is applied to the URL when the browser opens the WebSocket or HTTPS connection.
Because you can only do this from an on-premises environment that does not have an internet connection, you must use endpoints. Due to service restrictions, we check the source VPC to ensure that the user connects through the notebook endpoint in the same source VPC, not from the internet.
A notebook’s URL can be resolved from an on-premises environment and from the internet through the sagemaker.aws domain. However, when called from on-premises environment using the API and notebook endpoints, the connection must be made to the endpoints. If the connection is made over the internet, the response is Error 403: Access Denied.
To mimic an on-premises connectivity scenario, the Windows bastion host is located in the same VPC as the SageMaker notebook instance and its endpoint.
The AWS CloudFormation template creates a SageMaker notebook instance. To gain access to the notebook instance, the developer uses the SageMaker URL product in AWS Service Catalog. The developer gets a presigned URL that enables connections from the bastion host to the Amazon SageMaker notebook instance through the SageMaker notebook endpoint. The developer gets the URL without using a CLI command (the only supported method to obtain it).
Although the CloudFormation template creates the SageMaker notebook and other components, it does not deploy Amazon SageMaker Studio. To deploy Amazon SageMaker Studio, see the Securing Amazon SageMaker Studio connectivity using a private VPC blog post. When you use Amazon SageMaker Studio, this solution supports IAM users only. You can request the SageMaker Studio URL through the AWS Service Catalog product.
Open the Amazon SageMaker console and use the selections shown in Figure 2 to set up SageMaker Studio:
The output of the AWS Service Catalog product has the URL used to connect to the SageMaker notebook. If the user launched the AWS Service Catalog product from their computer, the user must copy the URL and paste it into the bastion host’s browser. If the user launched the AWS Management Console from the bastion host, the user can just click the URL.
Deploy the solution
To deploy the solution:
- Download the zip file from https://github.com/aws-samples/aws-service-catalog-reference-architectures/raw/master/blog_content/sagemaker-selfservice/sagemaker-selfservice-url.zip
- Uncompress the zip file in your computer
- Create an S3 bucket and copy the bucket URL
- Upload all the files to the bucket.
- Right-click and copy the object URL.
Deploy the CloudFormation template:
- Sign in to your AWS account as an administrator with permission to create resources.
- Choose Create Stack with new resources (standard).
- Choose Amazon S3 URL, paste the link you copied into Amazon S3 URL, and then choose Next.
- In the Parameters section, enter the parameters shown in Figure3:
RepoRootUrl will be the s3 bucket URL where you can find the start.yaml and all other files.
EnableSecNetwork enables or disables the security restrictions for where a SageMaker notebook URL can be consumed. If false is selected, then internet access is allowed. If true is selected, as shown here in Figure 3, then only private access is allowed. You can change this parameter by updating the root stack after you deploy the solution through the CloudFormation template.
- Switch to the Developer role in the account. (This role was created by the CloudFormation template.)
The Developer role has the following IAM Policies: AWSServiceCatalogEndUserFullAccess and DeveloperPassRoleToSageMakerNotebookManagedPolicy. The first policy allows you to invoke an AWS Service Catalog product. The second policy allows the Lambda function to invoke the API on the user’s behalf.
Using the solution:
- In the AWS Service Catalog user console go to your products, launch the product “URL for SageMaker Notebook”. Using the notebook name as a parameter, create a presigned URL.
When the AWS Service Catalog product is successfully launched, you should see the following:
The product has three outputs:
- A clickable URL. It can be used only once and must be used within five minutes of its creation.
- The copy paste text for the hosts file, for any host that cannot reach the VPC for DNS resolution.
- The internal IP address of the notebook endpoint in the VPC. This IP address must be reachable by HTTPS to open the notebook.
- Connect through RDP to the Windows bastion host. For information, see Connecting to your Windows instance in the Amazon EC2 User Guide for Windows Instances.
- Using Chrome on the bastion host, paste the URL you created in step 1.
We use Chrome because it uses Windows DNS and displays an error message. Firefox has its own DNS resolution. Do not use Internet Explorer or Microsoft Edge.
You should see the following:
- Repeat step 1 and paste the URL into your browser.
If you have made your connection from the bastion host and used Chrome, you will see the Jupyter notebook. If you receive a 403 error page or you are redirected to the AWS Management Console sign-in page, then your access has been denied, either because your connection went to the public endpoint or your URL has expired.
Before you attempt to retrieve a URL for Amazon SageMaker Studio, you must create an IAM user for SageMaker Studio. The user name is the only parameter required to retrieve a URL using the AWS Service Catalog product.
In Figure 8, you can see the parameter used to create the user for SageMaker Studio. This is not an IAM user.
Go to the AWS Service Catalog console to request your URL.
To avoid ongoing charges, delete the resources used in this blog post. You must first delete Amazon SageMaker Studio. Then terminate or delete all provisioned products in AWS Service Catalog. Sometimes you must manually delete the CloudFormation template that is generated by AWS Service Catalog before you can terminate the AWS Service Catalog product. The last step is to delete the root CloudFormation template.
The AWS Service Catalog URL product we shared in this blog post provides a simple method for a data scientist to access a SageMaker pre-signed URL. This enables data science teams to be immediately productive in their machine learning journey on AWS, while maintaining an enhanced enterprise security posture.
About the authors
Daniel Castro is an AWS Solutions Architect based in Toronto. He helps customers across Canada transform their businesses and execute successful cloud solutions. Om Patri, PhD is a Customer Delivery Architect at AWS Professional Services where he builds enterprise data lakes and machine learning infrastructure. He enjoys working closely with customers as they onboard their data teams to leverage AI/ML capabilities in the cloud and maximize business value from data.