AWS Machine Learning Blog

Direct access to Amazon SageMaker notebooks from Amazon VPC by using an AWS PrivateLink endpoint

Amazon SageMaker now supports AWS PrivateLink for notebook instances. In this post, I will show you how to set up AWS PrivateLink to secure your connection to Amazon SageMaker notebooks.

Maintaining compliance with regulations such as HIPAA or PCI may require preventing information from traversing the internet. Additionally, preventing exposure of data to the public internet reduces the likelihood of threat vectors such as brute force and distributed denial-of-service attacks.

AWS PrivateLink simplifies the security of data shared with cloud-based applications by eliminating the exposure of data to the public internet. It enables private connectivity between VPCs, AWS services, and on-premises applications. With AWS PrivateLink your services function as though they were hosted directly on your private network.

To secure your Amazon SageMaker API and prediction calls using AWS PrivateLink, we previously introduced PrivateLink support for API operations and runtime. Now it’s possible to use AWS PrivateLink to secure your connection to notebook instances as well.

To use Amazon SageMaker notebooks via AWS PrivateLink, you need to set up Amazon Virtual Private Cloud (VPC) endpoints. AWS PrivateLink enables you to privately access all Amazon SageMaker API operations from your VPC in a scalable manner by using interface VPC endpoints. A VPC endpoint is an elastic network interface in your subnet with private IP addresses. It serves as an entry point for all Amazon SageMaker API calls.

To limit access to the VPC endpoints you created, you also need to configure AWS Identity and Access Management (IAM) roles to allow traffic only from your VPC.

Note: Keep in mind that the AWS Management Console is accessed through the public internet, and since your connection will be avoiding the internet with AWS PrivateLink, you’ll need to use Amazon SageMaker only through the CLI and APIs. In other words, you won’t be able to use Amazon SageMaker through the console after you activate AWS PrivateLink with the following configuration.

Creating VPC endpoints

We will go through AWS Management Console steps to create VPC endpoints, but you can do the same operations using AWS Command Line Interface (AWS CLI) commands as well.

Here, we will create 2 VPC endpoints, where one is used to create a notebook instance by using SageMaker APIs, and the other one is used to access the notebook instance you created (CreatePresignedNotebookInstanceUrl). To create VPC endpoints from the console, open the Amazon VPC console, open the Endpoints page, and create a new endpoint, as shown in the following image.

Let’s start with creating the VPC endpoint for our notebook first. Here, you’ll need to define three attributes:

  1. The Amazon SageMaker API service name. For Service category, select AWS services; and for Service Name, select aws.sagemaker.us-west-2.notebook. (The Region information – us-west-2- in the URL may differ depending on the Region you select.)
  2. The VPC and Availability Zones that you want to use:
  3. The security group to be associated with the interface VPC endpoint: If you don’t specify a security group, the default security group for your VPC is associated.

Here, a private hosted zone enables you to access the resources in your VPC using custom DNS domain names, such as example.com, instead of using private IPv4 addresses or private DNS hostnames provided by AWS. The Amazon SageMaker DNS hostname that the AWS CLI and Amazon SageMaker SDKs use by default (https://api.sagemaker.Region.amazonaws.com) resolves to your VPC endpoint.

Repeat the same steps to create a second VPC endpoint for Amazon SageMaker APIs. This time you’ll select com.amazonaws.us-west-2.sagemaker.api while selecting the service name. You can begin using the VPC endpoint when its status is available.

Connecting your private network to your VPC

After you create VPC endpoints, make sure that you are either trying to access your notebook instances from within the same VPC or that you have a configuration in place, such as Amazon Virtual Private Network (VPN) or AWS Direct Connect, to connect to your notebooks. This is not necessary for other Amazon SageMaker API operations, but it’s essential to access your notebooks via a web browser from outside of your VPC since VPN needs to replace the internet gateway while connecting to your VPC. Here is a tutorial that you can refer to while connecting your private network to your VPC by using a VPN: https://aws.amazon.com/premiumsupport/knowledge-center/create-connection-vpc/

Configuring IAM roles

Once you have created VPC endpoints to the API services, you need to update IAM roles with conditional operator policies for all users or groups that will be accessing Amazon SageMaker notebooks. IAM is a web service that helps you securely control access to AWS resources. A policy is an entity that, when attached to an identity or resource, defines their permissions.

To grant or restrict access to Amazon SageMaker notebooks based on the VPC endpoints used, we will employ a aws:sourceVpce condition in the IAM policy. Since IAM denies all access requests by default, attaching an Allow policy with a condition ensures that requests will be successful only if they meet the required condition. For example, the following example policy allows a user to perform API operations only when the request comes through the specified two VPC endpoints (replace the placeholder AWS account ID with your own account ID, and the placeholder VPC endpoint IDs with your own endpoint IDs). Don’t forget to include both VPC endpoints you created.

Note: The actions covered in the following policy exemplifies notebook access cases specifically. You need to update the “Action” section for other Amazon SageMaker APIs you want to cover. Alternatively, you can use “sagemaker:*” to cover all Amazon SageMaker APIs in your policy.

{
    "Id": "notebook-example-with-sourcevpce",
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Enable Notebook Access",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreatePresignedNotebookInstanceUrl",
                "sagemaker:DescribeNotebookInstance"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:sourceVpce": [
                        "vpce-111bbccc",
                        "vpce-111bbddd"
                    ]
                }
            }
        }
    ]
}

This policy works by including an Allow statement with a StringEquals condition. When a user makes a request to Amazon SageMaker through a VPC endpoint, the endpoint’s ID is compared to the aws:sourceVpce values specified in the policy. If the values do not match, the request is denied.

Another way to configure the policy is by using the aws:sourceVpc condition instead of a aws:sourceVpce condition. The difference is that you will be using the VPC information in general instead of a specific endpoint within that VPC. This is useful when you don’t want to limit access by specific endpoints, but rather by the whole VPC. This way you’ll keep VPC information generic and won’t need to update IAM roles in case you update endpoints within that VPC. Here is an example:

{
    "Id": "notebook-example-with-sourcevpc",
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Enable Notebook Access",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreatePresignedNotebookInstanceUrl",
                "sagemaker:DescribeNotebookInstance"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:SourceVpc": "vpc-111bbaaa"
                }
            }
        }
    ]
}

You can also consider using Service Control Policies to orchestrate restrictions, at the account level of granularity, on what services and actions the users, groups, and roles in your organization can do.

Using Amazon SageMaker Notebooks via AWS PrivateLink

Use the following example AWS CLI command to list notebook instances from inside your VPC using the configured VPC endpoint.

aws sagemaker list-notebook-instances --endpoint-url VPC_Endpoint_ID.api.sagemaker.Region.vpce.amazonaws.com

If you enabled private DNS hostnames for your VPC endpoint, as shown in the following image, you don’t need to specify the endpoint URL.

If you enabled a private hosted zone or if you’re using an SDK released before August 13, 2018, you have to specify the endpoint when using the SDK or AWS CLI. For example:

aws --endpoint https://VPC_Endpoint_ID.api.sagemaker.Region.vpce.amazonaws.com sagemaker create-presigned-notebook-instance-url --notebook-instance-name NotebookInstanceName

For the VPC endpoint in the preceding example, this would be:

aws --endpoint https://vpce-08e906a63733a8aa1.api.sagemaker.us-west-2.vpce.amazonaws.com sagemaker create-presigned-notebook-instance-url --notebook-instance-name NotebookInstanceName

If you enabled a private hosted zone and you’re using the SDK released on August 13, 2018, this would be:

aws sagemaker create-presigned-notebook-instance-url --notebook-instance-name NotebookInstanceName

Conclusion

AWS PrivateLink support is available in all Regions where Amazon SageMaker and AWS Private Link are available. To learn more about using security features in Amazon SageMaker, see the Amazon SageMaker Developer Guide.


About the Author

Erkan Tas is a Senior Product Manager for Amazon SageMaker. He is on a mission to make Artificial Intelligence easy, accessible, and scalable through AWS platforms. He is also a sailor, science and nature admirer, Go and Stratocaster player.