AWS Machine Learning Blog
Understanding Amazon SageMaker notebook instance networking configurations and advanced routing options
This post was reviewed June, 2022.
An Amazon SageMaker notebook instance provides a Jupyter notebook app through a fully managed machine learning (ML) Amazon EC2 instance. Amazon SageMaker Jupyter notebooks are used to perform advanced data exploration, create training jobs, deploy models to Amazon SageMaker hosting, and test or validate your models.
The notebook instance has a variety of networking configurations available to it. In this blog post we’ll outline the different options and discuss a common scenario for customers.
Amazon SageMaker notebook instances can be launched with or without your Virtual Private Cloud (VPC) attached. When launched with your VPC attached, the notebook can either be configured with or without direct internet access.
IMPORTANT NOTE: Direct internet access means that the Amazon SageMaker service is providing a network interface that allows for the notebook to talk to the internet through a NAT interface in a VPC managed by the service.
Using the Amazon SageMaker console, these are the three options:
- No customer VPC is attached.
- Customer VPC is attached with direct internet access.
- Customer VPC is attached without direct internet access.
What does it really mean?
Each of the three options automatically configures the network interfaces on the managed EC2 instance with a set of routing configurations. In certain situations, you might want to modify these settings to route specific IP address ranges to a different network interface. Next, we’ll step through each of these default configurations:
- No attached customer VPC (1 network interface)
In this configuration, all the traffic goes through the single network interface. The notebook instance is running in an Amazon SageMaker managed VPC as shown in the above diagram.
- Customer attached VPC with direct internet access (2 network interfaces)
In this configuration, the notebook instance needs to decide which network traffic should go down either of the two network interfaces.
If we look at an example where we launched into a VPC with a 172.31.0.0/16 CIDR range and look at the OS-level route information, we see this.
Looking at this route table, we can see:
- 172.31.0.0/16 traffic will use eth2 interface.
- Some Docker and metadata routes.
- All other traffic will use the eth0 interface.
For simplicity, we’ll focus on the eth0 and eth2 configurations and not the other Docker/ec2 metadata entries. This shows us the following configuration:
This default setting uses the internet network interface (eth0) for all traffic except for the CIDR range for the customer attached VPC (eth2). This setting sometimes needs to be overwritten when interacting with either on-premises or peered-VPC resources.
- Customer attached VPC without direct internet access.
IMPORTANT NOTE: In this configuration, the notebook instance can still be configured to access the internet. The network interface that gets launched only has a private IP address. What that means is that it needs to either be in a private subnet with a NAT or to access the internet back through a virtual private gateway. If launched into a public subnet, it won’t be able to speak to the internet through an internet gateway (IGW).
Common customer patterns
Accessing on-premises resources from an Amazon SageMaker instance with direct internet access:
Suppose we have the following configuration:
If we try to access the on-premises resource in the 10.0.0.0/16 CIDR range, it will get routed by the OS through the eth0 internet interface. This interface doesn’t have the connection back on-premises and won’t allow us to communicate with on-premises resources.
To route back on premises, we’ll want to update the route table to have the following:
To do this, we can perform the following commands from a terminal on the Amazon SageMaker notebook instance:
Now if we look at the route table by entering “route -n” we see the route:
We see a route for 10.0.0.0 with mask 255.255.0.0 (which is the same as 10.0.0.0/16) going through the VPC routing IP address (172.31.64.1).
There is still one issue.
If we restart the notebook the changes won’t persist. Only the changes made to the ML storage volume are persisted with a stop/start. Generally, for the package and files to be persisted, they need to be under “/home/ec2-user/SageMaker”. In this case, we’ll use a different feature: lifecycle configuration to add the route any time the notebook starts.
We can create a lifecycle policy as shown in the following diagram:
Now we can create our notebooks with this lifecycle configuration:
With this setup when the notebook gets created or when it’s stopped and restarted, we will have the networking configuration we expect:
Amazon SageMaker Jupyter notebooks are used to perform advanced data exploration, create model training jobs, deploy models to Amazon SageMaker hosting, and test or validate your models. This drives the need for the notebook instances to have various networking configurations available to it. Knowing how these configurations can be adapted allows you to integrate with existing resources in your organization and enterprise.
About the Author
Ben Snively is an AWS Public Sector Specialist Solutions Architect. He works with government, non-profit, and education customers on big data/analytical and AI/ML projects, helping them build solutions using AWS.