Customize your Amazon SageMaker notebook instances with lifecycle configurations and the option to disable internet access
Amazon SageMaker provides fully managed instances running Jupyter Notebooks for data exploration and preprocessing. Customers really appreciate how easy it is to launch a pre-configured notebook instance with just one click. Today, we are making them more customizable by providing two new options: lifecycle configuration that helps automate the process of customizing your notebook instance, and the ability to disconnect your notebook instances from the public internet so that you can apply controlled security settings in your notebook instance.
Lifecycle configuration of notebook instances
Amazon SageMaker currently provides you the ability to manually install additional libraries on your notebook instances. However, once your notebook instance is terminated, these additional customizations are removed as well, requiring that you manually add them again when you restart your notebook instances. With the new Lifecycle configuration feature in Amazon SageMaker, you can now automate these customizations to be applied at different phases of the lifecycle of an instance. By example, you can write a script to install a list of libraries and, using the Lifecycle configuration feature, configure the scripts to automatically execute every time your notebook instance is started. Similarly, you can choose to automatically run the script only once when the notebook instance is created.
Intuit, known for providing global products and platforms like TurboTax and QuickBooks, uses lifecycle configuration to customize security settings of notebook instances, such as deploying security scanners and reconfiguring routing rules. Intuit also disables direct internet access on notebook instances and uses lifecycle configuration to bootstrap package installation leveraging a private package index deployed in its VPC.
Option to disable direct internet access for notebook instances
Until now, all Amazon SageMaker notebook instances have had direct internet access by default, which cannot be disabled. This allows you to download popular packages, notebooks, and datasets, as well as access other Amazon SageMaker components, through the public internet. However, if you connect a notebook instance to your virtual private cloud (VPC), the notebook instance could provide an additional avenue for data access as is discussed in Notebook Instance Security. Consequently, some customers have asked for the ability to control internet access, especially for notebook instances that are connected to their VPCs. You now have the option to disable the default direct internet access for your Amazon SageMaker notebook instances. This allows you to rely on your VPC configuration to regulate whether or not the notebook instance can access the internet.
To get started with these new features, open the Amazon SageMaker console and create a notebook instance. Navigate to Lifecycle configuration at the bottom of the page. For your first use, since you do not have any lifecycle configuration in your account yet, select Create a lifecycle configuration.
A modal pops up for you to create your first lifecycle configuration. As you create more lifecycle configurations, you can select them here from a drop-down list of your existing configurations.
On this popup window, give the lifecycle configuration a name, put your custom script in the text box under Start notebook or Create notebook, depending on your specific need, and then choose Create configuration. In this particular example, every time your notebook instance is started, the yaml package will be automatically installed and ready for use. At this point, your first lifecycle configuration has been created. You then choose Create notebook instance. A notebook instance will be created and started, and your script will be executed as you have configured. That’s it!
It is very simple to manage your lifecycle configurations as well. On the left navigation pane, under Notebook instances, select Lifecycle configuration.
Here, you’ll see all the lifecycle configurations that you have created. You can create a new lifecycle configuration, or edit/delete an existing one.
Let’s go through the process of creating a notebook instance with direct internet access disabled through the Amazon SageMaker console, although you can achieve the same goal using the AWS SDK, as well.
First, from the Amazon SageMaker console, select Notebook instances on the navigation bar, and choose Create notebook instance.
Next, fill in all the required fields in Notebook instance settings, and select the VPC for your notebook instance connection. You’ll notice a few other fields are enabled. Select the Subnet and Security group(s) as part of the VPC setting. To disable direct internet access, under Direct Internet access, simply choose Disable – use VPC only , and select the Create notebook instance button at the bottom. You are ready to go!
A few minutes later, a notebook instance will be up and running, without direct internet access. Note that in this case, you won’t be able to train or deploy models from notebooks on this notebook instance unless your VPC has a NAT gateway and your security group allows outbound connections. For information about setting up a NAT gateway for your VPC, see Working with NAT Gateways in the in the Amazon Virtual Private Cloud User Guide. For information about security groups, see Security Groups for Your VPC.
To recap, the lifecycle configuration option and the ability to disable internet access for your Amazon SageMaker notebook instances are available today in the U.S. East (N. Virginia), U.S. East (Ohio), EU (Ireland), and U.S. West (Oregon) AWS Regions. To learn more, visit the Amazon SageMaker notebook instances documentation.
About the Author
Fan Li is a Product Manager in the AWS ML Platforms team, which includes Amazon SageMaker, Amazon Machine Learning, and the AWS Deep Learning AMIs. He used to be a big fan of ballroom dance but now loves whatever his 7-year-old son likes.