AWS Architecture Blog
Field Notes: Bring your C#.NET skills to Amazon SageMaker
Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the undifferentiated heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.
Amazon SageMaker Notebooks are one-click Jupyter Notebooks with elastic compute that can be spun up quickly. Notebooks contain everything necessary to run or recreate a machine learning workflow. Notebooks in SageMaker are pre-loaded with all the common CUDA and cuDNN drivers, Anaconda packages, and framework libraries. However, there is a small amount of work required to support C# .NET code in the Notebooks.
This blog post focuses on customizing the Amazon SageMaker Notebooks environments to support C# .NET so C#.NET developers can get started with SageMaker Notebooks and machine learning on AWS. To provide support for C#.NET in our SageMaker Jupyter Notebook environment, we use the .NET interactive tool, which is an evolution of the Try .NET global tool. An installation script is provided for you, which automatically downloads and installs these components. Note that the components are distributed under a proprietary license from Microsoft.
After we set up the environment, we walk through an end-to-end example of building, training, deploying, and invoking a built-in image classification model provided by SageMaker. The example in this blog post is modeled after the End-to-End Multiclass Image Classification Example but written entirely using the C# .NET APIs!
Procedure
The following are the high level steps to build out and invoke a fully functioning image classification model in SageMaker using C#.NET:
- Customize the SageMaker Jupyter Notebook instances by creating a SageMaker lifecycle configuration
- Launch a Jupyter Notebook using the SageMaker lifecycle configuration
- Create an Amazon S3 bucket for the validation and training datasets, and the output data
- Train a model using the sample dataset and built-in SageMaker image classification algorithm
- Deploy and host the model in SageMaker
- Create an inference endpoint using the trained model
- Invoke the endpoint with the trained model to obtain real time inferences for sample images
- Clean up the used resources used in the example to stop incurring charges
Customize the Notebook instances
We use Amazon SageMaker lifecycle configurations to install the required components for C# .NET support.
Use the following steps to create the Lifecycle configuration.
- Sign in to the AWS Management Console.
- Navigate to Amazon SageMaker and select Lifecycle configurations from the left menu.
- Select Create configuration and provide a name for the configuration.
- In the Start notebook section paste the following script:
In the above code snippet the following actions are taken:
- Download of the .NET SDK and runtime from Microsoft
- Extraction of the downloads
- Installation of the .NET interactive global tool
- Setting of required PATH and DOTNET_ROOT variables so they are available to the Jupyter Notebook instance
Note: Creation of the lifecycle configuration automatically downloads and installs third-party software components that are subject to a proprietary license.
Launch a Jupyter Notebook instance
After the lifecycle configuration is created, we are ready to launch a Notebook instance.
Use the following steps to create the Notebook instance:
1. In the AWS Management Console, navigate to the Notebook instances page from the left menu.
2. Select Create notebook instance.
3. Provide a name for your Notebook and select an instance type (a smaller instance type such as ml.m2.medium suffices for the purposes for this example).
4. Set Elastic interface to none.
5. Expand the Additional configuration menu to expose the Lifecycle configuration drop down list. Then select the configuration you created. Volume size in GB can be set to the default of 5.
6. In the Permissions and encryption section, select Create a new IAM Role from the IAM role dropdown. This is the role that is used by your Notebook instance, and you can use the provided default permissions for the role. Select Create role.
7. The Network, Git repositories, and tags sections can be left as is.
8. Select Create notebook instance.
Create an S3 bucket to hold the validation and training datasets
It takes a few minutes for the newly launched Jupyter Notebook instance to be ‘InService’. While it is launching, create an S3 bucket to hold the datasets required for our example. Make sure to create the S3 bucket in the same Region as your Notebook instance.
Open the Jupyter Notebook and begin writing code
After the Notebook instance has launched, the ‘Status’ in the AWS Management Console will report as ‘InService’. Select the Notebook, and choose the Open Jupyter link in the Actions column. To begin authoring from scratch, select New -> .NET (C#) from the drop-down menu on the top right hand side.
A blank ‘Untitled’ file opens and is ready for you to start writing code. Alternatively, if you’d like to follow along with this post, you can download the full Notebook from Github and use the ‘Upload’ button to start stepping through the Notebook.
To run a block of code in the Notebook, click into the block and then select the Run button, or hold down your SHIFT Key and press ‘Enter’ on your keyboard.
We start by including the relevant NuGet packages for SageMaker, Amazon S3, and a JSON parser:
Next, we create the service client objects that are used throughout the rest of the code:
Download the required training and validation datasets and store them in Amazon S3
The next step is to download the required training and validation datasets and store them in the S3 bucket that was previously created so they are accessible for our training job. In this demo, we use Caltech-256 dataset, which contains 30608 images of 256 objects. For the sake of brevity, the C# code to download web files and upload them to Amazon S3 is not shown here but can be found in full in the Github repo.
Train a model using the sample dataset and built-in SageMaker image classification algorithm
After we have the data available in the correct format for training, the next step is to actually train the model using the data. We start by retrieving the IAM role we want SageMaker to use from the currently running Notebook instance dynamically.
We set all the training parameters and kick off the training job. We use the built-in image classification algorithm for our training job. We specify this in the TrainingImage parameter by providing the URI for the docker image for this algorithm from the documentation–there are a number of training Images available that correspond with the desired algorithm and the Region we have chosen.
Poll the job for completion status a few times until the job status reports Completed, then you can proceed to the next step.
Deploy and host the model in SageMaker
After the training job has completed, it is time to build the model. Create the request object with all required parameters and make API call to generate the model.
Create an inference endpoint using the trained model
After deploying the model, we are ready to create the endpoint that will be invoked to get real time inferences for images. This is a two-step process: first, we create an endpoint configuration and use it to create the endpoint itself.
Poll the endpoint status a few times until the it reports InService, then proceed to the next step.
Invoke the endpoint with the trained model to obtain real time inferences for sample images
Load the known list of classes/categories into a List (shortened here for brevity). We compare the inference response to this list.
Two images from the Caltech dataset have been chosen at random to be tested against the deployed model. These two images are downloaded locally and loaded into memory so they can be passed as payload to the endpoint. The code below demonstrates one of these in action:
We now have a response returned from the endpoint and we must inspect this result to determine if it is correct. The response is in the form of a list of probabilities–each item in the list represents the probability that the image provided to the endpoint matches the specific class/category in our previously loaded list.
The response indicates that for the given image the highest probabilistic match (~16.9%) to one of our known classes/categories of images, is at Index ‘7’ of the list. We inspect our list of known classes/categories at the same index value to determine the name, which returns “bathtub”. We have a successful match!
Index of Max Probability: 7
Value of Max Probability: 0.16936515271663666
Category of image : bathtub
Clean up the used resources
In order to avoid continuing charges, delete the deployed endpoint and stop the Jupyter Notebook.
Conclusion
If you are a C# .NET developer that was previously overwhelmed by the prospect of getting started with machine learning on AWS, following the guidance in this post will get you up and running quickly. The full Jupyter Notebook for this example can be found in the Github repo.