AWS Architecture Blog

Field Notes: Bring your C#.NET skills to Amazon SageMaker

Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the undifferentiated heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.

Amazon SageMaker Notebooks are one-click Jupyter Notebooks with elastic compute that can be spun up quickly. Notebooks contain everything necessary to run or recreate a machine learning workflow. Notebooks in SageMaker are pre-loaded with all the common CUDA and cuDNN drivers, Anaconda packages, and framework libraries. However, there is a small amount of work required to support C# .NET code in the Notebooks.

This blog post focuses on customizing the Amazon SageMaker Notebooks environments to support C# .NET so C#.NET developers can get started with SageMaker Notebooks and machine learning on AWS. To provide support for C#.NET in our SageMaker Jupyter Notebook environment, we use the .NET interactive tool, which is an evolution of the Try .NET global tool. An installation script is provided for you, which  automatically downloads and installs these components. Note that the components are distributed under a proprietary license from Microsoft.

After we set up the environment, we walk through an end-to-end example of building, training, deploying, and invoking a built-in image classification model provided by SageMaker. The example in this blog post is modeled after the End-to-End Multiclass Image Classification Example but written entirely using the C# .NET APIs!

Procedure

The following are the high level steps to build out and invoke a fully functioning image classification model in SageMaker using C#.NET:

  • Customize the SageMaker Jupyter Notebook instances by creating a SageMaker lifecycle configuration
  • Launch a Jupyter Notebook using the SageMaker lifecycle configuration
  • Create an Amazon S3 bucket for the validation and training datasets, and the output data
  • Train a model using the sample dataset and built-in SageMaker image classification algorithm
  • Deploy and host the model in SageMaker
  • Create an inference endpoint using the trained model
  • Invoke the endpoint with the trained model to obtain real time inferences for sample images
  • Clean up the used resources used in the example to stop incurring charges

Customize the Notebook instances

We use Amazon SageMaker lifecycle configurations to install the required components for C# .NET support.

Use the following steps to create the Lifecycle configuration.

  1. Sign in to the AWS Management Console.
  2. Navigate to Amazon SageMaker and select Lifecycle configurations from the left menu.
  3. Select Create configuration and provide a name for the configuration.
  4. In the Start notebook section paste the following script:
#!/bin/bash

set -e
wget https://download.visualstudio.microsoft.com/download/pr/d731f991-8e68-4c7c-8ea0-
fad5605b077a/49497b5420eecbd905158d86d738af64/dotnet-sdk-3.1.100-linux-x64.tar.gz
wget https://download.visualstudio.microsoft.com/download/pr/30ab052d-dbb6-4bce-8a44-
a831034589ed/7ffaad695afb7ccd778b0d3fc1c89f50/dotnet-runtime-3.0.1-linux-x64.tar.gz
mkdir -p /home/ec2-user/dotnet and&and& tar zxf dotnet-runtime-3.0.1-linux-x64.tar.gz -C /home/ec2-user/dotnet
export DOTNET_ROOT=/home/ec2-user/dotnet
export PATH=$PATH:/home/ec2-user/dotnet
export DOTNET_CLI_HOME=/home/ec2-user/dotnet
export HOME=/home/ec2-user

tar zxf dotnet-sdk-3.1.100-linux-x64.tar.gz -C /home/ec2-user/dotnet
Dotnetdotnet tool install --global Microsoft.dotnet-interactive
export PATH=$PATH:/home/ec2-user/dotnet/.dotnet/tools
Dotnetdotnet interactive jupyter install
jupyter kernelspec list

touch /etc/profile.d/jupyter-env.sh
echo "export PATH='$PATH:/home/ec2-user/dotnet/.dotnet/tools:/home/ec2-user/dotnet'">>
/etc/profile.d/jupyter-env.sh

touch /etc/profile.d/dotnet-env.sh
echo "export DOTNET_ROOT='/home/ec2-user/dotnet'">>/etc/profile.d/dotnet-env.sh
sudo chmod -R 777 /home/ec2-user/.dotnet

initctl restart jupyter-server --no-wait

In the above code snippet the following actions are taken:

  • Download of  the .NET SDK and runtime from Microsoft
  • Extraction of the downloads
  • Installation of the .NET interactive global tool
  • Setting of required PATH and DOTNET_ROOT variables so they are available to the Jupyter Notebook instance

Note: Creation of the lifecycle configuration automatically downloads and installs third-party software components that are subject to a proprietary license.

Launch a Jupyter Notebook instance

After the lifecycle configuration is created, we are ready to launch a Notebook instance.

Use the following steps to create the Notebook instance:

1. In the AWS Management Console, navigate to the Notebook instances page from the left menu.

2. Select Create notebook instance.

3. Provide a name for your Notebook and select an instance type (a smaller instance type such as ml.m2.medium suffices for the purposes for this example).

4. Set Elastic interface to none.

5. Expand the Additional configuration menu to expose the Lifecycle configuration drop down list. Then select the configuration you created. Volume size in GB can be set to the default of 5.

Notebook Instance Settings

6. In the Permissions and encryption section, select Create a new IAM Role from the IAM role dropdown. This is the role that is used by your Notebook instance, and you can use the provided default permissions for the role. Select Create role.

7. The Network, Git repositories, and tags sections can be left as is.

8. Select Create notebook instance.

Create an S3 bucket to hold the validation and training datasets

It takes a few minutes for the newly launched Jupyter Notebook instance to be ‘InService’.  While it is launching, create an S3 bucket to hold the datasets required for our example. Make sure to create the S3 bucket in the same Region as your Notebook instance.

Open the Jupyter Notebook and begin writing code

After the Notebook instance has launched, the ‘Status’ in the AWS Management Console will report as ‘InService’. Select the Notebook, and choose the Open Jupyter link in the Actions column. To begin authoring from scratch, select New -> .NET (C#) from the drop-down menu on the top right hand side.

A blank ‘Untitled’ file opens and is ready for you to start writing code. Alternatively, if you’d like to follow along with this post, you can download the full Notebook from Github and use the ‘Upload’ button to start stepping through the Notebook.

To run a block of code in the Notebook, click into the block and then select the Run button, or hold down your SHIFT Key and press ‘Enter’ on your keyboard.

Last checkpoint

We start by including the relevant NuGet packages for SageMaker, Amazon S3, and a JSON parser:

#r "nuget:AWSSDK.SageMaker, 3.3.112.3"
#r "nuget:AWSSDK.SageMakerRuntime, 3.3.101.49"
#r "nuget:AWSSDK.S3, 3.3.110.45"
#r "nuget:Newtonsoft.Json, 12.0.3"

Next, we create the service client objects that are used throughout the rest of the code:

static AmazonS3Client s3Client = new AmazonS3Client();
AmazonSageMakerClient smClient = new AmazonSageMakerClient();
AmazonSageMakerRuntimeClient smrClient = new AmazonSageMakerRuntimeClient();

Download the required training and validation datasets and store them in Amazon S3

The next step is to download the required training and validation datasets and store them in the S3 bucket that was previously created so they are accessible for our training job. In this demo, we use Caltech-256 dataset, which contains 30608 images of 256 objects. For the sake of brevity, the C# code to download web files and upload them to Amazon S3 is not shown here but can be found in full in the Github repo.

Train a model using the sample dataset and built-in SageMaker image classification algorithm

After we have the data available in the correct format for training, the next step is to actually train the model using the data. We start by retrieving the IAM role we want SageMaker to use from the currently running Notebook instance dynamically.

DescribeNotebookInstanceRequest dniReq = new DescribeNotebookInstanceRequest() {
   NotebookInstanceName = "dotNetV3-1"
};
DescribeNotebookInstanceResponse dniResp = await smClient.DescribeNotebookInstanceAsync(dniReq);
Console.WriteLine(dniResp.RoleArn);

We set all the training parameters and kick off the training job. We use the built-in image classification algorithm for our training job. We specify this in the TrainingImage parameter by providing the URI for the docker image for this algorithm from the documentation–there are a number of training Images available that correspond with the desired algorithm and the Region we have chosen.

string jobName = String.Format("DEMO-imageclassification-{0}",DateTime.Now.ToString("yyyy-MM-dd-hh-mmss"));

CreateTrainingJobRequest ctrRequest = new CreateTrainingJobRequest(){
  AlgorithmSpecification = new AlgorithmSpecification(){
  TrainingImage = "433757028032.dkr.ecr.us-west-2.amazonaws.com/image-classification:1",
  TrainingInputMode = "File" 
 },
 RoleArn = dniResp.RoleArn, 
 OutputDataConfig = new OutputDataConfig(){
   S3OutputPath = String.Format(@"Amazon S3s3://{0}/{1}/output",bucketName,jobName)
 },
 ResourceConfig = new ResourceConfig(){
   InstanceCount = 1,
   Instance typeInstanceType = Amazon.SageMaker.TrainingInstanceType.MlP2Xlarge,
   VolumeSizeInGB = 50
 },
 TrainingJobName = jobName,
 HyperParameters = new Dictionary<string,string>() {
   {"image_shape","3,224,224"},
   {"num_layers","18"}, 
   {"num_training_samples","15420"},
   {"num_classes","257"},
   {"mini_batch_size","64"},
   {"Epochsepochs","10"},
   {"learning_rate","0.01"}
 },
 StoppingCondition = new StoppingCondition(){
    MaxRuntimeInSeconds = 360000
 },
 InputDataConfig = new List<Amazon.SageMaker.Model.Channel>(){
   new Amazon.SageMaker.Model.Channel() {
    ChannelName = "train",
    ContentType = "application/x-recordio",
    CompressionType = Amazon.SageMaker.CompressionType.None,
    DatasourceDataSource = new Amazon.SageMaker.Model.DataSource(){
      S3DataSource = new Amazon.SageMaker.Model.S3DataSource(){
      S3DataType = Amazon.SageMaker.S3DataType.S3Prefix,
      S3Uri = s3Train,
      S3DataDistributionType = Amazon.SageMaker.S3DataDistribution.FullyReplicated
      }
    }
 },
 new Amazon.SageMaker.Model.Channel(){
   ChannelName = "validation",
   ContentType = "application/x-recordio",
   CompressionType = Amazon.SageMaker.CompressionType.None,
   DatasourceDataSource = new Amazon.SageMaker.Model.DataSource(){
     S3DataSource = new Amazon.SageMaker.Model.S3DataSource(){
       S3DataType = Amazon.SageMaker.S3DataType.S3Prefix,
       S3Uri = s3Validation,
       S3DataDistributionType = Amazon.SageMaker.S3DataDistribution.FullyReplicated
     }
    }
   }
  }
};

Poll the job for completion status a few times until the job status reports Completed, then you can proceed to the next step.

Deploy and host the model in SageMaker

After the training job has completed, it is time to build the model. Create the request object with all required parameters and make API call to generate the model.

String modelName = String.Format(“DEMO-full-image-classification-model-{0}”,DateTime.Now.ToString(“yyyy-MM-dd-hh-mmss”));
Console.WriteLine(modelName);

CreateModelRequest modelRequest = new CreateModelRequest(){
  ModelName = modelName,
  ExecutionRoleArn = dniResp.RoleArn,
  PrimaryContainer = new ContainerDefinition(){
     Image = “433757028032.dkr.ecr.us-west-2.amazonaws.com/image-classification:latest”,
     ModelDataUrl = tjResp.ModelArtifacts.S3ModelArtifacts
  }</p><p>
};

CreateModelResponse modelResponse = await smClient.CreateModelAsync(modelRequest);
</p><p>Console.WriteLine(modelResponse.ModelArn);

Create an inference endpoint using the trained model

After deploying the model, we are ready to create the endpoint that will be invoked to get real time inferences for images. This is a two-step process: first, we create an endpoint configuration and use it to create the endpoint itself.

CreateEndpointConfigRequest epConfReq = new CreateEndpointConfigRequest(){
   EndpointConfigName = epConfName,
   ProductionVariants = new List&lt;ProductionVariant&gt;(){
     new ProductionVariant() {
       Instance typeInstanceType = Amazon.SageMaker.ProductionVariantInstanceType.MlP28xlarge,
       InitialInstanceCount = 1,
       ModelName = modelName,
       VariantName = "AllTraffic"
     }
   }
 }; 

CreateEndpointConfigResponse epConfResp = await smClient.CreateEndpointConfigAsync(epConfReq);
Console.WriteLine(epConfResp.EndpointConfigArn);

string epName = String.Format("{0}-EndPoint",jobName);
Console.WriteLine(epName);

CreateEndpointRequest epReq = new CreateEndpointRequest(){
  EndpointName = epName,
  EndpointConfigName = epConfName
};

CreateEndpointResponse epResp = await smClient.CreateEndpointAsync(epReq);
Console.WriteLine(epResp.EndpointArn);

Poll the endpoint status a few times until the it reports InService, then proceed to the next step.

Invoke the endpoint with the trained model to obtain real time inferences for sample images

Load the known list of classes/categories into a List (shortened here for brevity).  We compare the inference response to this list.

String[] categoriesArray = new String[]{"ak47", "american-flag", "backpack", "baseball-bat", "baseball-glove", "basketball-hoop", "bat", "bathtub", "bear", "beer-mug", ..... "clutter"};
List&lt;String&gt; categories = categoriesArray.ToList();

Two images from the Caltech dataset have been chosen at random to be tested against the deployed model.  These two images are downloaded locally and loaded into memory so they can be passed as payload to the endpoint. The code below demonstrates one of these in action:

webClient.DownloadFile("http://www.vision.caltech.edu/Image_Datasets/Caltech256/images/008.bathtub/008_0007.jpg", "008_0007.jpg");

MemoryStream data streamdataStream = new MemoryStream(File.ReadAllBytes(@"./008_0007.jpg"));
InvokeEndpointRequest invReq = new InvokeEndpointRequest(){
  EndpointName = epName,
  ContentType = "application/x-image",
  Body = data streamdataStream</p><p>
};
InvokeEndpointResponse invResp = await smrClient.InvokeEndpointAsync(invReq);

//Read the response stream back into a string so it can be reviewed
StreamReader sr = new StreamReader(invResp.Body);
String responseBody = sr.ReadToEnd();

We now have a response returned from the endpoint and we must inspect this result to determine if it is correct. The response is in the form of a list of probabilities–each item in the list represents the probability that the image provided to the endpoint matches the specific class/category in our previously loaded list.

//Load the values into a List so they can be more easily searched
List&lt;Decimal&gt; probabilities = JsonConvert.DeserializeObject&lt;List&lt;Decimal&gt;&gt;(responseBody);

//Determine which category returned the highest Probability match and print it's value and Index
var indexAtMax = probabilities.IndexOf(probabilities.Max());
Console.WriteLine(String.Format("Index of Max Probability: {0}",indexAtMax));
Console.WriteLine(String.Format("Value of Max Probability: {0}",probabilities[indexAtMax]));

//Print which Category name matches with the image
Console.WriteLine(String.Format("Category of image : {0}",categories[indexAtMax]));

The response indicates that for the given image the highest probabilistic match (~16.9%) to one of our known classes/categories of images, is at Index ‘7’ of the list.  We inspect our list of known classes/categories at the same index value to determine the name, which returns “bathtub”.  We have a successful match!

Index of Max Probability: 7
Value of Max Probability: 0.16936515271663666
Category of image : bathtub

Clean up the used resources

In order to avoid continuing charges, delete the deployed endpoint and stop the Jupyter Notebook.

Conclusion

If you are a C# .NET developer that was previously overwhelmed by the prospect of getting started with machine learning on AWS, following the guidance in this post will get you up and running quickly. The full Jupyter Notebook for this example can be found in the Github repo.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
TAGS:
Haider Abdullah

Haider Abdullah

Haider Abdullah is a Partner Solutions Architect with AWS. He provides architectural guidance to help partners and their customers achieve success in the cloud. In his spare time, he enjoys flying and staying out of the clouds!