Amazon SageMaker Experiments – Organize, Track And Compare Your Machine Learning Trainings

Today, we’re extremely happy to announce Amazon SageMaker Experiments, a new capability of Amazon SageMaker that lets you organize, track, compare and evaluate machine learning (ML) experiments and model versions.

ML is a highly iterative process. During the course of a single project, data scientists and ML engineers routinely train thousands of different models in search of maximum accuracy. Indeed, the number of combinations for algorithms, data sets, and training parameters (aka hyperparameters) is infinite… and therein lies the proverbial challenge of finding a needle in a haystack.

Tools like Automatic Model Tuning and Amazon SageMaker Autopilot help ML practitioners explore a large number of combinations automatically, and quickly zoom in on high-performance models. However, they further add to the explosive growth of training jobs. Over time, this creates a new difficulty for ML teams, as it becomes near-impossible to efficiently deal with hundreds of thousands of jobs: keeping track of metrics, grouping jobs by experiment, comparing jobs in the same experiment or across experiments, querying past jobs, etc.

Of course, this can be solved by building, managing and scaling bespoke tools: however, doing so diverts valuable time and resources away from actual ML work. In the spirit of helping customers focus on ML and nothing else, we couldn’t leave this problem unsolved.

Introducing Amazon SageMaker Experiments
First, let’s define core concepts:

A trial is a collection of training steps involved in a single training job. Training steps typically includes preprocessing, training, model evaluation, etc. A trial is also enriched with metadata for inputs (e.g. algorithm, parameters, data sets) and outputs (e.g. models, checkpoints, metrics).
An experiment is simply a collection of trials, i.e. a group of related training jobs.

The goal of SageMaker Experiments is to make it as simple as possible to create experiments, populate them with trials, and run analytics across trials and experiments. For this purpose, we introduce a new Python SDK containing logging and analytics APIs.

Running your training jobs on SageMaker or SageMaker Autopilot, all you have to do is pass an extra parameter to the Estimator, defining the name of the experiment that this trial should be attached to. All inputs and outputs will be logged automatically.

Once you’ve run your training jobs, the SageMaker Experiments SDK lets you load experiment and trial data in the popular pandas dataframe format. Pandas truly is the Swiss army knife of ML practitioners, and you’ll be able to perform any analysis that you may need. Go one step further by building cool visualizations with matplotlib, and you’ll be well on your way to taming that wild horde of training jobs!

As you would expect, SageMaker Experiments is nicely integrated in Amazon SageMaker Studio. You can run complex queries to quickly find the past trial you’re looking for. You can also visualize real-time model leaderboards and metric charts.

How about a quick demo?

Logging Training Information With Amazon SageMaker Experiments
Let’s start from a PyTorch script classifying images from the MNIST data set, using a simple two-layer convolution neural network (CNN). If I wanted to run a single job on SageMaker, I could use the PyTorch estimator like so:

estimator = PyTorch(
        entry_point='mnist.py',
        role=role,
        sagemaker_session=sess
        framework_version='1.1.0',
        train_instance_count=1,
        train_instance_type='ml.p3.2xlarge')
    
    estimator.fit(inputs={'training': inputs})

Instead, let’s say that I want to run multiple versions of the same script, changing only one of the hyperparameters (the number of convolution filters used by the two convolution layers, aka number of hidden channels) to measure its impact on model accuracy. Of course, we could run these jobs, grab the training logs, extract metrics with fancy text filtering, etc. Or we could use SageMaker Experiments!

All I need to do is:

Set up an experiment,
Use a tracker to log experiment metadata,
Create a trial for each training job I want to run,
Run each training job, passing parameters for the experiment name and the trial name.

First things first, let’s take care of the experiment.

from smexperiments.experiment import Experiment
mnist_experiment = Experiment.create(
    experiment_name="mnist-hand-written-digits-classification", 
    description="Classification of mnist hand-written digits", 
    sagemaker_boto_client=sm)

Then, let’s add a few things that we want to keep track of, like the location of the data set and normalization values we applied to it.

from smexperiments.tracker import Tracker
with Tracker.create(display_name="Preprocessing", sagemaker_boto_client=sm) as tracker:
     tracker.log_input(name="mnist-dataset", media_type="s3/uri", value=inputs)
     tracker.log_parameters({
        "normalization_mean": 0.1307,
        "normalization_std": 0.3081,
    })

Now let’s run a few jobs. I simply loop over the different values that I want to try, creating a new trial for each training job and adding the tracker information to it.

for i, num_hidden_channel in enumerate([2, 5, 10, 20, 32]):
    trial_name = f"cnn-training-job-{num_hidden_channel}-hidden-channels-{int(time.time())}"
    cnn_trial = Trial.create(
        trial_name=trial_name, 
        experiment_name=mnist_experiment.experiment_name,
        sagemaker_boto_client=sm,
    )
    cnn_trial.add_trial_component(tracker.trial_component)

Then, I configure the estimator, passing the value for the hyperparameter I’m interested in, and leaving the other ones as is. I’m also passing regular expressions to extract metrics from the training log. All these will push stored in the trial: in fact, all parameters (passed or default) will be.

    estimator = PyTorch(
        entry_point='mnist.py',
        role=role,
        sagemaker_session=sess,
        framework_version='1.1.0',
        train_instance_count=1,
        train_instance_type='ml.p3.2xlarge',
        hyperparameters={
            'hidden_channels': num_hidden_channels
        },
        metric_definitions=[
            {'Name':'train:loss', 'Regex':'Train Loss: (.*?);'},
            {'Name':'test:loss', 'Regex':'Test Average loss: (.*?),'},
            {'Name':'test:accuracy', 'Regex':'Test Accuracy: (.*?)%;'}
        ]
    )

Finally, I run the training job, associating it to the experiment and the trial.

    cnn_training_job_name = "cnn-training-job-{}".format(int(time.time()))
    
    estimator.fit(
        inputs={'training': inputs}, 
        job_name=cnn_training_job_name,
        experiment_config={
            "ExperimentName": mnist_experiment.experiment_name, 
            "TrialName": cnn_trial.trial_name,
            "TrialComponentDisplayName": "Training",
        }
    )
# end of loop

Once all jobs are complete, I can run analytics. Let’s find out how we did.

Analytics with Amazon SageMaker Experiments
All information on an experiment can be easily exported to a Pandas DataFrame.

from sagemaker.analytics import ExperimentAnalytics
trial_component_analytics = ExperimentAnalytics(
    sagemaker_session=sess, 
    experiment_name=mnist_experiment.experiment_name
)
analytic_table = trial_component_analytics.dataframe()

If I want to drill down, I can specify additional parameters, e.g.:

trial_component_analytics = ExperimentAnalytics(
    sagemaker_session=sess, 
    experiment_name=mnist_experiment.experiment_name,
    sort_by="metrics.test:accuracy.max",
    sort_order="Descending",
    metric_names=['test:accuracy'],
    parameter_names=['hidden_channels', 'epochs', 'dropout', 'optimizer']
)
analytic_table = trial_component_analytics.dataframe()

This builds a DataFrame where trials are sorted by decreasing test accuracy, and showing only some of the hyperparameters for each trial.

for col in analytic_table.columns: 
    print(col) 

TrialComponentName
DisplayName
SourceArn
dropout
epochs
hidden_channels
optimizer
test:accuracy - Min
test:accuracy - Max
test:accuracy - Avg
test:accuracy - StdDev
test:accuracy - Last
test:accuracy - Count

From here on, your imagination is the limit. Pandas is the Swiss army knife of data analysis, and you’ll be able to compare trials and experiments in every possible way.

Last but not least, thanks to the integration with Amazon SageMaker Studio, you’ll be able to visualize all this information in real-time with predefined widgets. To learn more about Amazon SageMaker Studio, visit this blog post.

Now Available!
I just scratched the surface of what you can do with Amazon SageMaker Experiments, and I believe it will help you tame the wild horde of jobs that you have to deal with everyday.

The service is available today in all commercial AWS Regions where Amazon SageMaker is available.

Give it a try and please send us feedback, either in the AWS forum for Amazon SageMaker, or through your usual AWS contacts.

- Julien

AWS News Blog

Amazon SageMaker Experiments – Organize, Track And Compare Your Machine Learning Trainings

Resources

Follow