AWS Machine Learning Blog

Building, training, and deploying fastai models with Amazon SageMaker

April 2023: Please refer to the fastai course material for updated content

Deep learning is changing the world. However, much of the foundation work, such as building containers, can slow you down. This post describes how you can build, train, and deploy fastai models into Amazon SageMaker training and hosting by using the Amazon SageMaker Python SDK and a PyTorch base image. This helps you avoid the extra steps of building your own container.

Amazon SageMaker is a fully managed machine learning (ML) service. It allows data scientists and developers to quickly build, train, and deploy ML models into production at a low cost. The Amazon SageMaker Python SDK is an open source library for training and hosting ML models. It is easy to use and compatible with popular deep learning frameworks such as TensorFlow, MXNet, PyTorch, and Chainer. AWS recently added the fastai library to the base PyTorch container. This allows you to take advantage of the fastai deep learning model in Amazon SageMaker, instead of providing your own container.

Using modern best practices, the fastai library helps create advanced deep learning models with just a few lines of code. This includes domains like computer vision, natural language processing, structured data, or collaborative filtering.

The organization fast.ai develops and maintains the fastai library, which works with the popular deep learning open source PyTorch package. The organization recently placed well in the DAWNBench Competition. It also provides popular online courses to train developers in using their models, even those with no background or experience in ML.

Set up the environment

To set up a new Amazon SageMaker notebook instance with the fastai library installed, choose Launch Stack:

This AWS CloudFormation template provisions all the AWS resources that you need for this walkthrough. To create the stack, select I acknowledge that AWS CloudFormation might create IAM resources and choose Create stack.

When the stack has been successfully completed, AWS CloudFormation announces the stack’s CREATE_COMPLETE state.

In the Amazon SageMaker console, choose Notebook instances.

You can see a notebook instance labeled fastai, created from the AWS CloudFormation stack. Select Open Juypter to launch a managed Juypter environment. You can write your own Python code here to build the model.

For this post, we provide the IPython notebook under SageMaker Examples, Advanced Functionality, fastai_lesson1_sagemaker_example.ipynb. Create a copy of this notebook to build, train, and deploy the model into Amazon SageMaker.

For demonstration purpose, we use the Oxford IIIT Pet dataset to identify dog breeds.

Because the fastai library builds onto PyTorch, you can use the Amazon SageMaker Python SDK to train your PyTorch model. Training a PyTorch model is a two-step process:

  1. Create a PyTorch training script.
  2. Create a training job in Amazon SageMaker.

Create a PyTorch training script

Define a _train() function where you write your training code. A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir to host later. For a more detailed example, see the pets.py file in the fastai_oxford_pets GitHub repo.

Here is the basic training code:

print('Creating DataBunch object')
    data = ImageDataBunch.from_name_re(path_img, fnames, pat, 
                                ds_tfms=get_transforms(), 
                                size=args.image_size, 
                                bs=args.batch_size).normalize(imagenet_stats)

    # create the CNN model
    print('Create CNN model from model zoo')
    print(f'Model architecture is {args.model_arch}')
    arch = getattr(models, args.model_arch)
    print("Creating pretrained conv net")    
    learn = create_cnn(data, arch, metrics=error_rate)
    print('Fit for 4 cycles')
    learn.fit_one_cycle(4)    
    learn.unfreeze()
    print('Unfreeze and fit for another 2 cycles')
    learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
    print('Finished Training')

In this example, combine the training and inference code in a single script. Make sure to put your training code in a main guard (if __name__==’__main__’:), so that Amazon SageMaker doesn’t inadvertently run it at the wrong point in execution. You can also place pass hyperparameters in your training script, which you can retrieve with an argparse.ArgumentParser. The following code snippet features passing hyperparameters:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    parser.add_argument('--workers', type=int, default=2, metavar='W',
                        help='number of data loading workers (default: 2)')
    parser.add_argument('--epochs', type=int, default=2, metavar='E',
                        help='number of total epochs to run (default: 2)')
    parser.add_argument('--batch_size', type=int, default=64, metavar='BS',
                        help='batch size (default: 4)')
    parser.add_argument('--lr', type=float, default=0.001, metavar='LR',
                        help='initial learning rate (default: 0.001)')

Create a training job in Amazon SageMaker

You can train and host PyTorch models on Amazon SageMaker with the help of the PyTorch base image and PyTorch SageMaker Estimators, which handle end-to-end training and hosting.

The entry_point argument takes the training script that contains the _train() function. The train_instance_type argument takes the instance type where the model trains. If you set this parameter as ‘local’, the model trains on Amazon SageMaker notebook instance as a Docker container.

To use local mode training, install Docker Compose, as well as NVIDIA-Docker if training with a GPU. To train the model on Amazon SageMaker, pass the value of a GPU-optimized instance type, such as ml.p3.2xlarge or ml.p2.xlarge.

Amazon SageMaker bases the instance lifetime on your training time. When it completes training, Amazon SageMaker uploads the model artifact to Amazon S3 and terminates the instance.

The training script contains the definition of the model and the serving functions that are required to reconstruct the model back at the endpoint. This doesn’t limit when the code was written or the model was created, as long as the models and code match versioning.

pets_estimator = PyTorch(entry_point='source/pets.py',
                         base_job_name='fastai-pets',
                         role=role,
                         framework_version='1.0.0',
                         train_instance_count=1,
                         train_instance_type='ml.p3.2xlarge')

pets_estimator.fit(data_location)

Deploy the model in Amazon SageMaker

Using the Amazon SageMaker Python SDK, create a PyTorchModel object from the PyTorch estimator. The deploy () method on the model object creates an Amazon SageMaker endpoint, which serves prediction requests in real time.

The Amazon SageMaker endpoint runs an Amazon SageMaker-provided PyTorch model server. It hosts the model that your training script produces after you call fit. This was the model you saved to model_dir.

pets_model=PyTorchModel(model_data=pets_estimator.model_data,
                        name=pets_estimator._current_job_name,
                        role=role,
                        framework_version=pets_estimator.framework_version,
                        entry_point=pets_estimator.entry_point,
                        predictor_cls=ImagePredictor)

pets_predictor = pets_model.deploy(initial_instance_count=1, 
                                     instance_type='ml.c5.large')

Create the inference script

To do the inference, define four functions in your inference script. In this example, we use the same script for training and inference. You define these functions along with a training function called _train():

  • model_fn(model_dir)—Loads the model into the Amazon SageMaker PyTorch model server.
  • input_fn(request_body,request_content_type)—Deserializes the data into an object for predictions.
  • predict_fn(input_object,model)—On deserialized data performs inference against loaded model.
  • output_fn(prediction,content_type)—Serializes predictions according to the response content type.

For a more detailed example, see the pets.py file in the fastai_oxford_pets GitHub repo.

Run your inference using predict () method after the model deploys. This method returns the result of inference against your model.

response = pets_predictor.predict(img_bytes)

Clean up

Make sure that you stop the notebook instance and delete the Amazon SageMaker endpoint to prevent any additional charges.

Conclusion

In this post, we showed you how to use Amazon SageMaker to train and host fastai models using the PyTorch base image and Amazon SageMaker Python SDK. This approach eliminates the bring-your-own-container approach, which helps you to build, train, and deploy a fastai model easily and quickly in Amazon SageMaker.

To get started, login to the AWS Management Console and deploy the AWS CloudFormation template. If you are a new user, you will need to create an account. Also, follow this GitHub link for detail information on how to build the fastai model with Amazon SageMaker.


About the authors:

Amit Mukherjee is a partner solutions architect with AWS. He provides architectural guidance to help partners achieve success in the cloud. He has interest in the deep learning space. In his spare time, he enjoys spending quality time with his family.

Matt McClean is a partner solutions architect for AWS. He is a specialist in machine learning and works with technology partners in the EMEA Region, providing guidance on developing solutions using AWS technologies. In his spare time, he is a passionate skier and cyclist.