Posted On: Sep 21, 2022
Today, we’re pleased to announce that Amazon SageMaker Autopilot has added a new training mode that supports model ensembling powered by AutoGluon. For moderately large datasets (< 100MB), ensemble training mode builds machine learning (ML) models with high accuracy quickly - up to 8x faster than the current hyper parameter optimization (HPO) training mode with 250 trials. Amazon SageMaker Autopilot automatically builds, trains, and tunes the best ML models based on your data, while allowing you to maintain full control and visibility. The current HPO mode uses a combination of hyper parameter values to maximize the accuracy of a single model. However, in cases when a single model is unable to capture the complex characteristics of data, combining (or ensembling) the predictions from diverse models can significantly improve overall model accuracy.
Ensemble training mode within Amazon SageMaker Autopilot uses AutoGluon to train several base models and combines their predictions using model stacking. It supports a wide range of algorithms, including LightGBM, CatBoost, XGBoost, Random Forest, Extra Trees, Linear Models, and Neural Networks based on PyTorch and FastAI. To evaluate the performance improvements of ensemble mode over HPO mode, we used multiple OpenML benchmark datasets up to 100MB. Based on the results, ensemble training jobs on smaller data sets (less than 1 MB) saw an overall SageMaker Autopilot job runtime improvement of up to 8 times compared to HPO mode with 250 trials (from average 120 mins to 15 mins) and an improvement of 5.8 times when compared to HPO mode with 100 trials. Medium (1-10 MB) and large datasets (10-100 MB) saw 5 and 2.5 times runtime improvements against HPO 250 trials with an approximately 1.9% higher accuracy.
To get started, Create an SageMaker Autopilot experiment in SageMaker Studio console and select ‘Ensembling’ training mode or let SageMaker Autopilot infer the training mode automatically based on the dataset size. You can refer to createAutoMLJob API reference guide for updates to API, and upgrade to the latest version of SageMaker Studio to use the new ensemble training mode. For more information on this feature, see the developer guide and to learn more about SageMaker Autopilot, visit the product page.