Posted On: May 28, 2021

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. Starting today, Autopilot performs cross validation on input datasets under 50,000 rows for all problem types - regression, binary classification and multi class classification. With cross validation, you benefit from increased robustness to undesired splits between training and validation data, resulting in improved model quality. Depending on dataset and problem type, you may see improved model quality by up to 35%.

Autopilot automatically splits your input data intro training and validation sets. With this release, Autopilot also uses k-fold cross-validation method and performs inference with the ensemble of cross validation models from the trial with the best validation metric. Autopilot ensures that each training and validation fold has equal representation of each class to help improve precision and build the best model with available data. You can view the final validation metric for each model in the output of your Autopilot experiment before choosing to deploy the model. Additionally, the detailed cross validation updates including the training and validation metrics from each fold are available in Amazon CloudWatch.

Automatic cross validation is now is available in all AWS regions where SageMaker Autopilot is currently supported. To get started, review our documentation or access Amazon SageMaker Studio to create a new Autopilot experiment.