Posted On: Nov 12, 2021
Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. As a part of building models, SageMaker Autopilot automatically cleans, prepares and preprocesses data to optimize performance of machine learning models. Starting today, Autopilot generates several additional data insights that can help you improve the quality of data and thereby build higher quality models that better meet your business needs.
Some of the most important data insights now generated include prediction power, correlation between features, target column distribution, duplicate rows, anomalous rows, imbalanced class distribution, cardinality for multi-class classification target response. These insights are presented in the Data exploration notebook generated by Autopilot and are available to you early on before the training process is underway. Wherever possible these insights, also include recommendations to fix any detected data quality issues before attempting to automatically pre-process and curate the data.