Posted On: Nov 12, 2021
Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. As a part of building models, SageMaker Autopilot automatically cleans, prepares and preprocesses data to optimize performance of machine learning models. Starting today, Autopilot generates several additional data insights that can help you improve the quality of data and thereby build higher quality models that better meet your business needs.
Some of the most important data insights now generated include prediction power, correlation between features, target column distribution, duplicate rows, anomalous rows, imbalanced class distribution, cardinality for multi-class classification target response. These insights are presented in the Data exploration notebook generated by Autopilot and are available to you early on before the training process is underway. Wherever possible these insights, also include recommendations to fix any detected data quality issues before attempting to automatically pre-process and curate the data.
The data insights and recommendations are now generated in all AWS regions where SageMaker Autopilot is currently supported. To learn more see data insights . To get started with SageMaker Autopilot, see the Getting Started or access Autopilot within SageMaker Studio.