Posted On: Dec 14, 2022
Today, we are excited to announce the release of automatically generated feature-level visualizations in Amazon SageMaker Data Wrangler. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. Data Wrangler offers a variety of configurable visualization options from general data visualizations such as histogram, scatter plot or table summary to advanced visualizations such as anomaly detection or seasonable-trend decomposition for time series data, data leakage and feature bias for machine learning needs.
Starting today, SageMaker Data Wrangler automatically generates visualizations for each feature in the dataset. You’ll see these visualizations at the top of each column in the dataset after your dataset is imported. This automation further cuts the undifferentiated heavy lifting for data scientists by automatically generating insights related to data distributions and data quality at feature level.
With the automatically generated visualizations, you can immediately get insights related to data distributions and data types without writing a single line of code. The insights help you easily detect data quality issues such as outliers, missing or invalid values, etc, for each column in the dataset. Further, you can also hover on the visualizations to see detailed statistics such as count and percentage.
This feature is generally available and automatically activated in all AWS Regions that Data Wrangler currently supports at no additional charge. To learn more, refer to the AWS News Blog and SageMaker Data Wrangler product documentation.