Amazon SageMaker Unified Studio now supports data quality rule authoring and evaluation
Amazon SageMaker Unified Studio now supports data quality rule authoring and evaluation, powered by AWS Glue Data Quality. Data engineers, analysts, and data scientists can define data quality rules, run ruleset evaluations, and view results directly within SageMaker Unified Studio for both data at rest in catalog tables and data in transit within Visual ETL jobs. This helps you catch data quality issues before bad data enters your data lakes or affects downstream analytics and machine learning workloads.
With this launch, you can author rules using the same Data Quality Definition Language (DQDL) used in AWS Glue Data Quality and run evaluations directly in SageMaker Unified Studio across two workflows. For data at rest, a dedicated Data Quality tab on catalog assets provides rule authoring, on-demand or scheduled evaluations, and detailed per-rule pass/fail results. For data in transit, you can add an Evaluate Data Quality transform to any Visual ETL job, and review data quality results as part of the run details. You can create rulesets that check for completeness, uniqueness, freshness, accuracy, and other data quality dimensions.
This feature is available in all AWS Regions where Amazon SageMaker Unified Studio is available, in both AWS IAM Identity Center-based and IAM-based domains. To learn more, visit the Amazon SageMaker Unified Studio documentation.