AWS Glue Data Quality now supports pre-processing queries

Posted on: Nov 25, 2025

Today, AWS announces the general availability of preprocessing queries for AWS Glue Data Quality, enabling you to transform your data before running data quality checks through AWS Glue Data Catalog APIs. This feature allows you to create derived columns, filter data based on specific conditions, perform calculations, and validate relationships between
columns directly within your data quality evaluation process.

Preprocessing queries provide enhanced flexibility for complex data quality scenarios that require data transformation before validation. You can create derived metrics like calculating total fees from tax and shipping columns, limiting number of columns that are considered for data quality recommendations or filter datasets to focus quality checks on specific data subsets. This capability eliminates the need for separate data pre-processing steps, streamlining your data quality workflows.

AWS Glue Data Quality preprocessing queries are available through AWS Glue Data Catalog APIs - start-data-quality-rule-recommendation-run and start-data-quality-ruleset-evaluation-run, in all commercial AWS Regions where AWS Glue Data Quality is available. To learn more about preprocessing queries, see the Glue Data Quality documentation