AWS Big Data Blog

Calvin Wang

Author: Calvin Wang

Calvin is a Data Scientist at AWS AI/ML. He holds a B.S. in Computer Science from UC Santa Barbara and loves using machine learning to build cool stuff.

Let’s look at PyDeequ’s main components, and how they relate to Deequ (shown in the following diagram)

Testing data quality at scale with PyDeequ

April 2024: This post was reviewed for accuracy. Additionally, the output of PyDeequ can be integrated with Amazon DataZone. Read Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions  for more details. March 2023: You can now use AWS Glue Data Quality to measure and manage the quality of your data. […]