In my current project, I am not using Delpha Data Quality, but in a previous project, I used it for approximately 1.5 years to check for anomalies and address data quality issues in our data.
My main use case for Delpha Data Quality revolved around checking data quality issues as we fetched data from multiple sources and, after all transformations, loaded the data into the Snowflake database. Before loading the data into the final application, we checked for data quality issues such as not null conditions, uniqueness, completeness, and correctness. To check all anomalies and data qualities in the source data, we used the tool.
In my previous project, we were fetching data from multiple sources like CSV, Excel, and SQL Server. First, we loaded the data into Snowflake staging tables, which contained raw data only. Before moving the data from the staging layer to the bronze layer, we used Delpha Data Quality to perform data quality checks. For example, in customer data where the customer ID serves as the primary key, we validated whether it had a valid customer ID or if it contained a null customer ID. This example represents just one aspect of our data quality checks across many columns in customer data and sales data. Delpha Data Quality performs these checks and generates a score, and we defined a threshold where a score greater than 90% indicates good data quality. Consequently, if the score exceeds 90%, we move the data from the staging area to the bronze layer. If the score is lower, we stop the process and fix the issues in either the staging or source data before proceeding.
Setting up and interpreting the data scores in Delpha Data Quality is straightforward due to its inbuilt functionality. When we define and evaluate any table, we specify the tests to perform on various columns, such as not null or unique checks. Delpha completely automates the scoring process, which generates the final score based on these parameters.