My main use case is that we deployed Sifflet to solve a critical lack of visibility into the data health of a retail client's AWS-based data lake: S3, Glue, Redshift. The implementation focused on Sifflet's ML-driven anomaly detection to monitor over 1,500 tables and 10 million hourly records. By integrating via AWS Marketplace, we moved from manual SQL validation to automated monitoring of metadata and query logs. This allowed us to detect silent failures, such as partial loading or subtle schema drift, that were previously invisible to the engineering team.
Sifflet
SiffletExternal reviews
External reviews are not included in the AWS star rating for the product.
Automated data monitoring has transformed visibility and now prevents silent failures in our lake
What is our primary use case?
What is most valuable?
The end-to-end data lineage had the greatest impact for us. It provided an automated map correlating upstream AWS Glue job to downstream Redshift table and Tableau reports. This was vital for instant root cause analysis because we could trace a dashboard error back to its exact point of failure in the pipeline in seconds, rather than hours.
The standout feature that Sifflet offers is definitely the full-stack data lineage. In a complex AWS environment like ours, it is not enough to know that a table is broken, but you need to know where it broke and what it affects. Sifflet automatically maps the data flow from the ingestion layer in S3 and Glue, through the transformation in Redshift, all the way to the final BI dashboards. This allowed us to perform instant root cause analysis. If a report is wrong, we can trace it back to the exact source or transformation step in seconds. It completely eliminated the hours spent on manual SQL debugging and gives the team total control over the data lifecycle.
Sifflet impacted positively my organization because it established a certified data standard for business stakeholders and also avoided a lot of incidents and improved the governance of the data. Incident prevention is significant, as 80% of anomalies are now resolved before they impact executive reporting. Additionally, we achieved real-time visibility into data freshness and schema evolution across the entire lake. It is all automated now.
What needs improvement?
Sifflet can be improved in terms of premium investment. High entry cost requires a clear ROI based on cost of bad data. Additionally, alert tuning is an area for improvement because initial ML sensitivity requires expert calibration to prevent alert fatigue.
For how long have I used the solution?
I have been using Sifflet since 2023.
What other advice do I have?
Sifflet transformed our workflow from reactive to proactive. It eliminated the delay between data failure and its detection, catching schema drift and volume anomalies at the ingestion layer. By surfacing these issues before they reached the business dashboard, we effectively eliminated the data surprises and reduced manual forensic auditing by 50-60%.
My main recommendation for anyone adopting Sifflet is to treat it as a strategic data trust investment, rather than just a technical tool. To succeed, you should leverage the AWS Marketplace to bypass procurement delay and, most importantly, dedicate the first few weeks to fine-tuning alerts on your most critical data sets to prevent alert fatigue and allow the machine learning models to stabilize before scaling the monitoring across your entire enterprise infrastructure. I would rate this product a 9 overall.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
From traditional data quality to agile data oservability
Ease of use + ease of Integration + ease of monitor implementation.
raise alerts when data pipelines fail to execute with success.
track data freshness and implement data quality rules.
A friendy user interface that could become more friendly with some improvements
Capacity to add multiple tag values on any DQ monitor rules to facilitate filtering criteria based on those tags values, asset, severity values..
Capacity to use both search bar criteria (status of last DQ moniror runs combined with some predefined attributes such as severity, last run date..and free text to type to search for Monitor names).
Capacity to pin any DQ monitor or Asset to get a bookmark access from Dashboard pane
Capacity to get for each incidents the detailed list of compromised Dashboards (Power BI reports in our case)
- Capacity to expand in one click all assets linked to initial targeted asset in order to get a full picture of upstream and downstream linked assets.
- Capacity to view for each existing DQ monitor type (ReferentialIntegrity, DuplicatePercentage..) corresponding consolidated number of incidents present on targeted asset and ideally from filter pane possibility to refine incident number per type of monitor run we want to highlight on targeted asset and also possibility to refine each consolidated DQ monitor incident type number per severity level.
- On Incident module possibility to group into one incident multiple distinct DQ monitor alerts that are concerning same asset but on distinct columns for instance but applying to one common dimension value (country for instance) in order to mutualize all of these incidents into one unique ticketing creation process and root cause analysis to address to asset owner.
- Possibility to put on hold or snooze mode recurring DQ monitor alert on same asset and same grouping dimension value that is repeating over and over again on a daily basis if error threshold value is quite identical from one day to another.
It provides also some data cataloging module to provide some semantic and business logic to our existing data asset.
Useful in spotting problems and setting multiple monitors
So far, after a few days of usage, I have spotted a few problems (for instance, invalid regex) that were under the radar.
- Data quality (assuring data conformity and compliance with business rules)
- Data observability (make sure we process consistent volume of data daily for our import/export flows)
Benefits (so far):
- Spotting data problems (high number of null values, low volume of processed/ingested data)
Sifflet
Using Sifflet as Analytics Engineer after several month
There are good integrations to work with different data stacks.
The tool is responsive.
There are wide configurations for monitors and what data quality checks we want to keep track of.
The ability of the company to iterate quite rapidly to implement or improve features.
I do like the slack integration however I'm waiting for big improvements (templating and shape, to get more dense/lighter messages) on how the alerts are sent and hoping for auto-resolve.
Configuration of these monitors is relatively straightforward, through an interface that non-developers can understand.
The results are interactive and graphically explicit enough to give a better picture of the problem encountered, and can be easily shared with other users.
Intuitive data observability platform suitable across a business
- Very quick to implement new requested features
- Works seamlessly with a variety of different technologies
- Quick support via Slack with issues remedied quickly
Very helpful features and proactive team
For now, we are very satisfied by the "monitors" feature of Sifflet : it is easy to use, the integration of all our tables was very fast and the implementation or modification of monitors is quite clear (and it keeps getting clearer with the regular updates of the tool).
We will soon start using the other available features, such as the Data Catalog and Business Glossary, that also seem well-thought and easily actionable.
Allow teams to track their key indicators thanks to Slack alerts (already partly implemented)
Allow everyone in the company to access to a clean Data Catalog (future)
Very useful tool for Data Management
Issues are identified sooner from an Engineers perspective, we can proactively catch them and avoid awkward conversations with Stake Holders.
The Customer Support is another massive benefit of working with this company, they offer great indibidualistic approach, quick responses/support. And are always looking to improve their product based on feedback. It's refershing after many companies do the exact opposite.