The data hub I've been looking for
Quilt allows my team at DSP Concepts to focus on solving customer problems instead of data versioning problems. It is well established at this point that data quality is the foundation of serious and well performing data teams. Organization is key to building and retaining value in high quality data over time. Quilt solves this problem head on giving you a reliable single source of data truth with a suite of features to inspect the data and its documentation easily. This in turn allows each member of my team to find data in a self serve fashion without having to rely on institutional knowledge that only one team member, if any, might have depending on how long ago the data was collected, cleaned, processed and labelled.
- Leave a Comment |
- Mark review as helpful
A necessity for every data-driven company
Quilt is an indispensable tool for anyone that wants to properly manage their data in AWS. A key element to Quilt is that the programmatic interface is intuitive and flexible, offering multiple ways to integrate it into the data analysis workflow (python, R, command line). As only a handful of Quilt functions provide a majority of core functionality, there is not an overwhelming learning curve to get started, but many additional features improve the usability (e.g., reading data directly into memory, single file installation). Beyond the programmatic functionality, the Quilt web-based interface is extremely useful for browsing files and packages and switching between the different versions. I would highly recommend integrating Quilt into your data science workflow.
Missing tool in Data Science pipeline
Quilt simplified our flow in data maintenance and versioning. Now, it became extremely easy to keep track of changes in a dataset and refer in a reproducible manner a specific revision without worrying if someone overwrites the data. We have it already integrated into our flow, so the dataset updates interfere with model building no more.
Quilt team provides us with ongoing support. Bugs happen in every software, but in the case of small bug we found, we received a fixup in no time, so we could smoothly continue our work.
We spotted some drawbacks in Quilt Teams some time ago. These are mostly resolved here, and remaining "wishes" are on the roadmap. It's really nice that devs listen to our needs!
What we love most about Quilt, is the caching feature. We reduced data transfer costs while keeping low complexity of scripts.
Overall grade is 5/5 since that tool was missing heavily in the flow we had for Machine Learning. At this moment we use it also for versioning models (especially that we generate models in a bunch of formats each time) and Jupyter Notebooks (for which Git isn't the best option)