I have primarily used dbt for data transformation. My team and I work with source systems from various clients, some using Snowflake, others using SQL Server, and some using their own legacy source systems. We extract and load all the data into our data lake on Azure. From there, we use dbt to transform that data. In technical terms, we have extracted and loaded the data into our data lake, and from there we are doing the transformation with dbt. We are creating different layers such as silver, bronze, and gold to build a medallion architecture with dbt.
dbt Platform
dbt LabsExternal reviews
External reviews are not included in the AWS star rating for the product.
Building medallion architecture has improved collaboration and streamlined data transformation
What is our primary use case?
What is most valuable?
From a developer point of view, I find the ease of development and the code to be the most useful capabilities of dbt. I use VS Code to run the dbt models, and since the end user is only concerned about their output and reports, the ease of development and the fact that it is free are significant advantages.
I assess the impact of dbt's version control system on team collaboration as great. I have used it extensively, especially when we had situations where the code broke, as we were able to roll back to earlier versions thanks to version control.
I find dbt's documentation site generator to be quite crisp and straightforward. It helps with project transparency and onboarding new team members because the documentation is excellent for addressing issues we face. I learned dbt concepts primarily using their website and their tutorials, which helped me significantly compared to other platforms such as YouTube and Udemy. The course content that dbt provides is free and excellent for anyone starting out.
What needs improvement?
dbt seems quite adequate currently, but if I needed to name a few areas for improvement, I would mention the migration of code to Git and GitHub, which sometimes fails and can be confusing for developers during handover. There are some glitches in the connection, but I am unsure if that is an issue from the dbt side or something else, so I cannot comment definitively.
For how long have I used the solution?
I have been working with dbt for approximately seven to eight months.
What do I think about the stability of the solution?
Regarding stability and reliability, I see the tool as quite good. In terms of use case, market presence, demand, learning, and performance, I believe dbt will continue to be in the market.
I would rate dbt's stability and reliability at a minimum of eight out of ten based on the limited experience I have. Comparing it to tools I have seen in the past, such as Informatica and Alteryx, dbt can easily match up to that rating, specifically for stability.
What do I think about the scalability of the solution?
I am not very certain how scalable dbt is from my experience, as I have had limited scope to work with it. I have not analyzed it deeply. I started as a developer and began with their free plan before moving to a paid plan, which was quite affordable at around one hundred or one hundred fifty dollars per month. We are currently focusing on report development and multi-tenant deployment, so we might consider scaling in the future.
How are customer service and support?
Earlier, we used technical support for dbt, but that was only valid for a month or fifteen days. We later moved to the paid version because I was working on the proof of concept of Qlik Sense and other tools, and we finalized dbt as well. Initially, I explored dbt for free for about ten days without trialing any further support.
So far, we have not interacted much with technical support because we usually get help from the community on their website. If you type your question, you will likely find that someone has already asked it, so we do not need to contact their support directly.
Which solution did I use previously and why did I switch?
I have not yet utilized dbt's testing framework.
How was the initial setup?
My experience with the initial setup and deployment of dbt involved using VS Code. I am not very confident here because I received some help from another data engineer to set it up on my machine. However, I have used VS Code in the past, and with some libraries, it was successfully done, but I am not entirely certain about every detail.
What about the implementation team?
I am both a customer and consultant for dbt because my company has bought the license, and as an experienced person, I work on a product for my company.
What's my experience with pricing, setup cost, and licensing?
The course content that dbt provides is free and excellent for anyone starting out.
Which other solutions did I evaluate?
I have not practically used a different solution for the same use cases, but I have been part of teams that used tools such as Alteryx, Informatica, and Talend, even though I did not work with them hands-on.
What other advice do I have?
From a developer point of view, I find the ease of development and the code to be the most useful capabilities of dbt. I use VS Code to run the dbt models, and since the end user is only concerned about their output and reports, the ease of development and the fact that it is free are significant advantages.
I assess the impact of dbt's version control system on team collaboration as great. I have used it extensively, especially when we had situations where the code broke, as we were able to roll back to earlier versions thanks to version control.
I find dbt's documentation site generator to be quite crisp and straightforward. It helps with project transparency and onboarding new team members because the documentation is excellent for addressing issues we face. I learned dbt concepts primarily using their website and their tutorials, which helped me significantly compared to other platforms such as YouTube and Udemy. The course content that dbt provides is free and excellent for anyone starting out.
dbt seems quite adequate currently, but if I needed to name a few areas for improvement, I would mention the migration of code to Git and GitHub, which sometimes fails and can be confusing for developers during handover. There are some glitches in the connection, but I am unsure if that is an issue from the dbt side or something else, so I cannot comment definitively.
I would rate my overall experience with dbt at eight out of ten.
A Developer Friendly Transformation Tool
Reliable transformation practices at scale
Incremental data models have cut full refresh time and support trusted executive reporting
What is our primary use case?
I am currently working with dbt and Snowflake together. We use dbt for data transformation purposes. We obtain the data and store the raw data directly into Snowflake, then perform all transformations using dbt to prepare the data for reporting purposes.
We use dbt's modular SQL models. In dbt, we do not use full refresh or full data refresh. We have incremental strategies in place that only compute or transform incremental data, which operates in a CDC architecture. This approach is very fast, and we use it on a daily basis. We have scheduled all our dbt models using Airflow to run according to the scheduled time.
We use dbt's testing framework and the inbuilt functionality of dbt testing. For example, we use dbt's built-in tests to identify not null values and determine how many not null columns and values exist in each column. Beyond the built-in functionality, we have written custom SQL scripts to create external test cases on our models.
We ensure that incorrect or incomplete data does not go into the reporting layer because it can impact the business. We always perform dbt tests on our landing or raw data to ensure the correctness and completeness of the data before loading it into the final reporting layer. These reports are used by higher management, so we ensure that incorrect data is not published into the reporting layer for the Power BI reports.
We use dbt's documentation site generator. In dbt, we have YML file functionality, which can be used for creating documentation for each model. Whenever we make modifications to a model, we always update the YML file so we can track the history of how frequently we change the model. When new team members join, they can refer to this documentation to understand the data lineage and the data transformation strategy of the project.
What is most valuable?
dbt is very fast compared to the traditional tools. Previously, I worked on SSIS, which is provided by Microsoft, and data transformation took a considerable amount of time when dealing with large amounts of data. Since dbt works on the ELT architecture rather than the ETL architecture, it is much faster than traditional data transformation tools.
Previously, we were using SSIS packages, which were very slow. Recently we migrated all our SSIS packages to dbt models. After the migration, we moved the data from SQL Server to Snowflake. Previously, our data pipeline took around two days to load complete data when performing a full refresh. Since we migrated from SSIS to dbt model architecture, it takes around four hours only to complete a full refresh. The client is now happy because our downtime was drastically reduced when we perform a complete refresh of the data.
What needs improvement?
I am not very familiar with dbt's version control system.
I cannot identify any improvements in dbt because I am still exploring more functionality. I have been working with dbt for only three years, so I am exploring more functionalities and cannot see any limitations or improvement areas at this time.
In the past, I used the seed functionality, which is used to load raw files, individual files, or static files into the database. Going forward, if dbt can improve or implement more features on the seed side, that would be beneficial, especially when we have large files available that take time to load the data into Snowflake database.
For how long have I used the solution?
I have been working with dbt for the last three years.
What do I think about the stability of the solution?
I have not experienced any crashes, performance issues, or anything regarding stability and reliability.
What do I think about the scalability of the solution?
I find dbt very scalable.
How are customer service and support?
The dbt support team is very responsive. Whenever we have any issues on the dbt side, we always reach out to them. We did not face any challenges in the initial setup. I would rate the technical support a nine out of ten.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Previously, we were using SSIS packages, which were very slow. Recently we migrated all our SSIS packages to dbt models. After the migration, we moved the data from SQL Server to Snowflake. Previously, our data pipeline took around two days to load complete data when performing a full refresh. Since we migrated from SSIS to dbt model architecture, it takes around four hours only to complete a full refresh. The client is now happy because our downtime was drastically reduced when we perform a complete refresh of the data.
The main reason we decided to switch to dbt is performance. As mentioned earlier, every quarter we perform a full refresh, and that refresh took considerable time on SQL Server. Since we had to migrate because our data is very large and growing daily, we adopted dbt because Snowflake is very fast. In Snowflake, the storage layer and the computation layer are separate, which is not present in the SQL Server traditional database. That is why we moved from SQL Server to Snowflake and from SSIS to dbt.
How was the initial setup?
We evaluated Databricks as well, but ultimately the client wanted to adopt Snowflake and dbt technologies only.
What about the implementation team?
We took help from Snowflake directly, the Snowflake company, for the Snowflake side. The dbt side is maintained or set up by our infrastructure team.
What was our ROI?
Since we migrated from SSIS to dbt model architecture, it takes around four hours only to complete a full refresh. The client is now happy because our downtime was drastically reduced when we perform a complete refresh of the data.
What's my experience with pricing, setup cost, and licensing?
The pricing, setup cost, and licensing cost are managed by our infrastructure teams. As data engineers, we are not familiar with these details.
I need to check with my infrastructure team on whether we purchased dbt through the AWS Marketplace or directly from the local vendor.
Which other solutions did I evaluate?
Since dbt has a license cost, if a company is small and does not have much budget, they can explore other tools because there are other tools that provide the same functionality at a lower cost. If an organization is small, they can explore other products as well.
What other advice do I have?
I am currently working with Power BI, Tableau, Python, Databricks, Snowflake, and PySpark in the current project. I would rate my overall experience with dbt a nine out of ten.