Databricks Data Intelligence Platform
Databricks, Inc.External reviews
768 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Great Governance and UI—Databricks Fits Our ETL Workflow Perfectly
What do you like best about the product?
I like the overall environment, especially the governance features and the way the UI is handled. I primarily use Databricks as my ETL platform, and it fits well with how I work. The SDP job management governance and lineage capabilities are also helpful.
What do you dislike about the product?
Sometimes there are glitches in the UI. For example, if I cancel something, it takes a bit longer for that change to be reflected in the UI.
What problems is the product solving and how is that benefiting you?
It addresses centralized database and lakehouse management through Unity Catalog. It has also helped solve governance needs and improved lineage tracking.
Versatile Platform, But Needs Faster Analysis
What do you like best about the product?
I like Databricks because it allows me to perform multiple tasks on a single platform, which isn't possible with some other cloud service platforms. This functionality is particularly useful for managing database tasks efficiently and is a capability I can't find in other platforms.
What do you dislike about the product?
I have faced many times when there's a wrong thing in Databricks, and it takes some time to analyze. It could be better if they gave faster and more accurate answers.
What problems is the product solving and how is that benefiting you?
I use Databricks for migration projects. It allows me to perform multiple tasks on a single platform, which I can't do on other cloud platforms.
Comprehensive Platform with Room for Improvement
What do you like best about the product?
I find Databricks to be a one-stop solution because it incorporates various functionalities such as orchestrating pipelines. It also has an inbuilt AI called Genie, which helps in building jobs, and other AI-related tasks. I appreciate that compared to other providers like AWS and Azure, Databricks offers specific features that they lack, allowing me to use the database simply and access everything in one place. The initial setup was quite easy because I could use a single stop to directly implement and update tables using the data lakehouse, which is easier compared to others
What do you dislike about the product?
I think Databricks could improve on the orchestration part. Even though it has orchestration capabilities for pipelines and jobs, it misses the ease of access that something like Airflow provides, which is specifically designed for orchestration. It would be helpful if Databricks adopted a pattern similar to Airflow's for better orchestration and job linking. I also feel the Genie part could be improved. While the Genie works well, the output duration can be lengthy, usually taking more than five to ten minutes to perform specific tasks. So, I would like to see improvements in that area as well.
What problems is the product solving and how is that benefiting you?
I use Databricks as a one-stop solution for various tasks. It orchestrates pipelines and utilizes an inbuilt AI, making it more feature-rich than alternatives like AWS or Azure. This allows me to streamline workflows without relying on multiple providers.
Unified Data Engineering, Analytics, and ML on a Scalable Databricks Platform
What do you like best about the product?
What I like most about Databricks is how it brings data engineering, analytics, and machine learning together in one platform. It streamlines the entire data pipeline—from ingestion and transformation through to serving—so I don’t have to rely on multiple separate tools to get end-to-end workflows done.
Its integration with Spark and Delta Lake is another big plus, making it both scalable and dependable when working with large datasets.
Its integration with Spark and Delta Lake is another big plus, making it both scalable and dependable when working with large datasets.
What do you dislike about the product?
One challenge with Databricks is cost management and visibility. Since compute is abstracted through clusters and jobs, it can sometimes be difficult to track and optimize costs without additional monitoring or governance in place.
What problems is the product solving and how is that benefiting you?
Solves the problem of fragmented data ecosystems, where data engineering, analytics, and machine learning are handled in separate tools.
Seamless Integration, Needs Performance Tuning
What do you like best about the product?
I think the most useful part of Databricks is its single architecture where you can have everything, like a database and dashboard, all in one. Compared to other providers like Azure or AWS, where I would need multiple services, Databricks offers everything in a single service. This simplifies my work because I don't have to manage integration or network level details across different services. The convenience of having everything inside Databricks means I can avoid multiple network updates when connecting with tools like Power BI, which makes it a standout feature for me. Additionally, the initial setup after migrating from Snowflake was pretty easy since Databricks allows us to manage access and security within a single service.
What do you dislike about the product?
Yeah, so one thing that needs to be updated is Genie code. If I look at it, Genie code is helpful for generating code but when it does in the back end, it consumes much memory. For example, if I'm opening Databricks in Chrome, it's gonna take at least one or two GB memory at the back end, and that takes a lot of time to generate the response as well. So if we could reduce that, it would be great. Also, on the pipeline stuff, for example, if you take Airflow, Airflow is specifically designed for our position. We use Airflow and I can see, for example, if I have thousands of jobs, I can see each and every job and what's happening. But with Databricks, it's a tough job for me to see the success and failures and to manage the charts. We have multiple options to monitor it in Databricks, but it's hard when compared with Airflow.
What problems is the product solving and how is that benefiting you?
Databricks helps us consolidate data from different locations into a single database, simplifying master data management and making data access easier with integrated dashboards, improving our AI-powered sales and prospect tracking.
Genie Code and Inline Assistant Dramatically Boosted My Debugging Productivity
What do you like best about the product?
Genie code and the inline Assistant were the most helpful tools for me on my project. They helped me debug a 2k-line codebase and clearly explained why I wasn’t getting accurate data. It also provided a query to run in my source system (SQLMI). By running the discrepancy script in parallel on the source and target, I was able to debug the entire code much faster and improve my productivity. Overall, it cut my work time from about 8 hours down to around 1 hour.
What do you dislike about the product?
In Delta Sharing, there isn’t a catalog-level SELECT permission, and I sometimes think having that would be helpful. Also, when I use the Genie code inside a VM, it can make the website unresponsive at times. These are areas that could be improved.
What problems is the product solving and how is that benefiting you?
In one of our claims-processing migration projects, the client needed near real-time data availability for downstream applications. Previously, the architecture used Amazon Redshift as the data warehouse, with Jasper and Sisense consuming the data for reporting and analytics. However, that setup didn’t support real-time or near real-time streaming efficiently, which led to delays in data availability for downstream systems.
After migrating the platform to Databricks, we were able to substantially improve the data pipeline architecture. We implemented streaming along with optimized ETL pipelines, reducing the data refresh cycle to about 30 minutes. We also created a dedicated view that retains data from the previous run, so downstream systems always have a consistent dataset available while the next pipeline execution is still in progress.
Before, we struggled with delayed refresh cycles and a limited ability to meet near real-time data needs in our Redshift-based architecture. After moving to Databricks, we enabled faster ETL processing and improved near real-time data availability.
As a result, we reduced ETL refresh time to roughly 30 minutes and enabled near real-time access for downstream tools like Jasper and Sisense. Reliability also improved because the stable view continues to serve the previous run’s data during pipeline updates. Finally, the overall architecture became simpler by consolidating processing and analytics capabilities within Databricks.
Overall, Databricks helped us build a more scalable and efficient near real-time data processing platform, significantly improving the timeliness and reliability of analytics for the claims-processing workflow.
After migrating the platform to Databricks, we were able to substantially improve the data pipeline architecture. We implemented streaming along with optimized ETL pipelines, reducing the data refresh cycle to about 30 minutes. We also created a dedicated view that retains data from the previous run, so downstream systems always have a consistent dataset available while the next pipeline execution is still in progress.
Before, we struggled with delayed refresh cycles and a limited ability to meet near real-time data needs in our Redshift-based architecture. After moving to Databricks, we enabled faster ETL processing and improved near real-time data availability.
As a result, we reduced ETL refresh time to roughly 30 minutes and enabled near real-time access for downstream tools like Jasper and Sisense. Reliability also improved because the stable view continues to serve the previous run’s data during pipeline updates. Finally, the overall architecture became simpler by consolidating processing and analytics capabilities within Databricks.
Overall, Databricks helped us build a more scalable and efficient near real-time data processing platform, significantly improving the timeliness and reliability of analytics for the claims-processing workflow.
A Unified Platform for Scalable Data & AI Workloads
What do you like best about the product?
Databricks is great because it brings everything you need for data and AI into one place.
Instead of switching between different tools for data engineering, data cleaning, analytics, and machine learning, you can do it all in a single environment. That makes life a lot easier.
Instead of switching between different tools for data engineering, data cleaning, analytics, and machine learning, you can do it all in a single environment. That makes life a lot easier.
What do you dislike about the product?
Databricks is not beginner-friendly. You often need solid data engineering skills to use it effectively.
Reviews point out that while Databricks is extremely capable, it’s “a high‑end workshop” that requires expertise and is not easy for less technical teams.Databricks uses cost units (DBUs), which many people find difficult to estimate and manage.
Even expert reviews highlight that its pricing is famously complicated and can hide unexpected costs.
Reviews point out that while Databricks is extremely capable, it’s “a high‑end workshop” that requires expertise and is not easy for less technical teams.Databricks uses cost units (DBUs), which many people find difficult to estimate and manage.
Even expert reviews highlight that its pricing is famously complicated and can hide unexpected costs.
What problems is the product solving and how is that benefiting you?
Databricks uses the Lakehouse architecture to combine the strengths of data lakes and data warehouses into one unified platform. This means structured and unstructured data live together and are ready for analytics or machine learning.
Databricks Keeps Removing Friction with Strong Governance and Intuitive AI Tools
What do you like best about the product?
What I like most about Databricks is how its features have consistently matched the evolving needs of engineering teams. Over the years, I’ve seen it grow from a solid data platform into a workspace that genuinely streamlines how we build and manage data and AI solutions. Unity Catalog has been one of the biggest improvements for us having a single place to manage permissions and lineage has removed a lot of manual steps we used to handle separately across systems. Genie AI and BI have also become part of my regular workflow; being able to generate SQL or explore datasets through natural conversations helps teams get to answers faster, especially when we’re under time pressure. The Apps capability has added unexpected value by letting us create and share simplified internal tools directly within the platform, eliminating the need to stand up extra infrastructure. And with Lakebase, we’ve been able to support more transactional-style use cases without losing the flexibility of a lake, which has made certain pipelines far easier to maintain. Altogether, these improvements have removed a lot of friction from day‑to‑day work and made the platform something I genuinely enjoy using as it continues to evolve.
What do you dislike about the product?
What I dislike about Databricks is that some of the newer AI experiences especially Genie for code generation can feel unstable at times and may lose context during longer development sessions. It disrupts my workflow when the assistant can’t retain earlier logic or maintain continuity across multiple iterations.
I’ve also noticed a gap in native connectors for certain enterprise systems like DFS, SMB shares or windows-based source systems, and platforms such as DB2 on AS/400, which many customers still rely on. Even though Databricks continues to expand its ecosystem, the lack of direct connectivity in these areas often means we need extra middleware or custom pipelines to bridge the gap.
None of these are deal-breakers, but they’re areas where the platform’s otherwise smooth experience can still feel a bit incomplete.
I’ve also noticed a gap in native connectors for certain enterprise systems like DFS, SMB shares or windows-based source systems, and platforms such as DB2 on AS/400, which many customers still rely on. Even though Databricks continues to expand its ecosystem, the lack of direct connectivity in these areas often means we need extra middleware or custom pipelines to bridge the gap.
None of these are deal-breakers, but they’re areas where the platform’s otherwise smooth experience can still feel a bit incomplete.
What problems is the product solving and how is that benefiting you?
Databricks has helped us address several long‑standing challenges in how we manage and deliver data and AI. Before adopting its newer capabilities, we were dealing with fragmented governance, duplicate datasets, and a lot of manual effort to keep permissions and lineage consistent across different systems. Unity Catalog improved this by giving us a single place to manage security and ownership, which reduced confusion across teams and noticeably cut down on rework during audits.
We also used to spend a significant amount of time helping teams explore data or draft queries. With Genie AI and BI, they can now generate SQL, summaries, and visual insights more independently. As a result, the time from a question to a usable answer has shortened, especially when we’re working under tight delivery cycles.
Another pain point was building small internal tools around our data. Setting up separate infrastructure or hosting environments created unnecessary overhead. With Databricks Apps, we can now build and share these tools within the platform itself, which saves setup time and reduces ongoing maintenance.
Finally, we struggled to support workloads that needed both the flexibility of a lake and the reliability of a database. Lakebase helped close that gap by enabling transactional‑style operations directly on our lake data, which simplified several pipelines and reduced the number of systems we have to maintain.
Overall, Databricks has moved us from juggling multiple disconnected tools to working in a more unified and predictable environment. That shift has sped up delivery, lowered operational overhead, and improved the clarity of our workflows.
We also used to spend a significant amount of time helping teams explore data or draft queries. With Genie AI and BI, they can now generate SQL, summaries, and visual insights more independently. As a result, the time from a question to a usable answer has shortened, especially when we’re working under tight delivery cycles.
Another pain point was building small internal tools around our data. Setting up separate infrastructure or hosting environments created unnecessary overhead. With Databricks Apps, we can now build and share these tools within the platform itself, which saves setup time and reduces ongoing maintenance.
Finally, we struggled to support workloads that needed both the flexibility of a lake and the reliability of a database. Lakebase helped close that gap by enabling transactional‑style operations directly on our lake data, which simplified several pipelines and reduced the number of systems we have to maintain.
Overall, Databricks has moved us from juggling multiple disconnected tools to working in a more unified and predictable environment. That shift has sped up delivery, lowered operational overhead, and improved the clarity of our workflows.
Unified Lakehouse Architecture for ETL, Analytics, and ML in One Stack
What do you like best about the product?
Unified lakehouse architecture: Databricks lets me treat my data lake more like a “lakehouse,” combining data-lake flexibility with data-warehouse-like features such as ACID transactions, schema enforcement, and time travel on Delta tables. As a result, I can handle ETL, ad hoc analytics, and ML on a single stack, rather than juggling separate warehouses, lakes, and Spark clusters.
What do you dislike about the product?
The platform can feel heavy and is sometimes slow, especially when working with large notebooks or running long jobs. Databricks can also be expensive to operate, particularly if clusters are left idle or aren’t well optimized.
What problems is the product solving and how is that benefiting you?
Faster, collaborative workflows
Databricks simplifies big-data complexity by abstracting much of the Spark and cluster management, so I can focus more on logic and less on infrastructure. The built-in notebooks, jobs, and versioning make it easy to prototype quickly, collaborate with analysts and DS, and move code from experimentation into production with less rework.
Unified platform for data and AI
Databricks reduces the need for separate data-lake, data-warehouse, and ML tools by providing a single lakehouse platform where you can store, transform, and analyze data, and run ML workloads in the same place. This helps cut down on tool sprawl and makes it easier to share data and models across engineering, analytics, and data science teams.
Databricks simplifies big-data complexity by abstracting much of the Spark and cluster management, so I can focus more on logic and less on infrastructure. The built-in notebooks, jobs, and versioning make it easy to prototype quickly, collaborate with analysts and DS, and move code from experimentation into production with less rework.
Unified platform for data and AI
Databricks reduces the need for separate data-lake, data-warehouse, and ML tools by providing a single lakehouse platform where you can store, transform, and analyze data, and run ML workloads in the same place. This helps cut down on tool sprawl and makes it easier to share data and models across engineering, analytics, and data science teams.
All-in-One Databricks Platform with Strong Governance, Fast Spark Performance, and Genie
What do you like best about the product?
The all-in-one platform eliminates tool sprawl. Unity Catalog gives you governance, lineage, and discoverability without bolting on a separate catalog. The notebook UI is clean and makes iterating on PySpark fast. Genie is the standout AI feature it turns curated tables into natural language interfaces for business users, and the SDK lets you configure it programmatically so it stays maintainable. DLT handles pipeline orchestration well. Performance on Spark workloads is solid, especially with Photon. Integrations with Airflow, S3, and the broader ecosystem are straightforward. For the ROI, consolidating what used to require multiple tools into one platform pays for itself in reduced complexity.
What do you dislike about the product?
Pricing can be hard to predict. Compute costs scale quickly if you're not careful with cluster sizing and SKU selection, and it's not always obvious which workload tier you actually need until you see the bill. The notebook IDE, while functional, still lags behind a real editor for refactoring, multi-file navigation, and code review workflows
What problems is the product solving and how is that benefiting you?
Tool consolidation is the biggest one. Before, you'd need separate systems for ingestion, transformation, warehousing, governance, and serving each with its own learning curve, maintenance overhead, and integration headaches. Databricks collapses that into a single platform. Unity Catalog solves the data governance problem by giving you lineage, access control, and discoverability in one place instead of managing permissions across disconnected systems.
showing 71 - 80