Databricks Data Intelligence Platform
Databricks, Inc.External reviews
777 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Effortless Data Unification with Databricks and Lakebase
What do you like best about the product?
I like the ability to unify data engineering with Databricks. The lakehouse feature is especially helpful in my solutions, with persistent memory offering the most value. The initial setup was easy.
What do you dislike about the product?
Databricks could improve governance troubleshooting and simplify operational aspects.
What problems is the product solving and how is that benefiting you?
I use Databricks to build scalable data engineering, AI, and data governance solutions. It helps unify data engineering and supports standards validation, permissions management, and persistent memory.
Efficient Data Scaling with Collaborative Notebooks, but Costly
What do you like best about the product?
I use Databricks for ETL workflows and appreciate how it solves the problem of handling massive volumes of data using Apache Spark. Instead of dealing with complex cluster infrastructure manually, Databricks provides a managed environment that helps in scaling. One of the best features is the collaborative notebook environment, which allows cross teams to collaborate effectively. I switched from Snowflake to Databricks mainly because of the massive parallel processing of Spark. Although the initial setup was tough, learning Databricks is easy.
What do you dislike about the product?
Biggest issue is cost management. Initial setup was tough.
What problems is the product solving and how is that benefiting you?
Databricks handles massive data volumes with Apache Spark without manually managing complex cluster infrastructure, providing a scalable managed environment.
Consolidated Our Data Stack with Databricks that Boosted Performance and Productivity
What do you like best about the product?
Coming from an Airflow + Snowflake setup, moving to Databricks removed a layer of coordination overhead we had normalized, jobs scheduling jobs, reverse ETL pipelines just to get analytical results back into operational systems, and a separate feature store drifting out of sync with training data. The integrations were a big part of why the transition was smoother than expected: native connectors for cloud storage, Git-based repo sync for version-controlled notebooks, and the Databricks SDK plugging cleanly into our existing CI/CD pipelines meant we weren't rebuilding everything from scratch. Databricks Workflows replaced our Airflow DAGs cleanly, Unity Catalog gave us lineage and access control across our full medallion architecture without a separate tool, and Lakebase let us retire the online feature store entirely since model features now live where the data already is. Performance on large-scale aggregations across our brick-and-mortar store datasets improved noticeably, and the workspace UI makes it easy for the whole team to navigate notebooks, pipelines, and catalog without context-switching. The AI-assisted features in the notebook environment genuinely speed up development. The autocomplete and error suggestions that understand the data context are more useful than they sound day-to-day. Onboarding new engineers was also faster than expected given the depth of the platform, with thorough documentation and a responsive support team during migration. From an ROI standpoint, consolidating tooling meant fewer vendor contracts, less pipeline maintenance, and engineering time redirected toward actual product work.
What do you dislike about the product?
The cost model is the most persistent friction point — compute costs can escalate quickly if cluster lifecycle management isn't tight, and for a team that's still maturing its governance around who spins up what, the billing visibility could be more granular out of the box. The UI, while generally clean, gets harder to navigate at scale; when you have dozens of workflows, notebooks, and catalogs, the workspace organization tools don't quite keep up with the sprawl. On the integrations side, some third-party connectors feel like they were added as an afterthought — the experience isn't always as seamless as the native ones, and occasional version compatibility issues have caused unexpected debugging time. Performance on very large unoptimized queries can still surprise you with cold start latency on serverless compute, which matters when you're iterating quickly during development. The AI assistant features are improving but still inconsistent — context awareness drops off on complex multi-file projects and the suggestions occasionally miss the mark in ways that slow you down rather than help. Support response quality has been good for critical issues, but for nuanced technical questions the first response is sometimes generic, and getting to someone with deep product knowledge takes an extra round of escalation.
What problems is the product solving and how is that benefiting you?
The core problem we were solving was operational sprawl — we had analytical data living in one place and operational data in another, with a fleet of pipelines just to keep them in sync. Working with high-volume brick-and-mortar store data across a medallion architecture, the performance gains on large aggregations alone justified the move; queries that previously required careful warehouse sizing now handle gracefully on autoscaling compute. Consolidating onto one platform also meant our AI and ML workflows stopped being second-class citizens — feature engineering, model training, and serving now happen in the same environment where the data lives, which removed an entire category of infrastructure we were maintaining. The workspace UI, while not perfect at scale, made it easier to onboard the broader team without everyone needing deep platform expertise to be productive from day one.
Databricks Simplifies Big Data Processing and Team Collaboration
What do you like best about the product?
What I like best about Databricks is how it simplifies large-scale data processing and collaboration in one platform. The integration with Spark and cloud service makes handling big data much more efficient. I also like the notebook environment, which makes it easier for teams for works together on analytics and machine learning tasks.
What do you dislike about the product?
One thing I dislike about Databricks is the platform can feel complex for new users, especially when managing clusters and configurations. Pricing can also become expensive with larger workloads if resources are not optimized carefully. While integrations and AI features are powerful, the onboarding process and support documentation could be more beginner-friendly.
What problems is the product solving and how is that benefiting you?
Databricks helps solve the challenge of processing and analyzing large amounts of data efficiently in one platform. It combines data engineering, analytics and AI workflows, which reduce the need for the multiple separate tools. This improves collaboration, speeds up data processing, and helps generate insights much faster.
Powerful Lakehouse Platform for Scalable Pipelines and Collaboration
What do you like best about the product?
in my role i focus on designing scalable and future ready data platform, and databricks has become a key part of that architecture i have used it across multiple project for building data pipelines, enabling analytics, and support data science teams. what stand out it brings engineering, analytics and machine learning into one platform, which simplifies overall data architecture. the biggest strength is the lakehouse approach ., it combines the flexibility of a data lake with the reliability of a data ware house, this helps to avoid maintaining separate system for storage and analytics, i also like how well it handles large scale processing using spark, whether its batch or steaming data, it performs consistently when configured properly. collaboration is another strong point, teams can work together in notebooks, share logic, and reuse code easily, which improves productivity across departments. the UI is designed for well, notebooks are clean and flexible and switching between SQL , python and scala is smooth. it integrates well with AWS , Azure and GCP and Airflow. performance is strong for large scale workloads . the AI features like Genie is very useful.
What do you dislike about the product?
the biggest concern is cost control, if clusters are not managed properly or left running longer than needed, cost can increases faster than expected, auto scaling is helpful but without monitoring , it can still lead to higher usage. sometimes starting cluster can take time, especially when you just want to run quick tests or small jobs, this can slow down development and reduce productivity during short tasks. when something fails in a pipeline or job, debugging is not always easy, logs can be detailed, but tracing the exact issues in complex workflows can take time.
What problems is the product solving and how is that benefiting you?
it mainly solves the problem of handling large scale data processing and unifying different data workloads in one platform, earlier building and maintaining ETL pipelines require multiple tools and a lot of manual effort, with databricks i can build, run and manage pipelines in one place using spark, which simplifies the overall process, processing big datasets used to require heavy infrastructure setup, databricks handles this using distributed computing, so i can process large amount of data efficiently without worrying about scaling manually.in traditional setup, we needed separate tools for data engineering , analytics , machine learning, it brings all this into one platform, with shared notebooks and a unified workspace, team can collaborate more easily share code, and work on the same data.
Perfect for Cross-team Collaboration and Intensive Data Applications
What do you like best about the product?
The UX is one of the strongest parts. The notebook experience is clean and intuitive, collaboration is straightforward, and moving between exploration, experimentation, and production workflows feels seamless. It has enough flexibility for advanced users while still being approachable enough that onboarding new team members is fast. People can usually become productive quickly without spending weeks learning platform-specific quirks.
The integrations are also excellent. It works smoothly with the broader cloud ecosystem and connects well with data sources, orchestration tools, model serving infrastructure, and external systems. That interoperability makes it much easier to move from prototype to deployed pipeline without constantly rebuilding connectors or managing glue code.
Performance has been consistently strong, especially when working with distributed workloads and large-scale feature engineering. Spark optimization, cluster management, and managed infrastructure significantly reduce operational overhead, which lets me focus more on model development and analysis rather than environment tuning. For iterative experimentation, spin-up times and overall responsiveness are noticeably better than many alternative managed platforms.
The integrations are also excellent. It works smoothly with the broader cloud ecosystem and connects well with data sources, orchestration tools, model serving infrastructure, and external systems. That interoperability makes it much easier to move from prototype to deployed pipeline without constantly rebuilding connectors or managing glue code.
Performance has been consistently strong, especially when working with distributed workloads and large-scale feature engineering. Spark optimization, cluster management, and managed infrastructure significantly reduce operational overhead, which lets me focus more on model development and analysis rather than environment tuning. For iterative experimentation, spin-up times and overall responsiveness are noticeably better than many alternative managed platforms.
What do you dislike about the product?
One area where Databricks could improve is pricing. The platform delivers strong capabilities, but costs can escalate quickly for high-frequency or real-time workloads. For use cases involving continuously running low-latency tick pipelines, streaming market data, or iterative model retraining, the pricing can become fairly steep relative to the infrastructure being consumed. It sometimes feels like there’s a meaningful premium for convenience and managed orchestration, which can make cost optimization a constant consideration.
The AI integration is another area that still feels somewhat uneven. While there’s a clear push toward positioning the platform as an end-to-end AI/ML environment, some of the newer AI-focused features feel more like ecosystem additions than deeply integrated workflow improvements. In practice, there are still cases where custom tooling or external frameworks provide more flexibility and transparency, particularly for specialized model development, experimentation, and real-time inference use cases.
There can also be some complexity around tuning clusters and managing costs efficiently at scale. While the abstractions are helpful, getting the best performance-to-cost ratio sometimes requires deeper platform knowledge than the “fully managed” positioning might imply.
Overall, the platform is very strong technically, but pricing for always-on data-intensive workloads and the maturity of some AI-native capabilities are the two biggest areas where I’d like to see improvement.
The AI integration is another area that still feels somewhat uneven. While there’s a clear push toward positioning the platform as an end-to-end AI/ML environment, some of the newer AI-focused features feel more like ecosystem additions than deeply integrated workflow improvements. In practice, there are still cases where custom tooling or external frameworks provide more flexibility and transparency, particularly for specialized model development, experimentation, and real-time inference use cases.
There can also be some complexity around tuning clusters and managing costs efficiently at scale. While the abstractions are helpful, getting the best performance-to-cost ratio sometimes requires deeper platform knowledge than the “fully managed” positioning might imply.
Overall, the platform is very strong technically, but pricing for always-on data-intensive workloads and the maturity of some AI-native capabilities are the two biggest areas where I’d like to see improvement.
What problems is the product solving and how is that benefiting you?
Databricks solves one of the biggest challenges in modern data work: bringing together data access, large-scale processing, and collaborative development in a single environment.
For my work, the biggest benefit is real-time collaboration. It allows multiple people to work against the same datasets, notebooks, and pipelines without the usual friction of fragmented tooling or environment inconsistencies. That significantly speeds up experimentation, iteration, and knowledge sharing across projects, especially when moving quickly on model development or analyzing fast-changing data.
It also solves the challenge of scalable data access and processing. Working with high-volume time-series and transactional datasets requires infrastructure that can process large amounts of data efficiently without constant operational overhead. Databricks abstracts much of that complexity, making it possible to focus on analysis, feature engineering, and model development rather than spending time managing infrastructure.
The practical benefit is faster iteration cycles. I can move from raw data exploration to model experimentation and deployment much more quickly, which is especially valuable when working on real-time analytics, forecasting pipelines, and production-facing ML systems where speed of iteration directly impacts outcomes.
Overall, it reduces engineering friction and makes large-scale collaborative data work significantly more efficient, which translates into faster development, better experimentation, and more reliable deployment of data products.
For my work, the biggest benefit is real-time collaboration. It allows multiple people to work against the same datasets, notebooks, and pipelines without the usual friction of fragmented tooling or environment inconsistencies. That significantly speeds up experimentation, iteration, and knowledge sharing across projects, especially when moving quickly on model development or analyzing fast-changing data.
It also solves the challenge of scalable data access and processing. Working with high-volume time-series and transactional datasets requires infrastructure that can process large amounts of data efficiently without constant operational overhead. Databricks abstracts much of that complexity, making it possible to focus on analysis, feature engineering, and model development rather than spending time managing infrastructure.
The practical benefit is faster iteration cycles. I can move from raw data exploration to model experimentation and deployment much more quickly, which is especially valuable when working on real-time analytics, forecasting pipelines, and production-facing ML systems where speed of iteration directly impacts outcomes.
Overall, it reduces engineering friction and makes large-scale collaborative data work significantly more efficient, which translates into faster development, better experimentation, and more reliable deployment of data products.
Straightforward SQL, Smooth Workflow Scheduling, and a Handy Notebook Feature
What do you like best about the product?
It’s straightforward to write and run SQL, schedule workflows, and I especially like the notebook feature. Genie AI is helpful for diagnosing bugs, and it can also answer ad hoc questions whenever I need it.
What do you dislike about the product?
Genie’s AI feature could still use some improvement. It sometimes takes a long time to respond, and with more complex problems it doesn’t always handle them well.
What problems is the product solving and how is that benefiting you?
The workflow is very easy to schedule. It’s also simple to set up alerts, and the visualization makes it easy for me to modify and debug.
Solves Developers’ Problems with Genie, Lakeflow Connect, and DLT
What do you like best about the product?
This platform solves developers’ problems by offering features like Genie, Lakeflow Connect, and DLT.
What do you dislike about the product?
Before using it, I want to understand the compute and charges, and how to use it properly. Basically, I need to learn a lot first.
What problems is the product solving and how is that benefiting you?
It solved our data pipeline and dashboard creation challenges. With SDP and AI/BI Genie, we moved from manually managing the data pipeline to simply declaring it in SQL and having everything handled for us. Instead of spending so much time building dashboards, we can now just ask questions in natural language and get the answers we need without wasting a lot of time.
Intuitive UI and AI-Powered Experience That Keeps Getting Better
What do you like best about the product?
The UI is pretty intuitive and they are using ai to make the experience even better
What do you dislike about the product?
For the most part, it’s a great platform, but some of the debugging options could be improved.
What problems is the product solving and how is that benefiting you?
I use it to write queries for extracting data and running experiments, mostly with SQL and Python.
Scalable, All-in-One Environment with Some Learning Curve
What do you like best about the product?
I like Databricks for its scalability and all-in-one environment for data engineering, analytics, and machine learning. It allows me to process large datasets efficiently while keeping workflows organized in one platform. The scalability is very valuable because it lets me handle growing data volumes and complex workloads without performance issues. As projects expand, the platform can scale resources efficiently.
What do you dislike about the product?
Some features can have a learning curve, especially for new users working with advanced configurations or cluster management. The interface could also be more intuitive in certain areas. The setup was relatively smooth for core features, but some advanced settings like cluster optimization, permissions, and integrations required more time and technical knowledge.
What problems is the product solving and how is that benefiting you?
Databricks solves major data management and analytics challenges by efficiently handling large datasets, simplifying ETL processes, and centralizing workflows. Its scalability allows me to manage growing data volumes without performance issues, ensuring resources scale efficiently as projects expand.
showing 1 - 10