Databricks Data Intelligence Platform
Databricks, Inc.External reviews
773 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Streamlines Data Engineering with Ease
What do you like best about the product?
I really appreciate Databricks for its manageability. The cluster management, unified workspace, optimization, and versioning are all aspects I find incredibly valuable. The console has all the tools readily available, which is super convenient for our large scale data engineering projects. Also, the initial setup was super easy, making it a smooth transition into using the platform.
What do you dislike about the product?
norhing much
What problems is the product solving and how is that benefiting you?
I use Databricks for large scale data analysis, processing, and machine learning. It makes cluster management, workspace unification, optimization, and versioning easy with all tools handy in the console.
Databricks as a Hands On Data Engineer: Solving Real World ETL, Governance, and Lakehouse Challenges
What do you like best about the product?
I believe the most attractive thing about Databricks lies in its all-in-one nature, which makes data management easier. Previously, when I used several tools for data-related activities, the experience was not great but here everything seems to be interconnected and straightforward.
The ability to utilize notebooks, especially when working with PySpark, is another advantage of Databricks that i like the core. The tool allows quickly executing changes and modifications without excessive preparation. It also positively impacts the process of collaboration among my team who can simultaneously work on their projects and monitor the overall progress. However, version control can sometimes appear a bit unclear in my view.
In performance, Databricks seem efficient for me at handling big data and operating smoothly without delays. Cluster scaling occurs automatically, allowing me and my team to save time on the infrastructure level. Therefore,it is easy as no additional planning and adjustments are required.
There are minor issues with the UI, which sometime work slowly. but at overall due to is super other aspects like easy methods in implementing and integrating things it encourages me to utilize Databricks frequently.
The ability to utilize notebooks, especially when working with PySpark, is another advantage of Databricks that i like the core. The tool allows quickly executing changes and modifications without excessive preparation. It also positively impacts the process of collaboration among my team who can simultaneously work on their projects and monitor the overall progress. However, version control can sometimes appear a bit unclear in my view.
In performance, Databricks seem efficient for me at handling big data and operating smoothly without delays. Cluster scaling occurs automatically, allowing me and my team to save time on the infrastructure level. Therefore,it is easy as no additional planning and adjustments are required.
There are minor issues with the UI, which sometime work slowly. but at overall due to is super other aspects like easy methods in implementing and integrating things it encourages me to utilize Databricks frequently.
What do you dislike about the product?
One aspect of Databricks that i dislike is its UI. As you spend longer in using the tool, moving between notebooks and clusters becomes annoying at times.
The other problem is the costs that can faster sum up when we are not cautious. Unnecessary clusters may be running for a longer period than required and without the me or my teams knowledge, thereby increasing up the costs in our projects.
There is also complexity of debugging the errors, which are difficult at times as it involves spending extra effort trying to find out where things might have been wrong mainly when dealing with complex pipelines.
At times, there are some discrepancies with regards to customer service which takes us somewhere where we need not to be.
The other problem is the costs that can faster sum up when we are not cautious. Unnecessary clusters may be running for a longer period than required and without the me or my teams knowledge, thereby increasing up the costs in our projects.
There is also complexity of debugging the errors, which are difficult at times as it involves spending extra effort trying to find out where things might have been wrong mainly when dealing with complex pipelines.
At times, there are some discrepancies with regards to customer service which takes us somewhere where we need not to be.
What problems is the product solving and how is that benefiting you?
The most important issue that Databricks resolves is the issue of working with large volumes of data and maintaining consistency. Previously, there were separate processes for data engineering, analytics, and machine learning operations, requiring separate tools and made it difficult for me to handle but now these all are in one place, another one critical issue solved by Databricks is the issue of processing large data volumes. Utilizing the Spark, and distributed computing allows it to perform the tasks that were extremely slow on legacy systems I worked with. This has helped speed up my pipeline, although some time the delays occur.Collaboration is also another problem that Databricks addresses. Multiple users can collaborate on the same notebook or data sets. Collaboration previously was confusing, and now it is easy and good and easy and easly understandable and mainly easy sharing notebooks and assets.Scalability is another issue resolved by Databricks; there is no need to pay attention to infrastructure management. Cluster scaling depends on user requirements, saving time. Previously, it was necessary to pay more attention to the configuration of the infrastructure.
Reliable data platform with powerful pipeline support
What do you like best about the product?
What I like best about Databricks is how it brings data engineering, analytics, and machine learning together in one clean workspace. It saves time, makes collaboration easier, and helps teams move faster with large data.
What do you dislike about the product?
What I dislike about Databricks is that Auto Loader can become frustrating when source data changes frequently, especially if column names or datatypes shift without warning.
For example, a field like customer_id may suddenly come in as cust_id, or a column that was previously a string may start arriving as an integer, which can cause schema drift and break downstream processing.
I also find it inconvenient when schema inference is not fully accurate, such as when nested JSON or semi-structured data is read incorrectly, because it then requires extra manual fixes and maintenance to keep pipelines running smoothly.
For example, a field like customer_id may suddenly come in as cust_id, or a column that was previously a string may start arriving as an integer, which can cause schema drift and break downstream processing.
I also find it inconvenient when schema inference is not fully accurate, such as when nested JSON or semi-structured data is read incorrectly, because it then requires extra manual fixes and maintenance to keep pipelines running smoothly.
What problems is the product solving and how is that benefiting you?
Databricks is solving the problem of building and managing data pipelines at scale without so much manual effort. It helps with reliable ingestion, schema evolution, and orchestration, so teams can process data faster and keep pipelines more stable even when source files change.
For me, that means less time spent fixing broken jobs and more time focusing on transforming and using the data. It also benefits me by making batch and streaming workflows easier to manage in one platform, which is especially useful when data keeps changing.
For me, that means less time spent fixing broken jobs and more time focusing on transforming and using the data. It also benefits me by making batch and streaming workflows easier to manage in one platform, which is especially useful when data keeps changing.
Databricks: Unified Platform for Data Processing and Analytics
What do you like best about the product?
I like that Databricks brings everything into one place, making it unnecessary to use different tools for data processing, analytics, and pipeline work. It handles large data well, and we don't have to worry about managing clusters manually. Additionally, Databricks handles collaboration and experimentation well, making it easy to try out new things.
What do you dislike about the product?
In my point of view, the one area that can be improved is cost management. If clusters aren't monitored carefully, costs can increase faster than expected. One improvement that would help is better visibility into costs at a more detailed level. More built-in alerts or recommendations when costs start increasing unexpectedly would also be helpful.
What problems is the product solving and how is that benefiting you?
Databricks helps us handle large datasets and build data pipelines. It simplifies data processing, transforming, and analysis using Spark and SQL, all in one place. It solves the problem of slow data processing spread across systems, managing infrastructure automatically and facilitating collaboration and experimentation.
Transforms Table Data into Trustworthy Visuals with Helpful Debugging
What do you like best about the product?
I like the concept of transforming data into visuals for each table. Genie Code also helps with debugging and validating the data, which makes it easier to trust what I’m working with.
What do you dislike about the product?
As a proprietary platform built on open-source foundations, it can still introduce vendor lock-in risks, particularly through components such as Unity Catalog and its custom APIs.
What problems is the product solving and how is that benefiting you?
Databricks primarily solves the longstanding challenges of fragmented data architectures by introducing the Lakehouse paradigm. It combines the low-cost, scalable storage of data lakes with the reliability, ACID transactions, and performance of traditional data warehouses. This eliminates data silos, reduces costly ETL duplication, and provides a single unified platform for structured, semi-structured, and unstructured data.
End-to-End Data Management with Databricks
What do you like best about the product?
I like the fact that Databricks helps me manage data end to end, from ingestion to analytics to reporting and even governance. Within the platform, I'm able to build my pipelines to integrate and adjust data. I can also build dashboards, create reports, share them with my stakeholders, and ensure that the right people have access to the correct datasets and reports. The initial setup was pretty easy, and taking some training on the Databricks Academy was really helpful.
What do you dislike about the product?
The layout of the view of the portal could be nicer if it was a bit more colorful.
What problems is the product solving and how is that benefiting you?
Databricks solves a lot of problems by helping me build data pipelines, create a central source of truth, and maintain data security.
All-in-One Powerhouse with Room for Pricing Clarity
What do you like best about the product?
I like that Databricks is an all-in-one powerhouse where I can do multiple works in one place. It's powerful to manage data from multiple sources and have it in a single UC to manage permissions with row-level security. I also appreciate that I can create experiments, run multiple models, and select the best one from logs, which was difficult on other platforms. Once I learned the setup, it's been easy and comfy to work with.
What do you dislike about the product?
I find it difficult to use the calculator to determine CPU serving endpoint prices because the documentation doesn't explicitly explain this. It only mentions 1 concurrency equals 1 DBU on the Azure page, which isn't clear. The pricing calculator has a single option for serving endpoints, labeled as medium with four DBU, but lacks separate options for GPU or CPU and their concurrency, making it hard to understand how it works properly. Initially, I also felt it was very tough to learn Databricks and manage deployments of workspaces, although it became easier over time.
What problems is the product solving and how is that benefiting you?
Databricks consolidates multiple tools into one platform, making it powerful and convenient. I can manage permissions with row-level security and easily run experiments to select the best models, all in one place.
Unified Data Engineering, Science, and Analytics in One Collaborative Platform
What do you like best about the product?
What I appreciate most about Databricks is its ability to unify data engineering, data science, and analytics on a single platform. The collaborative environment—especially the notebooks and integrated workflows—makes it much easier for teams with different skill levels to work together without constant context-switching.
Another highlight is the integration with popular tools and cloud services that are widely used in the market today, which makes it easier to move data between them. The performance monitoring and job scheduling features help maintain visibility over pipelines, and the Delta Lake support for reliable data management has also been very useful.
Another highlight is the integration with popular tools and cloud services that are widely used in the market today, which makes it easier to move data between them. The performance monitoring and job scheduling features help maintain visibility over pipelines, and the Delta Lake support for reliable data management has also been very useful.
What do you dislike about the product?
Cost management is one area that could be improved. While Databricks offers autoscaling and flexible cluster options, it’s easy for resource usage to escalate unexpectedly, especially with large datasets and long-running jobs. Keeping costs predictable often requires careful oversight and a solid understanding of the platform’s pricing model.
Additionally, some of the more advanced features—such as fine-grained access controls and more complex job orchestration—can feel less intuitive. The documentation is extensive, but it occasionally leaves gaps that end up requiring trial and error.
Additionally, some of the more advanced features—such as fine-grained access controls and more complex job orchestration—can feel less intuitive. The documentation is extensive, but it occasionally leaves gaps that end up requiring trial and error.
What problems is the product solving and how is that benefiting you?
Databricks addresses several key challenges in modern data workflows, particularly around scalability, data reliability, and collaborative analytics. One major problem it solves is managing and processing large-scale datasets efficiently. By leveraging Apache Spark’s distributed computing framework, Databricks enables parallelized ETL pipelines and large-scale data transformations that would be impractical on traditional infrastructure.
Another challenge is ensuring data consistency and reliability across pipelines. With Delta Lake, Databricks provides ACID-compliant storage, versioned tables, and schema enforcement, which reduces data errors and simplifies data governance. This is especially beneficial when multiple teams are working on different stages of data pipelines at the same time.
Databricks also helps solve the problem of fragmented workflows for data scientists and engineers. Its unified environment supports multiple languages (Python, SQL, R, Scala) and includes integrated machine learning with MLFlow, making it easier to collaborate and move from data preparation to analytics and ML in one place.
Another challenge is ensuring data consistency and reliability across pipelines. With Delta Lake, Databricks provides ACID-compliant storage, versioned tables, and schema enforcement, which reduces data errors and simplifies data governance. This is especially beneficial when multiple teams are working on different stages of data pipelines at the same time.
Databricks also helps solve the problem of fragmented workflows for data scientists and engineers. Its unified environment supports multiple languages (Python, SQL, R, Scala) and includes integrated machine learning with MLFlow, making it easier to collaborate and move from data preparation to analytics and ML in one place.
Databricks Streamlined Our ETL Migration with Delta Lake and Unified Analytics
What do you like best about the product?
Databricks transformed my day-to-day workflow, taking me from constant SQL Server/ADF headaches to scalable, unified analytics. Migrating stored procedures into Spark SQL notebooks was surprisingly smooth, and using Delta Lake MERGE instead of complicated UPDATE logic saved me weeks of rewriting.
The most helpful features for me have been Delta Lake’s ACID transactions and schema evolution, which handle my sparse shipment loads really well. Unity Catalog has also been a big win because it eliminates the back-and-forth of RDS access tickets by enabling governed table sharing. On top of that, Genie turns natural-language requests into production-ready Spark SQL almost instantly.
On the upside, autoscaling clusters have cut costs by about 70% compared with ADF’s always-on pipelines. I also like being able to combine PySpark and SQL in a single notebook, which makes complex joins and subqueries much easier to manage. And I don’t miss the old NOLOCK hint debates—built-in optimizations take care of that.
If you’re migrating ETL pipelines, Databricks removes a lot of the SQL-to-cloud friction while still scaling to enterprise volumes without breaking the bank.
The most helpful features for me have been Delta Lake’s ACID transactions and schema evolution, which handle my sparse shipment loads really well. Unity Catalog has also been a big win because it eliminates the back-and-forth of RDS access tickets by enabling governed table sharing. On top of that, Genie turns natural-language requests into production-ready Spark SQL almost instantly.
On the upside, autoscaling clusters have cut costs by about 70% compared with ADF’s always-on pipelines. I also like being able to combine PySpark and SQL in a single notebook, which makes complex joins and subqueries much easier to manage. And I don’t miss the old NOLOCK hint debates—built-in optimizations take care of that.
If you’re migrating ETL pipelines, Databricks removes a lot of the SQL-to-cloud friction while still scaling to enterprise volumes without breaking the bank.
What do you dislike about the product?
The cluster reconnects fairly often, which can be disruptive during active work sessions. Also, when I run complex or heavy queries, I notice clear lag in response times, and that slowdown can hurt productivity.
What problems is the product solving and how is that benefiting you?
Databricks has helped us centralize our data engineering and analytics workflows into a single, unified platform. It addresses the challenge of managing large-scale data pipelines by enabling our team to process and transform massive datasets efficiently with Spark. The collaborative notebook environment has also boosted productivity, making it easier for data engineers and analysts to work together. Overall, it has significantly reduced the time we spend on data preparation and has allowed us to focus more on deriving insights.
Scalable Power with Manageable Trade-offs
What do you like best about the product?
The collaborative notebooks are hands-down my favorite part of Databricks. I love being able to jump into a notebook with my team, tweak Spark SQL queries live on those massive shipment datasets, and watch everything sync instantly—without any version-control.
It beats emailing notebooks back and forth or wrestling with merge conflicts; it feels like pair programming, but for data pipelines. And when you pair that with Delta Lake’s reliability for keeping my ETL jobs rock-solid on intermodal lane data, it ends up being a huge workflow saver.
Top notebook perks for me are the real-time editing and sharing that keeps everyone aligned during debugging, the built-in version history that lets me roll back mistakes quickly, and the seamless Spark integration so I’m not constantly context-switching when doing big data transforms.
It beats emailing notebooks back and forth or wrestling with merge conflicts; it feels like pair programming, but for data pipelines. And when you pair that with Delta Lake’s reliability for keeping my ETL jobs rock-solid on intermodal lane data, it ends up being a huge workflow saver.
Top notebook perks for me are the real-time editing and sharing that keeps everyone aligned during debugging, the built-in version history that lets me roll back mistakes quickly, and the seamless Spark integration so I’m not constantly context-switching when doing big data transforms.
What do you dislike about the product?
One key drawback is the cost management—charges can accumulate rapidly if clusters are left running, requiring careful monitoring of DBU usage and auto-termination settings.
Debugging intricate Spark job failures in notebooks often involves sifting through extensive log output, which extends resolution time considerably. Additionally, the UI experiences occasional performance delays under high workloads, impacting efficiency when responsiveness is essential.
Debugging intricate Spark job failures in notebooks often involves sifting through extensive log output, which extends resolution time considerably. Additionally, the UI experiences occasional performance delays under high workloads, impacting efficiency when responsiveness is essential.
What problems is the product solving and how is that benefiting you?
Databricks addresses core challenges in managing large-scale data processing, such as scalability limitations in traditional databases and the complexity of integrating disparate tools for ETL workflows. It enables distributed Spark processing across clusters to handle massive datasets efficiently, while Delta Lake provides ACID-compliant storage to ensure data integrity amid evolving schemas or concurrent updates.
This benefits me by streamlining pipelines that feed BI tools, reducing processing times from days to hours and minimizing manual infrastructure oversight. Collaborative notebooks further enhance team productivity through real-time editing, eliminating version control issues and accelerating development cycles.
This benefits me by streamlining pipelines that feed BI tools, reducing processing times from days to hours and minimizing manual infrastructure oversight. Collaborative notebooks further enhance team productivity through real-time editing, eliminating version control issues and accelerating development cycles.
showing 31 - 40