Sold by

Databricks Data Intelligence Platform
The Databricks Data Intelligence Platform unlocks the power of data and AI for your entire organization. Enjoy up to $400 in usage credits during your 14-day free trial. Cancel anytime. After your trial ends, you will automatically be enrolled into a Databricks pay-as-you-go plan.
Reviews (822)
Sachin G.
Eliminates the fragmentation tax for ML teams, but Unity Catalog migration takes patience
Reviewed on Jun 03, 2026
Review provided by G2
What do you like best about the product?
Managing end-to-end machine learning pipelines, specifically training and deploying multi-agent models and recommendation engines.What I appreciate most about Databricks is how it completely eliminates the coordination overhead—the fragmentation tax—between our data engineering and data science teams. Before Databricks, we were losing hours every day moving data between unmanaged data lakes, proprietary data warehouses, and our isolated machine learning compute clusters. Having MLflow natively managed inside the Databricks workspace is a massive advantage for my day-to-day workflow. I no longer have to worry about setting up tracking servers or maintaining infrastructure just to log my training metrics, because Databricks handles the automatic updates and maintenance seamlessly. Every experiment is automatically tracked, and the model registry seamlessly handles version control, making the handoff from experimentation to production deployment incredibly smooth. Additionally, the recent updates to MLflow for evaluating GenAI agents, specifically the ability to use trace-derived baselines to generate runnable evaluation scripts, have saved me countless hours of manual assembly.
What do you dislike about the product?
The transition to Unity Catalog has been a significant hurdle for our team. Upgrading our legacy workspace to support Unity Catalog's centralized access control and lineage tracking involved a steep learning curve, especially when dealing with privilege inheritance and ensuring the correct schema privileges were granted across the board. Furthermore, while the platform beautifully abstracts away a lot of DevOps work, it can obscure underlying infrastructure costs. It is far too easy for an engineer to spin up an oversized compute cluster for a simple exploratory data analysis task, leading to sudden and severe spikes in our monthly cloud bill. You have to be extremely disciplined with setting strict auto-termination policies and cluster management rules to keep costs in check. The user interface can also feel a bit tedious at times, requiring you to click through multiple layers in the Catalog Explorer just to view the model details page and trace table-to-model lineage.
What problems is the product solving and how is that benefiting you?
The primary problem Databricks solved for us was the massive bottleneck in deploying machine learning models to production. We used to struggle with the classic issue where a model worked perfectly in a local notebook but failed in production due to environment mismatches and a lack of proper version control. By standardizing on Databricks and the managed MLflow environment, we established a strict, documented approval chain that satisfies both our engineering standards and our strict compliance requirements. A real-life example of this was when we recently deployed a multi-agent system for customer churn prevention. We were able to run the inference, monitor the agent's safety and relevance metrics using MLflow's built-in judges, and continuously track the outputs all in one unified platform. This consolidated architecture cut our deployment timelines drastically and significantly reduced the time we spent debugging production errors.
Anita P.
Unified Scalable Data Processing and Machine Learning Platform
Reviewed on Jun 03, 2026
Review provided by G2
What do you like best about the product?
As a Data Scientist working for a mid-size company, my main use case for Databricks is as the central engine for all of our data processing and predictive modeling pipeline. I use it every day to pull raw dirty data from our cloud storage, explore it with complicated SQL queries and then create and train machine learning models with PySpark and Python. Basically it gives our data engineering and data science teams a common place to play on the same huge data sets at the same time without having to endlessly exchange files or credentials.From a day-to-day workflow perspective, I love the fluidity of the collaborative notebook environment. The ability to work with different languages in the same workplace is a great advantage. I can perform an optimized SQL query to pull in a hefty data set in one cell, then process it in the next using PySpark, and visualize it with Python libraries straight after. This fully removes the need to constantly bounce between different tools or IDEs. Another big victory for my daily work is the out-of-the-box connection with MLflow. It makes it very easy to roll back to a previous version, automatically tracks hyperparameter tuning, compares several model runs, and manages the full lifespan of a model. I really enjoy how Databricks takes away the effort of managing Spark clusters, you can spin up a distributed cluster with a few clicks, and focus on writing algorithms vs playing DevOps.
What do you dislike about the product?
And despite all its potential, working with Databricks does come with certain daily difficulties. What is most important for a mid-sized company like us is the aggressive pricing model for compute costs. The monthly payment can get out of control very rapidly, if you’re not compulsively watching your cluster configurations and auto-termination settings especially if a high-memory cluster is unintentionally left operating over the weekend. Another major pain point is the built-in Git integration. Databricks Repos has been helpful however managing complicated merge conflicts or branch management still feels unexpectedly clumsy compared to a regular local IDE like VS Code. Lastly, the learning curve is rather severe for new employees. The user interface might be complicated and debugging distributed computing failures can be a major bottleneck for young data scientists getting up to speed.
What problems is the product solving and how is that benefiting you?
The largest basic problem that Databricks tackled for our business was breaking down the separate silos between our data engineers and data science team. We saw this effect in the real world recently when we were working on a project to build a fraud detection algorithm. In the prior approach, I would have to submit a ticket to data engineering, wait days for them to extract and clean the data, and then try to train the model locally. I would get out of date data by the time I got it, and my machine would crash all the time owing to memory constraints. I could immediately connect to our Delta Lake, utilize PySpark to process the huge data size without any memory issues and train the model on a scalable cluster, all in the same ecosystem using Databricks. This one-stop-shop decreased our model deployment duration from about a month to a couple of days, dramatically enhancing how fast we offer actionable business value.
Keshav R.
Excellent for big data team but very tricky to manage costs and access
Reviewed on Jun 03, 2026
Review provided by G2
What do you like best about the product?
The main benefit from IT side is that Databricks removes the infrastructure headache. Earlier our data engineers were always asking for setting up Spark clusters, managing libraries, and handling VM failures. Databricks does all this automatically. The auto-scaling is quite smooth; it adds nodes when workload is high and removes them later, so infrastructure utilization is very efficient.
Also, the integration with AWS and Azure IAM roles is very solid. We can easily connect it with our active directory for single sign-on SSO, which makes user onboarding very fast. The notebook sharing feature is also liked by my teams because they can collaborate without sharing code files over email or Slack.
Also, the integration with AWS and Azure IAM roles is very solid. We can easily connect it with our active directory for single sign-on SSO, which makes user onboarding very fast. The notebook sharing feature is also liked by my teams because they can collaborate without sharing code files over email or Slack.
What do you dislike about the product?
The biggest pain point for IT Operations is cost control. Databricks billing uses DBUs Databricks Units, and it is very difficult to predict monthly budget. Another issue is the cluster startup time. It takes around 4 to 7 minutes to spin up a new cluster.
What problems is the product solving and how is that benefiting you?
We are using Databricks to centralize our entire data processing and machine learning pipelines. Before this, data was scattered in different silos, and maintaining different environments for data engineers and data scientists was an operational nightmare.
Now, Databricks gives a single platform. From an operations perspective, it reduces my team's support ticket load by at least 40% because users can self-serve their clusters within the limits we set. It saves a lot of engineering hours that we used to spend on maintaining open-source Apache Spark infrastructure.
Now, Databricks gives a single platform. From an operations perspective, it reduces my team's support ticket load by at least 40% because users can self-serve their clusters within the limits we set. It saves a lot of engineering hours that we used to spend on maintaining open-source Apache Spark infrastructure.
Ranjit P.
Managed Spark Clusters and Collaborative Notebooks That Just Work
Reviewed on Jun 03, 2026
Review provided by G2
What do you like best about the product?
The best thing about Databricks is the managed Spark clusters. Earlier, setting up Apache Spark manually on AWS or Azure was a big headache. Now, with Databricks, I can spin up a cluster with just a few clicks. The auto-scaling feature works very well, when processing heavy data workloads, it automatically adds nodes and reduces them when done, which saves some cloud costs.
Also, the collaborative notebooks are amazing. My team members and I can work on the same Python or SQL code at the same time, just like Google Docs. The integration with Delta Lake is also a big plus because it gives ACID transactions directly on cloud storage, so data corruption issues are very rare now.
Also, the collaborative notebooks are amazing. My team members and I can work on the same Python or SQL code at the same time, just like Google Docs. The integration with Delta Lake is also a big plus because it gives ACID transactions directly on cloud storage, so data corruption issues are very rare now.
What do you dislike about the product?
The biggest issue is the pricing. Databricks DBUs Databricks Units are quite expensive, and if you are not careful with cluster configurations or leave a cluster running by mistake, the cloud bill will jump very high quickly. The cost management tools inside the platform could be much better.
What problems is the product solving and how is that benefiting you?
We are solving the big problem of data silo and slow ETL Extract, Transform, Load pipelines. Before Databricks, our data science team and data engineering team were working in different environments, and moving data between them was painful.
Now, Databricks acts as a single Unified Analytics Platform. We ingest raw data into Azure/AWS, clean it using Spark SQL, and the machine learning guys use the same platform to train models. It has reduced our data processing time from hours to minutes, which helps us deliver client projects much faster.
Now, Databricks acts as a single Unified Analytics Platform. We ingest raw data into Azure/AWS, clean it using Spark SQL, and the machine learning guys use the same platform to train models. It has reduced our data processing time from hours to minutes, which helps us deliver client projects much faster.
ibrahim d.
Databricks: Unified, Efficient at Scale with Seamless Cloud Integration
Reviewed on Jun 03, 2026
Review provided by G2
What do you like best about the product?
Databricks provides a unified platform and is very efficient working with large scale terabytes level data. I also like the integration with various cloud services which is seamless and very helpful. Also, the inbuilt Apache spark and very efficient AI/ML workflow orchestration stands out from others. And the databricks support has been outstanding in case of any issues.
What do you dislike about the product?
With features comes cost and using databricks at a scale we use it (terrabytes data, multi customer, multi environment) becomes cost challenging. Also, learning curve can be bit steep for new beginners.
What problems is the product solving and how is that benefiting you?
Our primary challenge was managing large volume data for multiple customers and across different regions. Databricks very efficiently resolved that challenge with it unified platform and very good cloud integration. Our data pipelines are much faster and more orchestrated than ever.
Dipika M.
Databricks Makes Big Data Processing Simple and Boosts Productivity
Reviewed on Jun 02, 2026
Review provided by G2
What do you like best about the product?
I like how Databricks helps process and analyze large volumes of data without a lot of Complexity. which saves time and improves productivity.
What do you dislike about the product?
The only downside with Databricks is that it gets expensive with time.
What problems is the product solving and how is that benefiting you?
Auto-scaling clusters have saved us a lot of time. We don't have to worry about managing infrastructure while processing large datasets.
Jagdish S.
Phenomenal Spark Performance, Frustrating UX, and Eye-Watering Bills
Reviewed on Jun 02, 2026
Review provided by G2
What do you like best about the product?
I run a data science team at a mid-sized company where we handle everything from messy data pipelines to heavy-duty machine learning. Databricks is the core engine of our stack. We use it to ingest raw customer telemetry, clean it up, and run massive PySpark jobs to train our predictive models. We also rely heavily on its MLflow integration to manage our model registry and handle deployments. Essentially, it's the infrastructure playground where all our heavy data lifting happens.The sheer raw performance is unmatched. If you are dealing with massive, bloated datasets that choke local machines or standard cloud instances, Databricks handles them like a beast. The managed Spark environment takes away a massive chunk of the infrastructure headaches involved in setting up clusters from scratch. From a pure data science perspective, having collaborative notebooks where my team can jump in, write Python or SQL concurrently, and instantly visualize data without switching tools is a massive plus. The MLflow integration is also fantastic; being able to track hyperparameters, log artifacts, and register models in the exact same workspace where the data actually lives saves us from fragmented tool sprawl and keeps our MLOps pipelines incredibly tight.
What do you dislike about the product?
The user experience can be deeply frustrating, and the platform often feels like a collection of entirely different tools taped together. The UI is clunky, unintuitive, and constantly changing, which means you waste time just trying to navigate the workspace. Debugging a failed Spark job is also an absolute nightmare—you have to dig through endless layers of convoluted driver and executor logs just to find a simple syntax or out-of-memory error. But my absolute biggest issue is the pricing structure. The billing is completely opaque. They charge you Databricks Units (DBUs) on top of your standard cloud provider's compute costs, and if a junior dev accidentally leaves a high-concurrency cluster running over the weekend without auto-termination strictly configured, you will face an eye-watering bill on Monday.
What problems is the product solving and how is that benefiting you?
Before moving to Databricks, our data engineering and data science teams were completely siloed. Engineers would dump files into cloud storage, and we would struggle to pull that data, map schemas, and train models without running out of memory. Databricks fundamentally solved this fragmentation. For instance, we recently built a real-time recommendation engine where we needed to process millions of daily user events. With Databricks, we built an end-to-end pipeline that handles the data engineering, trains the model, and exposes the model registry to our production environment under one roof. It cut our time-to-production from months to days, which, despite the UX headaches and high costs, makes it a necessary evil for a company handling data at our scale.
Anupama J.
Prominent when scaling LLMs and pipelines, but be mindful of the cloud bill!
Reviewed on Jun 02, 2026
Review provided by G2
What do you like best about the product?
As a researcher of AI, it seems like infrastructure is the number one problem, especially setting up clusters, building drivers, and scaling distributed training. Databricks takes care of all that by itself. I can easily and quickly deploy a cluster of nodes for GPUs with PyTorch and DeepSpeed preconfigured in a few clicks. This built-in MLflow is a lifesaver to keep track of experiments. All the hyperparameters or architecture changes with respect to an embedding model are automatically being tracked every time. ESSENTIAL: I no longer have to struggle to get clean and versioned datasets from data engineers for training purposes when working with Delta Lake. Getting around those feature stores is also very easy with the Unity Catalog.
What do you dislike about the product?
First, it's really expensive, brother. On an extremely large A100 GPU cluster, if you, or someone on your team, forget to configure the auto-terminate, you are going to have a very bleak day with finance tomorrow. Expenses can add up quickly. Additionally, although they are too lightweight to be an ideal platform for distributed deep learning, the debugging workflow may be tedious. The intersection of the computing nodes makes it difficult to find the exact PyTorch-Out-Of-Memory or CUDA-Out-Of-Memory error occurring in the Spark logs. I also feel like the native MLflow UI in Databricks isn't as advanced and specialized as some of the tools like Weights & Biases.
What problems is the product solving and how is that benefiting you?
It fills the oceanic yawning void between research in AI and data engineering. To get terabytes of unstructured text data pre-trained in the past was a multi-step nightmare in different environments. I can make heavy data preparation using Spark and immediately switch to Python for training my model in the same ecosystem. It helps to communicate goodwill amongst the entire team. Everything is in one workspace, so my transition of raw data to experiment tracking to finally registering the model in the registry is done in one, unified, pipeline.
Khushi S.
Databricks is super fast with big data, yet slow to learn.
Reviewed on Jun 02, 2026
Review provided by G2
What do you like best about the product?
I work as a Data Analyst and every day, I use Databricks to complete my data tasks. The best thing I like is the processing speed. We were loading large tables and it was taking too long before we could load big tables using normal databases. My rich SQL queries are very fast in Databricks due to the use of Apache Spark backend.
In addition, the Notebook feature is quite useful to me. I can create SQL code in a cell and in the next cell, I can write Python or Pandas code to do some particular data cleaning. It is also easy to connect Databricks to our Power BI dashboards.
In addition, the Notebook feature is quite useful to me. I can create SQL code in a cell and in the next cell, I can write Python or Pandas code to do some particular data cleaning. It is also easy to connect Databricks to our Power BI dashboards.
What do you dislike about the product?
There are some things which I am facing issues with. First is the cluster starting time. In the morning, it takes 5-10 minutes to boot but once I log in. When the management requests urgent report, then I must make myself sit and wait till the cluster turns green.
What problems is the product solving and how is that benefiting you?
Its primary issue that it is resolving is the ability to process large volumes of company data without system freezing. Previously, it was a pain to deal with millions of rows. I can now easily query, filter and aggregate large datasets.
It is helping my team since data engineers and data analysts are sharing the same workspace. In case data engineers make a new table, I can see it right away and can query it directly in my notebook. Using notebooks to share with other team members to have it reviewed is similar to using Google Doc, which makes my everyday reporting work incredibly quick.
It is helping my team since data engineers and data analysts are sharing the same workspace. In case data engineers make a new table, I can see it right away and can query it directly in my notebook. Using notebooks to share with other team members to have it reviewed is similar to using Google Doc, which makes my everyday reporting work incredibly quick.
Dilkash N.
Best tool to work with big dealer data, but requires technical team.
Reviewed on Jun 02, 2026
Review provided by G2
What do you like best about the product?
We have a very big dealer and distributor network throughout India in our sanitaryware business. Big sales data are being generated everyday. Speed is my favorite thing about Databricks. Whenever we use simple excel or old software before, it is always hanging. And now our company data team is churning through the millions of rows in a short time. As a Senior Sales Specialist, I am making sure that I get my territory dashboard and forecasting reports at least daily in the morning. It is bringing all disperse data together in a good manner.
What do you dislike about the product?
Worst thing is that it is highly technical software. As a sales person, I cannot apply it in locating data directly. I need to request data engineering or IT team to code or make query every time I desire some new custom report. User-non technical interface is becoming very complicated. And my management is continually saying this costs a great deal.
What problems is the product solving and how is that benefiting you?
We are solving target tracking and inventory matching problem. We have a wide range of products, such as tiles, faucets, and washbasins, in Cera. Databricks is assisting our company to understand what region is selling what product more and the trend in the market. My advantage is due to this, because I can advise my local dealers accordingly, as to next month order. It is providing highly precise sales forecast and saving me my manual reporting time and I am closing my sales targets with ease.