I usually handle data ingestion and create warehouses. I also assist other teams, such as analytics, to create reports or perform other tasks.

Databricks Data Intelligence Platform
Databricks, Inc.External reviews
External reviews are not included in the AWS star rating for the product.
Integrating engineering and learning, but cost challenges arise with cluster management
What is our primary use case?
What is most valuable?
Having one solution for everything, from data engineering to machine learning, is beneficial since everything comes under one hood.
What needs improvement?
We often use a single cluster to ingest Databricks, which Databricks doesn't recommend. They suggest using a no-cluster solution like job clusters. This can be overwhelming for us because we started smaller.
We prefer using a small to mid-sized cluster for many jobs to keep costs low, but this sometimes doesn't support our operations properly. We need to stay in sync with the DVR versions, and migrations can pose challenges. For example, issues arose when we moved a cluster from a previous version to the latest one. We could use their job clusters, however, that increases costs, which is challenging for us as a startup. Maintaining this infrastructure can be a headache.
For how long have I used the solution?
I have worked at a couple of companies, not just the current one, and I have about 20 to 25 months of experience with Databricks.
What do I think about the stability of the solution?
They release patches that sometimes break our code. These patches are supposed to fix issues, but sometimes they cause disruptions.
What do I think about the scalability of the solution?
The patches have sometimes caused issues leading to our jobs being paused for about six hours. Fortunately, nothing important is currently running on Databricks, however, if there were, it would be a significant issue.
How are customer service and support?
They are good. My company has a contract with them that includes good support. Whenever we reach out, they respond promptly.
How would you rate customer service and support?
Neutral
What was our ROI?
With the benefits we receive, the price is reasonable. However, it's important to have good use cases. If it's just for data ingestion, it might not be the best solution price-wise. For a lot of different tasks, including machine learning, it is a nice solution.
What other advice do I have?
I would rate the solution seven out of ten. That rating also depends on how we have the contract with Databricks.
It's still a solid and good rating. I work as a data engineer and Databricks engineer.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Powerful and Intuitive
Reduces Job time to perform ETL on the Data Tables.
Provides seamless integration capabilities, but the cluster management features need improvement
What is our primary use case?
We use the product as a data science platform that enables me to handle and analyze large datasets efficiently.
What is most valuable?
Databricks can switch easily between cloud providers, such as Azure and GCP. It allows seamless integration with various data platforms and cloud providers, facilitating better data handling and analysis.
What needs improvement?
The product could be improved regarding the delay when switching to higher-performing virtual machines compared to other platforms like Snowflake. The ease and speed of managing clusters can also be enhanced, especially when scaling up resources. They could add more advanced data storage solutions like Iceberg and Delta files.
For how long have I used the solution?
I have been using Databricks for approximately two years.
What do I think about the stability of the solution?
I rate the product stability a seven out of ten.
What do I think about the scalability of the solution?
I rate the product scalability an eight.
How are customer service and support?
The technical support services are good.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup was straightforward. However, configuring policies could have been simpler.
What's my experience with pricing, setup cost, and licensing?
The product pricing is moderate.
Which other solutions did I evaluate?
I evaluated other options, including Snowflake, before choosing Databricks.
What other advice do I have?
Databricks is a robust solution for big data processing, offering flexibility and powerful features. While there are areas for improvement, especially in performance and cluster management, it remains a highly valuable tool in my data science toolkit.
I rate it a seven.
Manager Data Science
Onboarding can be smoother
The onboarding process is not smooth. When account setup begins, theere is no way to move to a new email if previous one has not yet been activated. Also no way to know which email was used to setup the subscription sign up.
Had a great impressive experience
Best product for both datalake and data warehouse reduce the cost and faster deliver the data
cost reduce
integration to visual is bit complex
Unified analytics platform
it is a platform for both data engineering and data science.
flexibility with different tpe of data.
scalability and performance.
Integration with cloud services
collabration features
warehousing for real time or nearly real time data
maintenance for the complex deployments
Easier adoption of advance analytics tools.
Helps in Real Time Data processing.
Adopting ACID properties into lakehouse helps in data quality and readability.
Databricks: a perfect data platform for python users
Ahead of the competition in building data ecosystems, but needs to improve ease-of-use
What is our primary use case?
I worked with Databricks pretty recently. The particular design processes involved in Databricks were also a part of that specific design/architectural process.
We have used the solution for the overall data foundation ecosystem for processing and storage on a Delta format. We have also seen use cases where we were trying to establish advanced analytics models and data sharing where we leverage the Delta Sharing capabilities from Databricks.
What is most valuable?
A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem.
What needs improvement?
There are some aspects of Databricks, like generative AI, where they are positioning things like DALL-E. They're a little bit late to the game, but I think there are some things that they are working on. Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster, and even though they are fast, I'm not sure how they'll catch up and get adopted because there are strong players in the market.
Databricks is coming up with a few good things in terms of integration. But I have to put one point forward that covers multiple aspects, which is the ease of use for the end user while operating this particular tool. For example, a tool like ADS gives you a GUI-based development, which is good for the end user who does development or maintenance. Looking at the complexities of data integration, a GUI might not be easy, but Databricks should embrace something on the graphical user development front because it is currently notebook-driven. Also, in terms of accessing the data for the end user, Databricks has an SQL interface, similar to earlier tools like SQL Management Studio. Since people are mostly comfortable with SSMS already or not, Databricks can build integration to known tools for data access, and that also helps, apart from what they're doing. I would like to see improvements with respect to user enablement, which is a good part of enterprise strategy. I would like to see their integration with a broader ecosystem of products. If you have to do data governance in tools like Microsoft Purview, it's manual and difficult. Now, I'm unsure if that momentum must be from Databricks or Microsoft. But it would be good if Databricks had some open interfaces to share metadata, which could be viewed in tools enabling data governance like Collibra, Purview, or Informatica. The improvement has to do with user and metadata integration for tools.
For how long have I used the solution?
I've worked with Databricks for over five or six years, but it's been on and off.
What do I think about the scalability of the solution?
The solution is scalable. In this particular ecosystem, there is no one else who can catch up with Databricks for now.
How are customer service and support?
Databricks' customer support is very good. They have a lot of ways in which they interact with vendors and service partners across the globe. They have periodic touch-up sessions with vendors, where their engineers answer your questions.
How was the initial setup?
The implementation is not challenging because the solution integrates well with the platforms on which they are established, whether it's Azure, AWS, or GCP. The solution is not difficult to set up, but you'd probably need a technical user to operate it.
It's the same story with maintenance, where you'd need a technically proficient person with programming knowledge to maintain it.
What other advice do I have?
Databricks integrates many enterprise processes because data processing and AIML are a small part of a larger ecosystem. Databricks has been a part of other platforms, and they are trying to establish their platform, which is a good direction.
Most of the capabilities of the underlying platform can be leveraged there. But the setup isn't difficult if the database lacks some capability, you can't find it in the database, or you're not comfortable with a certain feature in the database. It integrates well with the underlying platform. For example, with scheduling, let's say you are uncomfortable with workflow management. You can utilize integrations with EDA for any other tool and probably perform scheduling. Even if what you're trying to do is not easy, it is enabled with integration. Either they build a required feature in their tool later on, like a GUI, or you perform integrations to make the features possible.
We did evaluate licensing costs, but it had more to do with the Azure ecosystem pricing since whatever we are doing has more to do with Azure Databricks. Many optimizations are recommended, but we haven't exercised those for now. But considering that the processing is a bit more efficient, the overall price won't be much different from what it could be for any other similar component or technology. We haven't had specific discussions with Databricks' folks on pricing.
My advice to users who would like to start working with Databricks is that it is a good solution to work with for data integration and machine learning. Databricks is maturing for other use cases, so there are two points to be considered. One is that you need to evaluate how they will mature, which will be on a case-to-case basis. Second, how will it align with the overall platform story? There will be many overlapping aspects over there as Databricks expands its capabilities. In that case, it must be considered that if those capabilities overlap, how will the underlying platform vendors handle it? How would that interplay happen if many of Databricks' new capabilities align with Microsoft Fabric? That has to be very carefully considered. Otherwise, if you utilize those new capabilities, there might be a discontinuity where you cannot use Databricks because the platform does not support that.
If I specifically talk about Spark-based processing transformations, the data integration story, and advanced stability, I would rate Databricks around eight out of ten. However, with respect to new capabilities like cataloging, data governance, and security integration, I rate Databricks around five because it has to establish these features. And since Databricks integrates with platforms, we must see the interplay with the platforms' capabilities.
I overall rate Databricks a seven out of ten.