I am a Databricks service partner, and my customers use Azure Databricks and Data Factory.
 
                        Databricks Data Intelligence Platform
Databricks, Inc.External reviews
External reviews are not included in the AWS star rating for the product.
Databricks - A breath of Fresh air in Big Data
The platform has a solution for every data person, including but not limited to a Notebook that works with Scala, Python, R and SQL, a traditional SQL Editor, downloadable datasets and in house visualisations just a click away!
So user friendly and a platform to make the organization's data value chain delivering value
Data Lake but combined with Datawarehouse benefits
This solution has eliminated dependency on our already saturated datawarehouse resources. This has also helped in debugging as all data is processed and resides in one place.Last but not the least, this has reduced costs of our datawarehouse by 20%
Databricks: a perfect data platform for python users
Databricks Unleashed - Unlocking Data Insights and Streamlining Analytics with Databricks
Auto loader, schema evolution capabilities with CDC usage
Delta Live table Serverless Pipelines
Data Quality expectations
Databricks workflows
Databricks SQL warehouse - Photon SQL endpoints
Unity Catalog for data governance & security
Ease of use with partner connect & integartions
vendor lock-in if we use more databricks specific delta features
Learning curve for pyspark related stuff not for SQL coding
Building the catalog for centralized goverannce
Workflow orchestration
Integrations with cloud & data storage layers
Data sharing with external customer through delta sharing & marketplace
A complete platform for data science and engineering
Workspace allows you to organise all your notebooks in one place.
Job mode allows to plan notebook execution and to plan dev/prod pipelines.
The autocomplete tool is very efficient, specially when dealing with very long codes and installing python packages or java library is no longer a problem.
Databricks Lakehouse
A Tool Box to the Modern Big Data Data Scientist
Great tool for data exploration and development, no so much for production pipelines
Shareability
Hard to incorporate without being databricks aware, which leads to a vendor lock
Developing spark jobs towards production
A powerful solution that is easily integrated into a variety of platforms
What is our primary use case?
What is most valuable?
It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.
Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.
What needs improvement?
I would like more integration with SQL for using data in different workspaces. We use the user interface for some functionalities, while for others, we have to use SQL to create data sets and grant permissions. For example, when creating a cluster, we have to create it with some API or user interface. Creating a cluster with some properties using SQL grants the possibility of using SQL syntax. Integration with SQL will make Databricks easier to use by people who have experience with databases like Lakehouse, and they would be able to use the data lake and BI. More integration will help have one point of view for everyone using SQL syntax.
Integration with Kubernetes could also be good for minimizing the price because you can use Kubernetes instead of virtual machines. But that won't be easy.
For how long have I used the solution?
I have worked with the solution for four or five years, with some experience since 2016.
What do I think about the stability of the solution?
The solution is stable. The only problem with stability would be that people are not using it efficiently.
What do I think about the scalability of the solution?
The solution is good for scalability.
How was the initial setup?
When we have administration experience, the solution is not difficult to deploy. Technically, however, it's difficult because governance is more complex. For example, I have two warehouses on Databricks, which are clusters in this workspace, and we have to switch from workspace to workspace to have all this information. There is a system table that has all this, but I don't know if everyone can use these tables.
What's my experience with pricing, setup cost, and licensing?
Databricks are not costly when compared with other solutions' prices.
Which other solutions did I evaluate?
Databricks's functionalities are as good as solutions like Snowflake, BigQuery, and Redshift.
What other advice do I have?
People sometimes do not use the solution efficiently. They misunderstand databases, the usage of tables, and the performance. Many data engineers are very junior and don't have skills in that. Stability is more a customer problem than a problem with the product itself. One possible problem with the product is that there's no method to pause the usage of something. For example, we have to use the meta server or the data catalog in Synapse. But in Databricks, we have a choice to use a catalog or not, or Hive, which is always integrated, but we have to choose whether to use it or not. Many customers directly use the passes on Databricks, which causes performance and governance problems.
I can offer a lot of advice on Databricks, and one is to use meta stores like Unity Catalog or Hive Metastore. For incoming use cases, it's better to use Unity Catalog.
I rate Databricks a nine out of ten.