AWS Marketplace: Databricks Data Intelligence Platform Reviews

Financial Services

So user friendly and a platform to make the organization's data value chain delivering value

October 10, 2023
Review provided by G2

What do you like best about the product?

unified platform for both BI and AI workload

What do you dislike about the product?

To difficult to keep on track with the evolution pace that platform is growing

What problems is the product solving and how is that benefiting you?

Its helping to realise the paradigm of data-centric AI

Pranshu G.

Data Lake but combined with Datawarehouse benefits

October 07, 2023
Review provided by G2

What do you like best about the product?

It offers ACID transactions which is a massive suppport for data consistency, along with this, the leveraging features such as Time travel and schema evolution comes real handy while builidng a scalable solution. In addition of all above,it reduce data storage costs all while not compromising on powerful distributed programming.

What do you dislike about the product?

With all the features combines, it truly is a powerful tool however, it can be a real challange for new users to master it. For BI users, analysts, who arent skilled with programming, may find it difficult to understand the workflow. Moreover, the community for this tool is currently relatively small and hence minimzing community support.

What problems is the product solving and how is that benefiting you?

The business requires to keep to update powerBI dashboard reports which are ever increasing day by day. As a solution, we are utilizing the lakehouse's ACID, and features such as Schema evolution to clean and transform data and build BI dashboard using the now cleaned data.
This solution has eliminated dependency on our already saturated datawarehouse resources. This has also helped in debugging as all data is processed and resides in one place.Last but not the least, this has reduced costs of our datawarehouse by 20%

Amulya S.

Databricks: a perfect data platform for python users

October 05, 2023
Review provided by G2

What do you like best about the product?

The UI is build keeping ML and python users in mind, it is very intuitive to use.

What do you dislike about the product?

The speed of processing is slow and could be improved

What problems is the product solving and how is that benefiting you?

Easy integration with python notebooks and big data. The pipeline got much efficient.

Senthil K.

Databricks Genie Code - Agentic Applied AI for end-end SDL liefecycle

October 03, 2023
Review provided by G2

What do you like best about the product?

Genie Code

1) Genie Code automated our ETL processes, reducing manual effort and increasing efficiency. With Agentic’s SDL, we implemented CI/CD pipelines for faster, seamless updates and deployments.

2) Genie Code streamlined complex STTM mappings, improving accuracy and speed. Agentic’s real-time updates ensured mapping adjustments were made dynamically to align with changing transaction data.

3) We defined automated unit tests using SKILL.md, ensuring data transformations are validated before deployment. This reduced errors and ensured data quality, boosting confidence in our analytics.

4) Using Skills.md, we added custom extensions to Genie Code, such as integrating third-party data for enriched reports. This agility allowed us to quickly adapt to business needs and deliver new capabilities.

5) Agentic’s SDL enabled real-time data processing, providing immediate analytics for decision-making. Our marketing and sales teams now act on fresh data instantly, improving response times and overall efficiency.

What do you dislike about the product?

Hope it can be improved in next update -

Debugging issues in complex workflows can be time-consuming due to limited visibility into intermediate data transformations.

Genie Code lacks advanced error recovery mechanisms, making it difficult to manage failures in large-scale data pipelines.

As data volume increases, Genie Code’s performance can degrade, requiring significant manual adjustments to ensure smooth operation at scale.

What problems is the product solving and how is that benefiting you?

1) Scalable Processing - Built on Databricks' Spark-based architecture, Genie Code efficiently handles and scales processing for massive datasets, ensuring performance even with increasing data volumes.

2) Genie Code automates end-to-end ETL workflows, from data extraction to transformation and loading, streamlining data operations and eliminating manual tasks.

3) Real time collaboration - Genie Code enables real-time collaboration across teams by using shared notebooks, making it easier for data professionals to build and refine workflows collectively.

Filippo C.

A complete platform for data science and engineering

September 19, 2023
Review provided by G2

What do you like best about the product?

Cluster creation is now made easy through a simple configuration page.
Workspace allows you to organise all your notebooks in one place.
Job mode allows to plan notebook execution and to plan dev/prod pipelines.

What do you dislike about the product?

Data visualization of notebooks output cells is basic, even if it is good for simple application. Dashboard section could be improved by increasing clarity. These are however minor complaints.

What problems is the product solving and how is that benefiting you?

Databricks is helping me saving time when developing code and running jobs at given datetimes.
The autocomplete tool is very efficient, specially when dealing with very long codes and installing python packages or java library is no longer a problem.

Santosh M.

Databricks Lakehouse

September 08, 2023
Review provided by G2

What do you like best about the product?

Its awesome data warehouses platform to help to extract the data or metadata from data lakehouse. Data tables it help is to build the Ai/ML models.

What do you dislike about the product?

all services are good nothing to dislike.

What problems is the product solving and how is that benefiting you?

it's solve the Data Science and Machnie learning problems

Financial Services

A Tool Box to the Modern Big Data Data Scientist

September 05, 2023
Review provided by G2

What do you like best about the product?

The upscale in storing and retrieving large quantities of data with its sdk to s3. In addition, great resources allocation support and additional tools such as clearml.

What do you dislike about the product?

The compatibility to pandas is lacking due to the fact that it is mainly used by me with pyspark which didnt allow an optimal usage for the various pandas libraries.

What problems is the product solving and how is that benefiting you?

Retrieving and querying a very large data warehouse on s3 (several hunders of T'). Performing basic filtering and quering on the data and running a ML experiment on huge amounts of data.

Felix V.

Great tool for data exploration and development, no so much for production pipelines

August 23, 2023
Review provided by G2

What do you like best about the product?

Easy to set up processes and iterate.
Shareability

What do you dislike about the product?

Not tailored for production integration
Hard to incorporate without being databricks aware, which leads to a vendor lock

What problems is the product solving and how is that benefiting you?

Gaining data visibility
Developing spark jobs towards production

Nabil Fegaiere1

A powerful solution that is easily integrated into a variety of platforms

August 21, 2023
Review provided by PeerSpot

What is our primary use case?

I am a Databricks service partner, and my customers use Azure Databricks and Data Factory.

What is most valuable?

It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.

Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.

What needs improvement?

I would like more integration with SQL for using data in different workspaces. We use the user interface for some functionalities, while for others, we have to use SQL to create data sets and grant permissions. For example, when creating a cluster, we have to create it with some API or user interface. Creating a cluster with some properties using SQL grants the possibility of using SQL syntax. Integration with SQL will make Databricks easier to use by people who have experience with databases like Lakehouse, and they would be able to use the data lake and BI. More integration will help have one point of view for everyone using SQL syntax.

Integration with Kubernetes could also be good for minimizing the price because you can use Kubernetes instead of virtual machines. But that won't be easy.

For how long have I used the solution?

I have worked with the solution for four or five years, with some experience since 2016.

What do I think about the stability of the solution?

The solution is stable. The only problem with stability would be that people are not using it efficiently.

What do I think about the scalability of the solution?

The solution is good for scalability.

How was the initial setup?

When we have administration experience, the solution is not difficult to deploy. Technically, however, it's difficult because governance is more complex. For example, I have two warehouses on Databricks, which are clusters in this workspace, and we have to switch from workspace to workspace to have all this information. There is a system table that has all this, but I don't know if everyone can use these tables.

What's my experience with pricing, setup cost, and licensing?

Databricks are not costly when compared with other solutions' prices.

Which other solutions did I evaluate?

Databricks's functionalities are as good as solutions like Snowflake, BigQuery, and Redshift.

What other advice do I have?

People sometimes do not use the solution efficiently. They misunderstand databases, the usage of tables, and the performance. Many data engineers are very junior and don't have skills in that. Stability is more a customer problem than a problem with the product itself. One possible problem with the product is that there's no method to pause the usage of something. For example, we have to use the meta server or the data catalog in Synapse. But in Databricks, we have a choice to use a catalog or not, or Hive, which is always integrated, but we have to choose whether to use it or not. Many customers directly use the passes on Databricks, which causes performance and governance problems.

I can offer a lot of advice on Databricks, and one is to use meta stores like Unity Catalog or Hive Metastore. For incoming use cases, it's better to use Unity Catalog.

I rate Databricks a nine out of ten.

Avadhut Sawant

Ahead of the competition in building data ecosystems, but needs to improve ease-of-use

August 16, 2023
Review from a verified AWS customer

What is our primary use case?

I worked with Databricks pretty recently. The particular design processes involved in Databricks were also a part of that specific design/architectural process.

We have used the solution for the overall data foundation ecosystem for processing and storage on a Delta format. We have also seen use cases where we were trying to establish advanced analytics models and data sharing where we leverage the Delta Sharing capabilities from Databricks.

What is most valuable?

A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem.

What needs improvement?

There are some aspects of Databricks, like generative AI, where they are positioning things like DALL-E. They're a little bit late to the game, but I think there are some things that they are working on. Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster, and even though they are fast, I'm not sure how they'll catch up and get adopted because there are strong players in the market.

Databricks is coming up with a few good things in terms of integration. But I have to put one point forward that covers multiple aspects, which is the ease of use for the end user while operating this particular tool. For example, a tool like ADS gives you a GUI-based development, which is good for the end user who does development or maintenance. Looking at the complexities of data integration, a GUI might not be easy, but Databricks should embrace something on the graphical user development front because it is currently notebook-driven. Also, in terms of accessing the data for the end user, Databricks has an SQL interface, similar to earlier tools like SQL Management Studio. Since people are mostly comfortable with SSMS already or not, Databricks can build integration to known tools for data access, and that also helps, apart from what they're doing. I would like to see improvements with respect to user enablement, which is a good part of enterprise strategy. I would like to see their integration with a broader ecosystem of products. If you have to do data governance in tools like Microsoft Purview, it's manual and difficult. Now, I'm unsure if that momentum must be from Databricks or Microsoft. But it would be good if Databricks had some open interfaces to share metadata, which could be viewed in tools enabling data governance like Collibra, Purview, or Informatica. The improvement has to do with user and metadata integration for tools.

For how long have I used the solution?

I've worked with Databricks for over five or six years, but it's been on and off.

What do I think about the scalability of the solution?

The solution is scalable. In this particular ecosystem, there is no one else who can catch up with Databricks for now.

How are customer service and support?

Databricks' customer support is very good. They have a lot of ways in which they interact with vendors and service partners across the globe. They have periodic touch-up sessions with vendors, where their engineers answer your questions.

How was the initial setup?

The implementation is not challenging because the solution integrates well with the platforms on which they are established, whether it's Azure, AWS, or GCP. The solution is not difficult to set up, but you'd probably need a technical user to operate it.

It's the same story with maintenance, where you'd need a technically proficient person with programming knowledge to maintain it.

What other advice do I have?

Databricks integrates many enterprise processes because data processing and AIML are a small part of a larger ecosystem. Databricks has been a part of other platforms, and they are trying to establish their platform, which is a good direction.

Most of the capabilities of the underlying platform can be leveraged there. But the setup isn't difficult if the database lacks some capability, you can't find it in the database, or you're not comfortable with a certain feature in the database. It integrates well with the underlying platform. For example, with scheduling, let's say you are uncomfortable with workflow management. You can utilize integrations with EDA for any other tool and probably perform scheduling. Even if what you're trying to do is not easy, it is enabled with integration. Either they build a required feature in their tool later on, like a GUI, or you perform integrations to make the features possible.

We did evaluate licensing costs, but it had more to do with the Azure ecosystem pricing since whatever we are doing has more to do with Azure Databricks. Many optimizations are recommended, but we haven't exercised those for now. But considering that the processing is a bit more efficient, the overall price won't be much different from what it could be for any other similar component or technology. We haven't had specific discussions with Databricks' folks on pricing.

My advice to users who would like to start working with Databricks is that it is a good solution to work with for data integration and machine learning. Databricks is maturing for other use cases, so there are two points to be considered. One is that you need to evaluate how they will mature, which will be on a case-to-case basis. Second, how will it align with the overall platform story? There will be many overlapping aspects over there as Databricks expands its capabilities. In that case, it must be considered that if those capabilities overlap, how will the underlying platform vendors handle it? How would that interplay happen if many of Databricks' new capabilities align with Microsoft Fabric? That has to be very carefully considered. Otherwise, if you utilize those new capabilities, there might be a discontinuity where you cannot use Databricks because the platform does not support that.

If I specifically talk about Spark-based processing transformations, the data integration story, and advanced stability, I would rate Databricks around eight out of ten. However, with respect to new capabilities like cataloging, data governance, and security integration, I rate Databricks around five because it has to establish these features. And since Databricks integrates with platforms, we must see the interplay with the platforms' capabilities.

I overall rate Databricks a seven out of ten.

Databricks Data Intelligence Platform

Reviews from AWS customer

External reviews

So user friendly and a platform to make the organization's data value chain delivering value

Data Lake but combined with Datawarehouse benefits

Databricks: a perfect data platform for python users

Databricks Genie Code - Agentic Applied AI for end-end SDL liefecycle

A complete platform for data science and engineering

Databricks Lakehouse

A Tool Box to the Modern Big Data Data Scientist

Great tool for data exploration and development, no so much for production pipelines

A powerful solution that is easily integrated into a variety of platforms

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How was the initial setup?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Ahead of the competition in building data ecosystems, but needs to improve ease-of-use

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How was the initial setup?

What other advice do I have?