Reviews from AWS customer

10 AWS reviews

External reviews

792 reviews
from and

External reviews are not included in the AWS star rating for the product.


5-star reviews ( Show all reviews )

    Dharun T.

Streamlined, Collaborative Data Workflows with Powerful Performance

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
What I like most about Databricks is how it streamlines the entire data workflow by bringing processing, analysis, and machine learning into one platform. The collaborative notebook environment makes it easy to share code, context, and reasoning with teammates, which helps everyone stay aligned. It also performs strongly on large datasets while abstracting away most of the cluster management, so I can focus on solving the problem rather than dealing with infrastructure. On top of that, centralized access control and clear visibility into data usage support responsible data governance, offering a solid balance between power and ease of use.
What do you dislike about the product?
Databricks has a few downsides, although many of them feel more like trade-offs than outright negatives. My biggest concern is cost: if clusters aren’t managed carefully, expenses can climb quickly, even though the platform can scale very efficiently when it’s tuned properly. There’s also a real learning curve with Spark and distributed computing concepts, and debugging or performance tuning can be more involved than with simpler tools. Lastly, because it’s a managed service, you give up some low-level control compared with self-hosted systems, but the upside is that it takes a lot of the operational and infrastructure work off your plate.
What problems is the product solving and how is that benefiting you?
Because my client needs secure, reusable code, Databricks helps us write Python efficiently while applying OOP principles and design patterns. It also makes it straightforward to extend functionality over time and build custom code that interacts with APIs and databases.


    Magesh kumar N.

Effortless Setup, Minimal Configuration Required

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
I use Databricks to create pipelines and data models, and I really like its minimal need for configuration. It helps me reduce the time spent on configuring accounts and processes. Databricks manages these tasks well, making my work easier. The initial setup was straightforward too, thanks to the guidance provided through the playground feature.
What do you dislike about the product?
My suggestion is to have a Genie update more as to have validations and have the table mapping in it.
What problems is the product solving and how is that benefiting you?
I find Databricks makes my work easy by minimizing the need for configuration and automating workflows, saving me time.


    Vijayaramuprawin V.

All-in-One Platform That Helps Us Iterate Fast and Deploy with Confidence

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
We use Databricks daily as our core data platform for building and running pipelines across a medallion architecture, from extracting data out of SAP and Arkieva all the way to reporting-ready datasets. The notebook experience is intuitive, the feature set is massive, and Asset Bundles have made our CI/CD story with Azure DevOps really solid. Integration with cloud services was smooth, and once things are set up they just work. The learning curve can be steep for newer team members, especially around things like Unity Catalog and DABs, and costs can creep up if you're not staying on top of cluster configurations. Support is decent and the docs are strong enough that we rarely need to open a ticket. Overall, it's a powerful platform that does a lot under one roof, and it's hard to imagine our data engineering workflow without it.
What do you dislike about the product?
The cost can creep up fast if you're not careful with cluster sizing and job configurations, so it takes some effort to keep things optimized. Also, the learning curve for newer team members can be steep, especially around things like Asset Bundles, Unity Catalog, and getting the CI/CD pieces wired up properly.
What problems is the product solving and how is that benefiting you?
Databricks is solving the problem of having fragmented data spread across multiple systems like SAP and Arkieva by giving us one unified platform to extract, transform, and serve it all. That means our business teams get clean, reliable, reporting-ready data without us having to juggle a bunch of separate tools, and we can deploy and manage everything consistently across environments with confidence.


    FABIN P.

Databricks: All-in-One Solution for Data and Analytics

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
What I like most about Databricks is that it brings everything into one place, making it easy to work on data, build models, and manage workflows. It helps teams collaborate easily in real time. It also works very fast with large data using Apache Spark, and features like automation and Delta Lake make handling big data much simpler.
What do you dislike about the product?
One thing I dislike about Databricks is that it can be expensive, especially for large workloads. Sometimes the interface and setup can feel complex for beginners. Also, managing clusters and configurations can take some effort if you’re not very familiar with it.
What problems is the product solving and how is that benefiting you?
Databricks solves the problem of handling large amounts of data efficiently.
It brings data engineering, analysis, and machine learning into one platform.
This removes the need to use multiple tools.
It helps in faster data processing using Apache Spark.
It makes collaboration easier for teams.
It simplifies building and managing data pipelines.
It improves data reliability with features like Delta Lake.
It reduces manual work through automation.
It saves time and effort in daily tasks.
Overall, it helps me work faster and more efficiently with data.


    MOHAN R.

Efficient Data Processing with Robust Governance

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
I use Databricks for data engineering tasks like cleaning, analysis, and performing ETL from source environments to the cloud. I find it very good for governance, and it's easy to process the data. The catalog feature is particularly useful for governance, making it easier to manage data efficiently. Databricks offers very fast processing and efficient governance capabilities, which is why my team switched from ADF to Databricks. Additionally, the initial setup was very easy to understand.
What do you dislike about the product?
Reporting stuffs needs to improve
What problems is the product solving and how is that benefiting you?
Databricks makes governance straightforward and simplifies data processing for our projects. Its catalog is essential for data governance. We switched from ADF to Databricks for its fast processing and efficiency in governance.


    Sabareeswar K.

Databricks: Intuitive, Unified Platform with Seamless Integrations and Fast Support

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
As a data engineer, Databricks has become my go-to platform for end-to-end data work. The ease of use is outstanding notebooks, Delta Live Tables, and Genie all have intuitive interfaces that reduce rampup time significantly. Implementation was smooth thanks to excellent documentation and responsive customer support that actually resolves issues fast. I use it daily, and the sheer number of features from Unity Catalog to AI/BI Genie keeps growing. Integration with cloud storage, BI tools, and ML frameworks is seamless, making it a true unified platform.
What do you dislike about the product?
One challenge is the lack of cost transparency at a granular job level it's difficult to pinpoint exactly which pipeline or notebook is driving up DBU consumption without investing in custom monitoring. Auto scaling clusters, while powerful, can silently balloon costs overnight if not carefully configured with proper limits. Additionally, the SQL warehouse tiers can be confusing to choose from upfront, making budget planning tricky for teams. A built in cost allocation dashboard per job or user would be a huge improvement for day to day cost governance.
What problems is the product solving and how is that benefiting you?
Databricks has eliminated the silos between our data engineering, analytics, and ML teams. Previously, we juggled multiple tools for ingestion, transformation, and reporting. Now everything lives in one lakehouse. Genie specifically has been a game-changer business stakeholders can ask natural language questions directly against our data without writing SQL, which dramatically reduces ad-hoc request bottlenecks for our engineering team. Decision making is faster, data is more democratized, and we've cut our reporting pipeline overhead by a significant margin.


    Supriya M.

A Reliable Workhorse for Data Engineering and Analytics

  • April 01, 2026
  • Review provided by G2

What do you like best about the product?
The unified platform approach is what I appreciate most. Having notebooks, data engineering pipelines, ML workflows, and SQL analytics all in one place saves a ton of time instead of juggling multiple tools. The collaborative notebooks make it easy to share work with teammates, and the cluster management has gotten a lot smoother over time. Delta Lake integration is also a huge plus for keeping our data reliable and consistent.
What do you dislike about the product?
The cost can get out of hand pretty quickly if you're not careful with cluster sizing and uptime. It's not always obvious how to optimize spending, and the pricing model feels complex. The learning curve for new team members is also steeper than I'd like, especially for people who aren't already familiar with Spark. Sometimes the UI can feel sluggish when working with larger notebooks, and debugging job failures could be more straightforward.
What problems is the product solving and how is that benefiting you?
Databricks helps me resolve complex ETL pipeline failures and persistent data quality issues in supply chain analytics by unifying batch and streaming processing from SAP systems with Delta Live Tables. It also removes a lot of the infrastructure management headaches thanks to auto-scaling clusters, so I can stay focused on writing code for multi-terabyte workloads instead of constantly worrying about cluster sizing.

For my manufacturing data projects, Databricks accelerates development cycles from weeks to days via collaborative notebooks and DLT pipelines, enabling faster Power BI reporting and stakeholder decisions. Unity Catalog centralizes governance across Azure and SAP sources, preventing schema drift that plagued prior Hive-based lakes.


    Gowtham s.

Feature-Rich Databricks: Genie, Lakehouse Connect, and Streaming Tables Shine

  • March 31, 2026
  • Review provided by G2

What do you like best about the product?
Databricks has many features compared to other platforms. The key ones I’ve noticed are Genie, Lakehouse Connect, and streaming tables.
What do you dislike about the product?
One thing I’ve noticed in Databricks is that we aren’t able to deploy alerts from one environment to another.
What problems is the product solving and how is that benefiting you?
Databricks addresses key data challenges like siloed tools, scalability limits, and complex governance in modern analytics.


    Balakumaran R.

From Hive Chaos to Unity Catalog - Worth Every DBU

  • March 31, 2026
  • Review provided by G2

What do you like best about the product?
Unity Catalog has been the single biggest value-add for our enterprise migration. We moved from a Hive Metastore architecture to Unity Catalog and gained centralized governance, lineage tracking, and fine-grained access control across all our data assets without bolting on third-party tools. For a multi-domain organization (finance, manufacturing, supply chain, procurement), having one catalog that enforces consistent naming and permissions across bronze, silver, gold, and platinum layers saved us weeks of manual policy work.

UI/UX: The notebook experience with inline Spark SQL and PySpark, combined with the workspace file browser, makes it straightforward for our team to develop and test transformations iteratively. The SQL editor for ad-hoc queries against Unity Catalog tables is clean and responsive.

Integrations: Native Delta Lake support means we don't manage format conversions. The Azure Key Vault integration via secret scopes (dbutils.secrets.get) keeps credentials out of code. ADF integration for orchestration in our V1 environment was seamless, and Databricks Asset Bundles (DAB) for V2 deployment give us a clean CI/CD path with databricks.yml configs targeting dev/qa/prod without custom scripting.

Performance: Switching to CTEs over temp views in our Gold notebooks reduced cluster memory pressure noticeably. The ability to right-size clusters per environment (1 worker for dev, 3 for production) with Standard_D4ds_v5 nodes keeps costs predictable while maintaining performance for our batch ETL workloads.

Pricing/ROI: The pay-as-you-go compute model paired with single-user security mode clusters means we're not over-provisioning. Consolidating our ETL, governance, and BI serving layer into one platform eliminated licensing for separate catalog, orchestration, and data quality tools.

AI/Intelligence (Genie): Genie Spaces have been an unexpected win. Our business analysts in finance and supply chain can ask natural language questions against curated Gold/Platinum tables without writing SQL. It reduced the number of ad-hoc report requests coming to the data team by giving domain users a self-service path that still respects Unity Catalog permissions.

Support/Onboarding: The documentation is thorough, and the skills-based approach to learning (bundles, Unity Catalog, jobs, SQL) maps well to how our team actually works. Onboarding new engineers to the V2 architecture took about half the time compared to V1 because the platform conventions (medallion architecture, asset bundles, catalog naming) are well-documented and consistent.
What do you dislike about the product?
UI/UX: The notebook editor still feels behind dedicated IDEs. No native multi-file search, limited refactoring support, and the git integration UI is clunky for teams managing dozens of notebooks across workflow bundles. We ended up doing all real development in VS Code and treating the Databricks workspace as a deployment target, which adds friction. The workspace file browser also doesn't handle folder structures well when you have 50+ notebooks organized by domain there's no filtering, tagging, or favorites.

Integrations: Databricks Asset Bundles (DAB) are a step forward, but the documentation has gaps for complex multi-bundle deployments. We run a shared Global_Utilities bundle that other workflow bundles depend on, and getting cross-bundle references to work reliably across dev/qa/prod targets required significant trial and error. The ADF-to-Databricks integration works, but debugging failed pipeline runs means jumping between the ADF monitoring UI and Databricks job runs with no unified view. A tighter handshake between orchestration and compute monitoring would save hours of troubleshooting.

Performance: Cluster cold-start times remain a pain point for development workflows. Spinning up a single-node Standard_D4ds_v5 cluster takes 4-7 minutes, which breaks flow when you're iterating on notebook logic. Serverless compute helps but isn't available for all workload types yet, and the cost premium is hard to justify for dev/test environments.

Pricing/ROI: The DBU pricing model is opaque for capacity planning. Estimating monthly costs for a project with 30+ scheduled jobs, interactive development clusters, and SQL warehouse queries requires building custom spreadsheets because the built-in cost management tools don't give you a clear forecast by workflow or domain. We've been surprised by cost spikes from jobs that ran longer than expected with no easy way to set per-job budget alerts.

Support/Onboarding: Enterprise support response times are inconsistent. Critical issues with Unity Catalog permissions during our migration took 3-5 business days for initial triage, which stalled our deployment timeline. The community forums are helpful for common patterns, but for Unity Catalog edge cases (cross-catalog lineage, complex permission inheritance), the knowledge base is thin.

AI/Intelligence: Genie is promising but still rough for production use. It struggles with joins across more than 3-4 tables, sometimes generates incorrect SQL against our Gold layer, and there's no easy way to curate or correct its responses to improve accuracy over time. Our business users got excited, tried it, hit wrong answers on moderately complex questions, and lost trust. A feedback loop where domain experts can flag and correct Genie's outputs would make it genuinely production-ready.
What problems is the product solving and how is that benefiting you?
Data Governance Fragmentation → Unified Catalog We struggled with a Hive Metastore environment where table ownership, access control, and lineage were managed through a patchwork of manual documentation and custom scripts. After implementing Unity Catalog, we now have centralized governance across 4 catalog layers (bronze, silver, gold, platinum) spanning 6 business domains. What used to take a full-time data steward to track manually is now enforced automatically through catalog-level permissions and lineage. This cut our access provisioning time from days to under an hour per request.

Siloed ETL Logic → Standardized Medallion Architecture Before Databricks, our ETL pipelines were inconsistent — different teams wrote transformations differently, with no shared utilities or patterns. We built a standardized framework (Batch_Utilities.py) with reusable functions for schema validation, merge operations, data quality checks, and audit column management. Every notebook across all domains now follows the same 7-cell structure. This reduced new notebook development time from 2-3 days to roughly 4 hours, and onboarding a new developer to the pattern takes a single afternoon instead of a week.

Costly Report Refresh Failures → Reliable Pipeline Orchestration We had recurring issues with Power BI reports pulling stale or incomplete data because upstream jobs failed silently. With Databricks Jobs and metadata-driven pipeline tracking (pipeline status, start/end timestamps logged per run), we now catch failures at the transformation layer before they propagate to reports. Report data freshness issues dropped by approximately 80%, and our finance team stopped scheduling "data verification" meetings that used to consume 3-4 hours per week.

Multi-Environment Deployment Chaos → Asset Bundles Deploying notebooks across dev, QA, and production used to involve manual file copies and environment-specific config edits — error-prone and slow. Databricks Asset Bundles gave us declarative databricks.yml configs with variable substitution per target. A deployment that took 45 minutes of manual steps now runs in under 5 minutes via CLI. We deploy with confidence because the same bundle definition is validated before it hits production.

Self-Service Analytics Gap → Genie + Platinum Layer Business analysts in supply chain and finance were fully dependent on the data team for any ad-hoc analysis. By building denormalized Platinum tables optimized for reporting and exposing them through Genie Spaces, we enabled self-service querying in natural language. Early adoption has reduced ad-hoc report requests to the data team by roughly 30%, freeing up engineering capacity for new feature development.

Cost Visibility → Right-Sized Compute We were over-provisioning clusters because we had no clear view of actual utilization. By standardizing on Standard_D4ds_v5 nodes with environment-specific worker counts (1 for dev/QA, 3 for production) and single-user security mode, we reduced our monthly compute spend by approximately 25% compared to the shared cluster model we ran in V1.


    Karuppusamy V.

Databricks Makes End-to-End Data Workflows Fast, Collaborative, and Easy

  • March 31, 2026
  • Review provided by G2

What do you like best about the product?
What I like most about Databricks is how it simplifies the entire data workflow. Instead of switching between multiple tools for data processing, analysis, and machine learning, everything is available in one place. The notebook environment makes collaboration really smooth it feels natural to work with teammates, share code, and explain logic without extra effort.

Another thing I appreciate is the performance. Working with large datasets can usually be painful, but Databricks handles it efficiently in the background. You don’t have to worry much about managing clusters or optimizing everything manually it just works most of the time, which lets you focus more on solving the actual problem rather than dealing with infrastructure.

What also stands out is the way it handles data governance and organization. With features like centralized access control and better visibility into data usage, it becomes much easier to manage data responsibly, especially in larger projects. Overall, it gives a good balance between power and ease of use, which is why I enjoy working with it.
What do you dislike about the product?
One thing I don’t particularly like about Databricks is that it can get expensive pretty quickly, especially if clusters are not managed properly. If you forget to terminate clusters or run heavy workloads without optimization, costs can spike without much visibility at first. For teams that are still learning or experimenting, this can become a concern.

Another downside is that debugging can sometimes feel a bit tricky, particularly when working with distributed jobs. Errors are not always straightforward, and tracing issues across multiple nodes can take extra time compared to working in a simpler local environment. It requires a certain level of experience to quickly understand and fix issues.

Also, while the platform is powerful, it has a bit of a learning curve for beginners. Concepts like cluster configuration, job scheduling, and data governance are not always very intuitive at the start. It takes some hands-on time before you feel fully comfortable navigating and using everything efficiently
What problems is the product solving and how is that benefiting you?
What Databricks really solves is the problem of handling large-scale data without making the process overly complex. Earlier, working with big data meant dealing with multiple tools, managing infrastructure, and spending a lot of time just setting things up. Databricks simplifies all of that by bringing data engineering, analytics, and machine learning into one place, so the focus shifts more toward solving actual business problems instead of managing systems.

It also addresses performance and scalability issues. When working with huge volumes of data, traditional systems often struggle or slow down. Databricks handles this efficiently in the background, allowing workloads to scale without much manual effort. For me, this means I can process large datasets faster and run transformations or queries without constantly worrying about performance tuning.

Another big problem it solves is collaboration and data management. In many projects, teams struggle with version control, access management, and keeping data consistent. Databricks makes it easier to collaborate, track changes, and control who can access what. This helps me work more smoothly with others, reduces errors, and ensures that the data I’m using is reliable and well-governed.