Databricks Data Intelligence Platform
Databricks, Inc.External reviews
768 reviews
from
and
External reviews are not included in the AWS star rating for the product.
Fast, Governed Self-Service Data Exploration with Databricks Genie
What do you like best about the product?
As a data engineer, I use Databricks Genie to interact with data in natural language, while still relying on the same governed tables, metrics, and semantic models that my team has built. Instead of jumping straight into SQL notebooks for every exploratory ask, I or business users can phrase questions in plain language and let Genie translate them into structured, catalog‑aware queries. This keeps self‑service fast but also secure and governed.
What do you dislike about the product?
Laptop stability when multitasking
My laptop can hang or become noticeably sluggish when I’m working with multiple Genie tabs and dashboards at the same time, especially during heavier queries or more demanding visualizations. This hurts the overall user experience and can slow down iterative development and analysis.
Latency with complex data models
With very wide schemas or more complex semantic models, Genie sometimes selects suboptimal joins or an overly broad/narrow level of granularity. As a result, I still need to review the generated SQL and optimize it myself. In that sense, it remains a helpful assistant rather than a fully autonomous query engine.
My laptop can hang or become noticeably sluggish when I’m working with multiple Genie tabs and dashboards at the same time, especially during heavier queries or more demanding visualizations. This hurts the overall user experience and can slow down iterative development and analysis.
Latency with complex data models
With very wide schemas or more complex semantic models, Genie sometimes selects suboptimal joins or an overly broad/narrow level of granularity. As a result, I still need to review the generated SQL and optimize it myself. In that sense, it remains a helpful assistant rather than a fully autonomous query engine.
What problems is the product solving and how is that benefiting you?
In a recent project, the business wanted to understand a decline in customer‑lifetime‑value (CLV) in a specific region. A product manager used Genie to explore CLV trends by region and cohort, excluding refunds, directly from an AI/BI dashboard. From that conversation, I captured the core logic, wrapped it into a Delta Live Table pipeline, and scheduled it as a recurring job. This reduced ad‑hoc requests by roughly 30–40% and enabled ongoing self‑serve access to CLV insights while I focused on tuning performance and data‑quality rules.
Overall, Genie helps me talk with my data in natural language, improves how quickly we uncover insights, and supports better data‑quality practices—though working across many Genie‑backed tabs can strain local hardware and sometimes slow down the workflow.
Overall, Genie helps me talk with my data in natural language, improves how quickly we uncover insights, and supports better data‑quality practices—though working across many Genie‑backed tabs can strain local hardware and sometimes slow down the workflow.
Databricks: A Unified, Scalable Platform for Faster Collaboration and Innovation
What do you like best about the product?
Databricks stands out because it provides a unified platform that seamlessly combines data engineering, machine learning, and analytics, making collaboration across teams much easier. I especially appreciate how it simplifies working with big data by integrating with popular tools like Apache Spark, offering scalability, and enabling faster experimentation. The collaborative notebooks, strong support for multiple programming languages, and built-in security features make it both powerful and user-friendly. Overall, it helps accelerate innovation by reducing complexity and improving productivity across the entire data lifecycle.
What do you dislike about the product?
One drawback of Databricks is that it can feel overwhelming for new users because of its complexity and steep learning curve. The platform offers a wide range of powerful features, but navigating them effectively often requires significant technical expertise. Additionally, costs can escalate quickly if clusters are not managed carefully, and performance tuning sometimes demands deep knowledge of Spark internals. Integration with certain external tools can also be less seamless compared to other platforms.
What problems is the product solving and how is that benefiting you?
Databricks is solving the challenge of managing and analyzing massive amounts of data by providing a unified platform for data engineering, machine learning, and analytics. It eliminates the need to juggle multiple tools, making workflows more streamlined and collaborative. For me, this means faster access to insights, easier experimentation with models, and reduced complexity in handling big data. The benefit is clear: improved productivity, better collaboration across teams, and quicker decision-making powered by reliable data.
Databricks: Feature-Rich, User-Friendly, and Keeps Everything in One Platform
What do you like best about the product?
Among the various platforms I’ve worked with, Databricks stands out as a genuinely cohesive environment. It feels less like a bundle of disconnected features and more like a unified workspace—one that can evolve alongside the teams using it. The interface is intuitive enough to lower the barrier to entry, while still delivering the depth and power needed for heavy-duty engineering.
One of its biggest strengths is how it consolidates the data lifecycle. By bringing engineering, data science, and SQL analytics under one roof, it helps dissolve the silos that often lead to “data drift” and miscommunication between departments. In practice, it also simplifies the underlying infrastructure, replacing a dozen specialized (and sometimes conflicting) tools with a single, clearer source of truth.
Beyond simply “keeping things clean,” the platform also shines when it comes to collaborative transparency. With notebooks and experiments shared in real time, the gap between an initial data idea and a production-ready model can be dramatically shortened. On top of that, its commitment to open standards like Delta Lake means you’re not boxed into a proprietary black box—you’re building on a foundation that aligns with the broader data community’s direction. Overall, it strikes a rare balance: a polished, user-friendly wrapper around some of the most powerful distributed computing engines available today.
One of its biggest strengths is how it consolidates the data lifecycle. By bringing engineering, data science, and SQL analytics under one roof, it helps dissolve the silos that often lead to “data drift” and miscommunication between departments. In practice, it also simplifies the underlying infrastructure, replacing a dozen specialized (and sometimes conflicting) tools with a single, clearer source of truth.
Beyond simply “keeping things clean,” the platform also shines when it comes to collaborative transparency. With notebooks and experiments shared in real time, the gap between an initial data idea and a production-ready model can be dramatically shortened. On top of that, its commitment to open standards like Delta Lake means you’re not boxed into a proprietary black box—you’re building on a foundation that aligns with the broader data community’s direction. Overall, it strikes a rare balance: a polished, user-friendly wrapper around some of the most powerful distributed computing engines available today.
What do you dislike about the product?
The “Big Task” Breakdown
When Genie processes a large volume of data, it often ends up sending a huge amount of JSON back to the browser so it can render those tables and visualizations.
Memory overload: Browsers (and especially Chrome) can be real memory hogs. If a Genie response includes a very large result set or a massive execution plan, RAM usage can spike quickly, which can lead to that familiar “Not Responding” hang.
The “DOM” lag: Every row in a table and every line of code becomes an element the browser has to keep track of. As you scroll or type, the browser has to repaint thousands of these elements. When the task is too large, the browser’s main thread can get tied up rendering, and your typing starts to feel like it’s trailing behind by a few seconds.
When Genie processes a large volume of data, it often ends up sending a huge amount of JSON back to the browser so it can render those tables and visualizations.
Memory overload: Browsers (and especially Chrome) can be real memory hogs. If a Genie response includes a very large result set or a massive execution plan, RAM usage can spike quickly, which can lead to that familiar “Not Responding” hang.
The “DOM” lag: Every row in a table and every line of code becomes an element the browser has to keep track of. As you scroll or type, the browser has to repaint thousands of these elements. When the task is too large, the browser’s main thread can get tied up rendering, and your typing starts to feel like it’s trailing behind by a few seconds.
What problems is the product solving and how is that benefiting you?
You’ve nailed the core reason Databricks is winning over so many data teams: they’re reducing the “integration tax.” In most companies, you can easily lose around 30% of your time just moving data between the “storage” tool, the “processing” tool, and the “BI” tool.
The AI/BI Dashboard is a great example of this broader shift—from a “collection of tools” to a more unified platform.
What began as a basic visualization layer has evolved into a “Compound AI” system. Here’s how it has become so useful:
The “Ask Genie” integration: You’re no longer limited to staring at a static chart. As of 2026, every published dashboard includes an “Ask Genie” button by default. If a stakeholder notices a spike in a line chart, they don’t have to call you; they can right-click the chart and ask, “Genie, why did this drop on Tuesday?” and it will use Agent mode to track down the driver.
Direct-to-warehouse speed: Because it lives inside Databricks, there’s no need to “extract” data to a separate BI server. It queries the data where it already lives (Unity Catalog), which means the dashboard stays as fresh as your last ETL run.
AI-assisted authoring: You can build entire widgets just by describing what you want. Instead of dragging fields around, you can type, “Show me a funnel chart of our sales conversion by region,” and it generates the SQL and the visualization for you.
Deep governance: Since it’s built in, your security policies (row-level security, tags) follow the data automatically. You don’t have to recreate permissions in a separate tool like Tableau or Power BI.
The AI/BI Dashboard is a great example of this broader shift—from a “collection of tools” to a more unified platform.
What began as a basic visualization layer has evolved into a “Compound AI” system. Here’s how it has become so useful:
The “Ask Genie” integration: You’re no longer limited to staring at a static chart. As of 2026, every published dashboard includes an “Ask Genie” button by default. If a stakeholder notices a spike in a line chart, they don’t have to call you; they can right-click the chart and ask, “Genie, why did this drop on Tuesday?” and it will use Agent mode to track down the driver.
Direct-to-warehouse speed: Because it lives inside Databricks, there’s no need to “extract” data to a separate BI server. It queries the data where it already lives (Unity Catalog), which means the dashboard stays as fresh as your last ETL run.
AI-assisted authoring: You can build entire widgets just by describing what you want. Instead of dragging fields around, you can type, “Show me a funnel chart of our sales conversion by region,” and it generates the SQL and the visualization for you.
Deep governance: Since it’s built in, your security policies (row-level security, tags) follow the data automatically. You don’t have to recreate permissions in a separate tool like Tableau or Power BI.
Databricks Genie Nails Unity Catalog Migrations with Context-Aware Guidance
What do you like best about the product?
Databricks Genie's contextual understanding of Unity Catalog is genuinely impressive. While working through a complex UC migration, navigating three-level namespaces, volume paths, security modes, and widget-driven SQL execution, Genie reasoned through the specifics instead of falling back on generic answers. It really speaks the UC migration language, which cuts down on a lot of back-and-forth and makes troubleshooting feel more direct. Overall, the platform is powerful for managing large-scale data engineering work across Python, Scala, and notebook-based pipelines, all in one place.
What do you dislike about the product?
My biggest frustration with Genie is the lack of persistent session memory. On a long-running migration project with 60+ test cases and multiple interconnected components, having to re-establish context every session creates real overhead. Genie also struggles with cross-component reasoning: it handles individual notebooks well, but tracing issues across multiple layers of a framework is still largely a manual effort. Occasionally, the responses feel overly cautious when what’s needed is a more direct, confident answer.
What problems is the product solving and how is that benefiting you?
We’re using Databricks to carry out a full Unity Catalog migration for a large, automated ingestion framework, moving off the legacy Hive Metastore while also upgrading the runtime environment. Databricks provides a unified platform where the migration work, testing, and validation can all happen in one place. During testing, Genie in particular helped speed up root-cause analysis, for example, it pinpointed why a data extraction notebook was failing to resolve UC-managed table references and identified that adding a USE CATALOG statement was the fix. That kind of targeted, context-aware assistance directly reduces investigation time during complex migrations.
Centralized Governance and Fine-Grained Access Control with Unity Catalog
What do you like best about the product?
What I like best about Unity Catalog in Databricks is its ability to provide centralized data governance and fine-grained access control across all data assets, making it easier to manage and secure data in a collaborative environment.
What do you dislike about the product?
I created a notebook with more than 70 cells that I use to parse XML files. When I try to debug issues using Genie, it doesn’t work properly and ends up hanging.
What problems is the product solving and how is that benefiting you?
I used Spark functions to define the XML structure dynamically, assigning mpid and mpparentid as needed. This approach has been very beneficial for me.
Intuitive, Limitless Analytics for End-to-End Data Pipelines
What do you like best about the product?
It’s very intuitive, and the breadth of data and analytics you can do with it is limitless.
You can create a medallion architecture, create data pipelines, create jobs, dashboards, data governance, etc.
You can create a medallion architecture, create data pipelines, create jobs, dashboards, data governance, etc.
What do you dislike about the product?
I feel like some of the newer releases features can be a bit buggy at times, but after a while those things usually get better.
What problems is the product solving and how is that benefiting you?
We have a data and analytics platform and we use Databricks as our key vendor. Our relationship with them has been great and they’ve been super helpful the whole way.
Simplifies Data Engineering, Needs Better Tool Integration
What do you like best about the product?
I like the features of Genie, especially the new junior code, which makes it possible to get SQL-ready scripts by just chatting using natural language. This is fascinating, especially with the governance layer on top of it with Unity. It accelerates both analysts' and engineers' jobs by helping build reports and getting them ready efficiently, especially since it has access to most of the metadata. The documentation is also useful, suggesting SQL code that can be provisioned on the fly. Tying Genie with AI functions such as the ai_query makes it a superpower.
What do you dislike about the product?
Honestly, a ton of features that can be improved, especially connectivity with other tools, such as cloud tools, especially like Azure. As a Microsoft employee, I evangelize Databricks, but many of our clients use the Microsoft stacks extensively. Sometimes, these tools feel isolated from the whole stack. There’s still a lot of work to be done to connect models provisioned in Azure and things like unity catalogs or governance that can sit outside of Databricks and Microsoft's stack. This feels like a disconnect, especially in highly regulated environments where on-prem stuff needs to interact with Databricks capabilities.
What problems is the product solving and how is that benefiting you?
Databricks simplifies provisioning services, streamlines data engineering, and speeds up workflow creation. It combines tools into one governed platform, making handling big data easier and faster. Its AI layer integrates well, reducing the need for multiple tools.
Genie Code Agent Mode Made Our Migration to Databricks Fast and Accurate
What do you like best about the product?
Genie Code (Databricks Assistant Agent) — I’m currently working on migrating existing workloads from ADF and SQLMI to Databricks. As part of that, I need to convert stored procedures and ADF dataflows into Databricks notebooks. Initially, we refactored all the code manually, but once Agent Mode was available in preview, we tried using it to convert the stored procedures and dataflows into Databricks PySpark code. I was impressed by the accuracy: it handled about 90% of the code conversion without errors, aside from some case-handling and similar adjustments.
Also, Lakeflow Connect helped me connect SharePoint and SFTP data to Databricks more easily.
Also, Lakeflow Connect helped me connect SharePoint and SFTP data to Databricks more easily.
What do you dislike about the product?
It’s not a major issue, but in my project the client asked us to generate table and column descriptions using AI in Unity Catalog. For each environment, these descriptions vary, and I have around 300 tables just in the Bronze zone. Having to click into each table and generate AI descriptions one by one is very time-consuming, and the results are not consistent across environments.
It would be much more efficient if we had an option to generate descriptions at the schema level, and if there were an information schema or system tables that stored table and column descriptions as metadata. That way, we could easily replicate them across environments. In some cases, clients also have source system documentation we could leverage to generate more accurate table and column descriptions.
It would be much more efficient if we had an option to generate descriptions at the schema level, and if there were an information schema or system tables that stored table and column descriptions as metadata. That way, we could easily replicate them across environments. In some cases, clients also have source system documentation we could leverage to generate more accurate table and column descriptions.
What problems is the product solving and how is that benefiting you?
One of my main scenarios was migrating all the existing stored procedures and ADF dataflows into Databricks notebooks. Doing this manually took more than 6 hours to complete both the development and the validation. Later, we used Agent Mode Preview and converted over 80+ medium/complex stored procedures and 20+ ADF dataflows into Databricks notebooks. This saved more than 100+ hours, and it also generated validation scripts for each table to close out unit testing.
Apart from the Agent Assistant, we also used external volume. Previously, we relied on the Azure library for file processing in ADLS storage, but we ran into rate-limit issues, couldn’t process in parallel, and sometimes the job would abort. After we created an external volume pointing to the required ADLS container, we achieved parallel processing and faster reads and writes, instead of using custom Python code.
Apart from the Agent Assistant, we also used external volume. Previously, we relied on the Azure library for file processing in ADLS storage, but we ran into rate-limit issues, couldn’t process in parallel, and sometimes the job would abort. After we created an external volume pointing to the required ADLS container, we achieved parallel processing and faster reads and writes, instead of using custom Python code.
Databricks: A True Unified Analytics & AI Platform That Boosts Speed and Reliability
What do you like best about the product?
What I like best about Databricks is how it finally delivered what every data engineer/data professional has been wishing for — a true unified analytics and AI platform.
I remember working across five different tools just to get a single pipeline from ingestion to reporting. Databricks collapsed all of that into one environment, and that changed everything for me.
Delta Lake was the first breakthrough. When it arrived around 2020, ACID transactions and time‑travel immediately eliminated the operational pain we used to consider “normal.” If a job corrupted a table, I could roll back to a previous version in seconds instead of spending hours restoring backups. That reliability alone saved multiple downstream failures.
Before Delta existed, our pipelines relied heavily on overwrite patterns because there was no reliable way to apply updates or handle late‑arriving data safely. Overwrites were slow, expensive, and risky — especially for large tables. A single failure during overwrite could leave the table in a half-written, inconsistent state. Processing took longer, compute costs shot up, and recovery often meant manually rebuilding partitions from scratch.
The ROI became obvious as soon as we used Databricks end‑to‑end. Because one platform handles ingestion → transformation → ML → BI → governance, we retired entire categories of legacy tools and reduced operational overhead dramatically.
Then Genie arrived — and it genuinely transformed my day‑to‑day work.
I once needed a PySpark module for data quality checks. Genie generated the full logic — null checks, schema validation, aggregations — in seconds. Instead of spending 30 minutes writing boilerplate, I spent 3 minutes refining the logic. It shifted my focus from syntax to decisions.
Integrations are another strength. Connecting Databricks to S3, SQL Server, and especially Power BI has been seamless. Publishing Delta tables directly to BI models removed the need for brittle extracts and sped up refreshes. Unity Catalog made everything even cleaner with consistent permissions and lineage.
Performance is consistently strong when it matters — heavy joins, window functions, multi‑stage pipelines, or streaming workloads. Serverless compute starts instantly, and workloads scale predictably even under pressure.
Finally, onboarding surprised me. Features like serverless compute, natural‑language queries, AI‑generated code suggestions, and automatic comments make Databricks intuitive even for engineers new to Spark. It feels like the platform actively helps you learn.
In short: Databricks lets me work faster, recover instantly, integrate seamlessly, and scale confidently — all in one place. It’s the rare platform that improves both speed and reliability at the same time.
I remember working across five different tools just to get a single pipeline from ingestion to reporting. Databricks collapsed all of that into one environment, and that changed everything for me.
Delta Lake was the first breakthrough. When it arrived around 2020, ACID transactions and time‑travel immediately eliminated the operational pain we used to consider “normal.” If a job corrupted a table, I could roll back to a previous version in seconds instead of spending hours restoring backups. That reliability alone saved multiple downstream failures.
Before Delta existed, our pipelines relied heavily on overwrite patterns because there was no reliable way to apply updates or handle late‑arriving data safely. Overwrites were slow, expensive, and risky — especially for large tables. A single failure during overwrite could leave the table in a half-written, inconsistent state. Processing took longer, compute costs shot up, and recovery often meant manually rebuilding partitions from scratch.
The ROI became obvious as soon as we used Databricks end‑to‑end. Because one platform handles ingestion → transformation → ML → BI → governance, we retired entire categories of legacy tools and reduced operational overhead dramatically.
Then Genie arrived — and it genuinely transformed my day‑to‑day work.
I once needed a PySpark module for data quality checks. Genie generated the full logic — null checks, schema validation, aggregations — in seconds. Instead of spending 30 minutes writing boilerplate, I spent 3 minutes refining the logic. It shifted my focus from syntax to decisions.
Integrations are another strength. Connecting Databricks to S3, SQL Server, and especially Power BI has been seamless. Publishing Delta tables directly to BI models removed the need for brittle extracts and sped up refreshes. Unity Catalog made everything even cleaner with consistent permissions and lineage.
Performance is consistently strong when it matters — heavy joins, window functions, multi‑stage pipelines, or streaming workloads. Serverless compute starts instantly, and workloads scale predictably even under pressure.
Finally, onboarding surprised me. Features like serverless compute, natural‑language queries, AI‑generated code suggestions, and automatic comments make Databricks intuitive even for engineers new to Spark. It feels like the platform actively helps you learn.
In short: Databricks lets me work faster, recover instantly, integrate seamlessly, and scale confidently — all in one place. It’s the rare platform that improves both speed and reliability at the same time.
What do you dislike about the product?
What I dislike most about Databricks is the cost visibility and predictability.
Even as an experienced engineer, it can be difficult to get a straight, real‑time view of what a workflow will cost before running it. Photon vs. standard runtime, autoscaling behaviour, shuffle-heavy operations, DBUs—these can stack up quickly, and cost surprises happen unless you actively monitor and tune everything. A simple pipeline misconfiguration can quietly double your spend.
Another challenge is the rapid pace of new features and changes.
Databricks innovates incredibly fast, which is great, but it also means features may land before documentation, best practices, or governance patterns are fully mature. Sometimes functionality behaves differently across runtimes or cloud providers, and staying on top of everything requires continuous learning and refactoring. This can create team friction and technical debt.
In short: Databricks is exceptional, but the cost model isn’t always transparent, and the rapid feature rollout can introduce operational complexity that teams must actively manage.
Even as an experienced engineer, it can be difficult to get a straight, real‑time view of what a workflow will cost before running it. Photon vs. standard runtime, autoscaling behaviour, shuffle-heavy operations, DBUs—these can stack up quickly, and cost surprises happen unless you actively monitor and tune everything. A simple pipeline misconfiguration can quietly double your spend.
Another challenge is the rapid pace of new features and changes.
Databricks innovates incredibly fast, which is great, but it also means features may land before documentation, best practices, or governance patterns are fully mature. Sometimes functionality behaves differently across runtimes or cloud providers, and staying on top of everything requires continuous learning and refactoring. This can create team friction and technical debt.
In short: Databricks is exceptional, but the cost model isn’t always transparent, and the rapid feature rollout can introduce operational complexity that teams must actively manage.
What problems is the product solving and how is that benefiting you?
Business : Before adopting Databricks, our aerospace analytics environment — particularly around Customer engine health monitoring — suffered from the same challenges many traditional engineering organisations face.
We had multiple disconnected systems handling telemetry ingestion, fault-code processing, fleet analytics, and maintenance prediction. Data from engine sensors (FADEC, vibration, thermals, oil systems) arrived in different formats and needed heavy manual work just to normalise. Pipelines relied on full overwrites because our legacy setup didn’t support updates or late-arriving data, which made processing slow and expensive.
We struggled with slow ingestion of engine telemetry, inconsistent datasets across engineering teams, and long turnaround times for anomaly detection models.
Architecture challenge: Before using Databricks, we were operating in a fragmented data landscape.
We had multiple systems, disconnected storage layers, and a heavy reliance on overwrite‑based ETL jobs because our old data platform couldn’t support updates, late‑arriving data, or ACID guarantees. This meant pipelines were slow, error‑prone, and expensive. Rolling back bad data could take hours, and data inconsistencies across teams were common.
We struggled with siloed systems, slow pipelines, unreliable data, and high operational cost.
We struggled with manual overwrites and inconsistent data — but now we can use Delta Lake with ACID and time‑travel,
which has resulted in:
Instant rollback from data corruption scenarios
Reliable incremental processing instead of full overwrites
Consistent data consumed across engineering, BI, and ML teams
This reduced our telemetry pipeline processing window from hours to under 30 minutes for a fleet‑wide daily batch..
We struggled with multiple tools and duplicated architectures — but now we have one unified Lakehouse,
which has resulted in:
A single platform for ingestion → transformation → ML → BI → governance
Removal of 3–5 legacy tools (ETL schedulers, BI extracts, legacy ML infra)
Lower maintenance and licensing overhead
We struggled with slow development cycles — but now we can leverage Genie for AI‑assisted engineering,
which has resulted in:
70–80% faster creation of PySpark modules
Automatic generation of schema checks, null checks, and DQ logic
More time spent on decisions, less on boilerplate code
For example, a data quality module that used to take 30 minutes now takes 2–3 minutes to scaffold.
We struggled with inconsistent governance — but now Unity Catalog gives us end‑to‑end visibility,
which has resulted in:
Faster onboarding (reduced from days to minutes)
Centralised permissions, lineage, and audit trails
Stronger compliance alignment
We struggled to scale pipelines and ML workloads — but now we use distributed compute + Photon,
which has resulted in:
Large joins and window operations executing up to 10× faster
Stable handling of terabyte‑scale datasets
Predictable performance even under heavy workloads
We had multiple disconnected systems handling telemetry ingestion, fault-code processing, fleet analytics, and maintenance prediction. Data from engine sensors (FADEC, vibration, thermals, oil systems) arrived in different formats and needed heavy manual work just to normalise. Pipelines relied on full overwrites because our legacy setup didn’t support updates or late-arriving data, which made processing slow and expensive.
We struggled with slow ingestion of engine telemetry, inconsistent datasets across engineering teams, and long turnaround times for anomaly detection models.
Architecture challenge: Before using Databricks, we were operating in a fragmented data landscape.
We had multiple systems, disconnected storage layers, and a heavy reliance on overwrite‑based ETL jobs because our old data platform couldn’t support updates, late‑arriving data, or ACID guarantees. This meant pipelines were slow, error‑prone, and expensive. Rolling back bad data could take hours, and data inconsistencies across teams were common.
We struggled with siloed systems, slow pipelines, unreliable data, and high operational cost.
We struggled with manual overwrites and inconsistent data — but now we can use Delta Lake with ACID and time‑travel,
which has resulted in:
Instant rollback from data corruption scenarios
Reliable incremental processing instead of full overwrites
Consistent data consumed across engineering, BI, and ML teams
This reduced our telemetry pipeline processing window from hours to under 30 minutes for a fleet‑wide daily batch..
We struggled with multiple tools and duplicated architectures — but now we have one unified Lakehouse,
which has resulted in:
A single platform for ingestion → transformation → ML → BI → governance
Removal of 3–5 legacy tools (ETL schedulers, BI extracts, legacy ML infra)
Lower maintenance and licensing overhead
We struggled with slow development cycles — but now we can leverage Genie for AI‑assisted engineering,
which has resulted in:
70–80% faster creation of PySpark modules
Automatic generation of schema checks, null checks, and DQ logic
More time spent on decisions, less on boilerplate code
For example, a data quality module that used to take 30 minutes now takes 2–3 minutes to scaffold.
We struggled with inconsistent governance — but now Unity Catalog gives us end‑to‑end visibility,
which has resulted in:
Faster onboarding (reduced from days to minutes)
Centralised permissions, lineage, and audit trails
Stronger compliance alignment
We struggled to scale pipelines and ML workloads — but now we use distributed compute + Photon,
which has resulted in:
Large joins and window operations executing up to 10× faster
Stable handling of terabyte‑scale datasets
Predictable performance even under heavy workloads
Databricks Notebooks Make Collaboration Seamless Across Python, SQL, and Scala
What do you like best about the product?
Databricks collaborative notebooks are really useful and let me work in whatever language I need to meet my requirements effectively. The ability to mix Python, SQL and even Scala within a dashboard makes collaboration and teamwork much smoothet. I also appreciate how easily it integrates with other tools and cloud platforms, so it fits into my existing workflows without very little friction.
What do you dislike about the product?
I like their customer support and the frequent updates are a big reason this has become my favorite for data management, I also appreciate how well it integrates with external tools like Power BI for reporting its really good.
What problems is the product solving and how is that benefiting you?
Its simplifies cross team collaboration and helps us work through large datasets without having to worry too much about infrastructure or analytics overhead. Calcuations and reporting are fast, which has improved our development cycles and reduced the back and forth between the engineering and analytics teams.
showing 81 - 90