Transformation Journey with Databricks Data Intelligence Platform
What do you like best about the product?
As a data engineer who has been working with Databricks for the past two years, I can honestly say the platform has completely transformed the way we approach data engineering projects. Before Databricks, me and my team often faced challenges with managing large datasets and ensuring smooth collaboration between data engineers and data scientists. There were times when workflows felt disjointed, and troubleshooting issues across different tools consumed a lot of our time.
Databricks has changed all of that. The collaborative notebooks feature, in particular, has been a game-changer. I can now work seamlessly with data scientists in real-time, troubleshooting issues and iterating on solutions much faster. For example, during a recent project, we were able to refine a machine learning model within days, thanks to the ability to easily share notebooks and quickly run experiments together. This level of collaboration used to take weeks with previous tools.
The auto-scaling feature has been a lifesaver. I vividly remember struggling with performance issues when processing large datasets on our old infrastructure. Now, Databricks automatically adjusts resources based on workload, so we never have to worry about managing compute power. This has drastically cut down on processing times. For instance, a data transformation job that used to take hours now finishes in a fraction of the time, allowing us to deliver projects faster.
Delta Lake has also been invaluable. Before we started using it, data consistency and quality were constant concerns, especially when dealing with large and varied data sources. Now, with Delta Lake, we can trust that our data is not only high quality but also easily accessible and queryable. One particular example was when we had to rebuild a complex dataset pipeline. Delta Lake allowed us to work with incremental data updates, making the process much more efficient and reliable.
In short, Databricks has greatly reduced development time and improved the overall quality of our deliveries. It’s helped me streamline complex workflows, improve collaboration across teams, and most importantly, deliver data-driven solutions faster and with greater confidence.
What do you dislike about the product?
Cost Optimisation - While I appreciate the granular billing information provided, predicting costs for large projects or shared environments can still feel opaque. Many teams struggle to control runaway costs from idle clusters or suboptimal configurations. Introducing smarter autoscaling and recommendations tailored to our workloads would be invaluable. For instance, alerts for "idle clusters" or "cost hotspots" in our environment could proactively save budgets and improve efficiency.
Simplified Governance and Security - Managing access at fine-grained levels can be cumbersome. For example, controlling who can view versus who can execute a notebook or job often requires workarounds. Audit logs are excellent, but making sense of them for actionable insights sometimes feels like solving a puzzle. Enhanced attribute-based access control (ABAC) and more intuitive UI-based controls for permission management would greatly streamline operations.
User Experience - The collaborative notebook interface is one of Databricks' standout features, yet there are areas where it could be smoother. Collaboration is sometimes hindered when two users edit the same notebook. Version control feels basic compared to Git-based systems. Debugging within notebooks, especially for non-Python workloads, could use significant improvement. Adding inline commenting, conflict resolution tools, and robust debugging features would take the platform to the next level. A workspace-level activity feed to show what’s happening in shared projects would also be immensely helpful.
Workflow Automation - Include AI-driven insights for optimizing workflows (e.g., spotting bottlenecks or inefficiencies). Enable easier integration with external workflow automation tools.
What problems is the product solving and how is that benefiting you?
The Databricks Data Intelligence Platform has revolutionized how I handle data challenges by providing a unified, scalable, and collaborative environment. It simplifies processing large datasets, unifies teams across workflows, and ensures robust security and governance, enabling seamless data integration and real-time insights. With tools like Delta Lake and MLflow, it has streamlined pipeline development and machine learning, significantly improving productivity and reducing time to value. By democratizing analytics for technical and non-technical users alike, Databricks fosters a truly data-driven culture. Its flexibility, performance, and end-to-end capabilities have been instrumental in driving impactful results for my organization.
Databricks Data Intelligence Platform: ETL, Scalability, and Job Scheduling
What do you like best about the product?
ETL Pipeline automates batch and real-time data integration and quality data integration. Parallel data processing using multithreading. Scale up and scale down for optimising the cost
What do you dislike about the product?
Some SQL functions are not supported like declare, stored procedure, transaction rollback
What problems is the product solving and how is that benefiting you?
Fast ETL process, support of genie, Handling growing datasets
Performance of Databricks in Ml - Review !
What do you like best about the product?
I find that Databricks is totally fit for our requirement and budget in even middle level company like us , it uses Python which is easy to work with and databricks provides live datastream into input channels . I find lakehouse features best and also apache spark provides distributed processing for massive amount of data.
What do you dislike about the product?
It suits our company requirements but it needs a bit of patience at beginning with getting used to the processes since it integrates ml , ai and data processing.
What problems is the product solving and how is that benefiting you?
The most important role of datbricks in our industry is apache spark's distributed processing engine.Using it make simpler to us for working with this platform.It handles large pool of data for our Facebook advertisements lead. It unifies different processes that makes our task much easier and made real time processing of data simpler.
Databricks - Scalability and Performance
What do you like best about the product?
I really like Databricks Genie, It helps me to identify the error and give suggestions to resolve it.
Also If I ask to imrove the current code to faster performance Genie's suggestion are helpful. It helps to implement the ETL logic in effiecient way.
What do you dislike about the product?
Most of the features which I use are helpful but some sql functionalities are not supported such as Update table using join.
What problems is the product solving and how is that benefiting you?
Switching from on-prem server to Cloud with Databricks are beneficial because of follows:
1. On prem major challenge was it's hard maintain the code version and deployment. Using Databricks it's simpler maintain the versions of code and deploy it on different environment(as it's supports GIT)
2. Easy to scale, We can easily scale up and scale down the cluster configuration which causes cost effiecncy, improve in performance in execution.
A versatile data intelligence platform.
What do you like best about the product?
I liked the MLflow integration with Databricks, as it was a crucial part of churn prediction model for our subscription based service that our team developed. The model analysed customer behaviour data to identify potential risks and suggest strategies against that. Also, the job scheduling feature of DataBricks has automated our data preprocessing tasks, which saved us significant amount of time and efforts.
What do you dislike about the product?
We had trouble while setting up real time data ingestion pipelines. But the issue was resolved within a day because of the quick and detailed guidance by DataBricks customer support team.
What problems is the product solving and how is that benefiting you?
Our customer support team needed a dashboard to monitor tickets resolutions time and customer satisfaction score. Using DataBricks, we build a pipeline that pull data from multiple CRM tools. This has improved our productivity as the data collection and report generation is now automated
Simplify big data challenges for better decision-making
What do you like best about the product?
Recommendation engine for an e-commerce platform was developed by our team with the help of DataBricks. The project involved analysing customer behaviour to suggest products on the website. For this project we are required to process bulk data without any performance issues. That could only be possible with DataBricks as the platform is scalable. We also integrated DataBricks with AWS S3 to access data on cloud.
What do you dislike about the product?
Initially, we faced some challenges as the platform has a learning curve, but when we encountered any challenges, we connect with their customer support team and they provided a detailed guidance on every issues that we had.
What problems is the product solving and how is that benefiting you?
We have multiple sources of data and Databricksh has greatly improved our efficiency by combining all the sources of data into single platform. This has eliminated the need to switch between different tools and saving us hours of work each time.
Best platform for data engineering and data science
What do you like best about the product?
We used Databricks for its features such asreal time data processing and dat exploration tools for visualizing data.AutoML and Mlflow is one of the best AI integration in this platform.This software is cost efficient
What do you dislike about the product?
Limited tutorials for new users , not beginner freindly interface
What problems is the product solving and how is that benefiting you?
We used this platform analyzing and processing big data and process data from various formats, this tool is really great
Databricks - best integration tool
What do you like best about the product?
Databricks data intelligence platform make integration of data engineering, data science, and machine learning into a single environment simplify workflow. Users can easily share data and models in same platform.
Databricks optimize for cloud environment, this flexibility allows organisation to choose their preferred cloud provider.
Databricks has a large and active user community and ecosystem include a wealth of share knowledge resources and third party integration.
What do you dislike about the product?
I have been using this software from while but didn't find any dislike in it.
What problems is the product solving and how is that benefiting you?
Databricks support integration with wide range of data source, they allow users id easily ingest, process,and analysis data from disparate system.
Exceptional performance for end to end data management
What do you like best about the product?
I used Databricks to optimise customer segmentation strategy for a retail campaign. It helped me to analyse millions of records, clean the data and create the ML model based on purchasing behavior. The Delta Lake technology ensured data consistency during the process. Its ability to integrate with our Azure data lake made is easy to access datasets.
What do you dislike about the product?
Tableau integration with Databricks was challenging and I encountered issues while setting up real-time data visualisation. Despite the challenges, the platform enabled me to automate data pipelines, which saved me hours.
What problems is the product solving and how is that benefiting you?
Our operations team used Databricks to monitor and optimse supply chain performance. It has become an essential tool for us to enhance both individual productivity and team collaboration. Its impact can be felt acoss multiple projects.
Capability to integrate diverse coding languages in a single notebook greatly enhances workflow
What is our primary use case?
I am working as a data engineer at Fractal. On a daily basis, I work on Azure Cloud, and I use Databricks frequently. We have EDF pipelines and utilize Synapse for our daily tasks.
What is most valuable?
Databricks offers various courses that I can use, whether it's PySpark, Scala, or R. I can leverage all these courses in a single notebook, which is beneficial for clients as they can access various tools in one place whenever needed. This is quite significant.
I usually work with PySpark based on client requirements. After coding, I feed the Databricks notebooks into the ADF pipeline for updates. Databricks' capability to process data in parallel enhances data processing speed. Furthermore, I can connect our Databricks notebook directly with Power BI and other visualization tools like Qlik. Once we develop code, it allows us to transform raw data into visualizations for clients using analysis diagrams, which is very helpful.
What needs improvement?
As a data engineer, I see cluster failure in our Databricks user databases as a major issue. I am unsure why, however, our flow, typically involving three to four notebooks, sometimes leads to cluster failure. Despite attempts to identify the problem, there are times when the reason remains unclear. Adjusting features like worker nodes and node utilization during cluster creation could mitigate these failures.
For how long have I used the solution?
I have been using the solution for three years now.
What do I think about the stability of the solution?
Cluster failure is one of the biggest weaknesses I notice in our Databricks.
Which solution did I use previously and why did I switch?
Databricks is beneficial for cost-saving since clients I work for transitioned from AWS Cloud to Azure Cloud for this reason.
How was the initial setup?
The initial setup is very straightforward for us.
What's my experience with pricing, setup cost, and licensing?
I am not very aware of the pricing. We use three to four clusters in our project. Increasing the number or size of clusters, such as adding more workers, would result in higher costs. That's why we limit ourselves to four clusters for our business.
Which other solutions did I evaluate?
In terms of cost efficiency, it's very useful because our clients switched from AWS Cloud to Azure Databricks to save costs.
What other advice do I have?
I would rate the overall product eight out of ten.
Everything is probably good as far as I have used it, but there's room for improvement in cluster integration. Enhancing cluster capabilities while keeping costs lower would be beneficial.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure