The use case for Databricks is that we use the clustering for high big data processing within the cluster.

Databricks by Carahsoft Technology Corp [Private Offer Only]
Carahsoft Technology Corp.External reviews
External reviews are not included in the AWS star rating for the product.
Experiencing smooth performance and cost advantages over previous tools
What is our primary use case?
What is most valuable?
I think it is difficult to determine which feature of Databricks I enjoy the most since there are many valuable features.
What's valuable about Databricks to my organization is that it is more cost-effective and provides better performance than the current AWS tools and services they offer.
What needs improvement?
I am uncertain about specific improvements for Databricks.
It would be beneficial to make Databricks even more cost-effective.
For how long have I used the solution?
I have been using Databricks for two years.
What do I think about the stability of the solution?
My experience with Databricks has been smooth, and I haven't encountered any issues.
Databricks is definitely a very stable product and reliable.
How are customer service and support?
I have not used Databricks customer service or support.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Before Databricks, I used Batch processing, Fargate, and possibly Kubernetes.
I switched from my previous solutions because they were either too expensive or too difficult to configure.
Which other solutions did I evaluate?
I have considered other solutions besides Databricks, such as Snowflake, but we haven't explored it extensively yet.
We are still early in our Snowflake experience, so we don't know the pros and cons compared to Databricks.
What other advice do I have?
My deployment model for Databricks is limited as I'm not a heavy user.
I am not the person who purchased Databricks, but it was possibly acquired through the AWS Marketplace.
I may not have utilized Databricks machine learning capabilities.
My experience with the pricing and licensing model is that it remains relatively expensive. Though it's less expensive than AWS, we still need a more cost-effective solution.
I would rate Databricks overall a nine out of ten.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Unifying data for analytical insights with smooth AI and machine learning integration
What is our primary use case?
A typical use case for the solution is to build the data lakehouse for the client because they have a variety of source systems, and they want to unify that data into the lakehouse platform, where they want to use the data for analytical purposes and insights.
What is most valuable?
The most valuable features of Databricks are especially the Delta Lake and the Unity Catalog; those are the main features. The Unity Catalog is for data governance, and the Delta Lake is to build the lakehouse. Currently, they're coming up with workflow jobs, along with other supporting elements to create an end-to-end solution.
What needs improvement?
In my opinion, areas of Databricks that have room for improvement involve the dashboards. Until recently, everyone used third-party systems such as Power BI to connect to Databricks for dashboards and reports, but they're now coming up with their IBI dashboard, and I think they're on the right track to improve that even further.
For how long have I used the solution?
I have approximately four years of experience working with Databricks.
What do I think about the stability of the solution?
I would rate the stability of Databricks as highly stable, around nine out of ten.
What do I think about the scalability of the solution?
I would rate the scalability of this solution as very high, about nine out of ten.
How are customer service and support?
I rate the technical support as fine because they have levels of technical support available, especially partners who get really good support from Databricks on new features. For us, it's so far so good with no problems, and I would rate the support quality as eight out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of the Databricks solution is reasonably fair enough. It doesn't give any trouble to implement the solution, and I think it's fairly easy to set up and work on Databricks.
What was our ROI?
I can't say if there's seen an ROI from the solution because I do not have exposure in that area, although I think the people who decided to implement Databricks might have done all this analysis and POCs.
What other advice do I have?
My relationship with the vendor is that I'm not a partner of Databricks; I work for a client where we use the Databricks software for implementing the solutions.
My clients are usually enterprise-level organizations, but the area where they're implementing is medium level here, although it might go into enterprise level in the future.
Regarding the price of Databricks, I don't involve myself in those decisions.
I think Databricks is very good at facilitating AI and machine learning projects; they implement AI and machine learning models very well, and clients can run their models on Databricks. I believe they are in a better place compared to competitors such as Snowflake, and they are tying up with important companies such as SAP and Palantir.
Based on my experience, I would recommend Databricks to other people. Overall, I would rate this solution as one of the best, about eight out of ten, although I might not know some of the pitfalls; it's based on use case to use case, but for us, it's working well.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Unified platform simplifies end-to-end processes with intuitive data access solutions
What is our primary use case?
I use Databricks for various purposes, including data engineering, MLOps, machine learning training and deployment, the entire ML cycle, and dashboards. It serves different purposes for different projects.
What is most valuable?
Unity Catalog is a feature I am currently using extensively. I am migrating many projects to Unity Catalog. MLflow, which I use for model registering and creating the lineage of models, is also valuable.
Additionally, Databricks serves as a single platform for conducting the entire end-to-end lifecycle of machine learning models or AI ops. I don't need to switch between various tools, making it an all-encompassing solution for development and research. I use the lake house and utilize features effectively.
What needs improvement?
There has been a significant evolution in databases. One area of improvement is the Databricks File System (DBFS), where command-line challenges arise when accessing files. Standardization of file paths on the system could help, as engineers sometimes struggle.
It would be beneficial to have utilities where code snippets are readily available. This would allow engineers to easily click a snippet and import it into the notebook, enabling quick modifications to variables or paths for fetching files, such as reading data from DBFS files. If I could right-click to copy absolute paths or to read files directly into a data frame, it would standardize and simplify the process.
For how long have I used the solution?
I have used the solution for five years plus.
What do I think about the stability of the solution?
I would rate stability seven to eight out of ten.
What do I think about the scalability of the solution?
I would rate scalability seven to eight out of ten.
How are customer service and support?
I do not have any issues that require support. Many resources are available online.
How would you rate customer service and support?
Neutral
How was the initial setup?
I use infrastructure as code on the cloud to deploy the infrastructure. I have all the Git repositories and code repositories for deploying the code and models in the workspace. My setup includes a shared workspace, shared clusters, and integration with Unity Catalog.
What about the implementation team?
I have a team of 100 engineers working with me, and I head the Center of Excellence (COE).
What was our ROI?
I believe it is competitive across clouds. When it comes to big data processing, I prefer Databricks over other solutions. Cost-wise, it is very competitive. The setup process is straightforward, thanks to the use of Spark clusters. This allows for faster turnaround times with Databricks.
What other advice do I have?
The product rating is nine out of ten.
Databricks serves as a single platform that can handle numerous end-to-end machine learning tasks. The configuration is simple, scalability is excellent, and monitoring cluster utilization facilitates informed business decisions.
It's easy to schedule jobs, pipelines, and handle multiple use cases in parallel, providing countless benefits.
Which deployment model are you using for this solution?
Shared notebooks and scheduling enhance cost efficiency
What is our primary use case?
We work on three platforms. Databricks is hosted on Azure for us, so we work with ADFS, Azure Data Factory, and also the AWS Cloud. We work for some customers.
What is most valuable?
The notebooks and the ability to share them with collaborators are valuable, as multiple developers can use a single cluster. This reduces costs. The scheduling part is managed by Databricks itself, for example, when it is idle, it will automatically turn off. All these features are handled by Databricks, reducing costs. We do not need to schedule separately.
For example, on AWS EC2, we have to create a Lambda function or use System Manager templates to schedule EC2 and EMRs. Here, it is taken care of, saving significant resources.
Additionally, notebooks can be shared within the development team which saves effort. Developers can share their notebooks. Git and Azure DevOps integration on the Databricks side is also very helpful.
What needs improvement?
The API deployment and model deployment are not easy on the Databricks side. We use MLflow for managing MLOps, however, further improvement would be beneficial, especially for large language models and related tools. Moreover, the API deployment should be simplified for ease of deployment and consumption.
For how long have I used the solution?
I have been using Databricks for approximately two and a half to three years.
What do I think about the scalability of the solution?
We have not faced any shortages so far. The clusters are available on demand, thus we have not encountered any scalability issues.
How are customer service and support?
We mostly had limited data support required from Databricks. Whenever we did need support, within two or three days the problem was solved. I would rate them ten out of ten.
How would you rate customer service and support?
Positive
What about the implementation team?
We bought it as a service, which is why we never implemented it ourselves. We do not have any implementation team.
Which other solutions did I evaluate?
For companies focused solely on data transformation, transferring data between databases, and not tackling machine learning or deep learning problems, I recommend ADF. It would be sufficient and cost-saving compared to a full-fledged solution like Databricks. However, for data analytics and solving ETL problems, one should consider Databricks.
What other advice do I have?
I would rate it nine out of ten.
Capability to integrate diverse coding languages in a single notebook greatly enhances workflow
What is our primary use case?
I am working as a data engineer at Fractal. On a daily basis, I work on Azure Cloud, and I use Databricks frequently. We have EDF pipelines and utilize Synapse for our daily tasks.
What is most valuable?
Databricks offers various courses that I can use, whether it's PySpark, Scala, or R. I can leverage all these courses in a single notebook, which is beneficial for clients as they can access various tools in one place whenever needed. This is quite significant.
I usually work with PySpark based on client requirements. After coding, I feed the Databricks notebooks into the ADF pipeline for updates. Databricks' capability to process data in parallel enhances data processing speed. Furthermore, I can connect our Databricks notebook directly with Power BI and other visualization tools like Qlik. Once we develop code, it allows us to transform raw data into visualizations for clients using analysis diagrams, which is very helpful.
What needs improvement?
As a data engineer, I see cluster failure in our Databricks user databases as a major issue. I am unsure why, however, our flow, typically involving three to four notebooks, sometimes leads to cluster failure. Despite attempts to identify the problem, there are times when the reason remains unclear. Adjusting features like worker nodes and node utilization during cluster creation could mitigate these failures.
For how long have I used the solution?
I have been using the solution for three years now.
What do I think about the stability of the solution?
Cluster failure is one of the biggest weaknesses I notice in our Databricks.
Which solution did I use previously and why did I switch?
Databricks is beneficial for cost-saving since clients I work for transitioned from AWS Cloud to Azure Cloud for this reason.
How was the initial setup?
The initial setup is very straightforward for us.
What's my experience with pricing, setup cost, and licensing?
I am not very aware of the pricing. We use three to four clusters in our project. Increasing the number or size of clusters, such as adding more workers, would result in higher costs. That's why we limit ourselves to four clusters for our business.
Which other solutions did I evaluate?
In terms of cost efficiency, it's very useful because our clients switched from AWS Cloud to Azure Databricks to save costs.
What other advice do I have?
I would rate the overall product eight out of ten.
Everything is probably good as far as I have used it, but there's room for improvement in cluster integration. Enhancing cluster capabilities while keeping costs lower would be beneficial.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Transformative data analytics with enhanced AI functionalities and good value for money
What is our primary use case?
Databricks is used for transformations and streaming data processing. We utilize it primarily for data analytics, including the use of Delta Lake and Delta Life tables for ETL processes, dashboards for analysis, and the Unity catalog for role management.
How has it helped my organization?
Databricks improves our data analysis tasks with its powerful functionality, offering real-time analytics and machine learning features that help improve model accuracy. It is easy to use, which helps in saving time and, ultimately, costs.
What is most valuable?
The most valuable features of Databricks include the Delta Lake, a user-friendly interface, Delta Life tables for ETL, dashboard features for analysis, and the Unity catalog for role management. It also offers AI functionalities that assist with code management and machine learning processes.
What needs improvement?
While Databricks is generally a robust solution, I have noticed a limitation with debugging in the Delta Live Table, which could be improved. The issue with Delta type tables not loading into multiple places in a single pipeline has been fixed recently.
For how long have I used the solution?
I have been working with Databricks for four years.
How are customer service and support?
We regularly contact Databricks support and are satisfied with their service. I would rate them eight out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup was straightforward after the first week. Deployment processes became quick and efficient using Git.
What's my experience with pricing, setup cost, and licensing?
In terms of cost-effectiveness, Databricks is worth the money.
What other advice do I have?
I'd rate the solution nine out of ten.
Enhancing data integration and processing across cloud services with seamless transformations
What is our primary use case?
I work in a project where I build data pipelines using Azure Data Factory. I ingest data from on-premises to Azure Data Lake. After that, I perform transformations using Databricks notebooks and Spark, building the Databricks bronze, silver, and gold layers. We export reports from the gold layer.
How has it helped my organization?
Recently, we started using Databricks in our organization. It helps integrate data science and machine learning capabilities.
What is most valuable?
The Unity Catalog is a central governance for all data around the workspaces, and also Databricks' integration capabilities with cloud services like Azure Event Hub and Azure Data Factory. It is user-friendly for data processing, and Spark is a strong language for big data processing.
What needs improvement?
Performance could be improved. It is crucial to check coding, configure Spark correctly, implement caching, and monitor performance metrics to enhance performance.
For how long have I used the solution?
I have used Databricks for over two years.
What do I think about the stability of the solution?
I would rate stability as eight out of ten. It is quite stable.
What do I think about the scalability of the solution?
Databricks is perfect for scalability. It is easy to scale clusters.
How are customer service and support?
I haven't faced any issues requiring customer support, so I don't have experience with their customer support.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We used Informatica before, which is perfect for data management solutions. We started using Databricks for its capabilities in data science and machine learning.
How was the initial setup?
I would rate the initial setup as nine out of ten. It is quite easy for someone experienced with Spark.
What's my experience with pricing, setup cost, and licensing?
For my company, it's okay to upgrade to Databricks because it's comparable in price to Informatica. It is not considered expensive for the company.
Which other solutions did I evaluate?
For machine learning, I used Python and its libraries manually. Prior to Databricks, there was no special tool used for these purposes.
What other advice do I have?
If a company focuses on data science and machine learning, I recommend using Databricks. It's a great solution in this field. For data management needs, Informatica is advantageous due to its comprehensive tools.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Provides resources to users quickly without much hassle
What is our primary use case?
I have recently gotten into Databricks and trained on one model. I started using Databricks because of its hardware support and all the other things that it provides, and it is easier to get into. Earlier, when I had to test some part of my code or test if it was working or not, it was not just a fair, not a full production run, but just a fair testing; I had to get a machine, raise a request, get into the whole process. With Databricks, I can just simply create one myself. I could get the resources, whatever they are required, test it out all there, and then go ahead with that, and that is why I have been using it primarily.
What is most valuable?
The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle.
What needs improvement?
I think setting up the whole account for one person and giving access are areas that can be difficult to manage and should be made a little easier.
For how long have I used the solution?
I have experience with Databricks.
What do I think about the stability of the solution?
I think there's a duration after which our training without any activity would expire, which I think is a fair point, and that is the only place where I think this will stop. I haven't come across a lot of problems with Databricks.
What do I think about the scalability of the solution?
The tool is not used as frequently as PyTorch. I don't know why I am comparing Databricks to PyTorch, but I think around five people use it.
How are customer service and support?
I have not contacted the solution's technical support team.
Which solution did I use previously and why did I switch?
Before Databricks, I used to use a cloud support platform.
How was the initial setup?
The solution is deployed on the cloud.
Which other solutions did I evaluate?
I chose Databricks over other products, considering the hardware support it offers.
What other advice do I have?
A little bit of time will be needed to get comfortable with Databricks.
I rate the tool an eight out of ten.
Process large-scale data sets and integrates with Apache Spark with notebook environment
What is our primary use case?
I primarily use Databricks to process large-scale data sets with Apache Spark. My main use case is processing large data sets, such as 600 GB or 800 GB.
What is most valuable?
Databricks integrates natively with Apache Spark, which I use as a processing engine for large-scale datasets. This native integration is one of its strengths. Another strength is that the platform makes it very easy to manage resources. For example, setting up a cluster of five or fifteen nodes is straightforward with Databricks. The notebook environment is also excellent, making it easy to perform various tasks.
What needs improvement?
While Databricks allows you to upload your packages, we encountered some limitations with its capabilities, particularly with Apache Spark, which also affected Databricks. We had issues working with spatial data. You had to go through many steps to find libraries that could process spatial data in a distributed fashion.
For how long have I used the solution?
I have been using Databricks since 2018.
What do I think about the scalability of the solution?
I might have a project that runs for one or two months, and perhaps I won't use it for six months. Self-service is one of its strengths. I can shut down everything and easily spin up resources when I need to use them again. We have a dedicated group of fifty people who consistently use Databricks for analytics.
How was the initial setup?
The initial setup was very easy and took around 10-15 people. We have a data science infrastructure team helping with this.
What was our ROI?
Databricks stands out among most data platforms mainly because of its ease of use. The learning curve is not as steep, making it accessible for anyone to handle large-scale data processing on Databricks. This ease of use contributes positively to our return on investment. However, in our line of work, converting this efficiency into direct monetary gains can be challenging, given our nonprofit nature.
What's my experience with pricing, setup cost, and licensing?
We purchased high-performance laptops to reduce our reliance on the cloud. The main issue was the cost. Internally, if I used Databricks, that cost would return to my team. There was a time when my monthly cost was around ten thousand dollars, which was quite high. Due to these costs, several teams, including ours, move away from using Databricks and other cloud providers. It became prohibitive, so we invested in our high-performance computers internally instead.
What other advice do I have?
Databricks provides ease of use for me, particularly due to its seamless integration with Apache Spark. This integration simplifies the process of conducting machine learning on large-scale datasets.
I recommend this solution 100%. Overall, I rate the solution an eight out of ten.
Helps users with data processing and analytics
What is our primary use case?
I use Databricks to manage the setting up of data lakes for SaaS.
What needs improvement?
The biggest problem associated with the product is that it is quite pricey. We cannot find a better solution than Databricks in the market currently.
For how long have I used the solution?
I have been using Databricks for a year.
What's my experience with pricing, setup cost, and licensing?
It is an expensive tool. The licensing model is a pay-as-you-go one.
What other advice do I have?
The tool helps with data processing and analytics with large-scale data or big data since it is associated with managing data at a large scale.
For my general use cases, I would say that I am not a technical person, so I cannot explain to you how the tool helps with the area of data engineering tasks.
There is another team in my company that is involved in the use of machine learning and AI features in Databricks. My team is mostly into operations. The tool is used in a multi-country project.
For example, in my company, they make some shopping decisions related to solutions based on what is the product chosen by the whole company.
I rate the tool an eight out of ten.