My main use case for Cloudera Data Platform is for data analytics and AI workload.
We have different data sources where the data is coming in tabular format or CSV, semi-structured or structured, unstructured, and some sort of Kafka streaming messages. We use to store it and then we process and transform, apply the business logic, and then make the data ready for the consumer to consume.
Cloudera Data Platform offers excellent architectures in terms of decoupling the storage layer from the compute. It is flexible in terms of scaling to your storage account or compute. Additionally, we have different streaming services as part of the ecosystem, and they have added Ranger for security controls, which is a valuable feature.
Decoupling storage from compute has helped my team significantly. Before using Cloudera Data Platform, we were using Cloudera Distribution for Hadoop (CDH), where we had to have on-premises virtual machines or Linux boxes to add to the cluster, which required lots of effort. We had defined authorized maximum storage per system; for example, one computer can have a maximum of 8 TB, and scaling up to add more compute to the cluster was very challenging. In the current Cloudera Data Platform, the backend storage is a data lake that auto-scales, so we don't have to add more storage. In terms of security, we used to use Sentry in traditional CDH, but in Cloudera Data Platform, Ranger provides more granular level of security, allowing us to manage who can access data at different levels, maybe at a tabular level or column level.
Streaming services are provided by NiFi, which is one of the best ecosystems for streaming and ETL support.
Cloudera Data Platform has positively impacted our organization by reducing overall manual intervention, requiring fewer efforts and resources to build a big data cluster compared to traditional methods. It is also cost-effective and more stable than the traditional ways of handling big data workload.
In terms of resources, we have reduced from ten resources to four or five resources, making it an effective reduction in manual effort. Regarding cost saving, since we are in the cloud, we are saving significant money compared to maintaining infrastructure on-premises.
Cloudera Data Platform could improve by innovating more in terms of full-fledged support for AI workloads, enriching machine learning or LLM, as there haven't been updates in that aspect over the last one and a half years.
I have been using Cloudera Data Platform for almost four years.
Cloudera Data Platform's scalability is very good.
Customer support is good. However, having a common chat channel between firms and service providers would make communication faster and more efficient.
My advice to others looking into using Cloudera Data Platform is that if they are looking for big data workloads on the cloud where they can do analysis and achieve cost savings and resource reductions, it is definitely a good use case. It can vary based on business needs, but it is a good option for big data workloads.
I rated Cloudera Data Platform a six out of ten because I wish that it would keep up with market trends and release AI technology and AI-enabled workloads. Sometimes we struggle to get support, and having a common chat channel between firms and service providers would make communication and support more effective, especially in production.