Batch ingestion pipelines were implemented using Spark on Cloudera Data Platform data engineering, while near real-time ingestion was handled through streaming components integrated with the platform.
Cloudera on AWS
ClouderaExternal reviews
External reviews are not included in the AWS star rating for the product.
Unified data platform has improved governed analytics and supports faster enterprise delivery
What is our primary use case?
What is most valuable?
Another benefit that we achieved with Cloudera Data Platform was true cloud scalability on AWS. The separation of storage and compute on Amazon S3 enabled elastic scaling of workloads based on demand, optimizing infrastructure cost and performance.
Another key benefit that we achieved with Cloudera Data Platform was strong data governance and security. With native integration with Apache Ranger and Apache Atlas, it ensures fine-grained access control, lineage, and metadata management critical in a regulated, multi-team environment. This was a very useful key benefit for us.
Cloudera Data Platform had a significant positive impact not only on the client but also on us as an IT consulting partner delivering and operating the solution. Cloudera Data Platform provided a stable and predictable enterprise platform on which we could design and deliver the solution with confidence. The maturity of the platform reduced architectural uncertainty and allowed us to focus on value-driven design rather than low-level infrastructure challenges. The strong governance and security capabilities built into Cloudera Data Platform had a direct positive impact on our engagement. Instead of implementing custom client-specific governance frameworks, we were able to rely on native components such as Ranger and Atlas. This significantly reduced customization effort, simplified compliance discussions with stakeholders, and increased trust in the platform from both IT and business teams. The platform supported a long-term partnership approach with the client. By implementing Cloudera Data Platform as a strategic data foundation rather than a point solution, we positioned ourselves as a trusted advisor on data architecture, governance, and advanced analytics, enabling follow-on initiatives and continuous evolution of the platform.
The adoption of Cloudera Data Platform delivered a 30-40% reduction in platform setup and onboarding time and a 25% faster delivery of data pipelines. Related to governance, there was a 30-40% reduction in governance-related custom development.
What needs improvement?
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
What was our ROI?
What other advice do I have?
From my experience, the cost model and licensing of Cloudera Data Platform purchased via AWS Marketplace has been overall positive and well-aligned with enterprise expectations.
Cloudera Data Platform proved to be a highly reliable and powerful solution capable of supporting complex enterprise use cases on AWS with strong governance and scalability. Cloudera Data Platform requires a certain level of organizational and technical maturity to unlock its full value. For organizations that meet this pre-requirement, Cloudera Data Platform represents an excellent and future-proof data platform choice. I gave this product a rating of 9 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Uses handwritten notes and voice files to perform text analytics and gain real-time insights
What is our primary use case?
My main use case for Cloudera Data Platform is dealing with large volumes of data and primarily handling unstructured data by combining structured and unstructured data on this platform.
I use Cloudera Data Platform for handling unstructured data primarily in a healthcare company where there are many research notes, which are handwritten notes. Using this platform, we have performed PDF extraction where we store PDF data and then extract the data by performing PDF extraction using this platform. That is one use case. The second use case is mainly dealing with voice files. We store the voice files, convert voice to text, and then perform text analytics on that. It is basically dealing with call center voice files.
How has it helped my organization?
Cloudera Data Platform has impacted my organization positively in many ways. I belong to the service industry, and many of my customers are using this platform. They are predominantly using Cloudera Data Platform mainly from the banking domain.
It has made things better for those banking customers by providing all of the above.
What is most valuable?
The best features Cloudera Data Platform offers are from the earlier version, and if you see the latest version, there is significant change. It is very much end-user friendly. There are many user interfaces that they have added. A single pane for administration is easy from a data engineering perspective. You can use drag and drop more in the UI features; they are providing good dashboards to understand the performance of your platform. Ready metrics are available. It is very easy administration from a data platform standpoint. There are many other areas such as data principles including lineage and data security, all of which are really coming out of the box of this platform.
The dashboards and drag-and-drop tools have helped my team because the metrics are already available. As an administrator of the platform, certain key metrics are already available as a dropdown. You can select and pick whichever you want, and based on that, you will be able to see memory utilization and disk utilization. Based on that, you can make a decision such as whether you need to do some performance tweaks or add more hardware to your clusters. Those sorts of insights and early alerts help you to do that. That is also another feature available within the platform. From the administration perspective, it is really helpful for the data administrator or a platform administrator.
What needs improvement?
Cloudera Data Platform can be improved in several areas. I recently attended their roadmap session. Whatever limitations they have identified involve moving data from on-premises to cloud as a single-pane view and better lineage. They have done some recent acquisitions as well to overcome their product limitations. They are on the right track by doing this analysis themselves, identifying what the weaknesses are, and then using mergers or acquisitions to overcome them.
I would like to add that, beyond the platform itself, they should provide more training to systems integrators so that they can have a more ready workforce to use Cloudera Data Platform.
For how long have I used the solution?
I have been using Cloudera Data Platform for almost ten years.
What do I think about the stability of the solution?
Cloudera Data Platform is pretty stable in my experience; there are not any downtime or reliability issues.
In large environments or with growing data needs, I have seen hundred-node clusters running fine, dealing with petabytes of data. I have not seen any issues. When we go for node addition or node rebalancing, there are sometimes issues usually dealt with. It is not a major issue per se; it is more about how you deal with that particular situation.
What do I think about the scalability of the solution?
I manage scalability with Cloudera Data Platform, and the current features available are better now. They have the cloud burst feature available where if the on-premises capacity is not sufficient at a point in time, you can run that Spark job on the cloud itself. The cloud burst feature which they have recently added allows better scalability from a perspective to utilize a better ecosystem provider as well.
How are customer service and support?
My experience with customer support for Cloudera Data Platform is good. I have not majorly dealt with them, but whatever I have heard from my various team members indicates that customer support is good. They provide good pre-sale support and overall handholding to identify the right use case and technologies. Overall, they provide good support from the company.
Customer support is responsive and knowledgeable, but since I have not actually dealt with them extensively, I will not be able to provide a scale on one to ten.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
I did not use a different solution before Cloudera Data Platform; we used to use only structured databases for our data warehousing solution. It is a move from only structured data or on-premises appliance-based solutions to Cloudera Data Platform.
What was our ROI?
I have seen a return on investment. There are licensing costs that have been saved when we moved some of the data platforms, decommissioned them, and moved on to this platform. Time has been saved by implementing the right data quality solution so that the team used to spend more time correcting data. The right data quality solution saves time. It helps the time usually spent by business analysts who go to search in Excel to understand data definitions. Now that is something easily available as a part of the data catalog. Such things usually save license cost and money, as the time which business analysts are spending to get more information about the data dictionary is saved as part of the data catalog.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing varies based on your relationship and the size of the cluster. So far, I would say that it is competitive pricing that we have received.
Which other solutions did I evaluate?
Before choosing Cloudera Data Platform, I did evaluate other options. Earlier, it was Cloudera, Hortonworks, and MapR, but nowadays, with Hortonworks and Cloudera merging, it is predominantly Cloudera Data Platform for big data on-premises.
What other advice do I have?
My advice for others looking into using Cloudera Data Platform is to consider the fact that it has been around for more than a decade, making it a very stable solution. If you want to go with the on-premises solution, that is the way you should go. If you are looking for a solution to deal with large volume, variety of data, and velocity of data including real-time data processing, that is something you should select with this platform. Based on the industry, there are various use cases available in their use case manual where particular use cases are more suitable for the customer's industry; they can also help you select the right services or the right product stack from Cloudera. It is all good, and you should leverage their professional services to get a better and more suitable product architecture. I would rate this product an eight out of ten.
Have managed data services efficiently while ensuring fast performance and reliability
What is our primary use case?
My main use case for Cloudera Data Platform is that I am a certified administrator. I use Cloudera Data Platform in my daily work by managing it as a whole in a Telco company. I regularly handle tasks by managing Cloudera Data Platform and being responsible for its services, which are currently up and running, and managing daily administrative tasks.
What is most valuable?
In my experience, the best features Cloudera Data Platform offers are that all the services provided are excellent.
A particular service that stands out to me in Cloudera Data Platform is the performance, which runs very fast. I also find very good features in data security, data reliability, and data lineage.
Cloudera Data Platform's Manager UI and other UIs are very useful and helpful for managing operations.
Cloudera Data Platform has positively impacted my organization as it comes in very handy while performing on big data and handling large files.
What needs improvement?
For how long have I used the solution?
I have been using Cloudera Data Platform for approximately five years.
What do I think about the stability of the solution?
Cloudera Data Platform is very stable in my experience.
What do I think about the scalability of the solution?
Scalability of Cloudera Data Platform is very good and scalable in public cloud. However, it is not as scalable on on-premises private cloud, which adds considerable cost.
How are customer service and support?
I have interacted with the customer support team extensively, and they are very useful and helpful in resolving issues. I would rate the customer support of Cloudera Data Platform ten out of ten.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Before choosing Cloudera Data Platform, my organization was using Teradata, and we did not evaluate other options.
Manages large-scale data ingestion and transformation while improving job performance in hybrid environments
What is our primary use case?
My main use case for Cloudera Data Platform is measuring HDFS and the SQL queries in Impala to troubleshoot some error in YARN applications based on Spark, and control the reporting data between Informatica and Cloudera for transport data between the DB Oracle, Mongo DB to CDP in Impala, between HDFS.
For measuring HDFS, I use Cloudera Data Platform, specifically Cloudera Manager, to analyze small files in HDFS to reduce our number for the duration of jobs that read this file and the partition date.
I mainly use Cloudera Data Platform as part of a large-scale data processing and analytics pipeline in a hybrid cloud environment, primarily on Azure, which involves managing the YARN cluster, monitoring workloads, troubleshooting performance issues, and integrating data ingestion and transformation processes from various enterprise systems. We leverage CDP for its scalability, security, and strong integration with Looker, Informatica, Hive, and Spark.
How has it helped my organization?
Cloudera Data Platform (CDP) has helped our organization improve data management consistency and scalability across multiple environments. The unified control plane and centralized governance have reduced operational overhead and made it easier to manage workloads between on-premise and cloud environments.
We’ve also seen clear benefits in resource optimization — auto-scaling and workload isolation features have allowed better use of infrastructure, while tools like Cloudera Manager and Workload XM improved monitoring and troubleshooting efficiency.
That said, there’s still room for improvement in integration speed and UI responsiveness, especially when managing large clusters or hybrid deployments.
What is most valuable?
In my opinion, the best features of Cloudera Data Platform are its strong integration, scalability, and unified management capabilities, while what stands out the most in Cloudera Manager are SDX, which provide centralized control for governance, security, and data lineage across multiple sources, simplifying operations significantly. Finally, the YARN and Spark resource management in CDP is robust and efficient, which is essential for handling heavy data transformation workloads at scale.
Cloudera Data Platform has positively impacted my organization by providing a unique storage point for a lot of data from various databases in HDFS. With Hive or Impala, it is possible to read and integrate data among all the other platforms, making it a great platform for us to have the data and create integrations.
What needs improvement?
I don't have any challenges or areas I think could use enhancement.
For how long have I used the solution?
I have been using Cloudera Data Platform for one year, and I have experience with the last version of Cloudera Data Platform for four years.
What was our ROI?
A specific example of the positive impact of Cloudera Data Platform is the clearly saved time and improved performance, which is the main result of it. The costs are increasing at the start of the project, but after securing, they are reduced, and the most significant benefit is the availability of data from governance and management.
What other advice do I have?
For the centralized governance of Spark management, we use a dashboard on SAS or Power BI to integrate the data that is stored in HDFS.
My advice to others looking into using Cloudera Data Platform is that it's a great product to save time and reduce costs in the long term.
On a scale of one to ten, I rate Cloudera Data Platform a nine.
Good for secure containerization, and governance capabilities
What is our primary use case?
We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes.
Currently, we use it for healthcare domain.
What is most valuable?
Distributed computing, secure containerization, and governance capabilities are the most valuable features.
What needs improvement?
Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solutions like Gen2, Gen1, Bigtable, BigQuery, Lightstore, S3 buckets, etc., which pose a significant competition to HDP.
For how long have I used the solution?
I have experience with this product. The short form is HDP 2.7. I have been using it since 2011.
It was on-premises and hybrid for the first three months, then we migrated it to AWS and Azure.
What do I think about the stability of the solution?
In terms of storing data in different formats, it's been somewhat unstable. But when compared to Azure Gen2 and its support and features, it's much more advanced. The suitability depends on specific use cases, but overall, HDP seems more mature than it was in the past.
What do I think about the scalability of the solution?
From my experience with both HDP and CDH, they are both scalable. Currently, most people in my company have shifted to Azure, so they are using Gen2 primarily and discarding Gen1.
How are customer service and support?
I have frequently contacted technical support for both Cloudera and Hortonworks.
We have an IT system to raise issues against their team. Issues usually get attended by someone at an L1, L2, or L3 support level. They connect with us directly.
Which solution did I use previously and why did I switch?
Previously, we used Cloudera Data Platform (CDP), which turned out to be a cloud-based Azure infrastructure, and implemented metadata solutions like Hive and others.
How was the initial setup?
The setup was very difficult on non-cloud platforms. We had to implement a version-based approach. However, it became simpler with the use of Docker. We used to do it HDP sandboxes and VM boxes and then created clusters in the ancient days. Now, on cloud platforms, it's much easier, just a matter of a few clicks. That's another approach we can take.
What's my experience with pricing, setup cost, and licensing?
I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high.
It was particularly expensive for Cloudera and Hortonworks Data Platform. Both options were quite resource-intensive.
So, seven, or even nine or ten years ago, it was quite expensive.
What other advice do I have?
I recommend a mature decision-making model. Assess your specific needs and use cases. If HDP suits your requirements, use it. Otherwise, there are many advanced options available. Review and choose the best one for your use case.
Overall, I would rate the solution a nine out of ten.
I simply love this technology when it comes to new developments. And I've been working with it for the past twelve to thirteen years. However, with the emergence of new technologies, there might be a chance that I would reduce one point because there's room for improvement.
Helps with data management and has good scalability
What is our primary use case?
We use Hortonworks Data Platform for data management, significant data ingestion, and analytics.
What needs improvement?
Hortonworks Data Platform has a limited user community. I haven't seen much discussion about user experiences. More information could be there to simplify the process of running the product.
For how long have I used the solution?
We have been using Hortonworks Data Platform for a couple of months.
What do I think about the stability of the solution?
I rate the product's stability an eight out of ten.
What do I think about the scalability of the solution?
We have five Hortonworks Data Platform users in our organization. It is a scalable platform.
How was the initial setup?
The initial setup could be more straightforward. It would help if you are technically inclined to follow the necessary steps. There could be easy ways to set it up. It takes 45 minutes to complete and requires a team of five people to execute the process.
What about the implementation team?
We implement the product in-house.
What's my experience with pricing, setup cost, and licensing?
Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results.
What other advice do I have?
I recommend Hortonworks Data Platform to others and rate it an eight out of ten.
Upgrades and patching are addressed by the solution, and they offer a sandbox for testing
What is our primary use case?
There are a lot of use cases for the Hortonworks Data Platform. We use it alongside GPFS, so most of the information we use for operational analytics is primarily on the Hortonworks Data Platform.
What is most valuable?
The upgrades and patches must come from Hortonworks. Therefore, if we encounter any problems, they will be responsible for addressing them. This is one of the instances where we have to rely on them for all the upgrades.
What needs improvement?
The cost of the solution is high and there is room for improvement.
For how long have I used the solution?
I have been using the Hortonworks Data Platform for two years.
What do I think about the scalability of the solution?
Hortonworks Data Platform is scalable, but it lacks the capability for horizontal scaling. Therefore, we need to add more servers to increase its capacity.
How was the initial setup?
I am responsible for setting up the infrastructure, but I don't handle the engineering work.
What other advice do I have?
I would rate Hortonworks Data Platform an eight out of ten. The solution delivers on its promises, and Hortonworks provides a sandbox for testing before making a purchase.
The maintenance requires a lot of people, including the DRE and IRE teams.
It is not practical for most organizations that lack large amounts of resources to maintain their own data platform. The Hortonworks Data Platform makes it easier for such organizations.
Excelent
The most comprehensive big data stack
Cloudera Distribution for Hadoop,the open source Hadoop project.
Commercial support: Cloudera offers commercial support for Cloudera Distribution for Hadoop, which can be helpful for businesses that need help with installation, configuration, and troubleshooting.
Community: Cloudera has a large and active community of users and developers, which can be helpful for businesses that need help with using or extending Cloudera Distribution for Hadoop.
Lock-in: Cloudera Distribution for Hadoop is a proprietary product, so businesses that use it may be locked into using Cloudera for support and maintenance.
Complexity: Cloudera Distribution for Hadoop is a complex product, so businesses that use it may need to have a team of experienced engineers to manage it.