AWS Partner Network (APN) Blog
Hybrid cloud from data gravity to business agility with Cloudera on AWS
By: Attila Kanto, Senior Director, Engineering, R&D – Cloudera
By: Tushar Sharma, Senior Product Manager, R&D – Cloudera
By: Peter Darvasi, Principal Engineer, Public Cloud, R&D – Cloudera
By: Tamara Astakhova, Sr. Partner Solutions Architect – AWS
![]() |
| Cloudera |
![]() |
In financial services, the Digital Operational Resilience Act (DORA) and the EU Data Act have created an innovation gap. Multi-petabyte data lakes remain on premises because of compliance risks, while advanced AI capabilities like Amazon Nova, Amazon Bedrock AgentCore, and P5/P6 GPU instances are available only in the cloud. Financial institutions can’t apply state-of-the-art cloud AI to their data, leaving them unable to stop the newest fraud events in real-time.
In this post, we introduce hybrid elasticity, a zero-migration data access model that decouples data residency from compute elasticity, transforming your data center into a dynamic hybrid data hub with on-demand cloud scale.
The zero migration data access hybrid architecture
Cloudera and Amazon Web Services (AWS) collaborate to offer a zero-migration burst capability using accelerated computing. The concept is straightforward. Your steady-state workloads and data remain securely in your on-premises data center. Peak demand is offloaded to ephemeral cloud compute resources for generative AI inference, fraud detection, or stress testing.
Our unique value proposition is that this model allows compute resources to access datasets in-place within a workload context.
Zero-migration agility means you avoid the cost, compliance risk, and potential data sovereignty breach associated with moving petabytes of data or rewriting applications.
Data residency means your authoritative data never leaves the on-premises environment. Cloud instances access it securely through high-speed AWS Direct Connect links, process it in memory or temporary storage (NVMe-based storage), and write results back. By using this zero-copy approach, you can minimize compliance footprint and eliminate duplicate storage costs.
Governance parity keeps your security context and data asset lineage centralized on premises through Cloudera Shared Data Experience (SDX), which extends automatically to the cloud.
The business benefits are immediate, including reduced operational costs, enhanced data security, and improved agility. You achieve this without rewriting applications, managing a full migration, or maintaining data-replication pipelines.
How AWS and Cloudera make this possible
The collaboration between Cloudera and AWS combines Cloudera’s unified data platform with the elastic infrastructure of AWS to deliver cloud elasticity without sacrificing on-premises security and governance.
Cloudera’s unified runtime across on-premises and cloud deployments uses identical service versions on both. This provides workload portability by default, so workloads run in the cloud without costly refactoring.
Cloudera’s centralized security and governance is another significant business enabler. With the Cloudera platform, you don’t duplicate or rebuild your security policies. Shared Data Experience (SDX) extends your existing on-premises security and governance framework directly to your cloud compute jobs.
Consistent authentication and authorization means users accessing data from the cloud have the same permissions as they do when accessing data from on premises, authorized and audited by Apache Ranger.
Consistent lineage preserves data lineage from hybrid workloads in on-premises Apache Atlas right alongside your on-premises jobs.
Enhanced control means that you can apply stricter, context-aware policies, such as geo-location-based rules limiting data access from the cloud network or automatically masking sensitive columns only when a cloud-based application accesses them.
AWS provides the infrastructure with on-demand, elastic compute and networking capabilities, making this model economically viable.
By using elastic compute, you can instantly provision elastic compute resources (including cost-efficient AWS EC2 instances powered by AWS Graviton) for compute-heavy workloads like Apache Spark, Hive, or Impala, processing data in memory or on temporary high-speed storage (NVMe volumes on AWS) on top of Cloudera Data Hubs, which you decommission immediately when finished.
Secure connectivity comes through AWS Direct Connect, which provides high-bandwidth, low-latency, and secure network connectivity between the on-premises data and the AWS Cloud.
The following diagram shows how Cloudera’s cloud compute on AWS accesses on-premises data directly through Direct Connect. Through this architecture, financial institutions can keep sensitive data on premises for compliance while running real-time fraud detection and AI workloads in the cloud. Data teams can launch new analytics projects in hours instead of months without costly migration or replication.
Figure 1 – Cloudera and AWS zero-migration data access architecture
Together, Cloudera provides the unified security and governance layer, and AWS supports the elastic compute and network connectivity. This combination enables job offloading while maintaining consistent security and policy enforcement.
The new Cloudera Hybrid Data Hub provides on-demand cloud compute with secure access to datasets from associated on-premises clusters within a workload context.
Real-world application: Weekend fraud crisis
To understand the strategic impact of the zero-migration data access hybrid architecture, consider the following composite scenario inspired by challenges faced by Tier 1 banks. On Friday afternoon at a global financial services firm, a fraud detection specialist detects a sophisticated new fraud pattern. Someone is generating a huge number of synthetic identities. These fake personas, generated by AI, are highly convincing and bypass the bank’s existing rule-based fraud detection system. To address the threat, the data science team needs to retrain its massive fraud-detection model, combining high-velocity transaction data from the last 5 days with historical data from the last 10 years. However, the bank’s on-premises cluster is already running at 95% capacity processing standard end-of-week regulatory reports. There are no compute resources left on premises to perform the retraining process.
Traditionally, they have two options. Wait until Monday—when computing resources become available—to run the process, giving the unauthorized user an additional 60 hours to continue their activity. Or attempt to migrate 500 TB of sensitive transaction logs and personally identifiable information (PII) to a public cloud, a process that would take significant time and trigger a compliance audit.
Instead, by using Cloudera’s zero-migration data access capabilities on AWS, engineers can transform this bottleneck into business agility.
On-demand elasticity means they can spin up Amazon Elastic Compute Cloud (Amazon EC2) compute instances on AWS instantly to handle the workload without waiting for local hardware availability.
Sovereign access means these instances connect securely to the on-premises data lake to process sensitive data in-place without permanently moving regulated information off-site.
Extended on-premises governance uses the same Apache Ranger policies and Apache Atlas lineage from the home data center to maintain full auditability across environments.
Context-aware protection supports applying stricter policies, such as column-level masking, specifically when data is accessed by cloud-based applications.
Zero-waste efficiency means cloud resources are decommissioned the moment a job finishes, so the company pays only for the compute used while avoiding the cost of idle on-premises infrastructure.
The fraud detection model is retrained within 4 hours and new scenarios are created and deployed to the rule-based fraud detection system to support proper real-time alerts. This addresses the emerging fraud event. By reducing the complex task of retraining a fraud model from 2.5 days to a few hours. They have significantly accelerated their time-to-value, safeguarding both financial assets and brand reputation.
Conclusion
With Cloudera’s hybrid data platform on AWS, you finally have the flexibility to burst to the cloud on your own terms. You gain immediate business advantage from autoscaling cloud compute without rewriting applications, managing full migration costs and risks, or building new data pipelines. It creates a strategic bridge giving your critical workloads the power they need, when they need it, while keeping data and governance under your control.
Discover which of your workloads are ready for hybrid execution by reading our in-depth benchmark analysis on hybrid performance under bandwidth constraints. Learn more about Cloudera Hybrid Data Hubs and contact Cloudera today for a hybrid workload assessment.
CLoudera – AWS Partner Spotlight
Cloudera is an AWS Advanced Partner with AWS Data & Analytics and AI Software Competency that that provides a fast, easy, and secure platform to help customers use data to solve demanding business challenges.



