Migration & Modernization

Scaling Cloudera’s development environment: Leveraging Amazon EKS, Karpenter, Bottlerocket, and Cilium for hybrid cloud

This post is co-written with Shreelola Hegde,Sriharsha Devineni and Lee Watterworth from Cloudera.

Cloudera is a global leader in enterprise data management, analytics, and AI. The Cloudera platform enables organizations to manage, process, and analyze massive datasets, helping businesses across industries like finance, healthcare, manufacturing, and telecommunications accelerate AI/ML adoption and unlock real-time insights.

A key element driving the success of the platform is the development environment, where developers can build and test new features for release. The environment, built on Kubernetes, faced multiple challenges on-premises especially in the scaling and utilization of resources.

This post covers how Cloudera modernized its development operations by adopting a hybrid cloud approach built on Amazon Elastic Kubernetes Service (Amazon EKS). This strategy balances the existing capacity of on-premises environments with the elasticity of the cloud while leveraging enhancements such as Karpenter, Bottlerocket, and Cilium to optimize scaling, security, and cost efficiency.

Operational Challenges faced On-Premises

  • The development environment faced challenges running on-premises:
  • The environment was required to scale up and down frequently but was constrained by fixed capacity.
  • Build and test processes for services like Apache Spark, Hive, and HBase required containers as large as 64 GB RAM and 32 vCPUs, which quickly exceeded available resources.
  • Pull requests during intensive coding sprints surged as high as 300% leading to 45-minute build time increases, creating a bottleneck in the CI/CD pipeline that significantly slowed development velocity. This, in turn, delayed feature delivery, and risked release schedules while increasing infrastructure costs and developer idle time.
  • The application design also involved retrieving and saving artifacts and datasets from Amazon S3, which introduced latency for on-premises agents, further extending build and test cycles.
  • These constraints created bottlenecks in the development pipeline and highlighted the need for elasticity beyond on-premises clusters.

Solution Overview

Cloudera addressed these challenges by adopting a hybrid cloud model where predictable workloads remained on-premises and dynamic workloads moved to AWS. The architecture brought together the elasticity of AWS with Cloudera’s established on-premises infrastructure, delivering seamless scaling, reduced latency, and optimized costs.

Comprehensive AWS EKS architecture showcasing a production-grade Kubernetes deployment. The control plane features EKS API Server, Scheduler, and ETCD for cluster management. Karpenter handles intelligent node autoscaling using Bottlerocket as the container-optimized OS. Networking is managed by Cilium CNI, providing advanced network policies and security features. The infrastructure is organized into three strategic instance groups: Flex Instances Group: Using m7i-flex variants for dynamic workloads Graviton Instances Group: Leveraging ARM-based processors for cost-effective performance Core Services Group: Dedicated to system services and essential Kubernetes components This architecture emphasizes scalability, security, and resource optimization while maintaining operational efficiency.

High-level architecture of the development environment in AWS

The modernized Kubernetes architecture provided the following benefits:

  • Elasticity with Amazon EKS and Karpenter: Cloudera, using Karpenter, was able to scale its workloads from a handful of nodes to thousands within minutes and also contract when demand dropped. This ensured efficient scaling during surges while eliminating idle resource waste during freezes. This also enabled multiple pull requests to run in parallel without waiting for capacity, giving developers faster turnaround times and improving release velocity. Intelligent provisioning ensured that compute instances were always aligned with workload requirements, which improved utilization rates and reduced costs by up to 40%.
  • Handling large containers with Karpenter: Build and test jobs requiring massive containers were matched instantly with optimized compute through Karpenter’s intelligent node provisioning. This real-time elasticity ensured no delays or resource contention.
  • Strengthening security with Bottlerocket: The linux-based OS further enhanced the scaling process by delivering a container-optimized environment with minimal operating system overhead. Its immutable filesystem strengthened security by preventing unauthorized changes, while atomic updates simplified system patching and reduced maintenance downtime. This change reduced the development environment’s attack surface by 60%, streamlined patching through atomic updates, and improved compute efficiency by 35%.
  • Reducing build delays with Bottlerocket and Amazon Elastic Block Store (Amazon EBS) (Amazon EBS) snapshots: Pod launch times fell from 30 minutes to seconds using Bottlerocket and Amazon EBS snapshots to pre-cache large images. This improvement gave developers the ability to start new builds almost instantly, transforming productivity. Pull request spikes no longer created bottlenecks.
  • Scaling network with Cilium: Networking was modernized with Cilium, which provided identity-based security, advanced pod-level observability, and eBPF-driven networking. By introducing flexible IP address management, Cilium allowed Cloudera to scale beyond 10,000 workloads without encountering IP exhaustion issues, all while offering clear visibility into pod-level networking.
  • Eliminating idle resource waste with AWS Graviton and Flex instances: Graviton and Flex instances played an important role in cost and performance optimization. Graviton delivered strong price-performance benefits for ARM-based workloads, while Flex instances improved efficiency for x64 compilation tasks. Together, these compute options reduced operational costs by nearly a third and infrastructure costs by up to 40% ,ensuring Cloudera balanced performance and cost across diverse workload needs.
  • Resolving S3 latency with cloud-native integration: Running builds in Amazon EKS, dropped latency to Amazon S3 to milliseconds, accelerating artifact retrieval. This had the additional benefit of lowering network transfer costs by 30%.

This holistic solution not only addressed each bottleneck but also created a foundation that scales elastically, operates securely, and optimizes costs across hybrid environments.

Business Outcomes

The adoption of this hybrid Kubernetes environment transformed Cloudera’s development operations. Build and test cycle times improved by 50%, enabling faster delivery of new features and improvements. The ability to scale from 10 to more than 1,000 nodes in minutes gave developers reliable access to the resources they needed, eliminating bottlenecks during high demand. By optimizing data transfers with Amazon S3, network costs were reduced by 30% and latency was cut to milliseconds. Intelligent scaling and workload-aligned compute selection lowered infrastructure costs by 40%. Bottlerocket reduced the attack surface by 60% and improved compute efficiency by 35%.

These advances not only strengthened security but also delivered freed engineers from infrastructure management and allowing them to focus on core development.

Conclusion

Cloudera’s successful implementation showcases the transformative power of AWS’s container stack – Amazon EKS, Karpenter, and Bottlerocket. The modernized Kubernetes environment resulted in seamless scaling, enhanced security, and optimized cost management while delivering peak performance for dynamic workloads. Cloudera’s journey proves how the integration of purpose-built AWS solutions can dramatically improve infrastructure management, reduce operational overhead, and accelerate developer productivity.

Through automated node provisioning, intelligent workload placement, and streamlined operations, Cloudera demonstrates how organizations can achieve efficiency in container environments. Following Cloudera’s proven architecture, enterprises can build a robust, scalable, and cost-effective Kubernetes environment that meets today’s demanding development needs while preparing for future growth. Speak with your AWS account team to take the next steps to building a modernized Amazon EKS environment.

About the Authors

Shreelola Hegde

Shreelola Hegde

Shreelola Hegde is a Principal Engineer I at Cloudera with a decade of experience architecting, building, and securing enterprise products, and creating tools to improve productivity across cloud, on-premises, and Kubernetes environments.

Sriharsha Devineni

Sriharsha Devineni

Sriharsha Devineni is a Principal Engineer II at Cloudera leading Platform Engineering and Developer Experience, specializing in Kubernetes, enterprise tooling, and security, delivering scalable CI/CD workflows and secure, enterprise-grade tools that enhance developer productivity and accelerate delivery

Lee Watterworth

Lee Watterworth

Lee Watterworth is a Principal Architect at Cloudera, leading networking strategy and architecture for complex enterprise data platforms. He drives the design and implementation of large-scale networking solutions spanning data centers, cloud, and hybrid environments — covering VPCs, routing, peering, and secure connectivity at scale.