How a Global Technology Firm Realized Up to 25% Cost Savings on Amazon EMR with Pepperdata
By Ruchi Garg, Sr. Product Manager – Pepperdata
By Heidi Carson, Product Marketing Manager – Pepperdata
By Shashi Raina, Sr. Partner Solution Architect – AWS
Over the past few years, the advantages of migrating big data and other compute-intensive workloads to the cloud have become abundantly clear. Enterprises can enjoy near-infinite scalability on demand, agility in deploying new applications, and enhanced security and analytics—all combined with pay-as-you-go pricing.
Almost all of the factors that make the cloud computing so appealing, however, can sometimes lead to cost overruns. A survey by Pepperdata indicates one-third of organizations will exceed their cloud budget by up to 40%, making it a widespread problem that affects even the most sophisticated IT teams.
One such team works for a subdivision of a multinational technology company currently in the process of migrating its massive Apache Spark-based on-premises data center to Amazon EMR for greater cost savings and capabilities.
Using Pepperdata, an AWS Partner and AWS Marketplace Seller that eliminates waste and delivers cost savings for Amazon EMR and Amazon Elastic Kubernetes Service (Amazon EKS), the multinational organization has achieved almost a 25% reduction in cost on top of its post-migration EMR cost savings. Given the size and scale of this customer’s technical footprint, that translates into significant savings every year.
In this post, you will learn how a global technology firm that migrated its massive Apache Hadoop-based on-premises data center to Amazon EMR reduced cost by running Pepperdata Capacity Optimizer software for real-time, autonomous cost optimization. We’ll detail how Pepperdata works with the EMR scheduler to enable immediate and continuous savings with no application changes, no recommendations, and no manual tuning.
AWS: An Obvious Migration Partner
In late 2022, this subdivision began the process of migrating some of its workloads to Amazon EMR. Its Hadoop footprint had been growing at a rate in excess of 20% year over year for the last few years, and as a result of its size and growth the subdivision was facing scalability and reliability challenges of its data center. For capacity expansion, the team needed another strategy beyond adding more hardware to their on-premises data center.
Amazon Web Services (AWS) was a natural and obvious choice as the migration partner for this customer. The company was confident it would derive enormous value from the benefits of AWS managed services and from capabilities of Amazon EMR, which can run Apache Hadoop along with other frameworks.
One additional factor was the extreme volatility of the subdivision’s workloads in terms of cluster daily usage. The daily memory requirements of applications ranged from ~10TB to ~100TB, an almost tenfold increase. With this level of volatility, the inherent elasticity and pay-as-you-go pricing model of AWS were ideally suited to this customer’s requirements.
Despite their significant scale, fast-paced growth, and high volatility, Amazon EMR provided the flexibility and scalability required to maintain and even improve the team’s service-level agreements (SLAs).
Need for Additional Optimization
As the subdivision migrated its Apache Hadoop workloads from its on-premises data center to the new EMR environment within AWS, the team was looking for additional opportunities to optimize cloud costs given the continually expanding nature of their workloads. They also had a priority to increase operational efficiency throughout the company wherever possible.
At the same time, the operations team needed to maintain a high level of performance and reliability and adhere to expected SLAs. Pepperdata, a cloud cost optimization company, was tapped to deliver this.
Pepperdata had already been working with this customer for many years, helping to optimize their on-premises Hadoop infrastructure for greater efficiency and throughput. The customer had thus developed a strong level of trust in Pepperdata’s ability to optimize the utilization of the company’s valuable resources, whether on-premises or in the cloud.
Just as it does with on-premises clusters, Pepperdata ensures cloud resources are never wasted which helps to improve the efficiency and keep cloud costs in check.
Pepperdata: A Cloud Cost Optimization Company
Pepperdata autonomously and continuously optimized the cost of the customer’s cloud deployments in two unique ways:
Increased Utilization and Reduced Waste
Pepperdata Capacity Optimizer rapidly identified the existing nodes where more jobs could be completed, and then enabled the YARN schedulers to launch more jobs on these nodes to put this unused capacity to use.
CPU and memory were thus automatically optimized by increasing the utilization, enabling more applications to be launched without requiring new nodes or requiring any manual application changes or upfront modeling of usage.
The purpose of autoscaling in the cloud is to enable the automatic addition and removal of instances to match the volatile demands of Spark or big data workloads. However, if not properly configured cloud autoscalers can add more instances even when existing instances are only partially utilized because applications typically over-allocate resources.
If applications use just a fraction of their allocations, the instances will be underutilized, but improperly configured autoscalers may not be aware of this and will add more instances to accommodate additional workloads. Pepperdata dynamically tunes autoscalers based on resource utilization instead of resource allocation, thus achieving further optimization.
Figure 1 – Pepperdata optimizes clusters in place and requires no application changes.
Using Pepperdata for Capacity Optimization
Pepperdata was installed at this customer site via a simple bootstrap script in under an hour. When Pepperdata Capacity Optimizer was deployed in their new Amazon EMR deployment, the customer immediately began to realize the product’s cost-cutting benefits, including improved node efficiency.
Pepperdata Capacity Optimizer improved node efficiency because it works in real-time to enable the YARN scheduler to launch tasks based on actual hardware utilization rather than relying on allocations, which by design contain waste in the form of overhead.
When a cloud scheduler examines a busy cloud cluster, it often appears to the scheduler that the instances are mostly or fully utilized. This is because most applications are overprovisioned, and overprovisioning is primarily caused by two factors:
- Developers tend to ask for more resources than their applications really need, just to be safe. If developers are asked to predict how many resources are needed for a workload to perform, those predictions could end up being highly inaccurate. It’s not uncommon for allocations to be set quite high, which can lead to a high degree of wasted resources.
- Cloud autoscalers tend to overprovision if not governed constantly. They can be slow to ramp up or ramp down, which enables resources to be wasted, and thus money wasted. So, what happens when new applications come along in such a busy cluster? Either the autoscaler spins up new instances at extra cost, or the pending jobs are dropped into a queue where they wait idly for resources to free up. Both of these options waste precious resources.
The Scheduler “Whisperer”
When Pepperdata Capacity Optimizer is installed on a cluster, it uses machine learning to intelligently discern the actual utilization levels of each node in each cluster. It communicates directly to the YARN scheduler that more resources are available, enabling additional jobs or workloads to be added to the existing resources without adding new nodes or clusters.
Empowered with this new information from Pepperdata, the scheduler now “understands” that instances are only partially utilized, and is able to add more pending applications to existing instances without having to spin up new ones.
Capacity Optimizer also uses ML to add more workloads and efficiently implement autoscaling only when instances are truly fully utilized. As a result, it intelligently enables customers to do more work with existing resources, without having to spin up new instances or divert incoming applications to a pending queue.
Operating in this fashion, Capacity Optimizer enabled the customer’s utilization to track automatically its ever-changing workloads in real-time, thereby minimizing waste and reducing their overall cost.
Capacity Optimizer allowed YARN to effectively reclaim unused allocations and increase hardware utilization by roughly 40%. Pepperdata did this safely, always maintaining the node in a sweet spot of optimal utilization, which cannot be done manually. This peak allocation represented a 213% efficiency compared to the standard ability of native YARN scheduling based on allocations alone.
In addition, Pepperdata Capacity Optimizer enabled an average memory uplift of 43% during busy times, leading to 46% more running containers than without Pepperdata Capacity Optimizer.
Improved Demand-Based Autoscaling
As described above, Pepperdata Capacity Optimizer’s intelligent automation also ensured existing nodes in the cluster were fully utilized, communicating to the YARN scheduler that more resources were actually available than the scheduler was aware of, before autoscaling added additional nodes.
For this customer, even when two very large new jobs were added to the cluster, the node count remained flat. Without Capacity Optimizer, the node count would have been increased in response to the YARN allocation request despite low utilization at the hardware layer.
Capacity Optimizer’s ability to extract increased performance from existing nodes without requiring the addition of new nodes saved the customer significant resources by avoiding unnecessary scale up operations.
Results: Nearly 25% Cost Savings on Top of Amazon EMR
Pepperdata Capacity Optimizer was activated over a two-week period, and the rollout cadence was chosen to be fairly conservative so as not to disrupt the massive scale of the customer’s production workloads.
Along the way, the return on investment (ROI) was calculated in a two-step fashion. The first set of gains was achieved by implementing Capacity Optimizer alone. A second set of gains was achieved by enabling Pepperdata’s Autoscaling Optimization feature. At the end of two weeks, once all of the workloads had been migrated, and once both optimization steps had been fully implemented, the customer achieved close to 25% cost savings from Pepperdata on top of their Amazon EMR cost savings.
Based on the successes achieved in this two-week deployment, the subdivision is currently planning on migrating additional on-premises workloads to Amazon EMR. The team will continue to deploy Pepperdata on their Amazon EMR workloads to achieve the ultimate in cloud cost optimization.
Whether or not you’re a global enterprise like the customer described in this case study, Pepperdata can help achieve similar results on your Amazon EMR cluster. Pepperdata works with a variety of customers across the Fortune 500, including those in financial services, retail, and healthcare industries, as well as mid-size and startup companies. To learn more about Pepperdata’s capabilities for Amazon EMR, download the Pepperdata for Amazon EMR datasheet.
Pepperdata empowers Amazon EMR and Amazon EKS customers to autonomously and continuously quantify and eliminate waste in your applications in real-time, once and for all. Sign up for a free two-day Customized Waste Assessment that will provide a detailed report of total estimated waste in terms of memory hours, core hours, and instance hours, top 10 most wasteful queues, and estimated savings from running Pepperdata Capacity Optimizer in your environment.
For more information, contact Pepperdata at email@example.com.
Pepperdata – AWS Partner Spotlight
Pepperdata is an AWS Partner that eliminates waste and delivers cost savings for Amazon EMR and Amazon EKS with no code changes or manual tuning.