Reducing Costs by 40% Using Amazon EC2 Spot Instances and Exostellar’s Infrastructure Optimizer with Arm
Learn how Arm in the semiconductor industry optimized costs and efficiency for its electronic-design-automation workloads using Spot Instances.
Key benefits
65%
of backend workloads’ job runtime is on Spot Instances40%
lower cost of running backend EDA workloadsOverview
As a pioneer in its industry, semiconductor and software design company Arm wanted to optimize its electronic-design-automation (EDA) workloads that are running on Amazon Web Services (AWS). With support from Exostellar, an AWS Partner, Arm significantly reduced costs, increased efficiency, and improved sustainability for its stateful, long-running EDA workloads.
About Arm
Arm offers a high-performing and power-efficient compute solution for the connected global population. To meet the worldwide demand, Arm delivers advanced solutions for technology companies to unleash the power of artificial intelligence.
Opportunity | Using Amazon EC2 Spot Instances for Long-Running EDA Workloads at a Lower Cost for Arm
Founded in 1990, Arm designs, develops, and licenses IP solutions for CPUs, GPUs, and neural processing units. Arm’s engineers use advanced EDA tools to design, verify, and analyze technology to produce high-performing, low-cost, and energy-efficient products. Arm is continually looking to optimize this process, which requires vast amounts of compute and storage resources.
Arm started running its frontend verification workloads on AWS in 2016 using services such as Amazon Elastic Cloud Compute (Amazon EC2), secure and resizable compute capacity for virtually any workload. It also used AWS Batch, a fully managed batch computing service that plans, schedules, and runs containerized batch machine learning (ML), simulation, and analytics workloads. Although most companies traditionally run EDA workloads on premises or using cluster schedulers in the cloud, Arm scaled to over 500,000 concurrent virtual CPUs for frontend verification workloads using AWS.
As these jobs are typically short running, Arm ran them fully on Amazon EC2 Spot Instances, which let organizations take advantage of unused Amazon EC2 capacity on AWS and are available at up to a 90 percent discount compared to On-Demand Pricing. This approach aligns with the AWS Well-Architected best practices for sustainability.
Because of its innovative use of the cloud for EDA workloads, Arm won the Best Use of High Performance Computing in the Cloud award by HPCwire in 2022. Arm chose AWS because of its comprehensive services and support. “It’s a no-brainer to embrace the breadth and depth of AWS technology for both general use and EDA workloads to achieve the right solution,” says Zhifeng Yun, senior principal engineer at Arm. “AWS has listened to our requirements and pushed features into its services quickly so that we can run large-scale workloads successfully on AWS.”
Arm’s backend EDA workloads presented a different challenge because the workloads were both compute intensive and stateful, and some job runtimes exceeded 1 week. If these jobs were interrupted, Arm needed to run them again, which was costly and time consuming, causing unplanned delays. Arm initially relied on Amazon EC2 On-Demand Instances, where organizations pay for compute capacity by the hour or second with no long-term commitments, because the extended runtime made it impractical to use single Spot Instances that can be reclaimed with short notice.
However, as compute requirements continued to grow, Arm looked for more cost-efficient solutions. “Our main goal was to drive down the cost of running the backend workloads while still using the cloud to achieve the flexibility we need for the business,” says Yun.
Solution | Reducing the Cost of Running Backend EDA Workloads by up to 40 Percent on AWS
In 2024, Arm decided to implement Exostellar’s Infrastructure Optimizer solution, which can seamlessly migrate stateful workloads between Spot Instances and On-Demand Instances so that the workloads run continually while optimizing cost. “Using Infrastructure Optimizer, the system takes over the authority for running a job and migrates between instances to efficiently use Spot Instances while maintaining integrity,” says Yun.
Within 3 months, Arm was running the solution in production. Using Infrastructure Optimizer, an engineer submits a job to the cluster, and Infrastructure Optimizer starts a controller node, which continually analyzes the Spot Instances market and determines the right instance type to use. “Our engagement with the Exostellar team was excellent,” says Yun. “Exostellar understood the solution and supported our use case quickly.”
Using Infrastructure Optimizer, Arm achieved its goal of reducing costs without compromising quality. Arm was already using Savings Plans, which save organizations up to 72 percent with a flexible pricing model, to keep costs low when using On-Demand Instances. The company could further reduce costs by using Spot Instances for backend EDA workloads for about 65 percent of the job runtime. “We can run the same job that we used to run with On-Demand Instances,” says Yun. “Using Spot Instances through Infrastructure Optimizer, we’ve reduced costs by about 40 percent.”
Arm could also implement Infrastructure Optimizer with its existing scheduler environments without having to change the workloads or job submission process. Infrastructure Optimizer provisions dedicated worker nodes for each job so that jobs can run more efficiently with no resource contention, which sometimes happens in shared-node deployments.
Outcome | Applying ML and New Technology to Innovatively Use AWS
Arm plans to continue enhancing its solution and staying at the forefront of innovation. The company plans to incorporate ML capabilities to keep jobs focused and reduce the number of redundant jobs. Arm is also working alongside both AWS and Exostellar to improve Infrastructure Optimizer so that it can run using AWS Graviton processors, which are custom-designed server processors developed by AWS to provide excellent price performance for cloud workloads running on Amazon EC2. AWS Graviton–based Amazon EC2 instances use up to 60 percent less energy than comparable EC2 instances for the same performance. With the scale of Arm’s business, these efficiencies add up to a large impact that furthers sustainability.
“Arm’s cloud journey shows that it’s possible to run EDA workloads fully on AWS,” says Yun. “Our end goal is to work alongside AWS to demonstrate a turnkey solution for the vertical market of EDA to run entirely on AWS.”
Arm’s cloud journey shows that it’s possible to run EDA workloads fully on AWS.
Zhifeng Yun
Senior Principal Engineer, ArmAWS Services Used
Get Started
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages