AWS for Industries
Predict the cost of Electronic Design Automation on AWS using simulation
Introduction
Designing semiconductor requires High Performance Computing (HPC) to run Electronic Design Automation (EDA) tools. These workloads vary over time in both the amount and type of compute resources required. This makes it an ideal workload for the elasticity of the cloud. Customers choose optimal instance types to optimize each tools runtime. This reduces time-to-results and improves engineering productivity. By also leveraging multiple purchase options customer reduce their costs. However, since each customer’s workload is unique, customers need an HPC-specific tool to better simulate their costs on Amazon Web Services (AWS). In this blog, we will present an open-source AWS solution that simulates how past jobs ran on AWS to help customers estimate the most cost-effective way of running those jobs.
EDA workload challenges
EDA tools vary in their compute and storage requirements. While front-end tools benefit from faster CPUs, backend workloads benefit from memory-intensive instances, as many instances as the EDA license allows. With multiple projects running concurrently at various stages, sizing the workload in the cloud is a common challenge.
Unlike many HPC workloads, EDA is characterized by expensive licensing costs, so companies experience both under and overutilization of their HPC. When HPC resources are idle, capital expenditure (CAPEX) that was already invested is a waste. When the HPC is over utilized, jobs start queuing, expensive EDA licenses could be idle, and engineers could be less productive (see figure 1). All these contribute to project delays, increased costs, and increased risk of missing critical tape-out dates.
Past workloads as a predictor of future workloads
As EDA workloads grow with increased functionality and tests, the past workloads of a company are still the best predictor of their future workloads. The newly announced HPC Cost Simulator uses this property to not only help customers plan and visualize their costs, but also provides the opportunity for smarter purchasing options like Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances and Savings Plans. These let customers take advantage of unused Amazon EC2 capacity in AWS cloud while also saving on costs.
A key component in cost saving comes from shutting down idle resources, something not possible when purchasing hardware through CAPEX; customers realize a 20–30 % cost reduction on AWS by avoiding this expense.
Schedulers like IBM Spectrum LSF, Altair Accelerator, and Slurm are commonly used in EDA environments, and HPC Cost Simulator integrates with them natively. However, HPC Cost Simulator supports other schedulers as well (see the documentation). These schedulers also offer integration with AWS. This helps customers use the elasticity of Amazon EC2s to grow and shrink the HPC clusters as needed. IBM LSF offers Resource Connector, Altair Accelerator offers RapidScale, and Slurm has an open-source integration with AWS on GitHub.
Data privacy
When designing HPC Cost Simulator, we spoke to customers who were concerned about the exposure of scheduler logs because the logs include personal data (user names), internal project names, and also, indications when the company is scaling its HPC jobs, which may suggest an upcoming tapeout. To help customers analyze logs without having to undergo a complicated security assessment, we designed HPC Cost Simulator to run locally (on-premises) where the logs are. The data is processed and anonymized, and only two clear-text files are generated for sharing with AWS, making them easy to review. All sensitive data is removed (including timestamps), and the output only lists a series of hourly consumption estimates, which your AWS account team will then translate to a commercial offer.
Running the analysis
To simulate the cost of EDA workloads, the tool requires two inputs: accounting records from the schedulers and access to updated Amazon EC2 instance prices from the AWS API. If internet access is blocked in the environment your scheduler is in, you can download an offline copy of the Amazon EC2 prices using the get_ec2_instance_info.py
script.
Prerequisites (all schedulers)
For all schedulers, you must first clone the Git repo to a local folder using git clone https://github.com/aws-samples/hpc-cost-simulator
, and then run source setup.sh
to install the required packages and set up the python virtual environment. If packages need to be installed, then you will need sudo access.
The analysis happens in two stages: Step 1 takes the scheduler logs and converts them to a common format. Step 2 uses the common format and performs the final analysis.
Step 1
IBM LSF
The tool analyzes the lsb.acct
files. Copy them to a subfolder under the tool’s location (in this example, we used the input
folder), and then run the tool:
./LSFLogParser.py --logfile-dir input --output-csv jobs.csv
Altair Accelerator
The tool analyzes the accounting database. To extract the fields, first source vovrc.sh
script, located under your Altair install directory under the common/etc
folder). For example:
source /tools/altr/2019.01u7/common/etc/vovrc.sh
Then collect the data from the database into a CSV file:
nc cmd vovsql_query -e "select id, submittime, starttime, endtime, exitstatus, maxram, maxvm, cputime, susptime from jobs" > /tmp/input.csv
Now you can run the tool to analyze the data in two steps:
./AcceleratorLogParser.py --sql-input-file /tmp/input.csv --output-csv jobs.csv
Slurm
For Slurm, the data is taken from the accounting database. Though you can provide the tool a path for your Slurm binaries, we recommend exporting the data in advance to a CSV file. This allows multiple runs of the tool without having to call the Slurm database each time.
Example data extraction from Slurm, for all users and projects, and without limiting for a specific time period:
sacct --allusers --starttime 1970-01-01 --parsable2 --noheader --noheader --format State,JobID,ReqCPUS,ReqMem,ReqNodes,Constraints,Submit,Eligible,Start,Elapsed,Suspended,End,ExitCode,DerivedExitCode,AllocNodes,NCPUS,MaxDiskRead,MaxDiskWrite,MaxPages,MaxRSS,MaxVMSize,CPUTime,UserCPU,SystemCPU,TotalCPU > Slurm.txt
Now run the SlurmLogParser.py with this file as the input:
./SlurmLogParser.py --sacct-input-file Slurm.txt --output-csv jobs.csv
Other schedulers
Though other schedulers—for example, SGE and PBS Pro—do not have direct integration in this release, the tool supports input from a preformatted CSV file to accommodate them too.
First, create a CSV file with the following fields—for example, MyInput.csv as instructed in documentation.
Then run CSVLogParser.py:
./CSVLogParser.py –input-csv MyInput.csv –output-csv MyInput.csv
Step 2
Now that the scheduler-specific format has been converted a common format, we can run the final step: simulating the cost on AWS. To run the simulation based on the jobs.csv file that you created in step 1, run
./JobAnalyzer.py csv --input-csv jobs.csv
Note: you can run multiple simulations using the same jobs.csv
file—for example, add the --starttime
or the --endtime
to limit the analysis to a subset of time.
The result
The tool’s output is identical for all schedulers and has three fixed fields and a few variable fields:
- Relative hour (starting from 1)—by removing the absolute timestamp, we simplify sharing it externally
- Hourly Amazon EC2 On-Demand compute cost—this cost doesn’t take savings plans into account yet (your account team will add it)
- Hourly Spot Instances compute cost—the hourly cost of Spot Instances
- Variable fields: These will include the breakdown of the hourly On-Demand costs into Amazon EC2 instance families. These will help your account team optimize your costs using EC2 Instance Savings Plans.
Spot Instance use is calculated by the job’s duration, and you can configure the maximum job duration to use Spot Instance in config.yml
The raw data is then visualized showing the monthly Amazon EC2 costs for your workload on AWS, broken down by instance family, as seen in figure 2.
What’s next?
Your account team will help you optimize cost using Savings Plans and Amazon EC2 Spot Instances. Note that the most optimal Savings Plan is usually slightly higher than your minimal hourly use because Savings Plans offer up to 66% discount compared to On Demand pricing.
To get the optimal price for your workload, share the summary.csv
and hourly_stats.csv
files, located in the output
folder) with your AWS Account team.
To learn more about the benefits of running EDA workloads on AWS, see Run semiconductor design workflows on AWS, and for the technical details and AWS services used, see the implementation guide Semiconductor design on AWS.