Predict the cost of Electronic Design Automation on AWS using simulation

Introduction

Designing semiconductor requires High Performance Computing (HPC) to run Electronic Design Automation (EDA) tools. These workloads vary over time in both the amount and type of compute resources required. This makes it an ideal workload for the elasticity of the cloud. Customers choose optimal instance types to optimize each tools runtime. This reduces time-to-results and improves engineering productivity. By also leveraging multiple purchase options customer reduce their costs. However, since each customer’s workload is unique, customers need an HPC-specific tool to better simulate their costs on Amazon Web Services (AWS). In this blog, we will present an open-source AWS solution that simulates how past jobs ran on AWS to help customers estimate the most cost-effective way of running those jobs.

EDA workload challenges

EDA tools vary in their compute and storage requirements. While front-end tools benefit from faster CPUs, backend workloads benefit from memory-intensive instances, as many instances as the EDA license allows. With multiple projects running concurrently at various stages, sizing the workload in the cloud is a common challenge.

Unlike many HPC workloads, EDA is characterized by expensive licensing costs, so companies experience both under and overutilization of their HPC. When HPC resources are idle, capital expenditure (CAPEX) that was already invested is a waste. When the HPC is over utilized, jobs start queuing, expensive EDA licenses could be idle, and engineers could be less productive (see figure 1). All these contribute to project delays, increased costs, and increased risk of missing critical tape-out dates.

Figure 1. Compute supply vs. demand gaps result in project delays and wasted resources

Past workloads as a predictor of future workloads

As EDA workloads grow with increased functionality and tests, the past workloads of a company are still the best predictor of their future workloads. The newly announced HPC Cost Simulator uses this property to not only help customers plan and visualize their costs, but also provides the opportunity for smarter purchasing options like Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances and Savings Plans. These let customers take advantage of unused Amazon EC2 capacity in AWS cloud while also saving on costs.

A key component in cost saving comes from shutting down idle resources, something not possible when purchasing hardware through CAPEX; customers realize a 20–30 % cost reduction on AWS by avoiding this expense.

Schedulers like IBM Spectrum LSF, Altair Accelerator, and Slurm are commonly used in EDA environments, and HPC Cost Simulator integrates with them natively. However, HPC Cost Simulator supports other schedulers as well (see the documentation). These schedulers also offer integration with AWS. This helps customers use the elasticity of Amazon EC2s to grow and shrink the HPC clusters as needed. IBM LSF offers Resource Connector, Altair Accelerator offers RapidScale, and Slurm has an open-source integration with AWS on GitHub.

Data privacy

When designing HPC Cost Simulator, we spoke to customers who were concerned about the exposure of scheduler logs because the logs include personal data (user names), internal project names, and also, indications when the company is scaling its HPC jobs, which may suggest an upcoming tapeout. To help customers analyze logs without having to undergo a complicated security assessment, we designed HPC Cost Simulator to run locally (on-premises) where the logs are. The data is processed and anonymized, and only two clear-text files are generated for sharing with AWS, making them easy to review. All sensitive data is removed (including timestamps), and the output only lists a series of hourly consumption estimates, which your AWS account team will then translate to a commercial offer.

Running the analysis

To simulate the cost of EDA workloads, the tool requires two inputs: accounting records from the schedulers and access to updated Amazon EC2 instance prices from the AWS API. If internet access is blocked in the environment your scheduler is in, you can download an offline copy of the Amazon EC2 prices using the get_ec2_instance_info.py script.

Prerequisites (all schedulers)

For all schedulers, you must first clone the Git repo to a local folder using git clone https://github.com/aws-samples/hpc-cost-simulator, and then run source setup.sh to install the required packages and set up the python virtual environment. If packages need to be installed, then you will need sudo access.

The analysis happens in two stages: Step 1 takes the scheduler logs and converts them to a common format. Step 2 uses the common format and performs the final analysis.

Step 1
IBM LSF

The tool analyzes the lsb.acct files. Copy them to a subfolder under the tool’s location (in this example, we used the input folder), and then run the tool:
./LSFLogParser.py --logfile-dir input --output-csv jobs.csv

Altair Accelerator

The tool analyzes the accounting database. To extract the fields, first source vovrc.sh script, located under your Altair install directory under the common/etc folder). For example:
source /tools/altr/2019.01u7/common/etc/vovrc.sh

Then collect the data from the database into a CSV file:
nc cmd vovsql_query -e "select id, submittime, starttime, endtime, exitstatus, maxram, maxvm, cputime, susptime from jobs" > /tmp/input.csv

Now you can run the tool to analyze the data in two steps:
./AcceleratorLogParser.py --sql-input-file /tmp/input.csv --output-csv jobs.csv

Slurm

For Slurm, the data is taken from the accounting database. Though you can provide the tool a path for your Slurm binaries, we recommend exporting the data in advance to a CSV file. This allows multiple runs of the tool without having to call the Slurm database each time.

Example data extraction from Slurm, for all users and projects, and without limiting for a specific time period:

sacct --allusers --starttime 1970-01-01 --parsable2 --noheader --noheader --format State,JobID,ReqCPUS,ReqMem,ReqNodes,Constraints,Submit,Eligible,Start,Elapsed,Suspended,End,ExitCode,DerivedExitCode,AllocNodes,NCPUS,MaxDiskRead,MaxDiskWrite,MaxPages,MaxRSS,MaxVMSize,CPUTime,UserCPU,SystemCPU,TotalCPU > Slurm.txt

Now run the SlurmLogParser.py with this file as the input:
./SlurmLogParser.py --sacct-input-file Slurm.txt --output-csv jobs.csv

Other schedulers

Though other schedulers—for example, SGE and PBS Pro—do not have direct integration in this release, the tool supports input from a preformatted CSV file to accommodate them too.

First, create a CSV file with the following fields—for example, MyInput.csv as instructed in documentation.

Then run CSVLogParser.py:
./CSVLogParser.py –input-csv MyInput.csv –output-csv MyInput.csv

Step 2

Now that the scheduler-specific format has been converted a common format, we can run the final step: simulating the cost on AWS. To run the simulation based on the jobs.csv file that you created in step 1, run
./JobAnalyzer.py csv --input-csv jobs.csv

Note: you can run multiple simulations using the same jobs.csv file—for example, add the --starttime or the --endtime to limit the analysis to a subset of time.

The result

The tool’s output is identical for all schedulers and has three fixed fields and a few variable fields:

Relative hour (starting from 1)—by removing the absolute timestamp, we simplify sharing it externally
Hourly Amazon EC2 On-Demand compute cost—this cost doesn’t take savings plans into account yet (your account team will add it)
Hourly Spot Instances compute cost—the hourly cost of Spot Instances
Variable fields: These will include the breakdown of the hourly On-Demand costs into Amazon EC2 instance families. These will help your account team optimize your costs using EC2 Instance Savings Plans.

Spot Instance use is calculated by the job’s duration, and you can configure the maximum job duration to use Spot Instance in config.yml

The raw data is then visualized showing the monthly Amazon EC2 costs for your workload on AWS, broken down by instance family, as seen in figure 2.

Figure 2. The output: Hourly compute costs for on AWS, broken down the instance family

What’s next?

Your account team will help you optimize cost using Savings Plans and Amazon EC2 Spot Instances. Note that the most optimal Savings Plan is usually slightly higher than your minimal hourly use because Savings Plans offer up to 66% discount compared to On Demand pricing.
To get the optimal price for your workload, share the summary.csv and hourly_stats.csv files, located in the output folder) with your AWS Account team.

To learn more about the benefits of running EDA workloads on AWS, see Run semiconductor design workflows on AWS, and for the technical details and AWS services used, see the implementation guide Semiconductor design on AWS.

AWS for Industries

Predict the cost of Electronic Design Automation on AWS using simulation

Introduction

EDA workload challenges

Past workloads as a predictor of future workloads

Data privacy

Running the analysis

Prerequisites (all schedulers)

Step 1
IBM LSF

Altair Accelerator

Slurm

Other schedulers

Step 2

The result

What’s next?

Resources

Follow

Learn

Resources

Developers

Help

AWS for Industries

Predict the cost of Electronic Design Automation on AWS using simulation

Introduction

EDA workload challenges

Past workloads as a predictor of future workloads

Data privacy

Running the analysis

Prerequisites (all schedulers)

Step 1 IBM LSF

Altair Accelerator

Slurm

Other schedulers

Step 2

The result

What’s next?

Resources

Follow

Learn

Resources

Developers

Help

Step 1
IBM LSF