AWS for Industries

Cut time to results without changing your EDA flows

Using advanced node technology to successfully manufacture a chip is getting harder as chip geometries continue to shrink. Electronic Design Automation (EDA) consumes more compute, storage, and time. Giving your engineers more time to iterate and find bugs in the design and verification phases will result in saving millions in re-spins and lost revenue. Further complicating the chip design process, the semiconductor market is experiencing a talent shortage. Increasing productivity for existing engineers alleviates this shortage and improves time to market. In this blog, we explore two environments showing up to 40% performance improvement using flexible compute options. These environments span batch and interactive tools from Cadence and Synopsys, comparing time-to-results and job costs.

Why choice matters

Each EDA tool’s performance is impacted by something else. Some benefit from high CPU clock speeds or specific CPU instruction sets. Other tools benefit from a larger L3 cache, memory bandwidth, or network throughput. Running on AWS allows customers to right-size the compute instance to the specific Computer Aided Design (CAD) flow they are trying to optimize. We show 10-40% faster time to results across CAD flows, and the entire process took an hour to complete. This is a low-effort to get such improvements in performance without changing the CAD flow.

Quantifying the impact

To quantify the impact of this optimization, let’s compare two simplified scenarios (optimized/non-optimized):

  • Jobs are split 50/50 between two license types
  • License A runs 15% slower on CPU provider A
  • License B runs 15% slower on CPU provider B
  • 1,000,000 jobs per month, average job time is 1 minute if running on the fastest of the two options.
  • Both compute types cost $0.1 per hour (to keep things simple)
  • License cost is x3 the cost of compute ($0.3)

If the customer chooses only one CPU provider for their cluster, they will pay: 500,000 minutes * 100% * $(0.1+0.3) / 60 + 500,000 minutes * 115% * $(0.1+0.3) / 60 = $7,166.

However, the same customer choosing to run each job on the optimal CPU will only pay: 500,000 minutes * 100% * $(0.1+0.3) / 60 + 500,000 minutes * 100% * $(0.1+0.3) / 60 = $6,666.

This calculation ignores the business benefits of getting the results 15% faster (engineering productivity, time to market), highlighting only the direct savings. It also ignores the fact that some instances cost less, and additional savings can be achieved by choosing them. Before we share the results, let’s describe how we ran the benchmark.

Methodology

I ran this benchmark using AWS infrastructure in the AWS Oregon region. Each run included:

  • Three server architectures: Intel, AMD, and the ARM-based AWS Graviton processor
  • Two generations of compute (current and previous) at the time of testing
  • A total of 13 instance types (5 Intel, 4 AMD, 4 Graviton)

Each instance had the same number of cores, with some instances offering more memory. This allowed me to evaluate the cost/performance of each CPU type for each EDA tool. This also allowed for a comparison of CPU generations to see if this changed over time. For each tool, we used an IP block from the same EDA vendor (Cadence/Synopsys). We did not run the same use case across multiple tools. We are not comparing Cadence to Synopsys, but rather different compute types per tool. A similar approach has been taken by Siemens, as stated in their announcement of their cloud flight plans, capturing the best known methods to run faster on AWS. Note: The results are often design-specific. If Intel was faster for Tool X, that wouldn’t necessarily be the case for your design. You will need to repeat this test for your design. This can happen, for example, when a tool is sensitive to L3 cache size, but the test case was too small to stress the L3 cache. Your design may be big enough to experience that difference. In other words, your mileage will vary; test for yourself. For cost analysis we made three assumptions:

  • We assumed each license costs $2,500. There is no single cost for all licenses, but we needed a “plug number” to use to show the impact of runtime on overall cost. EDA licenses usually cost a few times more than the compute used to run them (4x more in Intel’s case). We’re optimizing for the overall cost, not the cost of compute alone.
  • We are not showing productivity—in our own silicon development we see engineers cost are up to 50% of the development cost of a new product. That cost component is not included in the following cost simulation. The impact of longer running jobs would be more than doubled if we included them.
  • For each job, we used Amazon Elastic Compute Cloud (Amazon EC2) On-Demand pricing in the Oregon region. On-demand hosts do not enjoy the discounts offered by Reserved Instances/Savings plans. This is a “worst-case scenario” calculation. You can read Predict the cost of Electronic Design Automation on AWS using simulation to learn how we help customers implement cost mitigations.

This blog is based on data from tests run in the summer of 2023. Since then, new instances have been announced, like the Intel-based r7iz and Graviton 4_ _instances. However, this blog looks at performance optimization over time. Retesting with the new instance types is a perfect example of how to iterate and constantly improve EDA performance on AWS.

Results: Cadence

Graph 1 shows the results of Cadence Spectre. Each dot on the graph represents a specific compute instance type:

  • X-axis shows the runtime (seconds)
  • Y-axis shows the total cost of a single job: compute + EDA license for that time it ran
  • An ideal server will be lower (cost-effective) and further to the left (faster time-to-results).

Graph 1 – Spectre cost/performance analysis (per job). Graviton instances were over 40% faster than x86 instances and over 40% lower on cost.

In graph 1, we see that the longer the job runs, the higher overall cost of the job. This is why the data points are spread diagonally. You can see the c7g/m7g (third generation Graviton processor) were faster by over 40% compared to the current generation Intel instances (c6i/m6i) and AMD (c6a/m6a). Spectre relies on floating-point operations, which the third generation Graviton processor performs faster than x86. Despite Graviton having a lower clock speed, it is faster in floating-point operations. This is not trivial, which is why we recommend testing and not relying on instance specs. Before moving on to the next tool, we can compare these instance families across generations to see how time-to-results changes over time:

Graph 2 – Spectre performance between compute generations. Graviton and AMD (M-family) both moved from being slower than Intel previously to being faster in the current generation.

Graph 2 shows runtime improvements across all compute families in the newer generation. But the improvements vary for each family; AMD was fastest in the previous generation but ARM is fastest in the current one. This shows the need to re-evaluate our compute choices with each new generation. As explained before, it took an hour to test this. In this lab, we ran the design on all compute nodes in parallel to evaluate the performance. If you are license-bound, you can:

  • Run these tests serially, or
  • Run your regression tests using one CPU type today and another CPU type tomorrow.

Let’s compare the same data for Xcelium:

Graph 3 – Xcelium cost/performance analysis. AMD was 11% faster than Intel and 14% faster than ARM-based instances.

Graph 3 shows AMD-based instances running 11% faster than a similar spec Intel and 14% faster than Graviton-based instances. The change in results compared to Spectre highlights the need for a wide variety of compute types to get faster time to results. Comparing compute generations (Graph 4), AMD moved from being slower than Intel to being faster. With each compute generation AWS customers have the flexibility to test what is optimal for each flow. Then they can use a combination of instance types.

Graph 4 – Xcelium performance between compute generations. AMD (M-family) moved from being slower than Intel previously to being faster in the current generation.

Results: Synopsys

We took a similar approach when testing Synopsys VCS, but this time we tested the same tool with two different quick-start kits (XBUS and Bitcoin). This highlights the impact of the specific design on the CPU choices we made. Using an XBUS quick-start kit from Synopsys, Intel is 10% faster than AMD and ~14% faster than Graviton (as shown in Graph 5).

Graph 5 – VCS (XBUS) cost/performance analysis. Intel was 11% faster than AMD and 14% faster than Arm based instances.

Comparing compute generations (Graph 6) we see that Intel is fastest in the current generation. But notice how these change when you compare the previous generations.

Graph 6 – VCS (XBUS) performance between compute generations, Intel was fastest in both generations.

Repeating the same test with the Bitcoin quick-start kit from Synopsys (Graph 7), we see the results shift. Now AMD is 25% faster than Intel and ~20% faster than Graviton. This shows how design-specific the results can be and why you’ll need to test for yourself.

Graph 7 – VCS (Bitcoin) cost/performance analysis. AMD was 30% faster than Intel and 25% faster than Graviton.

Comparing compute generations (Graph 8), we see again how things change over time, with AMD moving from slowest to fastest for this specific design.

Graph 8 – VCS (Bitcoin) performance between compute generations

Conclusion

With each new generation, CPU providers may leapfrog each other in cost/performance for EDA. This opens up new opportunities for improvement. AWS customers can customize the compute instance for the specific EDA flow to get faster results. This is in stark contrast to running on-premises, where your jobs run on whichever node happens to be available. This, in turn makes your existing engineering team more productive, increasing coverage and reducing time to market. And all that without having to change your existing flows. Want to run your EDA flow through this performance optimization process? Reach out to your AWS account team or an AWS Representative and ask to speak to a semiconductor specialist. We’ll be happy to support your test.

Further Reading