Scaling Synopsys Proteus optical proximity correction on AWS
Photolithography is a key step in the manufacturing of semiconductor chips. Photolithography works by shining light (laser) through a pattern (mask) onto a silicon wafer with photosensitive coating (resist). This changes the properties of the coating allowing manufactures to chemically remove the parts of the coating based on the exposure or lack of exposure to the laser. This forms a pattern on the wafer that matches the pattern on the mask. The manufacturer uses these patterns to selectively add or remove layers onto the wafer to form different devices on silicon.
This process remained largely unchanged until the geometry of the patterns on the chip started to approach the wavelength of the laser used. As these dimensions approached the wavelength of light, basic photolithography techniques could no longer accurately reproduce the patterns on the wafer due to diffraction and other distortions. In order to compensate, optical proximity correction (OPC) calculates the effects of these distortions on the final image and modifies the patterns on the mask. To accomplish this, OPC software, such as Synopsys Proteus, must execute billions of calculations across dozens of mask layers to compensate for the distortions introduced as semiconductor geometries continue to shrink.
This level of complexity makes OPC one of the most computationally demanding workloads in semiconductor manufacturing, often requiring thousands of compute cores running for hours to process a single semiconductor chip. Due to the massive scale of compute required, semiconductor foundries devote a significant portion of their data centers to this single workload.
Scaling Synopsys Proteus OPC on AWS
Synopsys and AWS recognize that as the advances in semiconductor technology continue to push the complexity of each chip, customer data centers face the challenge of keeping up with the growing demand on their resources. OPC is a natural fit to leverage the infinite compute scale of the AWS Cloud as the computation can be parallelized. Synopsys and AWS decided to launch a joint investigation to determine how Synopsys Proteus scales on AWS.
We decided to target scaling across 24,000 compute cores on a single design with a goal of maintaining at least 95% linearity at scale. Validating how a single design scales provides an accurate picture of the benefits customers can achieve as it takes into account the infrastructure and design workload interdependencies as you add workers to the same job. We already know that loosely coupled workloads scale nearly linearly on AWS.
Amazon EC2 Spot Instances are a great way to optimize compute costs for fault tolerant workloads. EC2 Spot Instances use spare Amazon EC2 capacity, which is available for up to a 90% discount over On-Demand Instances. When there is a spike in requests for a particular On-Demand Instance type in a specific Availability Zone (AZ), AWS can reclaim the Spot Instances with a two-minute notification.
We leveraged the official AWS Solution Scale-Out Compute on AWS (SOCA) to quickly create a readily available cloud environment that provides scalability of compute and storage, budget monitoring, job scheduling, etc. For guidance on how to set up SOCA for an EDA workload, check out the blog Scaling EDA Workloads using Scale-Out Computing on AWS.
Proteus runs using a distributed computing architecture. There is a head node that manages and tracks the workload and data while dispatching the individual compute jobs to worker nodes. Each worker node receives the data for a small part of the mask, processes the workload, and returns the finished data back to the head node.
We divided our investigation into two steps – 1) achieve scalability across 24,000 cores using On-Demand Instances and 2) optimize the architecture for cost using EC2 Spot. We started by running the tests with 2000 cores, then 4000, 8000, 10,000, 16,000, and finally with 24,000 cores using On-Demand Instances. For information on the design we used to test, see the table below labeled, Design Details. We were able to successfully scale Proteus to 24,000 and we were able to maintain scaling linearity of over 98% even with 24,000 cores. See the red solid line in figure 1.
Next, we explored running Proteus using Amazon EC2 Spot Instances in order to scale the workload more cost effectively. Starting with version vM-2017.03-9-T-20200117, Synopsys Proteus is now able to leverage the EC2 Spot Instances for the worker nodes. We proceeded to run the same set of tests using EC2 Spot Instances instead of On-Demand Instances. See the black solid line in figure 1. The results show that Spot interruptions did result in some reduction of overall efficiency. However, even with Spot interruptions, Proteus was still able to achieve over 97% scalability when scaling to 24,000 cores on a single design. We took advantage of Spot Fleet, which enables diversification of instance types to minimize the impact of Spot interruptions. Figure 2 shows the Spot instance types used for the duration of the 24,000 cores test.
|Layout||Synopsys N7 IP block|
OPC is one of the more compute intensive workloads in semiconductor manufacturing, consuming a significant portion of foundry data center capacity. The scale of OPC workloads makes them a natural fit for running on the cloud. The joint Synopsys and AWS team showed that Proteus can successfully scale to the targeted 24,000 cores for a single design while maintaining 98% scalability. Proteus can even successfully leverage the cost savings of Amazon EC2 Spot Instances while achieving over 97% scaling linearity. By leveraging AWS for Synopsys Proteus OPC, foundries and integrated device manufacturers (IDM) have the flexibility and elasticity to scale their OPC workloads across more cores than they could in their own data centers thereby reducing the total turn-around time while still realizing lower total compute cost than running on-premises.
While we achieved the target of this investigation, 24,000 cores on a single design with EC2 Spot, the Synopsys and AWS teams feel that we can go even farther with Synopsys Proteus on AWS. Stay tuned for future updates. For further information about how we scaled this and other EDA workloads on AWS, or just to how to migrate EDA workloads to AWS, reach out to your Synopsys or AWS account teams.
For more information on Synopsys Proteus, please go to https://www.synopsys.com/silicon/mask-synthesis/proteus.html
For more information on EDA workloads on AWS, please go to https://aws.amazon.com/semiconductor