AWS for Industries

Increasing fab productivity through data classification


Semiconductor manufacturing has become incredibly complex and the effort it takes to get electronics in front of the end customer at a reasonable price point is very challenging. This coupled with the chip shortage, and the cost and time to build fabs to increase capacity, has led to companies focusing on improving productivity. A big part of that is better collaboration, and it’s all about the data. As a follow-on to The semantics of data sharing across the semiconductor supply chain, this blog looks at ways of enabling better data sharing between semiconductor foundries (fabs) and equipment providers to increase productivity while protecting the intellectual property (IP) of each company.

While fab production lines will likely remain air-gapped for the foreseeable future for obvious security reasons, we can and should revisit how we treat the data generated during the chip fabrication process. Not all data is born equal. When you look at the famous NASA image of the earth rising over the moon, it looks like a perfect blue marble, smooth. Only when you zoom in can you see mount Everest, cities, buildings. We too need to zoom in, distinguish different types of data so that we can evaluate them based on (at least) 2 criteria:

Potential Value: The fab and equipment providers, sharing data with each other, can unlock new capabilities, such as improved predictive maintenance, increased throughput and performance, leading to higher Overall Equipment Effectiveness (OEE), higher parts lifetime, better SLA for part delivery.

Potential Risk: From the fab’s perspective, there is a risk of exposing the fab’s process. From the equipment provider’s perspective the risk of exposing the intellectual property (IP) that differentiates their tool.

Historically, the risks have always stopped us from zooming in, and looking at the different categories of data, but we should categorize the data, creating a risk vs. value graph and looking for low-risk high-value options. Chart 1 shows the value/risk analysis from the perspective of the fab:

Low risk, medium value

Starting with the lowest hanging fruit: Datasets that only expose machine-operational metrics with no process data. These metrics can help predict failures when used at scale, aggregating data from multiple fabs, identifying trends. With this dataset, equipment providers can improve throughputs by optimizing wafer movement scheduling, optimizing cleaning procedures for better part lifetime, and better parts replacement planning. It can help verify that parts are in stock and make sure they are close to the customers that are likely to need them, shortening lead times. All of these help lower maintenance costs, increasing lifetimes and uptime and lead to increased OEE.

Medium risk, high value

Here we have data that can help equipment providers improve their product, creating a positive-feedback loop that allows incremental improvement in their design but is more sensitive from the fab’s perspective since it can potentially expose their process or IP. This can include sensor data, calibration data, and potentially their correlation with production metrology data. This is not where we should aim to start, but some of these datasets can be shared securely using anonymization, obfuscation or similar mechanisms. A few years from now sharing this data will likely become the norm, when fabs start seeing benefits from sharing low-risk data.

High risk, high value

We’ve been focusing on the fab’s IP, but the equipment providers have IP too, and it’s inside their machines. If they can look into the data produced from their machines while in production, they can generate insights to develop process customized maintenance, part cleaning procedures, improve designs to increase performance, quality and lifetime, and increase throughputs. This is a complicated situation: The fab won’t share data it can’t audit, and the provider needs to protect their IP. Perhaps for the foreseeable future these datasets will only be shared during joint R&D phases, but there is room for optimism that a we can solve this too in the long run.

Functional requirements

Once fabs and their equipment providers agree on data classification and want to share data, there are a few other prerequisites:

  • The equipment provider and the fab need a virtual “meeting place”, where data will be uploaded and analyzed, and new insights on improving equipment availability will be shared
  • Different customers (fabs) data will need to remain separate, but analysis should be allowed to use datasets from multiple customers
  • Complex data analytics, ML/AI capabilities are required to find these new insights

These requirements make AWS an ideal meeting place for equipment providers and their fab customers. AWS allows customer data to be completely isolated in different child-accounts (as part of an AWS Organizations), where they can be anonymized or obfuscated, while a parent account has clearly defined access permissions to analyze the data, or pull a subset of anonymized data to a centralized data lake for further analysis (without the customer identifiers). AWS services like AWS Lake Formation not only make it easy to set up a secure data lake in days instead of months, it also helps secure the access to your sensitive data using granular controls at the column, row, and cell-levels.


All data generated by machines in the fab can help our industry as a whole move forward, easing the supply chain challenges. For the fabs, this will mean higher uptime and better yields, while for the equipment providers this will mean better operational efficiency and customer support. It’s not an easy first step considering the IP related risks, but categorizing the data in terms risk vs value helps build a roadmap towards improved data sharing. Using AWS data analytics services to quickly generate insights from the data, and leveraging the ability to host data from multiple customers in separate AWS accounts allows equipment providers and their fab customers to quickly iterate and experiment with this new data-sharing environment. Alternatively, fabs and equipment providers can leverage secure collaboration solutions like the recently announced AWS Clean Rooms to collaborate on the insights gained without exposing the underlying data.

To learn more, visit the AWS Semiconductor and Electronics page.

Eran Brown

Eran Brown

Eran Brown is a senior semiconductor Specialist Solution Architect. He spent 7 years working with semiconductor companies designing HPC storage infrastructure, and after all these years is still amazed at what a square inch of silicon can do.

Gautham Unni

Gautham Unni

Gautham Unni is the Head of Business Development for Semiconductors Smart Manufacturing. Prior to joining AWS, he had a career in semiconductor manufacturing at GlobalFoundries, Lam Research and Applied Materials in process engineering, product development and product management/marketing roles. He is passionate about improving collaboration and productivity in all manufacturing.