AWS for Industries

Generative AI for Semiconductor Design and Verification

The emergence of generative AI presents tremendous opportunities for advancing technical and business processes in high tech and semiconductor industries. From optimizing complex system design processes to accelerating time-to-market for new products, generative AI has unlimited potential to improve engineering and manufacturing methodologies and processes.

Generative design methodologies powered by AI can automatically design chips and electronic subsystems given the right prompts and desired parameters and constraints without intensive engineering efforts, freeing up resources. For instance, generative AI engineering assistants can help new engineers become up to 2X more productive by interacting with design tools using natural language. For process improvements that directly impact project timelines and business outcomes, generative AI can facilitate rapid development of product datasheets, technical manuals, and associated documentation customized for target audiences and markets. Further efficiency gains can be realized by using engineering assistants for research and providing engineers with contextual recommendations, thereby assisting human teams to quickly address critical research problems.

In this blog, we will discuss Amazon Web Services (AWS) generative AI services and how some of these services can be leveraged to build a generative Engineering Assistant for semiconductor design.

Why leverage AWS for generative AI?

AWS makes it easy to build and scale generative AI applications for your data, use cases, and customers. With generative AI on AWS, you get enterprise-grade security and privacy, access to industry-leading Foundation models (FMs), generative AI powered applications, and a data-first approach. For more information on the generative AI stack, please read this blog where Swami Sivasubramanian, Vice President of Data and Machine Learning, discusses in detail the bottom, middle and top of the stack.

Generative AI in Semiconductor design and verification

In a recent article, McKinsey & Company highlighted the increasing costs of designing a chip design at advanced nodes. Designers have seen the cost of design increase 2-3X higher compared to previous generations. In the chart below, Mckinsey highlights that a 5nm chip costs on average $540M to develop and takes 864 engineer days to complete. At this rate of increase, cost to design advanced nodes is estimated to reach $1B, allowing only a few companies to continue developing advanced chips.

Increasing Chip Development Costs

By leveraging generative AI in semiconductor designs, chip companies can optimize their costs by increasing developer productivity and empowering developers to do more with less. We believe there are several use cases in the semiconductor design lifecycle where generative AI can improve worker productivity, reduce design cycle time, and ultimately impact business metrics such as time to market, reduced costs and improved products. The illustration below shows various use cases across a typical semiconductor design cycle that can leverage generative AI.

Semiconductor design use cases for a typical EDA flow

Over the past decades, Electronic Design Automation (EDA) has significantly boosted chip design productivity – complementing the transistor density increases of Moore’s Law. However, many complex chip design tasks, especially those involving natural and programming languages, remain unexplored. Recent advancements in commercial and open-source Large Language Models (LLMs) present opportunities to automate these language-related and code-related tasks in the front-end, back-end, and production test design phases. Using LLMs to enhance chip design productivity by automating tasks like code generation, responding to engineering queries in natural language, report generation, and bug triage can greatly improve engineering productivity and optimize development costs.

According to a recent paper from NVIDIA, up to 60% of a chip designer’s time is spent in debug or checklist-related tasks across a range of topics such as tool usage, design specification, testbench creation, and root cause analysis of flows. Furthermore, the technical know-how and experience are often tribal knowledge and are documented in files and slide decks scattered across the firm, sometimes in different locations around the world. For new engineers on the team, the inaccessibility of this knowledge can be frustrating, ultimately increasing the overall design cycle time.

Another common use case in chip design flow is writing automation scripts that stitch together EDA tools, reference methodologies, and proprietary logic. Foundation models like Code Llama have demonstrated their adeptness at generating code and natural language about code. These models already have comprehensive support for Python code generation and can be further fine-tuned to generate code with other commonly used scripting languages like Perl and Tcl. Several EDA tools today either use some variant of Tcl or their own proprietary language to interact with the design when loaded in a GUI model. Another application of the code generation use case is creating an Engineering Assistant that a design engineer can use directly from the terminal and interact with these EDA tools in GUI mode through natural language. Synopsys, one of the leading EDA providers, is already building these type of generative AI capabilities into its tools, but companies could build their own independent Engineering Assistants with proprietary information to augment EDA tools they utilize. Imagine a customized Engineering Assistant that can work across tools from different EDA vendors.

As a previous Engineering Manager at Intel, my team was responsible for delivering EDA Tools and Scripts to design engineers. A large amount of my team’s time was spent on tracking, triaging, and resolving support requests. Generative AI can dramatically simplify this process and yield back time to the team by looking at all the supporting information, summarizing the ticket, and suggesting a course of action based on prior tickets and the team’s operating procedures.

Choosing the right generative AI approach for semiconductor design and verification

Choosing the right approach for building generative AI applications

The accuracy of these large language models is fundamentally tied to the quality and breadth of data they are trained on. In particular, these large language models have been trained on limited semiconductor domain data and as such, they cannot be used effectively out of the box in a production environment. For the semiconductor domain, there are two approaches to customizing foundational models:

  • Retrieval Augmented Generation (RAG)
  • Fine-Tuning

The Retrieval Augmented Generation (RAG) approach uses external data such as document repositories, databases, or APIs, to augment your prompt. With a RAG approach, the first step is to convert your documents and any user queries into a compatible format to perform relevancy search. To make the formats compatible, a document collection (or knowledge library) and user-submitted queries are converted to numerical representations using embedding language models. Embedding is the process by which text is given numerical representation in a vector space. RAG model architectures compare the embeddings of user queries within the vector of the knowledge library. The original user prompt is then appended with relevant context from similar documents within the knowledge library. This augmented prompt is then sent to the foundation model. You can update knowledge libraries and their relevant embeddings asynchronously.

Fine-tuning a pre-trained foundation model is an affordable way to take advantage of its broad capabilities while customizing a model on your own small corpus. The Fine-tuning approach is a customization method that involves further training and does change the weights of your model. Domain adaptation fine-tuning methods like Parameter Efficient Fine-Tuning and Low Rank Adaptation (LoRA) allows you to leverage pre-trained foundation models and adapt them to specific tasks using limited domain-specific data. In this blog, we will discuss how you can use a RAG-based architecture pattern using Amazon Bedrock to design your own Engineering Assistant.

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs along with a broad set of capabilities that you need to build generative AI applications, simplifying development with security, privacy, and responsible AI. With Amazon Bedrock, you get an easy-to-use developer experience to work with a broad range of high-performing FMs from Amazon and leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral, and Stability AI. You can quickly experiment with a variety of FMs in the playground, and use a single API for inference regardless of the models you choose, giving you the flexibility to use FMs from different providers and keep up to date with the latest model versions with minimal code changes. Bedrock also allows easy model customization with your data and privately customize FMs with your own data through a visual interface without writing any code. Another feature of Bedrock is its native support for RAG using Knowledge Bases which comes with APIs to query the knowledge base easily. We will be discussing in detail how you can use the aforementioned Bedrock features to build your EDA Engineering Assistant.

Knowledge Bases for Amazon Bedrock

With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company’s data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant, context-specific, and accurate responses without continuously retraining the FM. All information retrieved from knowledge bases comes with source attribution to improve transparency and minimize hallucinations.

Knowledge Bases for Amazon Bedrock manage the end-to-end RAG workflow for you. Simply point to the location of your data in Amazon S3, and Knowledge Bases for Amazon Bedrock automatically fetches the documents, divides them into blocks of text, converts the text into embeddings, and stores the embeddings in your vector database. You can choose from a variety of embeddings models to covert the knowledge base from your data into an embedding. If you do not have an existing vector database, Amazon Bedrock creates an Amazon OpenSearch Serverless vector store for you. Alternatively, you can specify an existing vector store in one of the supported databases, including Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud, with support for Amazon Aurora and MongoDB coming soon. Knowledge bases are easy and simple to create. For more instructions on how, please visit this link.

Amazon Bedrock API:

Amazon Bedrock supports a rich set of APIs with SDK support for C++, Python, Go, and more. It also supports APIs for RAG that handle the embedding and querying, and provide the source attribution and short-term memory needed for production RAG applications. With the new  RetrieveAndGenerate  API, you can directly retrieve relevant information from your knowledge bases and have Amazon Bedrock generate a response from the results by specifying an FM in your API call. Let me show you how this works.

Here’s a quick setup of how to use the APIs with the AWS SDK for Python (Boto3).

Python

def retrieveAndGenerate(input, kbId):
   return bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-instant-v1'
                }
            }
        )
 
response = retrieveAndGenerate("Can you tell me the syntax of the check_timing command for synopsys design compiler?", "AES9P3MT9T")["output"]["text"]

The code above shows a basic procedure built around the retrieve_and_generate runtime API where you pass the input text prompt (in this case, that is “Can you tell me the syntax of the check_timing command for synopsys design compiler”) and a unique ID assigned to your knowledge base. The output of the RetrieveAndGenerate API includes the generated response, the source attribution, and the retrieved text chunks. The model used here is Anthropic’s Claude v1 instant model. For the examples below, the Amazon Bedrock API was quickly integrated into a simple Python script for quick command line-based prompting. The response is as follows:

./eda_assistant.py –-prompt “Can you tell me the syntax of the check_timing command for Synopsys Design Compiler?”
 
-I- ModelID selected:  anthropic.claude-instant-v1
-I- User Prompt:  Can you tell me the syntax of the check_timing command for Synopsys Design Compiler?
-I- No. of input prompt tokens: 19
-I- Generating Output:
 
The syntax of the check_timing command for Synopsys Design Compiler is:
check_timing
[-overlap_tolerance minimum_distance]
[-override_defaults check_list]
[-include check_list]
[-exclude check_list]
[-multiple_clock]
[-retain]
 
-I- No. of output tokens:  161

Here is another example where we ask another simple query for a different EDA software tool.

> ./eda_assistant.py –-prompt “Can you tell how I can create a design library in Cadence Virtuoso?”
 
-I- ModelID selected:  anthropic.claude-instant-v1
-I- User Prompt:  Can you tell how I can create a design library in Cadence Virtuoso?
-I- No. of input prompt tokens: 17
-I- Generating Output:
 

To create a design library in Cadence Virtuoso, you first need to start Cadence and open the Library Manager. From the Library Manager window, go to File > New > Library and provide a name for the new library. You will also need to attach the library to an existing technology library such as NCSU_CellLib_FreePDK45.
-I- No. of output tokens:  160

In both cases, we have used publicly available documentation on tool usage for creating our knowledge base. It is important to note that the quality of the response is correlated with the quality of the context you provide the model. This simple use case is being used to illustrate a simple RAG-based architecture with limited high-quality data.

Another EDA specific use case for RAG-based architecture is Verilog code generation or TCL/Perl scripting. There is a limited dataset available on the public internet for Verilog; however, the below example demonstrates how engineering productivity can be improved using LLM-based assistants.

> ./eda_assistant.py --prompt "Write a simple Verilog module for a 16-to-1 multiplexer (MUX) that takes a 16-bit input vector and a 4-bit select signal to produce a single-bit output"

-I- ModelID selected:  anthropic.claude-instant-v1

-I- User Prompt:  Write a simple Verilog module for a 16-to-1 multiplexer (MUX) that takes a 16-bit input vector and a 4-bit select signal to produce a single-bit output

-I- No. of input prompt tokens: 41

-I- Generating Output:

Here is a simple Verilog module for a 16-to-1 multiplexer that takes a 16-bit input vector and a 4-bit select signal to produce a single-bit output:

module mux16to1(input [15:0] in, input [3:0] sel, output reg out);

  always @(*)

    case(sel)

      4'b0000: out = in[0];

      4'b0001: out = in[1];

      4'b0010: out = in[2];

      4'b0011: out = in[3];

      4'b0100: out = in[4];

      4'b0101: out = in[5];

      4'b0110: out = in[6];

      4'b0111: out = in[7];

      4'b1000: out = in[8];

      4'b1001: out = in[9];

      4'b1010: out = in[10];

      4'b1011: out = in[11];

      4'b1100: out = in[12];

      4'b1101: out = in[13];

      4'b1110: out = in[14];

      4'b1111: out = in[15];

    endcase

endmodule

-I- No. of output tokens:  385

RAG-based Architecture for an EDA Engineering Assistant

Now let’s review what a generative AI-based EDA assistant could look like. The architecture below is based on a RAG-based pattern where the architecture does a semantic search, provides additional context from search data chunks into the prompt, and leverages the LLM to summarize the response for the user. For building a semiconductor EDA-focused knowledge base, consider using a diverse range of data from reference manuals for tools, design methodology documents, PowerPoint presentations, markdown files, annotated and unannotated code bases, and internal tool/design support databases. Using this Engineering Assistant, design engineers will be able to interact with design tools, methodology documents, and internal ticketing systems using natural language. They will also get contextual answers that would help boost their productivity. Over the duration of the semiconductor design cycle, if each engineer saves several hours, those saved hours can be used for focusing on innovating the next chip. Referring back to the McKinsey & Company cost and labor analysis, another way to think about the return on investment is the number of person days saved at the annual average cost of $125,000.

Below is an outline of what the architecture will do:

  1. The user provides a question via the Python boto3 client-based script or a GUI app
  2. The question is converted into embedding using Amazon Bedrock via the Titan embeddings v2 model.
  3. The embedding is used to find similar documents from an Amazon OpenSearch Service Serverless index.
  4. Knowledge base with all EDA documents is used to hydrate the OpenSearch Vector DB
  5. The prompt is provided to Bedrock to generate a response using the Claude v2 model.
  6. The response along with the context is printed out on command line.

Security

For semiconductor companies, protecting intellectual property (IP) is critical. Loss of IP due to unauthorized access can result in financial loss, reputational damage, or even regulatory consequences. These potential consequences make controlling access to the data and the flow of data critical aspects of a well-architected design. In addition to other security controls available on AWS, Amazon Bedrock offers several capabilities to support security and privacy requirements to secure your generative AI applications.

Amazon Key Management Service (KMS)

With Amazon Bedrock, you have full control over the data you use to customize the foundation models for your generative AI applications. Your data is encrypted in transit and at rest. Additionally, you can create, manage, and control encryption keys using the AWS Key Management Service (AWS KMS). Identity-based policies provide further control over your data – helping you manage what actions users and roles can perform, on which resources, and under what conditions.

Data Protection, Compliance and Privacy

Amazon Bedrock helps ensure that your data stays under your control. When you tune a foundation model, we base it on a private copy of that model. This means your data is not shared with model providers, and is not used to improve the base models. You can use AWS PrivateLink to establish private connectivity from your Amazon Virtual Private Cloud (VPC) to Amazon Bedrock, without having to expose your VPC to internet traffic. Amazon Bedrock is in scope for common compliance standards including ISO, SOC, CSA STAR Level 2, is HIPAA eligible, and customers can use Bedrock in compliance with the GDPR.

Conclusion

In this blog, we highlighted the impact generative AI can have on the semiconductor industry and presented a variety of use cases across semiconductor design and verification. We discussed in detail the EDA Engineering Assistant use case based on a RAG-based architecture pattern leveraging Amazon Bedrock Python API, Knowledge Bases for Amazon Bedrock, and an open-source LLM. With generative AI, you can accelerate innovation and time to market for your semiconductor and electronic product development and augment design processes and flows by using Amazon Bedrock along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. You can also develop custom productivity apps for EDA flows, verification, physical design, and reduce development time while delivering powerful capabilities like automated coding, debug automation, design assistants, coverage closure, layout optimization and more. Start unlocking the benefits of generative AI to boost your productivity.

For information on Bedrock pricing, please visit https://aws.amazon.com/bedrock/pricing/. For more information on how AWS can help you, visit AWS Solutions for Semiconductor page.

Karan Singh

Karan Singh

Karan Singh is a Senior Solution Architect at AWS where he helps semiconductor customers leverage the cloud and AI for their workloads. Karan has 10 years of experience working in the semiconductor design with a focus on Standard Cell Design, Physical Design (Place and Route), Physical Verification and Design-for-Manufacturing (DFM). Prior to AWS, Karan has worked at Intel and STMicroelectronics. Karan holds a Bachelor of Science in Electrical and Instrumentation Engineering from Manipal University, a Masters in Science in Electrical Engineering from Northwestern University and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Umar Shah

Umar Shah

Umar Shah is the Head of Solutions at Amazon Web Services focused on the Semiconductor and Hitech industry workloads and has worked in the Silicon Valley for over 26 years. Prior to joining AWS, he was the ECAD manager at Lab126 where he created and delivered business and engineering best practices for Amazon EE teams. He has extensive experience in electronic sub-systems design, EDA design flow optimization, application engineering, project management, technical sales, technical writing, documentation & multimedia development, business development & negotiations, customer relations and business execution.