Customer Stories / Software & Internet / United States
Creating a Generative AI Search Engine for Programmers Using NVIDIA-Powered Amazon EC2 Instances with Phind
Learn how Phind created an intelligent search engine for programmers using NVIDIA GPU–based Amazon EC2 instances.
While they were still in college, Michael Royzen and Justin Wei saw the potential for using large language models (LLMs) together with high-quality context to build an intelligent search engine for developers. Software engineering requires a lot of specific knowledge, and traditional search engines often produce responses to technical questions that are too broad or shallow to help developers solve problems quickly. This led Royzen and Wei to build Phind, a generative artificial intelligence (AI)–based search engine where programmers and developers can get answers to their questions in 15 seconds instead of 15 minutes or longer.
Royzen and Wei built Phind on Amazon Web Services (AWS). To train its machine learning models, Phind uses Amazon Elastic Compute Cloud (Amazon EC2), which provides secure and resizable compute capacity for virtually any workload. Phind has grown consistently at a rate of 5–10 percent each week, and using instances powered by technology from NVIDIA, an AWS Partner, Phind has reduced the time it takes to start generating answers to developer questions by 75 percent and can complete answers eight times faster.
Opportunity | Using NVIDIA-Based Amazon EC2 Instances to Build a Search Engine for Phind
When the cofounders of Phind were seniors at the University of Texas at Austin, they saw an opportunity to use LLMs as a search engine by feeding them high-quality context. LLMs were becoming more accurate, but many still acted as summarizers. Royzen and Wei wanted to add to that functionality and create something that could act as an intelligent extension of the user, almost like a partner for them to work and reason through problems with.
The cofounders of Phind got access to AWS credits and used them to build their idea on AWS. “All of our main production infrastructure has always been on AWS,” says Michael Royzen, cofounder and CEO of Phind. “It works well, and because it works well, we have one less thing to worry about and can focus on the work that matters.” To train their LLMs, the cofounders spun up Amazon EC2 Spot Instances, used to run fault-tolerant workloads while saving up to 90 percent compared with on-demand prices. They built an index of text thousands of gigabytes in size on a tight budget using NVIDIA GPU instances on AWS.
In January 2022, they launched the first demo of their internet-connected LLM that could answer developer questions. The demo was a success and ran for 10 days, but it was too expensive for two college students to keep up. The cofounders then secured funding through Y Combinator, a startup accelerator, and launched Phind in July 2022.
All of our main production infrastructure has always been on AWS. It works well, and because it works well, we have one less thing to worry about and can focus on the work that matters.”
Cofounder and CEO, Phind
Solution | Generating Search Answers Eight Times Faster Using Amazon EC2 P5 Instances
When a developer uses Phind, they can type in a complex coding question to the search bar and receive an answer generated by AI along with links on the side to online sources. Phind can answer coding and development-related questions with much more precision than a generic search engine because it is trained in this context. The company trained its LLMs using NVIDIA A100 Tensor Core GPU-powered Amazon EC2 P4d Instances—which deliver high performance for machine learning training and high performance computing applications in the cloud—and NVIDIA H100 Tensor Core GPU-powered Amazon EC2 P5 Instances, the highest performance GPU-based instances for deep learning and high performance computing applications. Phind open sourced one of its models in Summer 2023, and it outperformed other open-source coding models and ChatGPT 4 on OpenAI’s HumanEval Benchmark, scoring 74.7 percent.
To orchestrate its LLM training, Phind runs Amazon EC2 P4d and Amazon EC2 P5 Instances on connected nodes using AWS ParallelCluster, an open-source cluster management tool used to deploy and manage high performance computing clusters on AWS. The company uses the same nodes for both training and inference. “We love using AWS ParallelCluster because it works quickly and reliably, and that’s all we can ask for,” says Royzen. In terms of inference speed, Phind has compared NVIDIA-based Amazon EC2 instances with other options and found the NVIDIA-powered instances to be 2–4 times faster for its workload.
Speed is important for Phind as a search engine, and the company uses two metrics to track it. One is time to first token, meaning the amount of time it takes between a user hitting the search button and the first word appearing on their screen. This measures the time it takes for the model to do the groundwork to generate an answer. Phind has reduced the time to first token by 75 percent using NVIDIA-powered Amazon EC2 Instances. The second metric is tokens per second, which measures how fast the model completes an answer. Using NVIDIA’s TensorRT-LLM, a software development kit for high performance deep learning inference, on Amazon EC2 P5 Instances, Phind has increased its tokens per second speed by eight times.
Phind has been growing at a pace of 5–10 percent per week in daily active users and daily actions across its products. The company wants to continue using AWS and NVIDIA for the long term to support that growth. “Everything that AWS does comes from a perspective of customer obsession, how they can make things better for end users, and we think about our users the same way,” says Royzen. Using AWS and NVIDIA, Phind can set up its infrastructure to achieve the best possible user experience.
Outcome | Continuing to Optimize for Speed with NVIDIA on AWS
Phind is continuing to make its models faster and will continue using NVIDIA-powered Amazon EC2 P4d and Amazon EC2 P5 instances to achieve its goal. Using NVIDIA’s new TensorRT library, Phind expects to make its models five times faster than previously. While the company is still determining the best split of LLM training and inference across Amazon EC2 instances, it is very pleased with the overall performance.
“Without AWS and NVIDIA, we wouldn’t exist,” says Royzen. “NVIDIA-powered AWS infrastructure is fast, available, scalable, and dynamic, and AWS services are part of the lifeblood of our company.”
Phind is a search industry startup that created an intelligent search engine designed to answer programmers’ complicated questions using large language models and generative AI to present the answer alongside links to sources.
AWS Services Used
Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.
Amazon EC2 P4d Instances
Amazon Elastic Compute Cloud (Amazon EC2) P4d instances deliver high performance for machine learning (ML) training and high performance computing (HPC) applications in the cloud.
Amazon EC2 P5 Instances
Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, powered by the latest NVIDIA H100 Tensor Core GPUs, deliver the highest performance in Amazon EC2 for deep learning (DL) and high performance computing (HPC) applications.
AWS ParallelCluster is an open source cluster management tool that makes it easy for you to deploy and manage High Performance Computing (HPC) clusters on AWS.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.