Gilead logo

Gilead Accelerates Development of Enterprise Search Tool Using Machine Learning on AWS


Biotechnology company Gilead Sciences Inc. (Gilead) wanted to increase staff productivity and streamline internal data management processes within its pharmaceutical development and manufacturing (PDM) business unit so that it could quickly roll out more therapeutic treatments for people with life-threatening diseases. To work toward this goal, the company wanted to build a scalable enterprise search tool that uses artificial intelligence (AI) and machine learning (ML) to provide predictive analytics and find important documents, knowledge, and data in one centralized location. For the tool to consistently produce relevant results with each natural language query, the company needed a set of solutions that would organize both structured and unstructured data from up to nine enterprise systems and documents from knowledge repositories.

To accelerate its project timeline, Gilead’s PDM team chose Amazon Web Services (AWS), adopting Amazon Kendra, a highly accurate intelligent search service powered by ML. While receiving support from AWS, the PDM team built a data lake within 9 months, and afterward, it built a search tool within only 3 months, completing its project well within its estimated timeline of 3 years. Since launching its enterprise search tool, users across PDM have been able to substantially reduce manual data management tasks and the amount of time it takes to search for information by approximately 50 percent, fueling research, experimentation, and pharmaceutical breakthroughs.

medical laboratory, scientist hands using microscope for chemistry ,biology test samples,examining liquid,Doctor equipment,Scientific and healthcare research background.vintage color

Amazon Kendra is a turnkey AI solution that, when configured correctly, is capable of spanning every single domain in the organization while being straightforward to implement."

Jeremy Zhang
Director of Data Science and Knowledge Management, Gilead Sciences Inc.

Gaining Support from the Amazon Machine Learning Solutions Lab

Headquartered in Foster City, California, Gilead specializes in the research and development of antiviral technology and pharmaceuticals, including potential treatments for HIV and viral hepatitis. In April 2021, the data science team within Gilead’s manufacturing business unit conceptualized Morpheus, an enterprise search tool that would use AI and ML to quickly draw pertinent information and insights from roughly 250,000 documents and 1 TB of unstructured data. A project team consisting of data scientists and engineers was formed within PDM dedicated to bringing this idea to life so that its researchers and scientists could gain deeper insights from regulatory, compliance, supply chain, and manufacturing data in order to accelerate their ability to bring lifesaving drugs to patients.

The Morpheus team faced a significant challenge to bring data from many enterprise systems together in order to implement a single AI and ML strategy for knowledge finding. “We recognized that we had the opportunity to innovate in the knowledge AI space at Gilead by designing and implementing infrastructure that would put together the data, knowledge, and information required to build AI search at scale,” says Jeremy Zhang, director of data science and knowledge management at Gilead.

To develop an enterprise search tool, the Morpheus task force engaged the Amazon Machine Learning Solutions Lab, which pairs an organization’s teams with ML experts to help identify and build ML solutions to address the organization’s highest return-on-investment ML opportunities. By collaborating with the Amazon ML Solutions Lab team, the task force deepened its understanding of cloud best practices and learned how to design and run proofs of concept. The team also learned about Amazon Kendra. “Amazon Kendra is a turnkey AI solution that, when configured correctly, is capable of spanning every single domain in the organization while being straightforward to implement,” says Zhang. Within 4 weeks, the team decided to move forward with developing the enterprise search tool entirely on AWS.

Building Its Morpheus Application to Catalyze Organizational Change

Gilead’s PDM team kicked off the Morpheus project by building a data lake using Amazon Simple Storage Service (Amazon S3), an object storage service offering industry-leading scalability, data availability, security, and performance. This data lake acts as a centralized repository for storing all of PDM’s unstructured data at virtually any scale. “In order to have an enterprise search tool on AWS, we had to have robust data management around it,” says Zhang. “So we built a data lake on AWS in 9 months—something that many consider should have taken many more years to implement.” The company uses the data lake not only as the basis for its AI and ML but also to run analytics and gain in-depth insights from data across development and manufacturing. Previously, Gilead’s teams had to submit tickets to its information technology team for analytics, and in some cases, it would take up to 1 year to fulfill the requests. Now the company can provide analytics and AI inferences within a few business days.

Next, the PDM team focused on enriching its searches by filling in missing or incomplete metadata for its documents tool using Amazon SageMaker, which helps users build, train, and deploy ML models for virtually any use case with fully managed infrastructure, tools, and workflows. Using this solution, Gilead has made it easier for its researchers to search for pertinent information with a few keywords. The company also uses Amazon Textract, an ML service that automatically extracts text, handwriting, and data from scanned documents. Gilead uses Amazon Textract to detect relevant information in its documents, and it has reduced associated costs by orders of magnitude per operation compared to its previous optical character recognition solution. “Amazon Textract is really nice, not only because of the real cost savings but also because its technical capability to extract information is extraordinary,” says Zhang.

The team also uses Amazon Kendra with its application to search for results from its data lake. In doing so, Gilead has been able to reduce the amount of time that it takes to search for relevant information across systems by roughly 50 percent, increasing staff productivity and streamlining its teams’ workflows. “Using Amazon Kendra is a big efficiency gain. With it, our team has reduced the number of places that people need to go to find the right information,” says Zhang.

In November 2021, the team was able to launch its Morpheus application, completing the first phase of its project with a core team of 5 employees. Since then, the application has been a catalyst for organizational change. Within 3 months of its launch, over 100 employees have adopted the enterprise search tool. “Morpheus got us over the idea that we have to do library science or ontology to organize and find knowledge,” says Zhang. “And it’s become an easy way to demonstrate the value of AI and ML to senior leadership.”

Deriving More Value from AI and ML Technologies

The development and manufacturing team within Gilead is currently working toward improving its data lake to achieve GxP compliance, including compliance with good manufacturing practices, and it projects that it will have finished restructuring the data lake by June 2022. The company also plans to build out more AI and ML technology for providing predictive metadata, personalized AI, and knowledge graphs. “Morpheus gives us awareness of how using a tool of this size and scale benefits the whole organization,” says Zhang. “It’s really helping us to understand how Gilead can use data science to drive the next wave of value that we can derive from AI and ML on AWS.”

About Gilead Sciences

Headquartered in Foster City, California, biotechnology company Gilead specializes in the research and development of antiviral technology and pharmaceuticals, including potential treatments for HIV and viral hepatitis, as well as potential COVID-19 treatments.

Benefits of AWS

  • Built an enterprise search tool that uses AI and ML in less than 1 year
  • Created a data lake that acts as a repository for nine different enterprise systems
  • Reduced manual tasks related to data management
  • Cut search times by roughly 50%
  • Streamlined internal workflows, increasing staff productivity
  • Gains in-depth analytics and insights within a few days
  • Increased cost savings
  • Catalyzed organizational change 

AWS Services Used

Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning. Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization.

Learn more »

Amazon SageMaker

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Learn more »

Amazon Textract

Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

Learn more »

Amazon Simple Storage Service (Amazon S3)

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.