AWS for Industries

Driving Innovation in Drug Discovery Using Generative AI with Bayer

At Bayer, the quest to accelerate drug discovery and deliver critical therapies to patients faster has taken an innovative leap forward with the power of generative artificial intelligence (AI). Recognizing the potential to dramatically reduce costs and time in predicting chemical reaction conditions, Bayer engaged Amazon Web Services (AWS) to explore the use of generative AI to predict chemical reaction conditions better, aiming to provide deeper context around the intricacies of chemical processes for accelerating drug discovery.

In a dynamic 6-week engagement, the AWS Prototyping and Customer Engineering (PACE) team rose to Bayer’s challenge, using its creativity and ingenuity to find potential solutions using generative AI. The AWS PACE team delivered solutions that will serve as the foundation for further advancements for Bayer. The team developed a chatbot that answers scientists’ queries in natural language, saving the time and effort of scouring databases. And one of its prototype models is already predicting accurate chemical reaction conditions, a significant first step in the use of generative AI to enhance drug discovery.

These achievements not only lay the foundation for Bayer’s continued advancements in drug discovery but also showcase the transformative potential of generative AI to narrow infinite possibilities down to a manageable set of promising outcomes.

Using Amazon SageMaker to Train State-of-the-Art Models That Predict Reaction Conditions for Bayer

The study of chemical reactions—in other words, how molecules interact—is the cornerstone of discovery of new therapeutics. It demands a meticulous understanding of the precise conditions required for molecular transformations. Scientists must navigate a complex landscape of solvents, reagents, catalysts, and environmental factors—such as pressure and temperature—to unlock the secrets of these interactions, a costly and labor-intensive process.

Generative AI can analyze vast datasets of chemical reactions to predict optimal conditions for novel compounds, potentially reducing the time and resources needed for experimental trials. By learning patterns from successful reactions, AI models can suggest promising reaction parameters, catalysts, and solvents, helping researchers to focus on the most promising pathways for synthesizing new drug candidates. With this concept in mind, Bayer asked the AWS PACE team to explore ways in which generative AI could revolutionize the process of predicting reaction conditions.

To bring this idea to life, the AWS PACE team started with an intense deep dive into chemistry. In a short time, the team read textbooks and academic papers, engaged with consultants inside and outside of AWS, and spoke daily with Bayer scientists. During its research, the team came across a scientific paper that illustrated how to predict a chemical product using transformer technology, a method that focuses on tokens—the smallest units of data that a model processes—and changes an input sequence into an output sequence by altering one token at a time. The team identified that it could use transformer technology, which is at the heart of many current generative AI solutions, as a starting point for reaction condition prediction.

Predicting Chemical Reaction Conditions Using Generative AI

The team built three prediction methods in the final 4 weeks of the engagement. First, it developed a custom transformer encoder-decoder model, an architecture with optimal input representation and bidirectional context. At Bayer’s request, the team used a publicly available centralized repository for organic reaction data as a training dataset. It trained the model using Amazon SageMaker, which lets organizations build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. The resultant prototype uses generative AI to predict chemical reaction conditions with high accuracy. And Bayer likely can improve the model’s accuracy substantially, as it will now train on Bayer’s proprietary dataset of higher quality.

As a different approach, the AWS PACE team then experimented with a decoder-only model, a more versatile architecture than that of encoder-decoder models. The team trained the decoder-only model from scratch using the architecture of a state-of-the-art and publicly available large language model (LLM). Although the decoder-only model demonstrated just a 60 percent accuracy rate in predicting the right reaction conditions, it provided Bayer chemists and data scientists with a solid starting point for further experimentation with this more versatile model architecture.

After building prototypes based on custom transformer models, the AWS PACE team experimented with fine-tuning a foundational model—a pre-trained LLM—with the organic reaction dataset to see if it could improve the accuracy of predictions. The team chose three model sizes ranging from 7B to 70B, accessing these models through Amazon SageMaker JumpStart, an ML hub with foundation models, built-in algorithms, and prebuilt ML solutions that deploy with just a few clicks. Although the fine-tuned LLM did not reach the performance levels of the custom models, the experiment provided valuable insights into the limitations of using general-purpose LLMs to predict chemical reaction conditions.

Additionally, the AWS PACE team created a chatbot that Bayer chemists can use to query the dataset using natural language. It built the chatbot on Amazon Kendra, an intelligent enterprise search solution. In the future, Bayer can make its own chemical reaction data available to the chatbot, building further value for scientists. “It was eye-opening to see what can be done if you have seasoned professionals working on your problem, even if it may not be their domain,” says Giulio Volpin, Scientist, Process Chemistry at Bayer.

Paving the Way for Future Innovation in Drug Discovery Using Generative AI

Now that the AWS PACE team has demonstrated the value of generative AI in drug discovery, Bayer can begin to alleviate the burden on lab scientists. In March 2024, AWS passed its code on to Bayer’s Applied Mathematics team, which will transfer the learnings to other projects. Ultimately, Bayer hopes that more accurate predictions of chemical reaction conditions will lead to a more efficient drug discovery process, ultimately helping patients to receive therapies faster.

Oiendrilla Das

Oiendrilla Das

Oiendrilla Das is Customer Advocacy Lead for Life Sciences and Genomics Marketing for AWS. She comes from a background in life sciences marketing, with a specialty focus on life sciences and cloud computing. Oiendrilla holds an MBA degree in marketing and completed her engineering in Biotechnology prior to her MBA degree.

Giulio Volpin

Giulio Volpin

Head Of Laboratory, Process Research at Bayer Crop Science

Juan Antonio Dominguez

Juan Antonio Dominguez

Juan Antonio Dominguez is a Sr. Customer Solutions Manager with 15 years of experience in Life Sciences at Amazon Web Services. He relentlessly focuses on supporting customers on achieving their business outcomes, solve complex challenges, and developing new business models through the technology. He also holds and Executive MBA by IESE Business School.

Marcilio Mendonca

Marcilio Mendonca

Marcilio Mendonca is a Sr. Solutions Developer in the Prototyping And Customer Engineering (PACE) team at Amazon Web Services. He is passionate about helping customers rethink and reinvent their business through the art of prototyping, primarily in the realm of modern application development, Serverless and AI/ML. Prior to joining AWS, Marcilio was a Software Development Engineer with Amazon. He also holds a PhD in Computer Science

Oliver Schaudt

Oliver Schaudt

Senior Project Manager Applied Mathematics

Otto Kruse

Otto Kruse

Otto Kruse is a Principal Solutions Developer within AWS Industries – Prototyping and Customer Engineering (PACE), a multi-disciplinary team dedicated to helping large companies utilize the potential of the AWS cloud by exploring and implementing innovative ideas. Otto focuses on application development and security.

Stefan Appel

Stefan Appel

Stefan Appel is a Senior Solutions Architect at AWS. For 10+ years, he supports enterprise customers adopt cloud technologies. Before joining AWS, Stefan held positions in software architecture, product management, and IT operations departments. He began his career in research on event-based systems. In his spare time, he enjoys hiking and has walked the length of New Zealand following Te Araroa.