AWS for Industries
Driving Innovation in Drug Discovery Using Generative AI with Bayer
At Bayer, the quest to accelerate drug discovery and deliver critical therapies to patients faster has taken an innovative leap forward with the power of generative artificial intelligence (AI). Recognizing the potential to dramatically reduce costs and time in predicting chemical reaction conditions, Bayer engaged Amazon Web Services (AWS) to explore the use of generative AI to predict chemical reaction conditions better, aiming to provide deeper context around the intricacies of chemical processes for accelerating drug discovery.
In a dynamic 6-week engagement, the AWS Prototyping and Customer Engineering (PACE) team rose to Bayer’s challenge, using its creativity and ingenuity to find potential solutions using generative AI. The AWS PACE team delivered solutions that will serve as the foundation for further advancements for Bayer. The team developed a chatbot that answers scientists’ queries in natural language, saving the time and effort of scouring databases. And one of its prototype models is already predicting accurate chemical reaction conditions, a significant first step in the use of generative AI to enhance drug discovery.
These achievements not only lay the foundation for Bayer’s continued advancements in drug discovery but also showcase the transformative potential of generative AI to narrow infinite possibilities down to a manageable set of promising outcomes.
Using Amazon SageMaker to Train State-of-the-Art Models That Predict Reaction Conditions for Bayer
The study of chemical reactions—in other words, how molecules interact—is the cornerstone of discovery of new therapeutics. It demands a meticulous understanding of the precise conditions required for molecular transformations. Scientists must navigate a complex landscape of solvents, reagents, catalysts, and environmental factors—such as pressure and temperature—to unlock the secrets of these interactions, a costly and labor-intensive process.
Generative AI can analyze vast datasets of chemical reactions to predict optimal conditions for novel compounds, potentially reducing the time and resources needed for experimental trials. By learning patterns from successful reactions, AI models can suggest promising reaction parameters, catalysts, and solvents, helping researchers to focus on the most promising pathways for synthesizing new drug candidates. With this concept in mind, Bayer asked the AWS PACE team to explore ways in which generative AI could revolutionize the process of predicting reaction conditions.
To bring this idea to life, the AWS PACE team started with an intense deep dive into chemistry. In a short time, the team read textbooks and academic papers, engaged with consultants inside and outside of AWS, and spoke daily with Bayer scientists. During its research, the team came across a scientific paper that illustrated how to predict a chemical product using transformer technology, a method that focuses on tokens—the smallest units of data that a model processes—and changes an input sequence into an output sequence by altering one token at a time. The team identified that it could use transformer technology, which is at the heart of many current generative AI solutions, as a starting point for reaction condition prediction.
Predicting Chemical Reaction Conditions Using Generative AI
The team built three prediction methods in the final 4 weeks of the engagement. First, it developed a custom transformer encoder-decoder model, an architecture with optimal input representation and bidirectional context. At Bayer’s request, the team used a publicly available centralized repository for organic reaction data as a training dataset. It trained the model using Amazon SageMaker, which lets organizations build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. The resultant prototype uses generative AI to predict chemical reaction conditions with high accuracy. And Bayer likely can improve the model’s accuracy substantially, as it will now train on Bayer’s proprietary dataset of higher quality.
As a different approach, the AWS PACE team then experimented with a decoder-only model, a more versatile architecture than that of encoder-decoder models. The team trained the decoder-only model from scratch using the architecture of a state-of-the-art and publicly available large language model (LLM). Although the decoder-only model demonstrated just a 60 percent accuracy rate in predicting the right reaction conditions, it provided Bayer chemists and data scientists with a solid starting point for further experimentation with this more versatile model architecture.
After building prototypes based on custom transformer models, the AWS PACE team experimented with fine-tuning a foundational model—a pre-trained LLM—with the organic reaction dataset to see if it could improve the accuracy of predictions. The team chose three model sizes ranging from 7B to 70B, accessing these models through Amazon SageMaker JumpStart, an ML hub with foundation models, built-in algorithms, and prebuilt ML solutions that deploy with just a few clicks. Although the fine-tuned LLM did not reach the performance levels of the custom models, the experiment provided valuable insights into the limitations of using general-purpose LLMs to predict chemical reaction conditions.
Additionally, the AWS PACE team created a chatbot that Bayer chemists can use to query the dataset using natural language. It built the chatbot on Amazon Kendra, an intelligent enterprise search solution. In the future, Bayer can make its own chemical reaction data available to the chatbot, building further value for scientists. “It was eye-opening to see what can be done if you have seasoned professionals working on your problem, even if it may not be their domain,” says Giulio Volpin, Scientist, Process Chemistry at Bayer.
Paving the Way for Future Innovation in Drug Discovery Using Generative AI
Now that the AWS PACE team has demonstrated the value of generative AI in drug discovery, Bayer can begin to alleviate the burden on lab scientists. In March 2024, AWS passed its code on to Bayer’s Applied Mathematics team, which will transfer the learnings to other projects. Ultimately, Bayer hopes that more accurate predictions of chemical reaction conditions will lead to a more efficient drug discovery process, ultimately helping patients to receive therapies faster.