AWS for Industries

[In the News] Machine Learning is Transforming Drug Discovery at AstraZeneca

This article originally appeared on Wired.


Disease understanding and drug discovery is based on scientific rigor. And while it’s work that is potentially life changing, it can also be tedious and time-consuming.

Few know this better than Magnus Söderberg, a pathology researcher with more than 20 years of lab experience and the director and technical lead at UK-and Sweden-based biotechnology company AstraZeneca. When developing a new candidate drug, Söderberg and his team might spend weeks hunched over microscopes, manually cataloging samples. One by one, they peer at tiny slices of kidney tissue taken from lab animals and categorize the condition of each to determine whether the candidate drug was both effective and safe.

Experts who specialize in this, like Söderberg, are at a premium mostly because this kind of work requires years of experience and specific expertise. And on top of that, it’s exhaustive and time-consuming. For example, in a preclinical study testing the efficacy of a new candidate drug it may take up to 3 weeks to read all the tissue sections. It’s not only time that could be better spent focusing on drug discovery—it also slows down the release of promising new drugs to patients who need them.

AstraZeneca has been experimenting with machine learning across all stages of research and development, and most recently in pathology to speed up the review of tissue samples. The machine learning models first learn from a large, representative data set. Labeling the data is another time-consuming step, especially in this case, where it can take many thousands of tissue sample images to train an accurate model. AstraZeneca uses Amazon SageMaker Ground Truth—a machine learning-powered, human-in-the-loop data labeling and annotation service—to automate some of the most tedious portions of this work, resulting in reduction of time spent cataloging samples by 50 percent.

The outcome? The acceleration of the drug research process and the introduction of medicines to the market more rapidly.

Unlocking New Pharma Insights

Since 2017, Söderberg has spearheaded a pilot project to verify the promise of machine learning in AstraZeneca’s research—starting with novel candidate drugs that have the potential to reduce kidney damage, a condition that affects many patients with diabetes. Researchers set out to test whether machine learning can help analyze and classify tissue samples as well as (if not better than) a human—and at a faster rate. To get the project running, AstraZeneca partnered with the AWS Machine Learning Solutions Lab to develop the model for object detection and recognition in images.

“The accuracy of our image detection and recognition model is highly dependent on the quality of the training dataset, so spending the time and resources to ensure highly accurate data labeling is essential,” says Christos Matsoukas, industrial PhD student supported by Royal Institute of Technology/Swedish WASP initiative and aligned to the project.

This is where Amazon SageMaker Ground Truth comes in—helping annotate, collect and classify training data quickly. The service uses machine learning in parallel with human labelers to learn how to label data for the specific task at hand, ultimately taking over the vast majority of the labeling work so that humans can focus on other more creative tasks. This is a novel approach to drug research since AstraZeneca is using the collective wisdom of experienced pathologists—taught to a machine—to dramatically increase the pace of research.

A key goal of the project, beyond saving time, is to give AstraZeneca researchers access to better data and more accurate analysis derived from it. AstraZeneca isn’t new to utilizing image analysis tools in some areas of the company, but the pilot program has the promise to categorize and conceptualize the data the company is collection as a whole.

“Imaging tools are becoming the backbone of the entire drug discovery and development industry,” says Richard Goodwin, AstraZeneca’s Director for Molecular Imaging. “But they’re generating so much new data that we can no longer rely on a human to be the expert that can interpret all of that data.”

In fact, the amount of information locked up in an image is now so extensive and complex that even highly trained researchers like Söderberg and his team are unable to see the relationships and patterns through reasoning and intuition alone.

“The kidney is especially complex, and doing a manual, quantitative scoring of each kidney just isn’t enough for scientific purposes anymore,” says Söderberg.

The goal of AstraZeneca’s pilot project is to use its AWS-built model across 2,000 tissue samples to identify key kidney features reliably and reproducibly.

Eyes on the Prize

The results have been striking and immediate. Söderberg says that the data labeled by the models has allowed AstraZeneca to develop techniques that can accurately identify the key structures affected by diabetic injury with a 95% success rate and minimal false negatives (failure to spot a key feature)—a higher rate of success than he’d expected. By offloading this tedious work to a machine, the total time required to analyze a set of tissue samples is halved, in the process freeing up scientists’ time so they can do more research.

Of course, saving time is only a small part of the benefit provided by machine learning.

“We look at these samples not as images but as complex networks of information,” says Goodwin of how the datasets inform the entire process—from discovery to development to patient monitoring. “We have to understand where our candidate drugs are going in the body, and how they’re affecting the tissues and [the relationships within its cell structures]. The ultimate goal is that with machine learning we can understand drug safety and efficacy much earlier, so we can speed up the drug discovery pipeline and bring safer, better medicines to the market more quickly.”

Söderberg expects to bring this technology into regular use across AstraZeneca labs, where it is poised to become an essential tool towards drug discovery across the company. As for Söderberg, his next dataset will expand from 2,000 samples to roughly 100,000—and likely expand to other disease models beyond diabetes, while simultaneously making the system available to other outside experts.

“We want to unlock all the hidden potential in the data and research we’re already doing,” says Goodwin. “It’s the start of a broad change in our business—using all of this imaging and molecular data along with AI to develop a holistic approach, one that lets our scientists interact and interpret the data in ways that have never been possible.”


Learn more about AWS in biopharma.


leading cloud innovator stories

Kelli Jonakin, Ph.D.

Kelli Jonakin, Ph.D.

Kelli Jonakin is the Worldwide Head of Marketing for Healthcare, Life Sciences, and Genomics Industry verticals at AWS. She comes with a background in pharmaceutical research, with a special focus on development and commercialization of biologics. Kelli received her Ph.D. in Pharmacology and Systems Biology from the University of Colorado, and received an NIH post-doctoral fellowship grant to study Biochemistry at the University of Wisconsin-Madison.