AWS for Industries

2023: a turning point for ML in life sciences

To our many customers and partners who work in Life Sciences, welcome to 2023, a year that I expect to be the turning point in how data science and machine learning (ML) accelerate development of new life-saving therapies. (I’ll share 2023 thoughts on Healthcare in an upcoming post.)

The first reason I’m so optimistic is that ML is moving from proof-of-concept or departmental-level use to an enterprise-wide capability. Prior to COVID, Moderna had built ML on AWS into their Drug Design Studio, which then enabled Moderna to complete the sequence for their mRNA COVID-19 vaccine in just 2 days. During COVID many of you saw how ML could speed up a specific step in the R&D process for a given vaccine or therapy, such as identifying drug targets; screening compounds, genes, or antibodies; and designing clinical trials. We are now seeing large biopharma customers apply ML in multiple steps of the R&D process, and doing so simultaneously in multiple therapeutic areas. In the ML Keynote at our recent re:Invent conference, Anna Berg Åsberg, AstraZeneca Global VP, R&D IT, shared how AstraZeneca is using Amazon SageMaker to democratize and scale up use of AI/ML across all of R&D. The company can now run hundreds of concurrent data science ML projects and 110 billion statistical tests in under 30 hours. This type of step-function acceleration requires 1. executive leadership, 2. collaboration across teams, and 3. adoption of standardized tools, including tools that empower people who aren’t data scientists to make predictions. We are seeing all 3 ingredients now in place at many large biopharma customers.

The second reason I’m so optimistic surrounds the  the “data” part of data science: you can now find, store, access and use the specific data you need much more quickly and cost effectively. Sometimes the data you need already exists in-house, but a team member doesn’t know how to find it, or they lack permission to use it. In October, this blog showed how you can use a data mesh architecture to address these challenges. At re:Invent  Gilead shared how and why they implemented data mesh in their organization to be more efficient. And  we also previewed Amazon Data Zone service, which enables controlled (governed) data access across organizational boundaries- a big step forward.

But, teams often need data that’s not already in-house, such as with collaborations between biopharma companies and university labs, biotechs, or ML-based Life Sciences startups. A Deloitte study found that almost half of forecast revenues from the late-stage pipeline are now generated through collaborations and scientific partnerships. In 2022, Roche and Recursion Pharmaceuticals announced such a collaboration, as Sanofi did with Exscientia and Insilico.  However, other collaborations can be inhibited because the data owner (often a university lab) is not comfortable sharing the data for fear that it might be used improperly. That’s why we were excited recently to preview AWS Clean Rooms, which enables collaboration without sharing of the data or revealing of underlying data.

Another major need for data not already in-house is Real World Data (RWD), which the FDA uses to monitor postmarket safety and adverse events, payers use to support coverage decisions, and biopharma companies use to design clinical trials. Biopharma customers often tell us their team members spend 80% of their time finding, procuring, and cleaning such data, and only 20% of their time answering science or business questions. In 2022 we began to see many customers address these challenges by using AWS Data Exchange, as AWS customers Takeda and Moderna talked about at re:Invent.

Whether the data is in-house or not, finding the data is not enough. You also need to store it and analyze it togain insights. For years customers have shared that it can be too expensive and difficult to store and analyze certain types of research data like genomic sequences and medical images. In November we announced three new services that that directly address these challenges: Amazon Omics,  Amazon HealthLake Imaging, and Amazon Healthlake Analytics. In addition to greatly reducing costs for many users, these new services can automatically normalize raw health data from multiple disparate sources into an analytics and interoperability-ready format in a matter of minutes.

With the introduction of these new services and features, as well as all of our comprehensive data services, it’s hard not to be excited for 2023! In fact, we have an entire team at AWS, as well as an extensive network of partners to help our Life Science customers build data foundations that will meet their needs now—and well into the future. Best wishes for a happy and healthy new year!

To learn more about how AWS works with life science organizations visit

Dan Sheeran

Dan Sheeran

Dan leads AWS' Healthcare and Life Sciences Industry Business Unit (HCLS IBU), which supports all AWS customers in Life Sciences, Medical Devices, Payors, Data Services and Healthcare ISVs and OEMs. The HCLS IBU helps customers leverage AWS cloud and machine learning services, and solutions from AWS Partners, to discover and develop new therapies, diagnostics and devices, and to deliver healthcare more efficiently with improved patient outcomes. Prior to joining AWS in 2019 Dan founded and led two digital health startups focused on telehealth and machine learning for chronic disease prevention and management. Dan lives in the Seattle area. He has an MBA from Northwestern University and BS from Georgetown University.