AWS for Industries

Revolutionizing Generative Biology with AWS and EvolutionaryScale

Today, AWS is excited to announce a collaboration with EvolutionaryScale to bring their new frontier language models for biology to scientists and researchers advancing applications from drug discovery to carbon capture, and more.

With this announcement, we are making EvolutionaryScale’s ESM3, a frontier, state-of-the-art language model family, available on AWS. This collaboration brings the best of EvolutionaryScale’s frontier models, including the generative, multimodal ESM3 model family, to AWS’s industry-leading infrastructure, enterprise-grade security, privacy measures, purpose-built services for health and generative AI, and generative AI capabilities (fine-tuning, guardrails, and more), where life science and biotech research is being done today. This includes the hundreds of thousands of AI/ML customers and 9 of the top 10 pharmaceutical organizations already using AWS for generative AI and ML, to help further democratize this work.

With foundation models like ESM3, researchers can generate complex multi-domain proteins from scratch, create protein design workflows, and incorporate functional understanding. ESM3’s powerful capabilities enable the creation of entirely new proteins that have never existed in nature, allowing scientists and researchers to take a novel “programmable biology” approach, potentially reducing the time and cost of bringing new therapeutics to market by years and billions of dollars.

Customers can easily get started with ESM3 through Amazon SageMaker, and orchestrate fully automated end-to-end drug discovery workflows through AWS HealthOmics, with support for Amazon Bedrock, the easiest way to build and scale generative AI applications with foundation models, coming later this year.

Strong momentum for generative AI in life sciences:

AWS is at the forefront of accelerating generative AI innovation across diverse industries, enabling organizations to harness the power of large language models (LLMs) and foundation models (FMs). With tens of thousands of customers leveraging Amazon Bedrock for easy access to the broadest set of high-performing generative AI models, and hundreds of thousands of customers using Amazon SageMaker, which offers hundreds of pre-trained models, AWS simplifies the process of building and scaling generative AI applications. We are seeing tremendous excitement and momentum around generative AI in the life sciences space, with customers transforming their businesses; from small scale process automation to fundamentally changing the way they perform research and discovery. For example, AstraZeneca is accelerating the transformation of drug discovery and precision medicine using genomics, so that researchers can turn insights into science faster. Gilead is generating insights from key datasets that accelerate the analysis of large quantities of unstructured information from a variety of sources across their enterprise. Using services like Bedrock and SageMaker, Pfizer has deployed AI solutions to create medical/scientific content and patent applications, enabling breakthroughs to reach patients faster while potentially saving up to $1 billion annually.

Which is why today, I’m excited to share a new joint initiative with the goal of transforming research and development for our life science customers, and the announcement of our go-to-market collaboration with EvolutionaryScale. EvolutionaryScale is the leading team in training and applying frontier language models for biology, pioneering the Evolutionary Scale Modeling (ESM) family of models; one of the first applications of large language modeling to biological data. They have achieved major milestones in applying generative AI to biology, such as developing the first biology-specific transformer language models, scaling laws, and structure prediction methods for biological sequences. Today, EvolutionaryScale announced the launch of ESM3, a first-of-its-kind generative, multimodal language model family, opening up an entirely new frontier in this critical field.

How generative AI stands to change biology:

Viewing biological sequences as the ‘language of life’ opens up exciting possibilities for applying generative AI techniques to the field of protein engineering and design. Just as large language models can be trained on vast text datasets to become helpful assistants exhibiting language understanding, generative biology models can learn the “language” of proteins by training on extensive protein sequence data. By understanding the patterns and relationships within these sequences, these models are capable of generating novel, functional protein sequences tailored for applications like drug design, enzyme engineering, or synthetic biology. While proteins are different from text due to their three-dimensional structures, this highlights the potential for leveraging the power of generative AI to accelerate discoveries and innovations in fields revolving around the building blocks of life. This technology is at the heart of transformation that ties together evolution, molecular biology, artificial intelligence, medicine, and health.

ESM3: A breakthrough in generative biology

EvolutionaryScale’s ESM3 is a groundbreaking frontier generative model for biology that reasons jointly over sequence, structure, and function, a capability unmatched by previous protein language models. Trained on multiple modalities and billions of protein sequences spanning 3.8 billion years of evolution, ESM3 can understand complex biological data from various sources and generate entirely new proteins that have never existed in nature. The ESM3 model family includes three proprietary models (98B parameter, 7B parameter and 1.4B parameter versions) and one open-source model (1.4B parameter), with the open-source version available to AWS customers today on Amazon SageMaker and AWS HealthOmics and coming later this year to Amazon Bedrock.

Through the ESM3 models, customers can:

  1. Generate complex, multi-domain proteins from scratch, where ESM3 understands the language of sequences, structures, and functions.
  2. Create protein design workflows, allowing researchers to design individual domains based on different modalities and compose them into novel proteins.
  3. Incorporate antibody understanding: ESM3 excels at understanding antibody sequence and structure, enabling in silico operations like diversification, optimization, and directed evolution.

Fluorescent protein esmGFP — created with ESM3

Safely democratizing biological foundation models (bFMs):

Similar to general-purpose LLMs and FMs, AWS’s comprehensive portfolio of purpose-built health and generative AI services (including Amazon SageMaker, AWS HealthOmics, AWS HealthScribe, and Amazon Bedrock) will provide researchers with easy access to bFMs like ESM3. Starting with Amazon SageMaker and AWS HealthOmics, customers can begin leveraging EvolutionaryScale’s newest open-source model version, with the proprietary ESM3 model family coming soon to these services, as well as Amazon Bedrock. Together, these fully-managed AWS services provide the easiest way for researchers to build, customize, and seamlessly integrate powerful bFMs like ESM3 into their drug research workflows.

With this announcement, customers can access and fine-tune ESM3 on their proprietary datasets, catalyzing innovation and enabling breakthrough discoveries in therapeutic development while keeping their proprietary data private. Customers will also be able to scale these innovations by leveraging AWS’ industry-leading generative AI infrastructure, including high-performance GPU instances and purpose-built ML accelerators, including AWS Trainium for training and AWS Inferentia for inferencing. With unmatched computational power, customers are able to efficiently train, build, and run ESM3 on AWS. Integrating generative bFMs like ESM3 with AWS services simplifies deployment and ensures robust security with encrypted data handling, private networking, and compliance with HIPAA and GDPR. To help guide responsible model behavior, guardrails are hardwired through numerous levels. EvolutionaryScale’s ESM3 was architected to mitigate safety risks like generating hazardous proteins. Guardrails on Amazon Bedrock enable customers to implement custom filters on inputs and outputs aligned with their ethical AI policies. This commitment to fairness, safety and transparency ensures the incredible potential of bFMs is realized responsibly.

The future of generative AI in biology

This milestone marks the beginning of a new era of generative AI-powered models tailored for biological applications. A significant leap forward, ESM3 has the potential to accelerate drug discovery timelines by harnessing the synergy between AWS’s generative AI capabilities and EvolutionaryScale’s innovative AI models at the frontier of computing power and data scale. This collaboration enables programmable biology, empowering life sciences companies to push boundaries, streamline processes, and ultimately deliver novel therapeutics to patients faster through the power of responsible generative AI – streamlining target identification, driving innovation, reducing development time and costs, and increasing the chances of successful drug development.

Matt Wood

Matt Wood

Matt is the Vice President of Artificial Intelligence products at AWS. In this role, he works closely with customers, partners, and internal AWS teams to drive effective use of AI across every industry. Matt has been at Amazon for 14 years, and has worked across the cloud business (including helping to launch Lambda, Kinesis, SageMaker, DeepRacer, Athena, and EMR), with a special focus on data, analytics, machine learning, and artificial intelligence. His passion is working with AWS customers (such as the NFL, Cerner, Intuit, Pinterest, GE, FINRA, Celgene, and NASA) to bring their ideas to life. Before joining Amazon, Matt attended medical school in the UK, completed a PhD in machine learning, and was post- doctoral fellow at Cornell University.