AWS Solutions Library

AWS Solutions Library›
Guidance for Protein Folding on AWS

Guidance for Protein Folding on AWS

Go to sample code

Overview

This Guidance helps researchers run a diverse catalog of protein folding and design algorithms on AWS Batch. Knowing the physical structure of proteins is an important part of the drug discovery process. Machine learning (ML) algorithms significantly reduce the cost and time needed to generate usable protein structures.

These systems have also inspired development of artificial intelligence (AI)-driven algorithms for de novo protein design and protein-ligand interaction analysis. This Guidance will allow researchers to quickly add support for new protein analysis algorithms while optimizing cost and maintaining performance.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Download the architecture diagram

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Customers deploy architecture components using CloudFormation. Solution changes are tested and deployed using GitLab pipelines. Customers can submit jobs and process the results through a Python software development kit (SDK), including jobs from Jupyter notebooks. Jobs write all results and metrics to Amazon S3.

Read the Operational Excellence whitepaper

All analysis jobs run within private subnets and use minimal AWS Identity and Access Management (IAM) policies to manage access to AWS services. All data is encrypted at rest and in transit. Amazon S3 data transfer occurs through a VPC endpoint.

Read the Security whitepaper

Analysis algorithms are split into independent containers and Python classes for modular execution and updates. AWS Batch automatically provides job retry logic. Job inputs and outputs are stored in Amazon S3. Additionally, the CloudFormation template provisions an attached data repository for the FSx file system to rapidly restore reference data.

Read the Reliability whitepaper

Protein folding algorithms require large sequence databases for data preparation and can take several minutes or hours to finish. AWS Batch supports FSx for Lustre mounts and extended run times. Both AWS Batch and Amazon FSx for Lustre support HPC use cases, such as protein folding with high input/output (IO) requirements.

Read the Performance Efficiency whitepaper

AWS Batch will automatically de-provision compute resources when jobs are finished. Customers can leverage Amazon Elastic Compute Cloud (Amazon EC2) Spot instances (which offer up to a 90% discount compared to On-Demand instances) and AWS Graviton-enabled instance types for some jobs. AWS Graviton instances are optimized for cloud workloads and can deliver up to 40% better price performance over comparable current generation x86-based instances.

Read the Cost Optimization whitepaper

AWS Batch automatically scales compute resources to handle jobs in a managed queue. This architecture includes benchmarking results and default parameters to minimize hardware resources.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Protein Folding on AWS

Overview

How it works

Well-Architected Pillars

Implementation Resources

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help

Guidance for Protein Folding on AWS

Overview

How it works

Well-Architected Pillars

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Implementation Resources

Related Content

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help