This Guidance helps researchers run a diverse catalog of protein folding and design algorithms on AWS Batch. Knowing the physical structure of proteins is an important part of the drug discovery process. Machine learning (ML) algorithms significantly reduce the cost and time needed to generate usable protein structures.
These systems have also inspired development of artificial intelligence (AI)-driven algorithms for de novo protein design and protein-ligand interaction analysis. This Guidance will allow researchers to quickly add support for new protein analysis algorithms while optimizing cost and maintaining performance.
AWS CodeBuild builds the containers necessary to run algorithms, such as AlphaFold and OpenFold.
All of the analysis algorithms are packaged as Docker containers and stored using Amazon Elastic Container Registry (Amazon ECR) in the deployment account. This helps ensure that all usage information remains private.
Define and submit analysis jobs from an Amazon SageMaker notebook instance or other Python environment.
Jobs run in general or accelerated compute environments based on the vCPU, memory, and GPU requirements.
Jobs write outputs and results to an encrypted Amazon Simple Storage Service (Amazon S3) bucket.
Users download job outputs to visualize the results or for downstream analysis.
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Customers deploy architecture components using CloudFormation. Solution changes are tested and deployed using GitLab pipelines. Customers can submit jobs and process the results through a Python software development kit (SDK), including jobs from Jupyter notebooks. Jobs write all results and metrics to Amazon S3.
Analysis algorithms are split into independent containers and Python classes for modular execution and updates. AWS Batch automatically provides job retry logic. Job inputs and outputs are stored in Amazon S3. Additionally, the CloudFormation template provisions an attached data repository for the FSx file system to rapidly restore reference data.
Protein folding algorithms require large sequence databases for data preparation and can take several minutes or hours to finish. AWS Batch supports FSx for Lustre mounts and extended run times. Both AWS Batch and Amazon FSx for Lustre support HPC use cases, such as protein folding with high input/output (IO) requirements.
AWS Batch will automatically de-provision compute resources when jobs are finished. Customers can leverage Amazon Elastic Compute Cloud (Amazon EC2) Spot instances (which offer up to a 90% discount compared to On-Demand instances) and AWS Graviton-enabled instance types for some jobs. AWS Graviton instances are optimized for cloud workloads and can deliver up to 40% better price performance over comparable current generation x86-based instances.
AWS Batch automatically scales compute resources to handle jobs in a managed queue. This architecture includes benchmarking results and default parameters to minimize hardware resources.
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.