This guidance helps you learn how to transfer life sciences data files to the cloud and provide data access using Amazon Web Services (AWS).

Architecture Diagram

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • As new data sources and partners arise, a variety of data transfer services can be used to adapt to these changing access patterns. For multi-site environments, S3 File Gateway can be used to transfer while you retain an on-site cache for other applications. Transfer Family lets partnering entities like CROs easily upload study results.

    Read the Operational Excellence whitepaper 
  • For data protection purposes, we recommend that you protect AWS account credentials and set up individual user accounts with AWS Identity and Access Management (IAM), so that each user is given only the permissions necessary to fulfil their job duties. We also suggest that you use at-rest encryption, and the services use in-flight encryption by default.

    Read the Security whitepaper 
  • DataSync leverages single or multiple VPC endpoints to ensure that if an Availability Zone is unavailable, the agent can reach another endpoint. DataSync is a scalable service that leverages sets of agents to move data. The tasks and agents can be scaled based on the demand of the amount of data that needs to be migrated.

    DataSync logs all events to Amazon CloudWatch. If a job fails, actions can be taken to better understand the issue and where the task is failing. Once the tasks are complete, post-processing jobs can be initiated to complete the next phase of the pipeline process.

    Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage.

    Read the Reliability whitepaper 
  • FSx for Lustre storage provides sub-millisecond latencies, up to hundreds of GBs/s of throughput, and millions of IOPS.

    Read the Performance Efficiency whitepaper 
  • By using serverless technologies that scale on-demand, you only pay for the resources you use. To further optimize cost, you can stop the notebook environments in SageMaker when they are not in use. If you don’t intend to use the Amazon QuickSight visualization dashboard, you can choose to not deploy it to save costs.

    Data Transfer charges are comprised of two main areas: DataSync, which is charged on a per GB transferred rate; and Direct Connect or VPN data transferred. Additionally, cross-Availability Zone charges might apply if VPC endpoints are used.

    Read the Cost Optimization whitepaper 
  • CloudWatch metrics allow users to make data-driven decisions based on alerts and trends. By extensively using managed services and dynamic scaling, you minimize the environmental impact of the backend services. Most components are self-sustaining.

    Read the Sustainability whitepaper 

Sample Code

Start building with this sample code. [Text]

AWS for Industries

Building Digitally Connected Labs with AWS

Many Life Sciences organizations struggle with lab digitalization that can scale, automate data tasks, enable AI/ML, and create collaborative environments that support diverse R&D efforts. The AWS Digital Lab Strategy is a set of services, architectures, and partners to help take advantage of cloud scale and agility.

This post discusses the tools, best practices, and partners helping Life Sciences labs take full advantage of the scale and performance of AWS Cloud.
Read the full blog post 


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.