This Guidance helps you to connect life sciences data instruments and laboratory system files to the AWS Cloud, either through the internet or a direct connection with low latency. You can cut down on storage expenses for data that gets accessed less often or make it accessible for high-performance computing for genomics, imaging, and other intense workloads, all on AWS.

Architecture Diagram

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • As new data sources and partners arise, a variety of data transfer services can be used to adapt to these changing access patterns. For multi-site environments, S3 File Gateway can be used to transfer while you retain an on-site cache for other applications. Transfer Family lets partnering entities like CROs easily upload study results.

    Read the Operational Excellence whitepaper 
  • For data protection purposes, we recommend that you protect AWS account credentials and set up individual user accounts with AWS Identity and Access Management (IAM), so that each user is given only the permissions necessary to fulfill their job duties. We also suggest that you use at-rest encryption, and the services use in-flight encryption by default.

    Read the Security whitepaper 
  • DataSync leverages single or multiple VPC endpoints to ensure that if an Availability Zone is unavailable, the agent can reach another endpoint. DataSync is a scalable service that leverages sets of agents to move data. The tasks and agents can be scaled based on the demand of the amount of data that needs to be migrated.

    DataSync logs all events to Amazon CloudWatch. If a job fails, actions can be taken to better understand the issue and where the task is failing. Once the tasks are complete, post-processing jobs can be initiated to complete the next phase of the pipeline process.

    Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage.

    Read the Reliability whitepaper 
  • FSx for Lustre storage provides sub-millisecond latencies, up to hundreds of GBs/s of throughput, and millions of IOPS.

    Read the Performance Efficiency whitepaper 
  • By using serverless technologies that scale on-demand, you only pay for the resources you use. To further optimize cost, you can stop the notebook environments in SageMaker when they are not in use. If you don’t intend to use the Amazon QuickSight visualization dashboard, you can choose to not deploy it to save costs.

    Data Transfer charges are comprised of two main areas: DataSync, which is charged on a per GB transferred rate; and Direct Connect or VPN data transferred. Additionally, cross-Availability Zone charges might apply if VPC endpoints are used.

    Read the Cost Optimization whitepaper 
  • CloudWatch metrics allow users to make data-driven decisions based on alerts and trends. By extensively using managed services and dynamic scaling, you minimize the environmental impact of the backend services. Most components are self-sustaining.

    Read the Sustainability whitepaper 

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?