AWS Partner Network (APN) Blog
Getting to Know APN Genomics Partners BioTeam, DNAnexus, Illumina, and Seven Bridges
Aaron Friedman is a Healthcare and Life Sciences Partner Solutions Architect at AWS
This past week, my colleague, Angel Pizarro, and I published a four-part blog series on the AWS Compute Blog that describes how you can build batch genomics workflows on AWS. This approach can be generalized to any type of batch workflow, such as post-trade analytics or fraud surveillance in financial services, or rendering and transcoding in media and entertainment. You can read these posts here:
- Building High-Throughput Genomics Batch Workflows on AWS: Introduction (Part 1 of 4)
- Building High-Throughput Genomics Batch Workflows on AWS: Job Layer (Part 2 of 4)
- Building High-Throughput Genomic Batch Workflows on AWS: Batch Layer (Part 3 of 4)
- Building High-Throughput Genomics Batch Workflows on AWS: Workflow Layer (Part 4 of 4)
In healthcare and life sciences, high-throughput workflows end up being only one part of the overall solution, such as for processing genomes. When you build analytics services that affect peoples’ lives, regulatory frameworks such as HIPAA, CLIA, or GxP often need to be factored into the solution. Also, while many industries have workloads that are amenable to traditional batch architectures, each workload must be modified to answer domain-specific questions.
Today, I’d like to tell you about several of our AWS Partner Network (APN) Partners who have particular expertise in architecting highly performant genomics solutions. These APN Partners can help you accelerate your time-to-insight with either their existing solutions or by working with you to build new ones.
One of the things I admire about BioTeam is their focus on teaching and training others to understand how DevOps principles can accelerate science. With their focus on making scientific computing fast, easy, and effective, BioTeam helps customers adopt best practices for large-scale data transfer, automate security and compliance, and scale their network, storage, and HPC capacity on AWS.
BioTeam is an AWS Life Sciences Consulting Competency Partner focused on delivering technology solutions to health and life science researchers. The company consists of scientists and technologists with real-world experience working in a wide variety of scientific computing environments such as government, pharma, biotech, and academia. The team’s strong scientific, in addition to technical, background allows them to focus on “science on day one,” quickly dive deep with their customers, understand both the technical and scientific problems to be solved, and then architect solutions to help drive customer success. BioTeam specializes in infrastructure for scientific computing, in particular, high-performance computing (HPC), networking, and storage architecture for the life sciences.
BioTeam has been building bioinformatics, drug discovery, and other life sciences solutions on AWS for their customers for almost a decade. “When I joined BioTeam in 2008, we were orchestrating Grid Engine clusters on some of the first Amazon EC2 instances and training researchers how to use AWS for basic science and engineering,” says Adam Kraut, Director of Infrastructure and Cloud Architecture. “Today, we are building out much richer environments that have complex networking, identity, and security needs across many AWS Regions and services.”
BioTeam has a big impact at the intersection of genomics, HPC, and big data. For example, at Biogen, BioTeam helped design and build ResearchCloud, which converges high-speed network connectivity, bioinformatics pipelines, HPC, and security automation. ResearchCloud enables Biogen to store and process thousands of genomics samples safely and to focus on large-scale collaborative science.
DNAnexus, an AWS Life Sciences Technology Competency Partner, is powering a global network for biomedical and genomic research and clinical applications through its API-based platform for secure, cloud-based deployment and sharing of diverse data types and tools. The DNAnexus Platform enables researchers to accelerate medical advances, discover new medicines, improve patient care, and advance R&D in areas such as cancer, heart disease, Alzheimer’s disease, and prenatal testing.
“The management and analysis of biomedical and genomic data at the scale needed to power large-scale studies require computational and storage infrastructure that exceeds the capacity of most institutions,” says Richard Daly, CEO of DNAnexus. “With AWS, DNAnexus helps enable enterprises worldwide to perform genomic analysis and clinical studies in a secure and compliant environment at a scale not previously possible.”
DNAnexus complies with all security and privacy requirements required under HIPAA, CLIA, ISO 27001, SOC 1/2/3, FedRAMP, and FISMA, helping customers to pursue clinically compliant projects at any scale. At Rady Children’s Institute for Genomic Medicine, for example, DNAnexus built a platform for whole genome analysis that brings precision medicine to critically ill newborns in a natal intensive care setting.
DNAnexus is also enabling collaborative science between biotechnology companies and leading medical institutions. The DiscovEHR collaboration between Regeneron Genetics Center and Geisinger Health System allows researchers from both institutions to marry exome data with longitudinal electronic health records from 250,000 patients to seek relationships between genetic variation and patient health. Leveraging the scalable, secure, and global infrastructure of AWS, it’s possible to upload data to the DNAnexus platform and bring researchers to the data, side-by-side with the bioinformatics tools and the Amazon EC2 compute resources they need to perform their analyses, all while respecting their data sovereignty requirements.
Illumina has a mission to unlock the power of the genome and use that genomic information to transform healthcare. And by providing nearly 90 percent of the world’s sequencers, Illumina has a front row seat to many of its customers’ scientific breakthroughs. After working closely with their customers, the team at Illumina learned how many were running similar genomics analyses and decided to build a solution to help simplify the orchestration of the often complex tasks and remove common informatics challenges associated with sequence analysis.
This led Illumina, an Advanced APN Technology Partner, to develop the Illumina BaseSpace Suite, a comprehensive platform of informatics tools with deep integration into their upstream sequencing technology. The suite includes everything from wet lab management and tracking to data management and storage, and analysis tools for scientists to interpret sequencing data. It integrates with instruments and eliminates the need for customers to copy or move data, which helps eliminate a common way errors are introduced into a laboratory system.
Genomics is well suited to the cloud, and you can scale up or down depending on your current requirements or sequences to analyze. This elasticity has helped Illumina customers minimize capital expenditures by not having to provision for peak load in their on-premises environments, and Illumina has noticed that small and large customers alike are now using the Illumina BaseSpace Suite to manage their analysis infrastructure at scale.
Recently, Illumina has improved BaseSpace’s availability to meet the needs of its global customer base by expanding to the AWS Frankfurt Region to support their European customers. “A European instance of BaseSpace Sequence Hub is essential for supporting customers in the region,” says Sanjay Chikarmane, Senior Vice President and General Manager, Enterprise Informatics Business Unit. “Our instruments are used globally, and we are matching this with our informatics deployments.” Along with global expansion comes alignment with global security standards. “Information security is paramount in the genomics space given the sensitive nature of the data. We believe that the most secure data is that which is maintained in ISO 27001-certified data centers, with experienced and dedicated IT and security personnel, and with certified IT practices,” says Chikarmane. “The additional ISO 27001 certification will provide our customers with added confidence in the security of the data they send to BaseSpace Sequence Hub, and on to BaseSpace Variant Interpreter (Beta).”
Open data initiatives fuel entrepreneurship, accelerate scientific discovery, and create efficiencies across life sciences segments. Seven Bridges, an AWS Life Sciences Technology Competency Partner, has taken this to heart. They build software systems that connect biomedical data to accelerate discovery in cancer, drug development, and precision medicine. By building on top of AWS to simplify the connections between massive data sets, analytic methods, and scientific expertise, Seven Bridges is enabling genomics research to improve human health.
Not unexpectedly, like our other Life Sciences Competency Partners, security is a high priority for Seven Bridges. They offer security and compliance white papers to demonstrate their approach to keeping genomics data secure, and align to US and EU standards. This means that their customers can analyze diverse populations while concurrently respecting local regulatory frameworks. Seven Bridges is currently the only commercial Trusted Partner of the National Institutes of Health (NIH), and can authenticate and authorize access to controlled access data from The Cancer Genome Atlas (TCGA) stored on AWS.
Launched on AWS in 2012, the Seven Bridges Platform enables pharmaceutical customers to integrate and analyze molecular, genomic and clinical data across their R&D efforts and throughout clinical trials. “The Seven Bridges Platform is the interface that enables pharmaceutical researchers to realize the benefits of cloud computing in their R&D programs,” says Brandi Davis-Dusenbery, CEO. “The combination of the AWS Cloud and Seven Bridges’ expertise in connecting data, teams and advanced genomic analysis methods gives our clients the tools they need to accelerate their drug discovery and development efforts.”
In addition to working with pharma companies, Seven Bridges has embraced the value of public genomic data resources and provides tools to make important biomedical data resources accessible and usable for researchers. The Seven Bridges Cancer Genomics Cloud is a U.S. National Cancer Institute pilot designed to democratize access to massive genomics datasets such as TCGA, which contains petabytes of data previously inaccessible to many cancer researchers. The Cancer Genomics Cloud enables researchers to access and use this complex data through a suite of tools built on AWS, including a Semantic Web-based Data Browser that allows them to query and retrieve files in seconds.
Securing and sharing your genomics analysis at scale
In a series of blog posts this week, we demonstrated how you can build a high-throughput batch solution on AWS to process genomes at scale in a process known as secondary analysis. However, the solution we presented only encompasses a portion of what you need to build in order to securely analyze and share genomes at scale. Our AWS Life Sciences Competency Partners who specialize in genomics, in addition to providing turn-key solutions and expertise for high-throughput secondary analysis, can provide a full-stack genomics solution with security, population-scale analytics, and data distribution built in.
For more information about how AWS can enable your genomics solutions, be sure to check out our Genomics in the Cloud page!
Please leave your questions and comments below. I’d love to hear from you.