The Icahn School of Medicine at Mount Sinai in New York City, N.Y. is an internationally recognized leader in medical and scientific training, biomedical research, and patient care. The institution works to expand biomedical knowledge, providing expert clinical care and serving the community. Working in close partnership with the Mount Sinai Hospital, the Icahn School of Medicine serves one of the most diverse and complex patient populations in the world.

Researchers and physicians at the Icahn School of Medicine are trying to unlock the genetic secrets of breast and ovarian cancers. Drs. John A. Martignetti and Peter R. Dottino at Mount Sinai and their collaborators at Station X are mining the more than 2,000 breast and ovarian tumor and germline DNA sequences generated by The Cancer Genome Atlas Consortium (TCGA). TCGA is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), two of the 27 Institutes and Centers of the National Institutes of Health, U.S. Department of Health and Human Services.

It’s a significant problem that requires considerable computing power as scientists analyze more than 100 TB of data, come up with new hypotheses, and reanalyze the data. Of all the women with an inherited genetic risk of developing either cancer, germline mutations in either BRCA1 or 2 account for about half. Researchers are trying to find the missing genetic links in those who do not carry a BRCA1/2 mutation.

In collaborating with Station X, Drs. Martignetti and Dottino were able to enlist the help of a solution provider who could provide a robust and secure analytical platform for the work. Station X develops GenePool™, a genomics software platform for scientists and clinicians who work with human genomics data in both early-stage research and clinical settings.

Mining information from terabytes of genomic data—and making sure that information is secure—calls for a flexible, high-performance platform with big-data storage and stringent access control. It was clearly a job for cloud computing.

Amazon Web Services (AWS) is the foundation for Station X’s genomics platform, GenePool, which can dynamically scale to analyze tens of thousands of genomes in minutes. “AWS is a natural place to build software environments,” says Sandeep Sanga, Vice President of Products at Station X. “We built GenePool on AWS to give researchers a place to manage and analyze enormous amounts of data. And we chose AWS because the number of services offered is so competitive.” Using AWS allowed Station X to focus on designing the GenePool platform to help researchers quickly and securely understand their sequenced data.

For Mount Sinai researchers, keeping patient data secure is critical. “Maintaining the confidentiality of our patients is of primary importance to us—particularly with the sheer amount of data being generated,” Martignetti says. “It is not a trivial matter. But by using AWS and GenePool, we met the required standards for confidentiality.” By using AWS, Station X is able to provide preapproved researchers access to The Cancer Genome Atlas’ controlled-access data, which enables authorized users to “compute and make sense of somatic and germline mutations in patients with either breast cancer or ovarian cancer,” Sanga says.

Mount Sinai uses AWS Identity and Access Management (IAM) for user authentication, allowing account access control and management using the AWS Access Control Lists (ACL) to provide a secure and centralized user and credential management. Amazon Simple Notification Service (Amazon SNS) and Amazon Simple Email Service (Amazon SES) provide outbound messaging services for both administrators and end users who require notifications and alerts.

Elastic Load Balancing helps Station X make sure it has a scalable web and API architecture that is both resilient and secure in its Amazon VPC environment, isolating data stores and middle tiers from network exposure to the Internet. “By isolating our data stores and middle tiers from network exposure to the Internet, we keep all of our servers private, ensuring a radically reduced security footprint,” Sanga says.

Mount Sinai researchers use the AWS Cloud to manage and extract meaningful information from mountains of genomic data stored on Amazon Simple Storage Service (Amazon S3), with additional storage on Amazon Glacier.

Station X uses Amazon Elastic Block Store (Amazon EBS) storage for critical, high-value data to allow a flexible and high-performance storage system capable of serving up vast amounts of pre-computed data for real-time genomic analysis.

Amazon Elastic Compute Cloud (Amazon EC2) powers GenePool’s built-in statistical models, visual filtering capabilities, rich integration with genomic and clinical annotation databases, and support integration via RESTful web services. “The elastic nature of Amazon EC2 allows us to perform significant data processing and analytics in a cost-effective and dynamically scalable way,” Sanga says. Mount Sinai uses dedicated Amazon S3 storage to ensure that its patient-derived genomics data is securely stored and staged for analysis in GenePool. Figure 1 illustrates Mount Sinai’s architecture.


Figure 1. Mount Sinai Research Architecture

To ensure that systems are operating effectively, GenePool uses Amazon CloudWatch for monitoring. Amazon ElastiCache provides a centralized caching mechanism, which allows the analytic results of large datasets to be returned quickly. “Scientists are able to answer critical questions within minutes or seconds, thanks to the genomics software platform we built on AWS,” Sanga says.

By using AWS and GenePool, Drs. Martignetti and Dottino can now rapidly mine thousands of patient records from The Cancer Genome Atlas projects and identify genetic aberrations in a number of novel candidate genes fitting their scientific hypothesis. By cross-referencing these candidate genes against other genomics data, Drs. Martignetti and Dottino were able to enrich the candidate gene list for new potential markers for hereditary breast and ovarian cancers.

“Before the AWS Cloud, we didn’t have a way to analyze such a huge data set with our external collaborators,” Martignetti says. “It wouldn’t have been possible to sift through the data in a meaningful way, analyze it, refilter it—all of that is critical to our efforts to find the missing links.”

Building GenePool on AWS gave Station X the ability to store data sets for our translational and clinical genomics customers, Sanga says. “We get a significant competitive edge by using AWS: fast data access, ample storage, and massive computing power,” he adds. “When it comes to research projects like this, we’ll never be done. There will always be more data to analyze. So even when we help researchers come to scientific conclusions, there’s always more to learn. By using AWS, we’re well positioned for the challenge.”

Without the ability to run this analysis in a secure way on the AWS Cloud, the physicians at Mount Sinai would not be able to further their research. “By using AWS, we can store source files securely and cost-effectively with significant durability and accessibility. We wouldn’t be able to conduct our research without it,” Martignetti says. “But by using AWS and GenePool, we hope to discover mutations that prove to be the missing links for why some women are at increased risk for developing these cancers.”

To learn more about genomics in the cloud, visit our AWS Genomics details page.