The High Performance Computing Facility of the New York University (NYU) Center for Health Informatics and Bioinformatics was established in 2009 to deliver forefront-computing capabilities to researchers at the NYU Langone Medical Center. The facility allows medical informatics and bioinformatics researchers to accelerate discovery and innovation through access to computational power, data storage, supercomputing resources and data sharing with collaborators throughout the world.
Dr Stratos Efstathiadis, Technical Director of the High Performance Computing Facility, describes the facility’s primary activities: “Our facility captures massive amounts of data from Next-Generation sequencers, microscopes, slide scanners, mass spectrometers and other research instruments. It must store, curate, support and enable the analysis of this data, and also provide resources scientists can use to run simulations and generate models.
These analysis and simulation jobs often need to run for days, or even weeks. Thus, although the facility is one of the largest of its kind for a medical center, occasionally there is a need for additional computing resources so researchers can analyze data more quickly.
Even more problematic than the time it takes to analyze data is the time it takes to transfer it. Efstathiadis says, “Transferring data is a large bottleneck; our datasets are extremely large, and it often takes more time to move the data than to generate it. Since our collaborators are all over the world, if we can’t move it they can’t use it.”
The AWS cloud, combined with Globus Online, a free file transfer service hosted and powered by AWS, offer a reliable way to transfer large datasets to Amazon EC2. The solution is able to move files in parallel at speeds up to 50 megabytes per second. Efstathiadis notes, “That’s similar to our onsite transfer rates – there’s really no slowdown at all!”
Dr Efstathiadis decided to use AWS due to Amazon EC2’s instance selection, and because Center researchers were already familiar with the service. “Also,” adds Efstathiadis, “because Globus Online makes it easy to transfer data to Amazon EC2, it is an appealing way to get our large data sets into the cloud environment where researchers can access and use it.” Other tools, such as SCP, according to Efstathiadis, “take too long or are blocked by institutional firewalls, limiting the type of tools we can use. Globus Online means improved throughput without compromising usability.”
The facility also uses Amazon S3 for data storage. Dr Efstathiadis explains, “Globus Online makes it easy to move data from local storage to Amazon S3 storage by deploying a server image, installing a Globus Online endpoint, and moving the data there.”
By using the cloud, the HPC facility expanded the set of services it can offer to NYU researchers, who can now access the resources they need, when they need them. The cloud also helps researchers collaborate; by using Amazon S3, they easily share their findings and datasets with researchers around the world.
Sharing data is an important component. Efstathiadis says, “Our researchers have many collaborators at other sites; by uploading data onto Amazon S3, researchers in other locations can access it for their own use. So, with this solution NYU researchers expedite their own analysis pipelines and also help other researchers do the same.”
NYU’s Center for Health Informatics and Bioinformatics HPC facility constantly looks for ways to make it easier and less expensive to conduct research. Using the cloud helps the facility get closer to their goal of scalable computing resources beyond the bounds of what any local facility alone could provide.