By spinning up a few hundred nodes on AWS and getting results in less than a day, our scientific researchers have a lot more freedom to ask questions that weren’t even possible before. The speed is important, but equally important is the additional intellectual curiosity this enables for researchers.
Lance Smith Associate Director of IT, Celgene

Celgene is an American biopharmaceutical firm that manufactures drug therapies for cancer and inflammatory disorders. Headquartered in New Jersey, the company is committed to improving the lives of patients through the delivery of innovative treatments. Major Celgene products include Thalomid, which is used to treat inflammation disorders, and Revlimid, used to treat patients with multiple myeloma.

To better support the needs of its pharmaceutical researchers, the Celgene Research and Early Development (R&ED) team wanted to improve its high performance computing (HPC) capabilities. In particular, the company’s on-premises HPC system created a bottleneck for computational researchers. “It would often take researchers weeks or even months to process large jobs on a cluster,” says Lance Smith, associate director of IT at Celgene. “That slowed down their time to results.”

Celgene also needed to enable secure collaboration between its own researchers and academic research labs. The company works with a number of high-profile academic institutions, who need access to Celgene’s HPC resources for early-stage drug discovery. “Collaboration had become more and more of a priority for us, but it just wasn’t possible with the existing environment,” Smith says. “There would be enormous security issues and intellectual property concerns if we gave external parties network access to internal systems for data-sharing purposes. We have so much data that we can’t lose. It can take 10 or more years to create a drug, and if data is compromised, intellectual property may be lost and future medicines jeopardized.”

Scalability was another major concern. “Our on-premises clusters only had several hundred cores altogether, which was very limiting for both internal and external researchers,” Smith says. To overcome these challenges, Celgene decided to move its HPC clusters to the cloud.

The Celgene R&D division chose Amazon Web Services (AWS) as its cloud technology provider. “I used AWS at a previous company and I was very impressed with the performance, scalability, and security of the platform,” says Smith.

The company runs many HPC workloads on hundreds of Amazon Elastic Compute Cloud (Amazon EC2) instances and uses Amazon Simple Storage Service (Amazon S3) and Amazon Glacier to store hundreds of terabytes of genomic data. “Some of our genomic files are 200 gigabytes each in size, after compression, so we need the robust storage capabilities of Amazon S3 and Amazon Glacier,” says Smith.

To enable collaboration, Celgene created a cloud environment based on AWS. Instead of engineers needing to manually build and deploy individual compute nodes, collaboration researchers can now use the AWS environment to self-provision using pre-approved company images. Celgene IT still manages the tools, security, and other standards, but researchers have the power to create and access resources at will. The shared collaboration space is based on an Amazon Virtual Private Cloud (Amazon VPC), which Celgene uses to provision a section of the AWS Cloud for launching resources in a virtual network. The company also relies on AWS Identity and Access Management (IAM) to securely control access to the environment, and AWS Direct Connect for dedicated network connections. “The research scientists are separated. We give them access to select AWS services, but they can’t change the security configurations we define and manage,” says Smith. “Another significant benefit is isolation from the enterprise environment, using VPCs and advanced features of Direct Connect, which permits us to allow greater flexibility to the researcher while still protecting the company.”

Celgene scientists have dramatically reduced the time it takes to complete HPC jobs needed for cancer drug research. “For our informatics researchers, computational jobs on AWS can be reduced to hours, compared to weeks or months on our on-premises HPC cluster,” says Smith. As a result, researchers can run many more queries. “By spinning up a few hundred nodes on AWS and getting results in less than a day, our scientific researchers have a lot more freedom to ask questions that weren’t even possible before,” Smith says. “Speed is important, but equally important is the additional intellectual curiosity this enables for researchers. They can ask scientific questions they were afraid or unable to ask before because of hardware limitations or time constraints.”

The company is also using the AWS Cloud to simplify and improve collaboration. “With our physical environment, research collaboration was not possible,” says Smith. “On-premises isn’t as agile as our researchers want to be. Quotes, standards, finance, capital procurement, delivery, installation, and configuration takes months in an enterprise setting. And because research collaborations can start up spontaneously, the result is completely unforecast demand for IT. Historically Celgene has been an industry leader in scientific collaboration; AWS allows us to expand our partnerships into the informatics and computational space. Using AWS, we enable teamwork by providing isolated access for researchers from different organizations. Each collaborator can upload petascale data and work with us together in a common, secure area.” This collaborative capability has helped Celgene expand its research with the public sector. “We have many collaborative projects with prominent research universities, and we see that growing exponentially,” says Smith.

Researchers at Celgene can also take advantage of the company’s highly scalable cloud HPC environment. “Using AWS, a single scientist can launch hundreds of compute nodes,” says Smith. “That’s a capability we just didn’t have before. We have 350 nodes available for one department, and that will soon go up to 700 and, eventually, several thousand.” In the company’s previous HPC environment, individual clusters contained up to 10 nodes each. “It just wasn’t possible for multiple researchers to do a whole lot simultaneously,” Smith says. “But now we can have a few-hundred-node computational job running while another researcher uses 1,000 nodes on a separate simulation, without interfering with each other. That just wasn’t possible before in a shared environment. The issues with different libraries, hardware sharing, software compatibility, OS selection, network configuration, and cost allocation—all those problems went away for us.”

As Celgene expands its cloud-based HPC infrastructure, it also plans to increase its use of AWS services. “We’re using AWS successfully in our R&D division; 2016 is the year the enterprise started seeing cloud-native solutions outside of R&D, and I fully expect the AWS Cloud will spread throughout our entire company in the next 18 months,” says Smith

To learn more about how AWS can help you manage your HPC cluster, visit our AWS High Performance Computing details page. To learn more about using AWS as a life-sciences organization, visit the Life Sciences detail page.