How Fred Hutch unlocks siloed data with AWS and open-source software
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
Fred Hutchinson Cancer Research Center (Fred Hutch) is dedicated to the elimination of cancer and related diseases. From discovering new ways to prevent and detect cancer earlier, to developing effective treatments with fewer side effects, researchers at Fred Hutch work every day to deliver hope to patients all over the world.
Using Amazon Web Services (AWS) and open-source software, Fred Hutch built a single, user-friendly, browser-based solution, Motuz, to streamline and simplify upload of large quantities of data. Motuz helped Fred Hutch de-silo its data and make it shareable and accessible.
The journey to de-silo data
In the past, researchers at Fred Hutch had many challenges stemming from inconsistent processes for data migration and multiple platforms for storage. Siloed data impedes research progress and limits the collaboration necessary to further speed the pace of scientific discovery.
At Fred Hutch, organizations using different processes and approaches for sharing between disparate systems led to data being isolated. Teams of researchers have to run large batch data uploads ranging from 10-30 terabytes at a time, with files as large as 500 gigabytes. In the past, Fred Hutch researchers did this all via command line tools that used complicated user interfaces (UI) best suited for expert users. Unfriendly UIs caused teams to lose valuable time to organizing, ingesting, and deciphering which platform data should be extracted from. This took away from their ability to effectively collaborate and derive meaningful insights from the organization’s data.
Dirk Petersen, scientific computing director at Fred Hutch, says the administrative burden of the unfriendly UI hurt efficiency and productivity over the years and detracted from Fred Hutch’s main mission of eliminating diseases. “Data moving is cumbersome. We needed better tools to simplify the high-speed movement of data between our labs, teams, and external collaborators, so that we could spend less time worrying about infrastructure and focus more time and energy on discovering new therapies and cures,” said Petersen. The team decided to create a tool to streamline the process with the aim to increase researcher collaboration, de-silo data, and speed the time to science.
Enhancing operational efficiency with the AWS Cloud
Fred Hutch built a user-friendly UI to facilitate faster uploads of large batch datasets within a hybrid cloud environment. Motuz is an open-source tool that distributes datasets in real-time across the hundreds of researchers within Fred Hutch’s organization. Petersen built Motuz with mainstream open-source components such as Flask (Python), React, and PostgreSQL. Motuz has less than 10,000 lines of code (LOC). Petersen says, “We hope this will allow community developers to contribute code as they can understand the code base quickly. We kept the base small by not writing any code to connect to cloud storage. Instead, we use Rclone, a popular open-source cloud data moving tool, to actually copy the data.”
To build the solution, Fred Hutch uses AWS to store and analyze its data with services including AWS Lambda, Amazon Relational Database Service (Amazon RDS), Amazon Kinesis, AWS Batch, Amazon DynamoDB, Amazon Elasticsearch Service (Amazon ES), Amazon Elastic File System (Amazon EFS), and Amazon Simple Storage Service (Amazon S3).
“We ultimately wanted to create a solution that our researchers would get the best value out of, while introducing an easier way for them to interact with cloud-based tools. At the time, we only had around 10 percent of our workforce interacting with the cloud,” said Petersen.
After implementation, Petersen says, “We have seen as much as a 500 megabytes per second throughput, which is around a 10x shift in copy speed, all while providing a simple UI that allows researchers to access, share, and interpret data in a fraction of the time.”
Working with AWS, Petersen says, “We now have access to high-performance tools and services. Motuz was the missing link in creating an all-browser, high-scaled, cloud-based experience for our researchers.” Motuz is installed on a high-throughput server and is now the default gateway to the cloud at Fred Hutch.
Read the AWS Public Sector Blog post, “Supporting healthcare with technology in response to COVID-19.” And check out Motuz, and then read more healthcare and research stories on the AWS Public Sector Blog.
Listen to the Fix This podcast interview with Fred Hutch.
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.