AWS for Industries

Roche advances personalized healthcare using multi-modal data on AWS

Blog guest-authored by Mustaqhusain Kazi, Global Head of Roche Informatics Strategy and Digital Innovation, and Neeraj Agarwal, Distinguished Enterprise Architect, Foundational Tech, PHC and Science at Roche.

In the 1990s, the “blockbuster model” reigned supreme, with pharmaceutical companies designing therapies for larger populations with a one-size-fits-all approach. Today, with increased digitization and an explosion of patient data, supported by advances in cloud computing and big data analytics, that model is getting replaced by individualized medicines.

With a better understanding of disease heterogeneity, pharmaceutical companies are shifting their focus to targeted therapeutics designed for smaller patient cohorts. Roche is accelerating this transition by unifying regulatory-grade, multi-modal health datasets to fuel R&D into personalized medicine. They want to provide the right patient with the right treatment at the right time―removing the guesswork around whether or not a drug will work.

However, this is easier said than done. Bringing together disparate and disconnected patient health datasets to extract clinically relevant insights, without compromising the intellectual property of each entity, has many roadblocks. Solutions designed for addressing these challenges (like federated research, federated machine learning, trusted research environments, and neutral zone) are still in their infancy. The result: life sciences organizations are still struggling to build a holistic picture of the patient and his/her disease progression.

Roche’s next-generation personalized healthcare platform, Apollo, built on Amazon Web Services (AWS) addresses this fragmented view of the patient. The platform unifies electronic health records (EHR), imaging (CT scans, PET scans, MRIs, and more), diagnostics, wearables, devices, and genomics data to generate a better, high-resolution view of the patient. “By creating a flywheel that starts and ends with the patient, we’re using the data collected from the patient to innovate on their behalf—from smarter, more efficient R&D to better trial designs,” said Mustaqhusain Kazi, Head of Personalized Healthcare, Pharma Informatics at Roche.

The platform has three distinct modules:

  1. Data
  2. Analytics
  3. Collaborations

The first module automates the processing and the storage of the datasets, preparing it for analytics downstream. In the second module, researchers and data scientists get access to a range of portable data analytics tools to generate insights from the data. The third module fuels data-sharing and collaboration, inside and outside of Roche. The core platform also has robust guardrails in place to safeguard the protected health information (PHI) and anonymized datasets, while adhering to local data laws across the 100+ global markets that Roche operates in. Like any true platform, the value comes from applications/cohorts Roche builds on top of this underlying platform.

High-level architecture for Roche’s Apollo healthcare platform

Building blocks of Roche’s Apollo Platform

In the first module, automations built into the platform helps reduce the time required to ingest data from different sources and prepare it for analysis. The data comes in from a wide variety of sources via batch transfer, APIs, and streaming. Once in, the platform quarantines the data to check for quality and security breaches, to prevent any contamination of existing data. After that, automated workflows move the data through different pipelines for verification, validation, standardization, tagging, labeling, and classification, before publishing it to a data catalog ready for analysis.

Roche leverages Amazon Simple Storage Service (Amazon S3) for data storage, because of its industry-leading scalability, data availability, security, and performance. Lifecycle transitioning using Amazon S3 Intelligent Tiering delivers automatic cost savings by monitoring the access patterns. It moves the data to the most cost-effective access tiers based on usage patterns, without affecting performance or operational overheads. Amazon Redshift, a fast, fully managed, petabyte-scale cloud data warehouse, allows scientists to generate insights from both structured and semi-structured data across all data repositories. “With AWS, we have the tools to bring in the data seamlessly from our different data partners, process it automatically, and store it for analysis,” said Kazi.

The second module centralizes a range of data analytics tools to support in-depth exploration using machine learning, artificial intelligence, deep learning, computer vision, and more. Data scientists across the globe can access a range of self-service, portable tools for model development and validation, result validation, and insights generation—on the cloud, or on-premises. The module also offers the compute resources needed to run these algorithms, and complements it with flexible, fit-for-purpose visualization solutions.

Roche uses Amazon Elastic Kubernetes Service (Amazon EKS), a managed container service to run and scale Kubernetes applications, while managing costs as they scale. For deep learning training jobs needing file storage, Roche uses Amazon FSx for Lustre, a high-performance shared storage file system backed by Amazon S3. It provides sub-millisecond latencies, up to hundreds of GBs/s of throughput and millions of IOPS (input/output operations per second). This helps the team accelerate compute workloads, and access and process the stored data with superior price-performance.

The third module fuels collaborative research for faster insights generation and external knowledge-sharing. “The platform allows us to collaborate more efficiently with different internal departments, and also externally with third-party data scientists, fueling open science,” said Kazi.

With footprints in 100+ countries, Roche needs a strong security and compliance posture to adhere to local regulations across the globe. AWS’ shared responsibility model and 87 global availability zones spread across 27 regions help meet these requirements. That, in combination with a range of curated cloud services built specifically for the healthcare and life sciences industry, makes AWS a straightforward choice for Roche. “During the building of the Apollo platform, what stood out was the spirit of partnership, co-creation, and collaboration with AWS,” reminisces Kazi. “We got the ability to work backwards—to bring the right data and analytics to the right question, and address data interoperability on data from different sources,” he added.

The results speak for themselves. The platform has significantly decreased the time required for preparing and processing the datasets for analysis. Image files that previously took 2-3 days for processing are now ready in hours, and modular electronic health record datasets get processed within minutes. The platform stores data from thousands of  patients, unifying hundreds of  health datasets from internal and external data providers.

Today, Apollo is fueling true scientific collaboration and expanding the use of machine learning and artificial intelligence across Roche’s R&D organization. This is resulting in generating deeper insights, providing better go-no-go decisions, improving the efficiency of clinical trials, developing newer diagnostics, and better matching patients to therapies. “By using AWS to connect the multi-modal datasets that exist in clinical settings, and providing collaboration tools and intuitive workflow products for researchers to analyze these datasets in tandem, Roche is taking steps closer towards its mission to provide every patient with the best treatment possible in the fastest time.” says Kazi.

The future looks exciting too. The platform is expanding into federated learning, adding new machine learning capabilities to improve data quality, and crowdsourcing of image data annotation—all tying back to Roche’s relentless pursuit of improving patient outcomes.

For additional information about Roche’s Apollo platform, watch this recording from this re:Invent session. For more information on AWS solutions for healthcare and life sciences contact an AWS Representative.


Mustaqhusain KaziMustaqhusain Kazi is the Global Head of Roche Informatics Strategy and Digital Innovation. He is responsible for leading and driving the strategy and digital innovation to meet the needs of our Pharma, Diagnostics, and Insights businesses. He has over 20 years of experience in pharmaceutical and life sciences, high-tech, petrochemicals, and renewable energy. He has held senior leadership roles to build high-performing organizations responsible for driving end-to-end strategy, architecture, and execution of enterprise information systems meeting business requirements across product development lifecycle and manufacturing.

Neeraj AgarwalNeeraj Agarwal is Distinguished Enterprise Architect for Foundational Technologies, Personalized Healthcare and Science at Roche. Neeraj is a passionate technologist with a proven track record of transforming visions into reality. His expertise spans Architecture, Engineering and Product Management for Advanced Analytics, Data Management, Cloud Platforms, DevOps and Security services enabling integrated data ingestion, curation, self service analytics capabilities (Statistical, HPC, Reproducible Research, MLOps), collaborations for insights generation, visualization and sharing. In his free time, he likes to hike bay area trails with family and friends.

Oiendrilla Das

Oiendrilla Das

Oiendrilla Das is Customer Advocacy Lead for Life Sciences and Genomics Marketing for AWS. She comes from a background in life sciences marketing, with a specialty focus on life sciences and cloud computing. Oiendrilla holds an MBA degree in marketing and completed her engineering in Biotechnology prior to her MBA degree.

Subrat Das

Subrat Das

Subrat Das is a Senior Solutions Architect and part of Global Healthcare and Life Sciences industry division at AWS. He is passionate about modernizing and architecting complex customer workloads. When he’s not working on technology solutions, he enjoys long hikes and traveling around the world.

Sunil Aladhi

Sunil Aladhi

Sunil Aladhi is a Senior Technical Account Manager and part of Global Healthcare and Life Sciences industry division at AWS. He leads a global team to help Life Sciences customers operate their workloads optimally on AWS. Sunil has advised AWS customers across a diverse set of industries to design and operate a broad variety of workloads using AWS Services. Apart from work, he loves spending time with his family and traveling.