AWS for Industries

Executive Conversations: Catalyzing the next generation of cancer care with Prabhu Arumugam and Emma McCargow of Genomics England

What does the future of cancer treatment look like for patients, doctors, and researchers, and what is cloud computing’s role in its success? Rowland Illing, Director and CMO, Government Health at Amazon Web Services (AWS), sat down with Prabhu Arumugam, Genomics England’s Director of Clinical Data and Imaging, and Emma McCargow, Program Lead for Cancer at Genomics England. In this interview, Arumugam and McCargow discuss how Genomic England’s plan for building the world’s largest cancer research platform will advance precision medicine in oncology.

Rowland Illing: Thank you so much for joining today. To get us started, can you give us an overview of Genomics England’s mission and an overview of your roles at the organization?

Prabhu Arumugam: At Genomics England, our mission is to realize the enormous potential of genomic information to further precision medicine. We support our mission in two ways. First, by delivering and continually improving genomic testing capabilities to help clinicians better diagnose and treat patients. We partner with the NHS [National Health Service] to provide whole genome sequencing insights to inform personalized treatments for rare diseases and cancer patients. Second, we enable genomic research by sharing pseudo-anonymized patient data with researchers to make new diagnostic and therapeutic discoveries, which can then be used to benefit patients.

We think of this as an infinity loop—the more we do in genomic healthcare, the more powerful the research assets become, and the more we learn in research, the smarter healthcare delivery becomes. This creates more ways for us and the NHS to improve patient outcomes. I am also Genomics England’s Caldicott Guardian and my role centers on governance—ensuring the protection of participant data and overseeing how it’s used safely.

Emma McCargow: I head the Cancer 2.0 program, which is Genomic England’s initiative to build the world’s largest cancer platform to support an earlier, faster diagnosis of cancer. Cancer is an exquisitely complex disease, estimated to affect 1 in 2 people in the UK, per the NHS. It is also extremely personal to me. I lost both my parents to cancer. So, the launch of Cancer 2.0 last year is really close to my heart.

RI: Let’s talk about the Cancer 2.0 program. How is it different from previous initiatives by Genomics England, and how does it further your mission around precision oncology?

EM: With Cancer 2.0, we have introduced two disruptive technologies to better understand cancer. The first is long read sequencing technology, which provides a more granular view of mutation patterns, repetitive regions, and markers for gene silencing or activation. As a result, we can generate new insights about the genome that were previously inaccessible. This helps to more accurately diagnose and treat patients.

The second is multi-modal data analysis. We are combining disparate data sets that were previously siloed, and held in different formats and systems, across different health disciplines. Now we can analyze them in tandem using artificial intelligence (AI) and machine learning (ML). This will help us generate a more holistic picture of cancer and get answers to questions for faster and earlier diagnosis.

We expect the program to run into early next year. Once ready, the platform can become a significant driving force in making genomics part of the routine healthcare in the NHS. We also want this to become an enabling platform for the global research community. We want both the academia and the industry (the biotech and pharma companies) to leverage the data and develop the next generation of cancer diagnostics and precision medicine.

RI: You mentioned multi-modal data. What are the different data sources? And, what is the value in bringing these datasets together?

PA: With this program, we’re bringing together DNA sequencing, imaging, pathology, clinical, and therapeutic data to significantly improve our knowledge of cancer, and understand its evolution at a genetic and protein level. We already have whole genome sequencing data from the broadest range of cancer indications from our 100,000-genome program. Through our collaboration with the NHS, we’re now combining that data with digital pathology and radiology images (like MRIs and CT scans) from more than 15,000 participants across 84 hospitals in England, representing 20 solid cancer subspecialities. This will allow researchers to look at the molecular as well as the spatial context of solid tumors. We’re also adding longitudinal clinical data from patient health records, which includes data on interventions, disease progression, and treatment response through the timeline of a person’s clinical care. So, the scale is not just in terms of the number of patients represented, but also the richness, diversity, and granularity of the data captured.

Combining this data together will allow us to build machine learning models that leverage valuable information across modalities. Researchers will get the opportunity to study the genetic sequence and images of a tumor in tandem, which will push forward the development of more effective precision therapies and diagnostics. In the future, it will allow clinicians to apply artificial intelligence to make better clinical decisions, by predicting a patient’s prognosis, disease progression, response to treatment, and survival—thus raising the bar on personalized health

EM: We’ve never collected this size of a library before. The long-time global standard reference database in cancer research was The Cancer Genome Atlas (TCGA), which covered a few  cancer types, with data from a couple thousand genomes. With Cancer 2.0, we are anticipating onboarding 250,000 curated digital pathology and accompanying radiology images in conjunction with the vast genomic data from our 100,000-genome program. Apart from the scale, this data is also diverse and inclusive, which will help us understand how cancer evolves in different ethnicities. It will also help us study toxicity responses and side effects of oncology treatments in different patient types, thus fueling the development of targeted therapies where patients can have a better quality of life and response to treatment.

RI: Given the sensitive nature of this patient data, a critical aspect I know you’ve spent significant time on is security and data privacy. How are you addressing concerns around this?

EM: Data privacy and security is at the forefront of everything we do. Genomics England works hard to engage with the public and build trust and transparency around our programs. We only take data from participants on receiving their consent. We feel honored when participants allow us access to their data, and we do everything to protect it and treat it with the utmost respect and sensitivity. Our TRE (trusted research environment) provides researchers access to this data while enabling data security, compliance, and patient privacy.

PA: Data privacy and security is a fundamental tenet of how we operate, and we uphold ourselves to the highest standards. The data that we receive from our sources, like the NHS sites or the national suppliers, goes through an extensive process of de-identification. We remove all patient PID [Personal Identifiable Data] and pseudonymize that data. Every data that goes into our research environment is tested for PID before it becomes available. All participants get a unique ID, so that the data cannot be correlated to a specific participant. We also have a data protection team to establish protocols around how we manage this data inside and outside of the organization.

As a program using cloud technology powered by AWS, we have built a network architected to protect our patient’s information, and our applications and devices. This helps us further strengthen our security posture and meet compliance requirements around data locality, protection, and confidentiality—which are critical to the success of this program.

RI: Speaking about cloud, what has been the role of cloud technology and AWS in bringing this program to life?

PA: From the beginning, we wanted a research environment that was cloud-based. The cloud has transformed the way we, and our researchers, access and analyze the data. Working with AWS has enabled the program to meet the security and compliance requirements, while simplifying logistics. Using AWS, we have streamlined the collaboration with our partners, so that the pathology images and radiology scans can go into the cloud directly from the NHS sites. We don’t have to physically move hard drives around and plug them into our systems—the process today is natural. AWS has also made our program scalable, as we no longer have to worry about the size of the data we’re bringing it, thanks to the unlimited storage. AWS has also provided the unlimited compute power needed to run these massive experiments. Most importantly, the cloud has fueled collaboration at an unprecedented level, irrespective of physical locations, while protecting sensitive data—which has been the fundamental goal of the system. We’ve got researchers working across the world annotating images and looking at genomes together. This wouldn’t have been possible without the cloud.

EM: We’re currently in the process of deploying a picture archiving and communication (PAC) system called Sectra on AWS. This will help us further strengthen our security and compliance posture. The tools that AWS provides have been fundamental to the research at Genomics England, truly increasing the usability of the cloud. AWS has also provided us with a robust set of capabilities for our varied machine learning use cases, like Amazon SageMaker, which helps us explore the data in new ways.

RI: With all the research in this space, how do you see cancer diagnosis and treatment evolving in 5 to 10 years from now?

PA: With the Cancer 2.0 program, we at Genomics England are trying to push the boundaries of what we know and accept. Whether it is bringing down the cost of whole genome sequencing, to testing the limits of how fast can we return whole genome results, or transforming diagnostics and population-level health economics. We’re trying to change the way we can apply genomics at the bedside. Imagine a patient coming into a cancer clinic, and getting their tests, scans, biopsy and results within hours. We’re pushing the boundaries so every cancer patient can benefit from genomic healthcare.

EM: I hope to see a future with better patient outcomes and an improved treatment experience for our patients. If we can be a tiny cog in that wheel by enabling research, making it happen, and having that translate directly to the clinic—I think that would be an amazing achievement.

RI: Absolutely. Oncology genomics is a rapidly evolving field and is a key priority for AWS. I appreciate you taking the time to discuss your breakthrough in building the world’s largest cancer data program, and how AWS could power this innovation.


It is an exciting time in genomics, and we look forward to seeing what the future holds. See how AWS is supporting other life science researchers in their quest to expand biological understanding and improve human health.

Prabhu Arumugam

Dr. Prabhu Arumugam is the Director of Clinical Data & Imaging and the Caldicott Guardian at Genomics England. As a pathologist by training, Prabhu’s key focus is on improving cancer clinical data and leads the multimodal diagnostics in cancer project.

Emma McCargow

Emma McCargow is the Program Lead for the Cancer at Genomics England. She is responsible for operating the day-to-day delivery of the strategic genomics agenda in Cancer 2.0, which includes maximizing patient benefit and enhancing cancer research.

Rowland Illing

Rowland Illing

Dr. Rowland Illing is the Chief Medical Officer and Director of International Government Health for Amazon Web Services (AWS). He has responsibility for public sector healthcare strategy and operations for AWS internationally, excluding the US & China. This encompasses healthcare service delivery, research and genomics. He is passionate about the delivery of person-centered care, increasing access and improving outcomes at a lower cost by accelerating the digitization and utilization of healthcare data.