AWS for Industries

Executive Conversations: The era of genomics in the cloud with Peter Goodhand, CEO, Global Alliance for Genomics & Health

Peter Goodhand, CEO of the Global Alliance for Genomics & Health (GA4GH), joins Lisa McFerrin, Worldwide Lead of Genomic Bioinformatics at AWS, to discuss how secure storage and responsible sharing of genomic data in the cloud can benefit human health. GA4GH is a nonprofit alliance dedicated to creating frameworks and standards that facilitate data sharing between researchers and medical professionals across the globe.

This Executive Conversation is one in a series of discussions held with thought leaders in life sciences and genomics, where we seek to learn more about the impact of technological innovation and cloud computing on their industries.

Lisa McFerrin (LM): GA4GH has been around for almost a decade. How has your mission changed with the evolving genomics industry?

Peter Goodhand (PG): The leaders who started GA4GH saw in 2013 that the way we conducted research in the preceding decade was going to change dramatically. We recognized an opportunity not just to do bigger, better research, but to do it in a way that would accelerate learning, accelerate knowledge, and benefit human health. Over the past decade, we’ve witnessed a global movement towards population genomics, the migration of genomics from research into the clinic, and the intersection of genomics with novel technologies such as cloud computing and machine learning.

While the scope of GA4GH has evolved – and will continue to evolve – with industry and scientific advancements, our mission has remained the same: to enable responsible, international sharing of genomic data in order to benefit human health and medicine. GA4GH brings together more than 600 leading organizations across healthcare, research, life science, and technology to create scalable frameworks and standards to enable interoperability across the global genomics community.

The genome itself is just four letters, but without standardized protocols, frameworks, and standards, we may characterize or annotate them differently. It’s up to the leaders of both research and healthcare to decide whether genomics becomes an overwhelming, noisy tsunami of data or a treasure trove of information that can alter the future of healthcare.

LM: The adoption of genomics is accelerating across a number of industries, including infectious disease tracing and pharma R&D. What new trends are you seeing as a result?

PG: People are starting to say we should stop using the word “astronomical” and instead say “genomical” to describe massively large data sets. There’s been an explosion in the scale and variety of genomic data collection, mainly fueled by reduced sequencing costs. Twenty years ago, the price of a single genome was $3 billion. Today, that number has dropped to less than $1000. What took 10 years can now be done in a day. And even those advancements are small compared with the growth we expect in the next decade.

Several key trends have emerged as a result. The first is the rise of population genomics programs across the globe, which seek to drive innovation in healthcare and accelerate discovery by combining clinical information with genomic data at scale. As these programs mature, we’re starting to see organizations like the UK Biobank and the US National Center for Biotechnology Information make their data available to the larger research community through the AWS Cloud. By democratizing population-scale data amongst researchers, biopharma organizations, and the larger genomics community, interoperable standards promise to unleash a new wave of scientific discoveries and healthcare breakthroughs.

Similarly, we’re seeing increasing adoption of concepts like a “pre-competitive space,” in which a group of industry providers can come together to generate shared knowledge before developing proprietary solutions that build on that foundation. Researchers are starting to recognize that providing either reciprocal or public access to their own data can allow them to combine it with other datasets and allow for more statistically significant, more impactful research.

The last trend I’ll touch on is the intersection of genomics and novel technologies such as machine learning (ML). In particular, I see a real role for things like machine learning in the distillation of genomic knowledge into actionable clinical knowledge. There are two main areas where we’ve begun to see ML drive change: accelerating scientific breakthroughs and driving clinical applications. We’re starting to see organizations leverage ML to rapidly extract insights from large cohorts of data and identify new correlations. With ML, biopharma organizations can bring genomics into drug discovery and development to migrate from one-size-fits-all drugs to personalized treatments. And at the bedside, ML can help healthcare providers rapidly find the answers needed to provide an individualized treatment course based on the patient’s genetic profile.

 LM: How would you say genomics has evolved as a result of the shift towards cloud-based storage?

PG: Cloud technologies have become a vital part of genomic research and data management. Ten years ago, there were only a handful of examples of genomic data in the cloud and now it’s becoming the standard. When we compare traditional approaches to modern cloud approaches, you really see the power, scale, security, and flexibility made possible by new cloud capabilities and technologies.

Within GA4GH, we have a dedicated cloud workstream focused on bringing genomic knowledge into a cloud environment and thinking about things such as workflow execution tools, registry data, repository services, and so on. Five or six years ago, the genomics community was not really thinking in these terms; now we’re talking about them on a daily basis. As a result, we’re seeing organizations of all sizes and disciplines accelerate their pace of innovation through access to lower-cost high performance compute and storage services.

While speed and scalability are important, responsible genomic data security and sharing remain vital. People are getting used to these new technologies and to the fact that cloud environments are often more stable and more secure than on-premise data management. It can be a very difficult thing for many people to get their heads around. Cloud environments can be more secure than other data management options.

A real-world example that comes to mind is one project in Australia, which had transitioned from being a research-only project into merging with clinical practice. They were initially planning to put genomic data in the AWS Cloud and keep all of their patient data on-premises, but when they conducted their security audit, they found that the cloud is far more secure than on-premises storage at any hospital.

This level of security and flexibility enables organizations around the globe to collaborate and build off of each other’s work in a secure and cost-effective environment. As a result, we’re starting to see new concepts like data visiting emerge.

LM: Can you speak a bit more about that ­– the concept of data visiting?

PG: Historically, there were relatively few research centers generating human genomic data, and the datasets were not particularly large. It was therefore completely possible for those      centers to handle all the data on-premises, and for others to download copies of the data they wanted (as long as they had the appropriate permissions). But as the amount of data exponentially scales, that is quickly becoming impossible.

As a result, we’re seeing an increasing shift towards “data visiting” in the cloud – a concept conceived by Barend Mons, Ph.D. in which you identify the data you want to analyze, send your analysis to the institution that houses them, compute remotely, and then receive your aggregated results. The data never have to leave their jurisdictional domain. This concept of data visiting offers a huge potential to bridge the barrier between health data and genomic data, and in many instances, that is enabled by the use of cloud technologies.

LM: What do you think is necessary to make the transition to clinical application and deal with the inevitable explosion of genomic data we expect to see? How do you foresee machine learning playing a major role in that?

PG: The application of innovative technologies like machine learning can help accelerate scientific breakthroughs and clinical applications. I think for healthcare delivery to really achieve the scale of genomic medicine, the solid knowledge from that genomic research will have to be built into clinical decision support. I think we cannot expect the average clinician, the average oncologist, to spend hours trying to understand genomic variants – just as we wouldn’t expect them to spend hours looking at an MRI or PET scan; the radiologists do that.

A physician – for example, a pediatric oncologist – should be able to receive a report, quickly determine a course of action, and then manage the patient with the best knowledge available. As genomics becomes a standard part of healthcare, the research community needs to ensure that this highly specialized, granular, detailed (and often ambiguous) genomic knowledge is translated into actionable clinical knowledge and decision support whenever possible. But even more interesting is not seeing the transfer of information in a linear way – from research to clinical research to health care – but instead seeing it in a circular way. There’s a cycle of knowledge where the research informs healthcare and the data from healthcare further drive research.

LM: How does cloud technology help the research and healthcare communities tackle the sheer scale of genomic information that exists?

PG: We know far more today than we did seven years ago, but we still only understand a fraction of what the genome can tell us. It’s complex. It’s nuanced. There’s huge variety across the seven billion people on Earth. To unravel the genome’s mysteries, we need to be able to analyze it at scale, and we need to link genomic data to health data, which requires real transformation.

When GA4GH began, many people looked at the scale and complexity of genomic data and assumed that technology would be the issue. We knew we were no longer operating on a human-readable level. We needed to move to machine learning and think about artificial intelligence (AI) approaches and cloud storage. So researchers were very focused on the scale, the complexity, the big data aspects of this field, and I think people worried that technological development couldn’t keep up. But the last seven years have shown us that one way or another, data science and technology are able to keep pace with genomic science. It has become clear that the human aspects will be the limiting factors – diverse ethical and regulatory approaches to data sharing across the globe will be much harder to overcome than technological constraints. One of the first things our organization developed was a Framework for Responsible Sharing of Genomic and Health-Related Data, and we’ve continued to update this over the years and to develop more specific guidance that builds on it in areas such as privacy, security, and consent. Ultimately, all of the work GA4GH does – from technical specifications to policy resources – builds on this framework. Building on the notion that all humans have a fundamental right to share in the benefits of scientific advancement, the GA4GH Framework grounds everything we do and continually brings us back to the point of all this: improving the human condition for everyone.

Watch Peter Goodhand’s 2021 AWS HCLS Virtual Symposium presentation, “The importance of data standardization and interoperability in the age of cloud genomics”, now available on demand.

Learn more about Healthcare & Life Sciences on AWS, or read more Executive Conversations on the AWS for Industries blog.


Peter Goodhand is a leader in the global health sector, holding senior executive and board member positions in the health research advancement community. He currently serves as the Chief Executive Officer of the Global Alliance for Genomics in health (GA4GH) and President of GA4GH Inc. The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework. Mr. Goodhand played a critical role in the inception and development phase of the GA4GH as the Executive Director, prior to becoming the CEO. Mr. Goodhand is responsible and accountable for moving forward the GA4GH agenda, identifying and prioritizing objectives and outcomes that align with the overall GA4GH mission. In 2020, GA4GH Inc. was incorporated as a non-profit organization to scale its operations to reliably produce and protect technical standards and policy frameworks, create formal agreements with other international Standards organizations and secure long term engagement from the international genomics community.

Mr. Goodhand previously served as President of the Ontario Institute for Cancer Research (OICR) and President and CEO of the Canadian Cancer Society. Before joining the charitable sector, he had a 20 year career in the global medical technology industry, including strategic leadership roles with multinational healthcare companies and as the founding Managing Director of the Health Technology Exchange (HTX).

Lisa McFerrin

Lisa McFerrin

Lisa McFerrin is the WW Lead for HCLS Strategy & Solutions for Research, Discovery, and Translational Medicine at AWS. Lisa has a background in math and computer science and a PhD in Bioinformatics, with over 15 years experience in software and methods that bridges biomedical data to advance the understanding of cancer biology and improve patient care. She is dedicated to lowering barriers in data analysis to facilitate collaborative and reproducible research.