AWS for Industries

Executive Conversations: Building the Brain Knowledge Platform, with Shoaib Mufti, Data and Technology Lead at the Allen Institute for Brain Science

Shoaib Mufti, Head of Data and Technology at the Allen Institute for Brain Science, joins Lisa McFerrin, Worldwide Lead for Research, Discovery, and Translational Medicine at Amazon Web Services (AWS), to share how the Allen Institute is using the cloud to build the Brain Knowledge Platform (BKP) for the U.S. National Institutes of Health (NIH) BRAIN Initiative Cell Atlas Network (BICAN). The initiative is a global collaboration akin to the Human Genome Project, to map the approximately 200 billion cells in the human brain by their type and function. By mapping brain cells with unprecedented precision and scale using cloud technology, the Allen Institute is creating a path toward breakthrough treatments of brain diseases.

Lisa McFerrin: Thank you for joining us today. To get us started, can you tell us about the Allen Institute’s mission and your role at the organization?

Shoaib Mufti: The Allen Institute is an independent, non-profit bioscience institute that aims to unlock the unknowns of human biology through foundational science. We’re focused on three main areas—the brain, cells, and immunology—with brain science being the longest-standing of the three.

Our approach is to focus on big problems that can’t be solved by one organization alone, and support it with teams of thousands of experts, technology, infrastructure. Using this industrial-level research function, we aim to accelerate progress, share our findings with the world, and make it openly available to the larger research community worldwide.

I lead the data and technology team at Allen Institute for Brain Science, which is currently developing the Brain Knowledge Platform (BKP). My team supports our internal scientists by developing data pipelines for use of analytics technologies such as machine learning and computer vision, as well as data platforms to store, analyze, and gain insights from data. In addition, we simplify the process for our external customers, the neuroscience community, to utilize our data, while offering them access to robust visualization and analytics tools.

LM: You mention the Brain Knowledge Platform. Can you tell us more about the project and how it fits into Allen Institute’s overarching mission?

SM: The brain knowledge platform is an effort (and the largest initiative of its kind!) to understand the complexities of the brain by mapping its cells and cell types, with precision and scale. Envision it to be the first-ever periodic table of the mammalian brain. We are using single-cell genomics to define and map all the cell types in the brain, both in human and the most relevant non-human primate species, for biomedical research. This will provide a reference of unprecedented resolution to understand the functional organization of the brain and the differences that make us unique—both as species and as individuals. Like the reference human genome, this can provide a basis of understanding building blocks in brain development, disorders, and disease, which can inform paths for treatment.

The platform also acts as a central data repository unifying information for neuroscience research, and linking it to make it easier for researchers to get answers to their questions faster. The platform also provides access to interactive data dashboards, enabling users to visualize and interpret complex brain data effectively. Through these dashboards, researchers gain valuable insights into patterns, correlations, and trends within the data, facilitating new discoveries and advancing our knowledge of the brain’s intricacies.

A jointly funded initiative with the NIH’s Brain Initiative Cell Atlas Network (BICAN) project, the platform creates a referenceable map of the current state of knowledge around the brain, and makes it accessible to researchers across the globe, aligning with the institute’s overarching mission of better understanding the human brain with open science.

LM: Can you share more on why this initiative is important, and how it is accelerating our understanding of the brain? Also, what are the practical applications of the brain map?

SM: First is the impact on neuroscience research. Having a single source of integrated data makes it much easier for researchers to get answers to their questions, accelerating the time to discovery. Imagine, rather than going to 20 different publications and doing separate searches and accessing different data sets, the platform provides them with a unified resource operated by a trusted organization like the Allen Institute, to understand what is known about the brain and to test their hypotheses with confidence and easy access. It is also extensible to other domains and enhanced by artificial intelligence and machine learning, expanding the ways researchers can apply it.

Second is the impact on medicine. The human brain is complex and is the least understood organ. In the treatment of many brain diseases, like Alzheimer’s, there is still much to learn about disease progression and mechanisms to intercept cognitive regression. The platform can help researchers identify the neuro circuits that are involved in specific behaviors and cognitive functions, such as learning, memory, emotion, and decision making. This will accelerate the research into treatments for debilitating brain diseases and pave the way for the development of personalized therapies.

And third is the impact on technology. The platform will have important implications for the development of new technologies, like the field of brain-computer interfaces, which allows us to control devices or communicate through thoughts alone.

LM: This is a massive initiative. With the amount of data that is generated, and the analytics that the platform is powering, how is the cloud and your collaboration with AWS catalyzing the initiative?

SM: With the project generating exabytes of data, we couldn’t have done this alone without the partnership with AWS. AWS provides scalability both for storage and compute that we need for things such as large-scale graph analysis, data set training, and machine learning. It is flexible, so it evolves to our needs and allows us to pick the tools we want. With the technology evolving so rapidly, we could not have built what we are building on a data center.

This is a consortia initiative, and we are working with many collaborators across the globe. The cloud helps us unify the distributed data, while managing secure yet seamless access. The built-in privacy and security of AWS create a level of trust which is important in these efforts. And many of those collaborators are also on AWS, which makes it even easier.

The cloud also makes it easy to bring communities together. AWS helps us in creating that social environment for global researchers to come together, along with the tooling they need to access the data, which aligns with our mission around open science.

LM: Are you also looking to leverage generative AI in the future?

SM: Absolutely, we are evaluating generative AI services from AWS, including Amazon Bedrock, to integrate foundation models into the platform.

One of the key features of the Brain Knowledge Platform is its ability to run automated machine learning pipelines. Researchers can leverage this capability to automate data processing, analysis, and interpretation, freeing up valuable time and resources. By harnessing the power of automation, the platform empowers researchers to focus on generating novel hypotheses, conducting in-depth analyses, and pushing the boundaries of knowledge in neuroscience.

The platform can seamlessly integrate generative AI models into workflows, unlocking new dimensions of creativity and innovation. By incorporating generative AI, researchers can explore uncharted territories, simulate brain processes, and generate novel insights that may have remained undiscovered otherwise. This integration of AI can foster a dynamic collaboration between human expertise and machine intelligence, leading to breakthroughs that push the boundaries of our understanding.

LM: What are some of the biggest challenges you’re facing in this journey?

SM: Scale and visualization of the data are the biggest. The sheer size and complexity makes it incredibly hard. The human brain has about 80 billion neurons. And, if you’re comparing multiple brains, you’re bringing a lot of data together. We are trying to integrate across scales—from cells and genes, to  macroscale non-invasive neuroimaging data collected in hospitals. Handling such large datasets over such different scales is extremely difficult to integrate and visualize. That is where the cloud has played a key part in allowing us to continuously scale.

The other challenge is the ease of use and discoverability. We want to drive widespread community adoption by creating trust, and by removing technical barriers around using and contributing to the platform. We are looking at the cloud to build easy interfaces using natural language processing, so users can ask questions rather than having to write computer programs.

LM: This resource and collaborative ecosystem can have an immediate impact on accelerating discoveries in the scientific community. How do you see this shaping the future?

SM: Having a common spatial reference and understanding different cell types could lead to new discoveries around brain-related disorders, improving diagnostics and enabling more personalized treatment. I think that’s going to change the whole field. We will be able to understand not just the genetic code, but how it is expressed at a cellular level, and how it interacts to impact brain function–the most complex organ. We are also working to make the cellular reference the common standard for the field just like the human genome is for genomics, with standardization and tools for the community to map against.  This will immediately harmonize efforts across the field and bring new level of resolution into studies of disease pathways, to help identify new targets for therapeutic intervention.

There is a lot of promise on the clinical side as well. Imagine a world where you go to your doctor’s office, they’re going to look at an MRI, they find something of interest, and they refer to the platform for better diagnosis using the cloud.

Finally, the infrastructure we are building can also be extended to other cell atlases, like the lungs or the heart. With the potential to expand to other organs, there is the potential to reveal the connections, interaction, and dependencies for how the human body functions.

LM: I am inspired at the possibilities this platform will unlock. You’ve started with the hardest organ, the brain, and in developing a reference for the brain, you’re paving the way for other complex networks, from holistic physiological pathways to biochemical and bioelectric responses. We are so glad that AWS can support this hugely important work. Thank you for joining me today, and I look forward to the impact this initiative can bring to the scientific community and future clinical applications in neurology.

Lisa McFerrin

Lisa McFerrin

Lisa McFerrin is the WW Lead for HCLS Strategy & Solutions for Research, Discovery, and Translational Medicine at AWS. Lisa has a background in math and computer science and a PhD in Bioinformatics, with over 15 years experience in software and methods that bridges biomedical data to advance the understanding of cancer biology and improve patient care. She is dedicated to lowering barriers in data analysis to facilitate collaborative and reproducible research.