AWS Public Sector Blog
Brain Workshop Meets Cloud
The Allen Institute for Brain Science in Seattle and the University of Washington recently hosted a two-week, intensive workshop on computational neuroscience. It offered advanced graduate and post-graduate students an introduction to the current state of the neurobiology of sensory processing, including anatomy, physiology, and neural coding. For the first time, attendees had the opportunity to compute massive datasets in the cloud through a collaboration with AWS.
We reached out to David Feng, Associate Director of Technology at the Allen Institute for Brain Science, to learn more about the workshop’s first experiment with the cloud.
What are the goals of your workshop on the Dynamic Brain?
The workshop, now in its fourth year, prepares late-stage graduate students interested in neuroscience, physics, applied mathematics, data, and computer sciences, and post-doctoral fellows for research careers in neuroscience. It provides training in computational and analytical methods, collaborative and open-source tools, and direct experience with a breadth of data types ranging from single cell morphology to neural population dynamics.
The main objective is for students to develop and execute projects with their peers, under the advisement of course faculty. While these projects are computational in nature, they use empirical data collected by the Allen Institute for Brain Science. The course takes place at the Friday Harbor Laboratories of the University of Washington on San Juan Island.
What were you able to do at this year’s workshop that you couldn’t do before?
Students accessed an order of magnitude more data than they had in the past. In previous years, we provided students external hard drives containing data for their course projects. This limited a project’s scope to only the data we could fit on a hard drive. This year, we provided students with 35TB of data through Amazon Simple Storage Service (Amazon S3), including our highest resolution datasets that do not fit on most external hard drives.
This approach was extraordinarily successful, enabling reliable and high-powered computation and collaborative projects. Students spent more time analyzing data and less time configuring their software toolchains. We deployed a JupyterHub cluster, which dynamically provisioned Docker-based instances that come preconfigured with a host of hard-to-configure dependencies. Rather than spending days setting up development environments, students could click a link and start working immediately.
Additionally, the ease of retrieving large, custom compute configurations enabled new types of projects. Students tend to limit their analyses to what they can easily run on their laptops. This year, the base instance we provided them was more powerful than most of the laptops they brought. One participant wanted to play with deep neural networks, so we spawned a GPU instance to use with the necessary dependencies and data volumes preconfigured.
We were pleased to see that a majority of the students (and instructors) opted to analyze data in the cloud, rather than using their laptops and tethered hard drives.
What changes did you have to make now that the data is available in the cloud?
No software changes were necessary. The Allen Institute provides a Python software package to manipulate our data organized on a standard filesystem, (AllenSDK). This year we used S3FS to volume-mount our S3 bucket. From the students’ perspective, it appeared identical to their external hard drives. This meant that all of our pre-existing code worked out of the box. The majority of our technical course preparation went into Docker image and compute instance configuration.
What’s next for workshop participants? Will they continue to work with the data even though the course is over?
We have encouraged students to continue working on their projects in the JupyterHub cluster. Successful course projects tend to take on a life of their own after the course ends, and in previous years, some projects have led to peer-reviewed publications or conference presentations. We will continue to provide students with access to the data and analysis resources they had during the course. This is straightforward now that most of those resources are in Amazon S3 and Amazon Elastic Compute Cloud (Amazon EC2).
Learn more about how this workshop empowers students to advance neuroscience research through data analysis.