What Data Egress Means for Higher Education: A Q&A with Internet2

In March, we announced that AWS is offering a data egress discount to qualified researchers and academic customers, making it easier for researchers to use its cloud storage, computing, and database services by waiving data egress fees. We had the opportunity to sit down with Andrew Keating, Director of NET+ Cloud Services at Internet2 to discuss the impact of the data egress for higher education and how the cloud is transforming research in the academic world. Andrew received his Ph.D. from UC Berkeley and prior to his current role worked at the university building cyber infrastructure and programs to support data-intensive research.

Internet2, a member-owned advanced technology community, provides a collaborative environment for U.S. research and education organizations to solve common technology challenges and to develop innovative solutions in support of their educational, research, and community service missions.

What does Internet2 hear from its member institutions about how they are using cloud computing and the benefits it provides them?

Andrew: Over the past few years, universities have shifted their thinking from whether to deploy cloud services to a conversation about how to do so strategically and effectively. Several universities have adopted “cloud-first” strategies to move all or most of their enterprise IT services to the cloud. Even those universities that do not specifically call out “cloud first” are moving significantly in that direction.

Researchers have been using cloud services before the term became popular, in the sense that cross-institutional collaboration and sharing of research data sets has been taking place for decades. These days, the significant shift is that researchers are looking to commercial cloud providers as an alternative to building their own “clouds” through on-premises hardware. The efficiencies and time-to-research gains they are making are already substantial and they are able to more effectively use their grant dollars.

What are researchers using the cloud for?

Andrew: At the high end of data intensive research, the cloud is enabling more efficient deployment of storage and compute capabilities and on-demand capabilities are dramatically reducing the time it takes to begin a research project. In some respects, physicists, astronomers, and others with data or compute intensive needs have had these capabilities through supercomputing centers for some time. What’s different for them are the economic efficiencies of being able to spin up a virtual supercomputer on demand, as well as not having to deal with the hardware installation, maintenance, or waiting for a schedule slot to open up.

In my opinion, the biggest and most immediate impact of the cloud on research is making storage and compute capabilities more accessible to researchers broadly and this would previously have only been available through a supercomputing center. The cloud lets researchers get to work almost immediately.

What can researchers do now that wasn’t possible before the cloud? How does cloud help them?

Andrew: Cloud services have reduced the administrative and technical barriers to engaging in research activities and scientific discovery. Researchers are able to deploy the resources they need on demand without purchasing, maintaining, or administering hardware. We are also beginning to see the impact of cloud-based machine learning and analytics that will further transform scientific discovery. Even as recently as a few years ago, researchers analyzing a data set would need to have an idea about patterns or trends they wanted to find in the data. Machine learning is increasingly helping detect patterns or correlations in large data sets, and the specialized skills and domain expertise of the researcher can be focused on understanding and interpreting the meaning of those machine-generated observations or patterns.

Why is data egress so important to researchers?

Andrew: Reducing or eliminating data egress charges for researchers eliminates one more perceived barrier to cloud service adoption. When a researcher receives a grant, does the work, and decides to store the data on AWS, having to come back later and pay to move it out presents some conceptual and practical problems. For one, the researcher may have exhausted the grant funds, so depending on the amount of the charge there could be a financial burden. More conceptually, there was unease in the research community about the perception that data was “held hostage” or that their research work product would not be accessible to them or their colleagues. We are happy that AWS listened to its customers and responded to the needs of researchers who identified data egress charges as a barrier to broader adoption of cloud services.

