Emerging trends in cloud for advanced research computing
Research computing has come a long way from the mainframes of the 1960s. At the recent Practice and Experience in Advanced Research Computing (PEARC) conference, I noted four emerging themes that underscore how the field continues to evolve:
AI and ML continue to expand in application
Artificial intelligence (AI) and machine learning (ML) continue to expand in their application to advance research. Besides basic research in methods, machine learning is being used in applications ranging from digital pathology to exploring collections of research papers using steerable AI.
Cloud technology is a natural fit for AI and ML, both in terms of supporting the computational demands of training, and in collecting, storing, and sharing data at scale. At a booth presentation at PEARC, AWS demonstrated how researchers can use Amazon SageMaker, AWS Lake Formation, and AWS Glue to extract information from data stored in Amazon Simple Storage Service (Amazon S3), build a metadata store, query this data, and then analyze results using advanced AI and ML frameworks in a Jupyter notebook environment. If you missed the demo, you can read about the approach in the AWS Machine Learning Blog.
The cloud offers an opportunity for workforce development
The cloud is an opportunity to grow skills in the research computing workforce. New curricula and internship programs are available to train research computing professionals and address major gaps in existing training (such as reproducibility and project management). As scientific fields evolve to take advantage of modern computational resources, researchers need more support in how to use computers for research. This is an opportunity to teach technical skills to students with an interest in science. As a professor, I made it a point to engage undergraduates in my research, often through National Science Foundation (NSF) supplements to support research experiences for undergraduates.
But research computing is just one facet of the technological skills that organizations around the world need to develop. At AWS, we address this at scale through initiatives such as AWS Educate, which provides students and educators access to self-paced training, collaboration tools, and hands-on learning pathways for careers in areas such as ML, data science, application development, and cloud architecture.
Building tools for research: Gateways, workbenches, and workflows
Research computing experts can build things that enable research discoveries and outcomes. Many papers center on science gateways, workbenches, and workflow managers. Workflow managers are software products that allow researchers to express and execute complex parallel pipelines that might take advantage of HPC resources. Workbenches are environments that provide a simpler development environment to workflow managers. Science gateways are web-based resources for accessing data, software services, and computation. These represent different levels and approaches for allowing researchers to access and process data more effectively, advancing and accelerating science.
AWS presented a tutorial on “Best practices for research HPC in the cloud,” emphasizing how cloud technologies which were developed primarily to support commercial applications have evolved to provide technical capabilities for high performance computing (HPC), while maintaining reliability, availability, and durability. HPC clusters can grow and shrink, and use of containers and NextFlow can make it possible to execute tightly coupled and loosely coupled workflows efficiently on the cloud, demonstrating the building blocks of research computing gateways on AWS. Check out our tutorial.
Beyond science: HPC for humanities
Many speak of the crisis in humanities at universities, where students turn away from majors such as art history or literature to majors that are the fastest growing in the job market. But science gateways have moved beyond science—bringing the growing job market to the humanities.
The SnowVision gateway is a pioneering application of HPC to humanities. Archaeologists recognized long ago that the designs on fragments of Native American pottery from the southeastern United States could be used to track populations and evolution of artistic designs, but this pattern matching was done by humans, and was slow and laborious. SnowVision uses advanced computer vision, accelerated by HPC, to automate this process, allowing researchers to upload a shard of pottery and match its design right away.
We also saw ML take an art history twist: a paper by Paul Rodriguez and colleagues described the potential application of deep learning to group historical paintings, opening up interesting questions about what is possible if models could be trained on features that trained art historians can see.
Data and analytics have the power to impact every field, and this power can be realized as more digital collections are created (such as Oxford University’s Global Heritage Collections), using AWS to reduce the costs of digitization.