AWS Public Sector Blog
Emory University supports AI.Humanity initiative with high-performance computing on AWS
Emory University is a private research university in Atlanta, Georgia. Founded in 1836, Emory has nine academic divisions and is known for its scholarly engagement across the humanities, social and natural sciences, medical disciplines, and high scientific performance and citation impact. In 2022, Emory launched the AI.Humanity initiative to explore the societal impacts of artificial intelligence (AI) and influence its future development to serve humanity. Over the next few years, the initiative intends to recruit up to 60 new faculty, create a community of scholars, develop AI-focused educational programs, and advance the ethical use of AI. Emory aims to be a leading advocate for ethical use of AI and a top destination for students and faculty seeking to understand and apply its transformative technologies.
To support the computing needs of AI.Humanity, Emory uses Amazon Web Services (AWS) to provide:
- Elastic computing: Researchers can spin up as many Amazon Elastic Compute Cloud (Amazon EC2) instances as needed to handle parallel processing and simulations, which is far more scalable than traditional on-premises hardware.
- GPU computing: Researchers have access to state-of-the-art graphics processing units (GPUs) like Tensor Core A100s and H100s.
- Storage options: Emory uses high-speed file systems, like Amazon FSx for Lustre, to meet the performance needs of their high-performance computing (HPC) workloads, along with virtually unlimited storage capacity with services like Amazon Simple Storage Service (Amazon S3).
- Fast and convenient deployment: Tools like AWS ParallelCluster conveniently deploy HPC environments with clusters, networking, storage, and so on, in a matter of minutes, instead of months with traditional on-premises HPCs. It also offers a familiar HPC user experience, hiding AWS Cloud complexities.
- Security and compliance: AWS supports security-sensitive standards like the Health Insurance Portability and Accountability Act (HIPAA), the Federal Risk and Authorization Management Program (FedRAMP), and the National Institute of Standards and Technology (NIST) 800-171 compliance requirements.
- Cost efficiency: Emory only pays for the HPC resources used and can update resources as needed with little lead times.
AWS makes it easy to create HPC environments tailored to research needs with flexible, scalable, and secure cloud resources, enabling researchers to accelerate discoveries.
This post aims to provide an overview of how AWS supports Emory’s roadmap for AI.Humanity and to provide a model for other research institutions to reference for building out a cloud-based HPC cluster on AWS.
Building an HPC solution for AI research on AWS
Emory considered several options that would provide scalable, cost-effective, and efficient HPC solutions for their current and future faculty. As the first step in their multifaceted infrastructure plan, they decided to move forward with a cloud-based HPC running on AWS, which provided a valuable combination of elastic compute, access to latest generation GPUs, layered security and compliance, and cost efficiency. Emory worked on its HPC solution with AWS and SchedMD, the developer of an open-source compute workload management software called Slurm. Using AWS ParallelCluster, an open source cluster management tool that helps quickly build HPC compute environments on AWS, Emory built a highly customized Slurm-based HPC. Emory’s cluster uses Amazon S3, Amazon Elastic Block Store (Amazon EBS), and FSx for Lustre for highly parallelized and high input/output (I/O) intensive workloads. The cluster provides a familiar environment for researchers, with Slurm commands for job submission, multi-partition setup, and Slurm accounting for job usage and cost tracking.
Researchers at Emory can log on to the cluster and submit jobs from their university-owned laptops, on-premises workstations or servers, from their on-premises HPC cluster, or through Amazon EC2 instances running on their individual AWS account. The cloud cluster features two P4 (8x NVIDIA A100 GPUs) instances, both purchased as reserved instances, along with other various-sized Amazon EC2 instances using compute optimized, storage optimized, and accelerated computing instances spread across multiple availability zones.
During proof-of-concept (POC) and early-adopter phases, Emory researchers leveraged the power of the cloud-based HPC to scale complex projects. One researcher was able to complete distributed Al training with eight NVIDIA A100 GPUs on 21,517 single-framed images. This was not possible with the current on-premises cluster, given that these required GPU resources were not available in a timely or cost-effective manner. Since the production cloud-based HPC was launched, a growing number of Emory researchers have been using the environment.
Security is a core tenet for Emory. One of the reasons they chose AWS was so they could build their HPC on infrastructure that is secure and meets a variety of compliance standards, such as HIPAA and NIST 800-171.
Emory’s AWS-powered HPC cluster illustrates the university’s commitment to support growing AI scholarship by increasing its scholarly computing and infrastructure services. The enhancements made to Emory’s HPC capabilities are an important component of the AI.Humanity initiative, facilitating Emory’s work to shape the AI revolution to better human health, generate economic value, and promote social justice.
Conclusion
AWS gains valuable insights from working with institutions like Emory University. These insights help inform the continued engagement of AWS services for the research community. Through continued collaboration, our work with Emory has the potential to develop repeatable solutions and best practices for HPC research by other universities. This highlights the deep commitment of AWS to furthering innovation in higher education and pushing the boundaries of what is possible in cloud computing.
Visit Getting Started with HPC on AWS to learn more. You can use the resources there to start experimenting with AWS via a sample project or tutorial, gain deeper insight through whitepapers and videos, or find an AWS Partner to get hands-on guidance.
You can also visit our AWS HPC Blog to learn of other use cases being built by AWS customers, as well as new features and development updates about AWS ParallelCluster and other HPC architectures.