AWS HPC Blog
Category: Compute
Instance sizes in the Amazon EC2 Hpc7 family – a different experience
Hpc7g is the first Amazon EC2 HPC instance offering with multiple instance sizes, but this is quite different from the experience of getting smaller instances from other non-HPC instance families. Today, we want to take a moment to explore why this is different, and how it helps.
Application deep-dive into the AWS Graviton3E-based Amazon EC2 Hpc7g instance
In this post we’ll show you application performance and scaling results from Hpc7g, a new instance powered by AWS Graviton3E across a wide range of HPC workloads and disciplines.
How SeatGeek simulates massive load with AWS Batch to prepare for big events
In this post we explore SeatGeek’s load testing system that simulates 50k simultaneous users. Originally built to prep SeatGeek for large-event traffic spikes, it now runs weekly to help them harden their code.
Customize Slurm settings with AWS ParallelCluster 3.6
With AWS ParallelCluster 3.6, you can directly specify Slurm settings in the cluster config file – improving reproducibility and another step towards self-documentation for your HPC infrastructure.
Protein Structure Prediction at Scale using AWS Batch
In this post, we discuss how Novo Nordisk approached the deployment of a scale-out HPC platform for running AlphaFold, while meeting their enterprise IT requirements and keeping the user experience simple.
Streamlining distributed ML workflow orchestration using Covalent with AWS Batch
Complicated multi-step workflows can be challenging to deploy, especially when using a variety of high-compute resources. Covalent is an open-source orchestration tool that streamlines the deployment of distributed workloads on AWS resources. In this post, we outline key concepts in Covalent and develop a machine learning workflow for AWS Batch in just a handful of steps.
Introducing GPU health checks in AWS ParallelCluster 3.6
AWS ParallelCluster 3.6.0 can now detect GPU failures in HPC and AI/ML tasks. Health checks run at the start of Slurm jobs and if they fail, the job is requeued on another instance. This can increase reliability and prevent wasted spend.
Benchmarking the Oxford Nanopore Technologies basecallers on AWS
Oxford Nanopore sequencers enables direct, real-time analysis of long DNA or RNA fragments. They work by monitoring changes to an electrical current as nucleic acids are passed through a protein nanopore. The resulting signal is decoded to provide the specific DNA or RNA sequence by virtue of compute-intensive algorithms called basecallers. This blog post presents the benchmarking results for two of those Oxford Nanopore basecallers — Guppy and Dorado — on AWS. This benchmarking project was conducted in collaboration between G42 Healthcare, Oxford Nanopore Technologies and AWS.
Run Celery workers for compute-intensive tasks with AWS Batch
Many applications leverage distributed task systems like Celery to handle asynchronous work. In this post, we describe how to handle compute-intensive Celery tasks using AWS Batch to scale the compute resources and run worker agents.
Simulating climate risk scenarios for the Amazon Rainforest
In this post, we discuss the “tipping point” problem, using HPC at a large scale to simulate the impact of deforestation to the risk of accelerating damage to the Amazon rainforest.