AWS HPC Blog
Category: AWS ParallelCluster
Instance sizes in the Amazon EC2 Hpc7 family – a different experience
Hpc7g is the first Amazon EC2 HPC instance offering with multiple instance sizes, but this is quite different from the experience of getting smaller instances from other non-HPC instance families. Today, we want to take a moment to explore why this is different, and how it helps.
Customize Slurm settings with AWS ParallelCluster 3.6
With AWS ParallelCluster 3.6, you can directly specify Slurm settings in the cluster config file – improving reproducibility and another step towards self-documentation for your HPC infrastructure.
Introducing GPU health checks in AWS ParallelCluster 3.6
AWS ParallelCluster 3.6.0 can now detect GPU failures in HPC and AI/ML tasks. Health checks run at the start of Slurm jobs and if they fail, the job is requeued on another instance. This can increase reliability and prevent wasted spend.
Elastic visualization queues with NICE DCV in AWS ParallelCluster
In this blog post we’ll show you how to create an elastic pool of visualization nodes, by combining AWS ParallelCluster with NICE DCV in a novel way.
Checkpointing HPC applications using the Spot Instance two-minute notification from Amazon EC2
In this post we show you how to create an HPC cluster and capture the two-minute warning notifications from Amazon EC2 Spot to execute a checkpoint, reactively.
Install optimized software with Spack configs for AWS ParallelCluster
Today, we’re announcing the availability of Spack configs for AWS ParallelCluster. You can use these configurations to install optimized HPC applications quickly and easily on your AWS-powered HPC clusters.
Deploying Open OnDemand with AWS ParallelCluster
In this post, we describe an integration of Open OnDemand with AWS ParallelCluster so admins can provide web-based access to HPC resources beyond what they have at their site, by using the AWS cloud to add new capabilities and extend capacity.
Multiple Availability Zones now supported in AWS ParallelCluster 3.4
In AWS ParallelCluster 3.4, you can now build HPC clusters that span multiple Amazon EC2 Availability Zones. In this post, we describe how the new feature works, how to use it, and some implications for cluster design that it raises.
Leveraging Slurm Accounting in AWS ParallelCluster
Slurm accounting adds flexibility, transparency, and control to operating an #HPC cluster. #AWS #ParallelCluster 3.3.0 can now automatically configure #Slurm accounting whether you are using your own database or Amazon #Aurora.
DCV in 2022: a year in review
In this post we recap all the really significant feature released in DCV from 2022 that delighted our customers. Of course, we’re still not done, so expect more in 2023.