AWS HPC Blog
Category: AWS ParallelCluster
Introducing a community recipe library for HPC infrastructure on AWS
Today we’re showing you our community library of HPC Recipes for AWS. It’s a public repo @github that will help you achieve feature-rich, reliable HPC deployments ready to run your workloads no matter where you’re starting from.
How Maxar builds short duration ‘bursty’ HPC workloads on AWS at scale
In this post, we hear from Maxar’s WeatherDesk team on how they deploy their HPC workloads using a “fail fast” software development technique so they can be sure of meeting customer deadlines for their business.
Bursting your HPC applications to AWS is now easier with Amazon File Cache and AWS ParallelCluster
Today we’re announcing the integration between Amazon File Cache and AWS ParallelCluster – super important for hybrid scenarios. We’ll show you how it works and how to deploy it.
The plumbing: best-practice infrastructure to facilitate HPC on AWS
If you want to build enterprise-grade HPC on AWS, what’s the best path to get started? Should you create a new AWS account and build from scratch? In this post we’ll walk you through the best practices for getting setup cleanly from the start.
Automate your clusters by creating self-documenting HPC with AWS ParallelCluster
Today we’re going to show you how you can automate cluster deployment and create self-documenting infrastructure at the same time, which leads to more repeatable results that are easier to manage (and replicate).
Running protein structure prediction at scale using a web interface for researchers
Today, we’ll show you our open-source sample implementation of a web frontend and cloud HPC backend to support researchers using AI tools like AlphaFold for drug discovery and design.
Instance sizes in the Amazon EC2 Hpc7 family – a different experience
Hpc7g is the first Amazon EC2 HPC instance offering with multiple instance sizes, but this is quite different from the experience of getting smaller instances from other non-HPC instance families. Today, we want to take a moment to explore why this is different, and how it helps.
Customize Slurm settings with AWS ParallelCluster 3.6
With AWS ParallelCluster 3.6, you can directly specify Slurm settings in the cluster config file – improving reproducibility and another step towards self-documentation for your HPC infrastructure.
Introducing GPU health checks in AWS ParallelCluster 3.6
AWS ParallelCluster 3.6.0 can now detect GPU failures in HPC and AI/ML tasks. Health checks run at the start of Slurm jobs and if they fail, the job is requeued on another instance. This can increase reliability and prevent wasted spend.
Elastic visualization queues with NICE DCV in AWS ParallelCluster
In this blog post we’ll show you how to create an elastic pool of visualization nodes, by combining AWS ParallelCluster with NICE DCV in a novel way.








