Category: AWS ParallelCluster
Slurm accounting adds flexibility, transparency, and control to operating an #HPC cluster. #AWS #ParallelCluster 3.3.0 can now automatically configure #Slurm accounting whether you are using your own database or Amazon #Aurora.
In this post we recap all the really significant feature released in DCV from 2022 that delighted our customers. Of course, we’re still not done, so expect more in 2023.
In this post we describe the process to launch large, self-supervised training jobs using AWS ParallelCluster and Facebook’s Vision Self-Supervised Learning (VISSL) library.
AWS ParallelCluster 3.3.0 now lets you define a list of Amazon EC2 instance types for resourcing a compute queue. This gives you more flexibility to optimize the cost and total time to solution of your HPC jobs, especially when capacity is limited or you’re using Spot Instances.
In this post, we highlight a little-known configuration option for Slurm on @awscloud ParallelCluster that can reduce costs and increase your iteration speed by preventing idle batch instances from launching when EC2 capacity is limited.
This post will help you understand the tools available to ease the stress of migrating your cluster (and your users) from SGE to Slurm, which is necessary since the HPC community is no longer supporting SGE’s open-source codebase.
A key part of the development of quantum hardware and quantum algorithms is simulation using existing classical architectures and HPC techniques. In this blog post, we describe how to perform large-scale quantum circuits simulations using AWS ParallelCluster with QuEST, the Quantum Exact Simulation Toolkit. We demonstrate a simple and rapid deployment of computational resources up to 4,096 compute instances to simulate random quantum circuits with up to 44 qubits. We were able to allocate as many as 4096 EC2 instances of c5.18xlarge to simulate a non-trivial 44 qubit quantum circuit in fewer than 3.5 hours.
In this blog post, we discuss the AWS solution that Amazon’s construction division used to conduct large-scale CFD fire simulations as part of their Fire Strategy solutions to demonstrate safety and fire mitigation strategies. We outline the five key steps taken that resulted in simulation times that were 15-20x faster than previous on-premises architectures, reducing the time to complete from up to twenty-one days to less than one day.
AWS ParallelCluster version 3.2 introduces support for two new Amazon FSx filesystem types (NetApp ONTAP and OpenZFS). It also lifts the limit on the number of filesystem mounts you can have on your cluster. We’ll show you how, and help you with the details for getting this going right away.
AWS ParallelCluster version 3.2 now supports memory-aware scheduling in Slurm to give you control over the placement of jobs with specific memory requirements. In this blog post, we’ll show you how it works, and explain why this will be really useful to people with memory-hungry workloads.