AWS ParallelCluster | AWS HPC Blog

Large scale training with NeMo Megatron on AWS ParallelCluster using P5 instances

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.

Using machine learning to drive faster automotive design cycles

Aerospace and automotive companies are speeding up their product design using AI. In this post we’ll discuss how they’re using machine learning to shift design cycles from hours to seconds using surrogate models.

Announcing the High Performance Software Foundation (HPSF)

We’re excited to share how we’re involved in launching the High Performance Software Foundation to increase access to and adoption of HPC. By bringing together key players to collaborate, we can lower barriers and accelerate development of portable HPC software stacks.

Best practices for running molecular dynamics simulations on AWS Graviton3E

If you run molecular dynamics simulations, you need to read this. We walk through running benchmarks of popular apps like GROMACS and LAMMPS on new Hpc7g instances and Graviton3E processors. The results – up to 35% better vector performance versus Graviton3! Learn how to optimize your own workflows.

Data, emerging technologies, and the circular economy: how Accenture and AWS are unlocking environmental and business impact

Realizing the $4.5 trillion circular economy opportunity requires accurate data, scalable HPC and agile tools. Read this post to discover how AWS and Accenture partner for real progress.

Optimizing MPI application performance on hpc7a by effectively using both EFA devices

Get the inside scoop on optimizing your MPI apps and configuration for AWS’s powerful new Hpc7a instances. Dual rail gives these instances huge networking potential @ 300 Gb/s – if properly used. This post provides benchmarks, sample configs, and real speedup numbers to help you maximize network performance. Whether you run weather simulations, CFD, or other HPC workloads, you’ll find practical tips for your codes.

Build and deploy a 1 TB/s file system in under an hour

Want to set up a high-speed shared file system for your #HPC or #AI workloads in under an hour? Learn how with this new blog post.

Renewable energy transition: examining the impacts of wind energy through simulation

As we move towards a greener future, understanding wind energy’s climate impacts is key. Check out this blog post by our friends at Whiffle, to learn how large-scale simulations reveal wind power’s effect on our atmosphere.

Choosing the right compute orchestration tool for your research workload

Running big research jobs on AWS but not sure where to start? We break down options like Batch, ECS, EKS, and others to pick the right tool for your needs. Lots of examples for genomics, ML, engineering, and more!

Protein language model training with NVIDIA BioNeMo framework on AWS ParallelCluster

In this new post, we discuss pre-training ESM-1nv for protein language modeling with NVIDIA BioNeMo on AWS. Learn how you can efficiently deploy and customize generative models like ESM-1nv on GPU clusters with ParallelCluster. Whether you’re studying protein sequences, predicting properties, or discovering new therapeutics, this post has tips to accelerate your protein AI workloads on the cloud.

AWS HPC Blog

Tag: AWS ParallelCluster