Today, we’re diving deep into the open-source frameworks that move MPI messages around, and showing you how work we did in the Open MPI and libfabrics community lead to an improvement for EFA users – and everyone else, too.
In this post we’ll show how generative AI, combined with conventional physics-based CFD can create a rapid design process to explore new design concepts in automotive and aerospace from just a single image.
How Amazon’s Search M5 team optimizes compute resources and cost with fair-share scheduling on AWS Batch
In this post, we share how Amazon Search optimizes their use of accelerated compute resources using AWS Batch fair-share scheduling to schedule distributed deep learning workloads.
In this post, we show how Reezocar uses computer vision to change the way they detect damage and price used vehicles for re-sale in secondary markets. This reduces landfill and helps achieve the goals of the circular economy.
In this post we’ll show you how the NFL used AWS to scale their ML workloads and produce the first comprehensive dataset of helmet impacts across multiple NFL seasons. They were able to reduce manual labor by 90% and the results beats human labelers in accuracy by 12%!
In this post, we discuss the benefits of digital technology for the circular economy, and show how businesses can implement these technologies to get the most out of them for the wellbeing of everyone.
Complicated multi-step workflows can be challenging to deploy, especially when using a variety of high-compute resources. Covalent is an open-source orchestration tool that streamlines the deployment of distributed workloads on AWS resources. In this post, we outline key concepts in Covalent and develop a machine learning workflow for AWS Batch in just a handful of steps.
AWS ParallelCluster 3.6.0 can now detect GPU failures in HPC and AI/ML tasks. Health checks run at the start of Slurm jobs and if they fail, the job is requeued on another instance. This can increase reliability and prevent wasted spend.
Since launch, EFA has seen continuous improvements in performance. In this post, we talk about our 2nd generation of EFA, which takes another step in improving Machine Learning and High Performance Computing in the Cloud.
In this post we describe the process to launch large, self-supervised training jobs using AWS ParallelCluster and Facebook’s Vision Self-Supervised Learning (VISSL) library.