AWS HPC Blog

Category: Thought Leadership

Rearchitecting AWS Batch managed services to leverage AWS Fargate

AWS service teams continuously improve the underlying infrastructure and operations of managed services, and AWS Batch is no exception. The AWS Batch team recently moved most of their job scheduler fleet to a serverless infrastructure model leveraging AWS Fargate. I had a chance to sit with Devendra Chavan, Senior Software Development Engineer on the AWS Batch team, to discuss the move to AWS Fargate and its impact on the Batch managed scheduler service component.

Accelerating Genomics Pipelines Using Intel’s Open Omics Acceleration Framework on AWS

In this blog, we showcase the first version of Open Omics and benchmark three applications that are used in processing NGS data – sequence alignment tools BWA-MEM, minimap2, and single cell ATAC-Seq on Xeon-based Amazon Elastic Compute Cloud (Amazon EC2) Instances.

Choosing between AWS Batch or AWS ParallelCluster for your HPC Workloads

It’s an understatement that AWS has a lot of services (more than 200 at the time of this post!). We’re usually the first to point out that there’s more than one way to solve a problem. HPC is no different in this regard, because we offer a choice: customers can run their HPC workloads using AWS […]

The Convergent Evolution of Grid Computing in Financial Services

The Financial Services industry makes significant use of high performance computing (HPC) but it tends to be in the form of loosely coupled, embarrassingly parallel workloads to support risk modelling. The infrastructure tends to scale out to meet ever increasing demand as the analyses look at more and finer grained data. At AWS we’ve helped many customers tackle scaling challenges are noticing some common themes. In this post we describe how HPC teams are thinking about how they deliver compute capacity today, and highlight how we see the solutions converging for the future.

Supporting climate model simulations to accelerate climate science

The Amazon Sustainability Data Initiative (ASDI), AWS is donating cloud resources, technical support, and access to scalable infrastructure and fast networking providing high performance computing solutions to support simulations of near-term climate using the National Center for Atmospheric Research (NCAR) Community Earth System Model Version 2 (CESM2) and its Whole Atmosphere Community Climate Model (WACCM). In collaboration with ASDI, AWS, and SilverLining, a nonprofit dedicated to ensuring a safe climate, the National Center for Atmospheric Research (NCAR) will run an ensemble of 30 climate-model simulations on AWS. The climate runs will simulate the Earth system over the period of years 2022-2070 under a median scenario for warming and make them available through the AWS Open Data Program. The simulation work will demonstrate the ability to use cloud infrastructure to advance climate models in support of robust scientific studies by researchers around the world and aims to accelerate and democratize climate science.

Pushing pixels with NICE DCV

NICE DCV, our high-performance, low-latency remote-display protocol, was originally created for scientists and engineers who ran large workloads on far-away supercomputers, but needed to visualize data without moving it. Pushing pixels over limited bandwidth across the globe has been the goal of the DCV team since 2007. DCV was able to make very frugal use of very scarce bandwidth, because it was super lean, used data-compression techniques and quickly adopted cutting-edge technologies of the time from GPUs (this is HPC, after all, we left nothing on the table when it came to exploiting new gadgets). This allowed the team to create a super light-weight visualization package that could stream pixels over almost any network. Fast forward to the 2020s, and a generation of gamers, artists, and film-makers all want to do the same thing as HPC researchers- only this time there are way more pixels, because we now have HD and 4k (and some people have multiple), and for most of them, it’s 60 frames per second, or it’s not worth having. Today we have around 12x the number of pixels, and around 3x the frame rate compared to TV of circa 2007. Fortunately, networking improved a lot in that time: a high-end user’s broadband connection grew around 60x in bandwidth, but the 120x growth in computing power really tipped the balance in favor of bringing remote streaming to the masses. Still, physics remains, meaning the latency forced on us by the curvature of the earth and the speed of light, is still a challenge. We still haven’t beaten physics, but we’re making up for it by building our own global fiber network and adding more machinery (and in local and wavelength zones) to get closer to more customers as soon as we can.

In the search for performance, there’s more than one way to build a network

AWS worked backwards from an essential problem in HPC networking (MPI ranks need to exchange lots of data quickly) and found a different solution for our unique circumstances, without trading off the things customers love the most about cloud: that you can run virtually any application, at scale, and right away. Find out more about how Elastic Fabric Adapter (EFA) can help your HPC workloads scale on AWS.