AWS HPC Blog

Category: High Performance Computing

Pushing pixels with NICE DCV

NICE DCV, our high-performance, low-latency remote-display protocol, was originally created for scientists and engineers who ran large workloads on far-away supercomputers, but needed to visualize data without moving it. Pushing pixels over limited bandwidth across the globe has been the goal of the DCV team since 2007. DCV was able to make very frugal use of very scarce bandwidth, because it was super lean, used data-compression techniques and quickly adopted cutting-edge technologies of the time from GPUs (this is HPC, after all, we left nothing on the table when it came to exploiting new gadgets). This allowed the team to create a super light-weight visualization package that could stream pixels over almost any network. Fast forward to the 2020s, and a generation of gamers, artists, and film-makers all want to do the same thing as HPC researchers- only this time there are way more pixels, because we now have HD and 4k (and some people have multiple), and for most of them, it’s 60 frames per second, or it’s not worth having. Today we have around 12x the number of pixels, and around 3x the frame rate compared to TV of circa 2007. Fortunately, networking improved a lot in that time: a high-end user’s broadband connection grew around 60x in bandwidth, but the 120x growth in computing power really tipped the balance in favor of bringing remote streaming to the masses. Still, physics remains, meaning the latency forced on us by the curvature of the earth and the speed of light, is still a challenge. We still haven’t beaten physics, but we’re making up for it by building our own global fiber network and adding more machinery (and in local and wavelength zones) to get closer to more customers as soon as we can.

Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Batch processing is a common need across varied machine learning use cases such as video production, financial modeling, drug discovery, or genomic research. The elasticity of the cloud provides efficient ways to scale and simplify batch processing workloads while cutting costs. In this post, you’ll learn a scalable and cost-effective approach to configure AWS Batch Array jobs to process datasets that are stored on Amazon S3 and presented to compute instances with Amazon FSx for Lustre.

In the search for performance, there’s more than one way to build a network

AWS worked backwards from an essential problem in HPC networking (MPI ranks need to exchange lots of data quickly) and found a different solution for our unique circumstances, without trading off the things customers love the most about cloud: that you can run virtually any application, at scale, and right away. Find out more about how Elastic Fabric Adapter (EFA) can help your HPC workloads scale on AWS.

Getting started with containers in HPC at ISC’21

Containers are rapidly maturing within the high performance computing (HPC) community and we’re excited to be part of the movement: listening to what customers have to say and feeding this back to both the community and our own product and service teams. Containerization has the potential to unblock HPC environments, so AWS ParallelCluster and container-native schedulers like AWS Batch are moving quickly to reflect the best practices developed by the community and our customers. This year is the seventh consecutive year we are hosting the ‘High Performance Container Workshop’ at ISC High Performance 2021 conference (ISC’21). The workshop will be taking place on July 2nd at 2PM CEST (7AM CST). The full program for the workshop is available on the High Performance Container Workshop page at https://hpcw.github.io/

Building highly-available HPC infrastructure on AWS

In this blog post, we will explain how to launch highly available HPC clusters across an AWS Region. The solution is deployed using the AWS Cloud Developer Kit (AWS CDK), a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation, hiding the complexity of integration between the components.

Accelerating research and development of new medical treatments with HPC on AWS

Today, more than 290,000 researchers in France are working to provide better support and care for patients through modern medical treatment. To fulfill their mission, these researchers must be equipped with powerful tools. At AWS, we believe that technology has a critical role to play in medical research. Why? Because technology can take advantage of the significant amount of data generated in the healthcare system and in the research community to enable opportunities for more accurate diagnoses, and better treatments for many existing and future diseases. To support elite research in France, we are proud to be a sponsor of two French organizations:  Gustave Roussy and Sorbonne University. AWS is providing them with the computing power and machine learning technologies needed to accelerate cancer research and develop a treatment for COVID-19.

Figure 1. A map of the USA showing the location of RITC participants and instructors for the Spring 2021 RITC Workshops.

Training forecasters to warn severe hazardous weather on AWS

Training users on how to use high performance computing resources — and the data that comes out as a result of those analyses — is an essential function of most research organizations. Having a robust, scalable, and easy-to-use platform for on-site and remote training is becoming a requirement for creating a community around your research mission. A great example of this comes from the NOAA National Weather Service Warning Decision Training Division (WDTD), which develops and delivers training on the integrated elements of the hazardous weather warning process within a National Weather Service (NWS) forecast office. In collaboration with the University of Oklahoma’s Cooperative Institute for Mesoscale Meteorological Studies (OU/CIMMS), WDTD conducts its flagship course, the Radar and Applications Course (RAC), for forecasters issuing warnings for flash floods, severe thunderstorms, and tornadoes. Trainees learn the warning process, the science and application of conceptual models, and technical aspects of analyzing radar and other weather data in the Advanced Weather Interactive Processing System (AWIPS). 

AWS joins Arm to support Arm-HPC hackathon this summer

Arm and AWS are calling all grad students and post-docs who want to gain experience advancing the adoption of the Arm architecture in HPC to join a world-wide community effort lead by the Arm HPC User’s Group (A-HUG).
The event will take the form of a hackathon this summer and is aimed at getting open-source HPC codes to build and run fast on Arm-based processors, specifically AWS Graviton2.
To make it a bit more exciting, A-HUG will be awarding an Apple M1 MacBook to each member of the team (max. 4 people) that contributes the most back to the Arm HPC community.

Numerical weather prediction on AWS Graviton2

The Weather Research and Forecasting (WRF) model is a numerical weather prediction (NWP) system designed to serve both atmospheric research and operational forecasting needs. With the release of Arm-based AWS Graviton2 Amazon Elastic Compute Cloud (EC2) instances, a common question has been how these instances perform on large-scale NWP workloads. In this blog, we will present results from a standard WRF benchmark simulation and compare across three different instance types.