AWS HPC Blog

Category: *Post Types

Building a Scalable Predictive Modeling Framework in AWS – Part 3

In this final part of this three-part blog series on building predictive models at scale in AWS, we will use the synthetic dataset and the models generated in the previous post to showcase the model updating and sensitivity analysis capabilities of the aws-do-pm framework.

Building a Scalable Predictive Modeling Framework in AWS – Part 2

In the first part of this three-part blog series, we introduced the aws-do-pm framework for building predictive models at scale in AWS. In this blog, we showcase a sample application for predicting the life of batteries in a fleet of electric vehicles, using the aws-do-pm framework.

Building a Scalable Predictive Modeling Framework in AWS – Part 1

Predictive models have powered the design and analysis of real-world systems such as jet engines, automobiles, and powerplants for decades. These models are used to provide insights on system performance and to run simulations, at a fraction of the cost compared to experiments with physical hardware. In this first post of three, we described the motivation and general architecture of the open-source aws-do-pm framework project for building predictive models at scale in AWS.

Running large-scale CFD fire simulations on AWS for Amazon.com

In this blog post, we discuss the AWS solution that Amazon’s construction division used to conduct large-scale CFD fire simulations as part of their Fire Strategy solutions to demonstrate safety and fire mitigation strategies. We outline the five key steps taken that resulted in simulation times that were 15-20x faster than previous on-premises architectures, reducing the time to complete from up to twenty-one days to less than one day.

Call for participation: RADIUSS Tutorial Series

Lawrence Livermore National Laboratory (LLNL) and AWS are joining forces to provide a training opportunity for emerging HPC tools and application. RADIUSS (Rapid Application Development via an Institutional Universal Software Stack) is a broad suite of open-source software projects originating from LLNL. Together we are hosting a tutorial series to give attendees hands-on experience with these cutting-edge technologies. Find out how to participate in these events in this blog post.

Understanding the AWS Batch termination process

In this blog post, we help you understand the AWS Batch job termination process and how you may take actions to gracefully terminate a job by capturing SIGTERM signal inside the application. It provides you with an efficient way to exit your Batch jobs. You also get to know about how job timeouts occur, and how the retry operation works with both traditional AWS Batch jobs and array jobs.

Bayesian ML Models at Scale with AWS Batch

Ampersand is a data-driven TV advertising technology company that provides aggregated TV audience impression insights and planning on 42 million households, in every media market, across more than 165 networks and apps and in all dayparts (broadcast day segments). The Ampersand Data Science team estimated that building their statistical models would require up to 600,000 physical CPU hours to run, which would not be feasible without using a massively parallel and large-scale architecture in the cloud. AWS Batch enabled Ampersand to compress their time of computation over 500x through massive scaling while optimizing their costs using Amazon EC2 Spot. In this blog post, we will provide an overview of how Ampersand built their TV audience impressions (“impressions”) models at scale on AWS, review the architecture they have been using, and discuss optimizations they conducted to run their workload efficiently on AWS Batch.

Introducing the Spack Rolling Binary Cache hosted on AWS

Today we’re excited to announce the availability of a new public Spack Binary Cache. In a collaboration, between AWS, E4S, Kitware, and the Lawrence Livermore National Laboratory (LLNL), Spack users now have access to a public build cache hosted on Amazon S3. The use of this Binary Cache will result in up to 20x faster install times for common Spack packages.

Benchmarking NVIDIA Clara Parabricks Somatic Variant Calling Pipeline on AWS

Somatic variants are genetic alterations which are not inherited but acquired during one’s lifespan, for example those that are present in cancer tumors. In this post, we will demonstrate how to perform somatic variant calling from matched tumor and normal genome sequence data, as well as tumor-only whole genome and whole exome datasets using an NVIDIA GPU-accelerated Parabricks pipeline, and compare the results with baseline CPU-based workflows.