AWS HPC Blog
Tag: Machine Learning
Announcing expanded support for Custom Slurm Settings in AWS Parallel Computing Service
Today we’re excited to announce expanded support for custom Slurm settings in AWS Parallel Computing Service (PCS). With this launch, PCS now enables you to configure over 65 Slurm parameters. And for the first time, you can also apply custom settings to queue resources, giving you partition-specific control over scheduling behavior. This release responds directly […]
How DTN accelerates operational weather prediction using NVIDIA Earth-2 on AWS
Cyclone chasing just got a whole lot smarter! Check out how DTN’s AI-powered weather model is rewriting the forecast. Brace yourself for the future of weather prediction.
Announcing Capacity Blocks support for AWS Parallel Computing Service
This post was contributed by by Kareem Abdol-Hamid, Kyle Bush Today we’re happy to announce that support for Amazon EC2 Capacity Blocks for Machine Learning are now supported in AWS Parallel Computing Service (AWS PCS). This allows you to reserve and schedule GPU-accelerated Amazon EC2 instances for future use. That includes the NVIDIA Hopper GPU […]
Scale Reinforcement Learning with AWS Batch Multi-Node Parallel Jobs
Autonomous robots are increasingly used across industries, from warehouses to space exploration. While developing these robots requires complex simulation and reinforcement learning (RL), setting up training environments can be challenging and time-consuming. AWS Batch multi-node parallel (MNP) infrastructure, combined with NVIDIA Isaac Lab, offers a solution by providing scalable, cost-effective robot training capabilities for sophisticated behaviors and complex tasks.
Enhancing Equity Strategy Backtesting with Synthetic Data: An Agent-Based Model Approach – part 2
Developing robust investment strategies requires thorough testing, but relying solely on historical data can introduce biases and limit your insights. Learn how synthetic data from agent-based models can provide an unbiased testbed to systematically evaluate your strategies and prepare for future market scenarios. Part 2 covers implementation details and results.
Enhancing Equity Strategy Backtesting with Synthetic Data: An Agent-Based Model Approach
Developing robust investment strategies requires thorough testing, but relying solely on historical data can introduce biases and limit your insights. Learn how synthetic data from agent-based models can provide an unbiased testbed to systematically evaluate your strategies and prepare for future market scenarios. Part 1 of 2 covers the theoretical foundations of the approach.
Three recipes you don’t want to miss for AWS Parallel Computing Service
AWS Parallel Computing Service now supports AWS CloudFormation, enabling you to deploy and scale HPC workloads as code. Check out our open-source HPC Recipes Library for quick cluster deployments.
Smashing computational barriers: data-driven ball-impact modeling on AWS
Elevate your engineering capabilities with lightning-fast impact prediction. Our new blog post delves into how advanced ML models, like U-Nets and Fourier Neural Operators, are revolutionizing transient response forecasting for critical industries like consumer electronics, automotive, and aerospace. Gain a competitive edge by integrating these cutting-edge techniques.
Integrating Research and Engineering Studio in Trusted Research Environments built on AWS
Explore how Research and Engineering Studio on AWS (RES) enables admins to build Trusted Research Environments (TREs) with built-in security and compliance controls. Learn more in our latest blog post.
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS
LLMs are scaling exponentially. Learn how advanced technologies like Triton, TRT-LLM and EKS enable seamless deployment of models like the 405B parameter Llama 3.1. Let’s go large.