Author: Randy Seamans

Randy is an industry storage veteran and a Principal Storage Specialist and advocate for AWS, specializing in High Performance Storage, Artificial Intelligence (HPC/AI), Enterprise Storage, and Disaster Recovery. For more Storage Insights and Fun, follow him at https://www.linkedin.com/in/storageperformance.

Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant

If you’re iterating on deploying large language models (LLMs) on AWS GPU instances, you’ve probably noticed the larger the model to be loaded into GPU High Bandwidth Memory (HBM), the longer the painful wait until the GPUs are ready for inference. As models grow to hundreds of billions of parameters and GPU environments grow ever […]

Artificial Intelligence

Author: Randy Seamans

Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant

Learn

Resources

Developers

Help