High Performance Computing | Artificial Intelligence

Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant

If you’re iterating on deploying large language models (LLMs) on AWS GPU instances, you’ve probably noticed the larger the model to be loaded into GPU High Bandwidth Memory (HBM), the longer the painful wait until the GPUs are ready for inference. As models grow to hundreds of billions of parameters and GPU environments grow ever […]

Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch

In this post, we walk through building a scalable, event-driven transcription pipeline that automatically processes audio files uploaded to Amazon Simple Storage Service (Amazon S3), and show you how to use Amazon EC2 Spot Instances and buffered streaming inference to further reduce costs.

Training Llama 3.3 Swallow: A Japanese sovereign LLM on Amazon SageMaker HyperPod

The Institute of Science Tokyo has successfully trained Llama 3.3 Swallow, a 70-billion-parameter large language model (LLM) with enhanced Japanese capabilities, using Amazon SageMaker HyperPod. The model demonstrates superior performance in Japanese language tasks, outperforming GPT-4o-mini and other leading models. This technical report details the training infrastructure, optimizations, and best practices developed during the project.

A review of purpose-built accelerators for financial services

In this post, we aim to provide business leaders with a non-technical overview of purpose-built accelerators (PBAs) and their role within the financial services industry (FSI).

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

Large language models (LLMs) are making a significant impact in the realm of artificial intelligence (AI). Their impressive generative abilities have led to widespread adoption across various sectors and use cases, including content generation, sentiment analysis, chatbot development, and virtual assistant technology. Llama2 by Meta is an example of an LLM offered by AWS. Llama […]

Artificial Intelligence

Category: High Performance Computing

Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant

Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch

Training Llama 3.3 Swallow: A Japanese sovereign LLM on Amazon SageMaker HyperPod

A review of purpose-built accelerators for financial services

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

Learn

Resources

Developers

Help