Amazon Web Services

In this comprehensive video, AWS Machine Learning specialist Emily Webber explores various options for deploying foundation models on AWS, focusing on Amazon SageMaker. She covers online, offline, queued, embedded, and serverless application types, explaining their tradeoffs. The video demonstrates how to host distributed models across multiple accelerators and optimize performance through techniques like model compression. Emily provides a hands-on walkthrough of deploying a 175 billion parameter BLOOM model using SageMaker's large model inference container. She discusses key concepts like tensor parallelism and offers practical tips for efficient model deployment and serving. The video concludes with a demo of invoking the deployed model for inference.

product-information
skills-and-how-to
generative-ai
ai-ml
compute
Show 7 more

Up Next

VideoThumbnail
1:01:07

Accelerate ML Model Delivery: Implementing End-to-End MLOps Solutions with Amazon SageMaker

Nov 22, 2024
VideoThumbnail
9:30

Deploying ASP.NET Core 6 Applications on AWS Elastic Beanstalk Linux: A Step-by-Step Guide for .NET Developers

Nov 22, 2024
VideoThumbnail
15:58

Revolutionizing Business Intelligence: Generative AI Features in Amazon QuickSight

Nov 22, 2024
VideoThumbnail
2:51

How to Start, Connect, and Enroll Amazon EC2 Mac Instances into Jamf for Apple Mobile Device Management

Nov 22, 2024
VideoThumbnail
47:39

Simplifying Application Authorization: Amazon Verified Permissions at AWS re:Invent 2023

Nov 22, 2024