Amazon Web Services

In this comprehensive video, AWS Machine Learning specialist Emily Webber explores various options for deploying foundation models on AWS, focusing on Amazon SageMaker. She covers online, offline, queued, embedded, and serverless application types, explaining their tradeoffs. The video demonstrates how to host distributed models across multiple accelerators and optimize performance through techniques like model compression. Emily provides a hands-on walkthrough of deploying a 175 billion parameter BLOOM model using SageMaker's large model inference container. She discusses key concepts like tensor parallelism and offers practical tips for efficient model deployment and serving. The video concludes with a demo of invoking the deployed model for inference.

product-information
skills-and-how-to
generative-ai
ai-ml
compute
Show 7 more

Up Next

VideoThumbnail
18:11

Building Intelligent Chatbots: Integrating Amazon Lex with Bedrock Knowledge Bases for Enhanced Customer Experiences

Nov 22, 2024
VideoThumbnail
21:56

The State of Generative AI: Unlocking Trillion-Dollar Business Value Through Responsible Implementation and Workflow Reimagination

Nov 22, 2024
VideoThumbnail
1:19:03

AWS Summit Los Angeles 2024: Unleashing Generative AI's Potential - Insights from Matt Wood and Industry Leaders

Nov 22, 2024
VideoThumbnail
58:49

AWS Clean Rooms ML and Differential Privacy: Revolutionizing Secure Data Collaboration

Nov 22, 2024
VideoThumbnail
50:05

Unlocking Business Value with Generative AI: Key Use Cases and Implementation Strategies

Nov 22, 2024