Author: Wangpeng An

ByteDance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2

At ByteDance, we collaborated with Amazon Web Services (AWS) to deploy multimodal large language models (LLMs) for video understanding using AWS Inferentia2 across multiple AWS Regions around the world. By using sophisticated ML algorithms, the platform efficiently scans billions of videos each day. In this post, we discuss the use of multimodal LLMs for video understanding, the solution architecture, and techniques for performance optimization.

Artificial Intelligence

Author: Wangpeng An

ByteDance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2

Learn

Resources

Developers

Help