AWS Architecture Blog

Moises Hernandez

Author: Moises Hernandez

Moises Hernandez is an AI engineering manager specializing in computational optimization for video generation. His expertise spans high-performance computing, deep learning framework development, GPU optimization, financial quantitative analysis, natural language processing, and neuroscience imaging. He holds a Ph.D. in Neuroscience from the University of Oxford.

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

This post introduces a video decoding optimization technique that we have ideated in collaboration with Synthesia Research Engineering team, which we call Asynchronous Frame Generation Pipeline. Adopting this technique allows you to overlap GPU compute, device-to-host (D2H) data transfer, and host-side post-processing. In this post, we apply this technique to the VAE decoder of a Wan video generation model as an example, where our benchmarks on G7e show increased GPU kernel utilization from 82% to 99.9%, in turn leading to an 8.2% decrease in latency (and increase in throughput) for video decoding. We expect this technique to benefit any customer with a chunked video generation pipeline that transfers frames to host memory.