AWS for M&E Blog
Snap optimizes Bitmoji rendering with NVIDIA L4 and Amazon G6 Instances
This blog was co-authored by Wenzhou Wang, Tech Lead, Bitmoji at Snap Inc.
Snapchat’s Bitmoji allows users to create personalized avatars that mirror their appearance and style, adding a unique, expressive dimension to their online interactions. With millions of rendering requests generated every minute across Chats, Stories, and AR experiences, Snap needed an infrastructure capable of handling high volumes efficiently and cost-effectively.
We’ll outline how Snap leveraged Amazon EC2 G6 instances with NVIDIA L4 GPUs to optimize their Bitmoji rendering pipeline for improved performance and cost-effectiveness.

Figure 1: 3D avatars generated by Snap’s Bitmoji system, showcasing expressive, personalized characters for Snapchat.
Challenges with previous infrastructure
The Bitmoji pipeline initially relied on Amazon EC2 G5 instances with NVIDIA A10 Tensor Core GPUs. However, Bitmoji rendering’s moderate GPU requirements often left the high-end capabilities of the NVIDIA A10 underutilized, leading to elevated costs. Snap sought a more balanced solution that offered GPU performance better aligned with Bitmoji’s needs, to process eight million requests per minute with optimal cost-efficiency—prompting the inclusion of G6 instances.
Transition to Amazon G6 instances with NVIDIA L4
The G6 instances (equipped with third-generation AMD EPYC processors) and NVIDIA L4 GPUs, provided Snap with the ideal balance of CPU and GPU performance. The NVIDIA L4 offers 24 GB of GPU memory for resource caching that fits Snap’s rendering scenario. This boosted the throughput from 4,600 requests/min to 6,100 requests/min, and reduced latency by 43% from P90 162 ms to 91 ms.
As a result, Snap configured their Kubernetes autoscaler to prioritize G6 instances for deploying model assembly and rendering service pods. This reduced costs by 50% while maintaining benchmarked performance.
Solution architecture
To manage thousands of 3D avatar processing requests per second, Snap’s Bitmoji team developed a robust architecture powered by Amazon Web Services (AWS) infrastructure. Facilitating both scalability and performance for the Bitmoji processing pipeline (Figure 2). The system is composed of three areas:
- Request Gateway: Running on Amazon EC2 C6g and C7g instances, this service acts as the pipeline’s entry point. It coordinates resources and manages requests with Snap’s KeyDB clusters for caching, Amazon DynamoDB for metadata, and Amazon Simple Storage Service (Amazon S3) for 3D model storage.
- Model Assembly: Operating on G6 and G5 instances, this service constructs complex 3D models from raw mesh and texture assets stored on Amazon FSx for Lustre, optimizing them into GLB-format models. Frequently accessed models are cached on Amazon S3 to minimize regeneration costs and delays.
- Rendering: Also operating on G6 and G5 instances, performs the resource-intensive task of rendering 3D avatars into rasterized images, providing high-quality visuals for Snapchat users.
Optimization strategies
To enhance the performance and cost efficiency of each pipeline stage, the Bitmoji team implemented targeted optimizations:
- nvJPEG: Utilized for ultra-fast GPU-based texture image encoding.
- KTX2 Encoding: Applied offline for runtime-efficient GPU texture compression.
- CUDA Buffer Caching: Employed for frequently accessed textures, reducing retrieval delays.
- Morph Target Baking: Done on the CPU for single-frame avatar poses, balancing CPU and GPU workloads efficiently.
These optimizations balanced CPU and GPU utilization within the 3D Bitmoji cluster. They allow the team to move GPU-intensive services, like Model Assembly and Rendering, to G6 instances with faster CPUs. NVIDIA L4 GPUs provide the same memory capacity with optimized price/performance, resulting in further efficiency gains.
Key rendering workflow enhancements
Image processing runs on the CPU, benefiting significantly from the faster CPUs in G6 instances, which accelerate this stage by up to 30%. Meanwhile, 3D model generation, running on the GPU, takes advantage of the extensive data caching capabilities of NVIDIA L4 GPUs. These combined optimizations improve memory transfer efficiency between CPU and GPU. It also minimizes resource waste, and enables Snap to meet its rendering performance goals while reducing operational costs.
Results and business impact
Transitioning to G6 instances delivered significant improvements for Bitmoji rendering services. Snap reported a 40% reduction in rendering latency, resulting in faster response times and an improved user experience. Moreover, server costs decreased by 50%, allowing Snap to reallocate resources strategically while maintaining high-quality Bitmoji experiences across Snapchat.
Conclusion
Migrating to Amazon G6 instances has enabled Snap to optimize the Bitmoji rendering pipeline, benefiting from faster CPUs and more cost-effective GPUs. This infrastructure alignment with Bitmoji’s specific needs facilitates both high efficiency and reduced costs. The move underscores the value of customized infrastructure solutions for supporting scalable, high-quality applications, empowering Snap to continue delivering visually rich, engaging experiences across its platform.
Contact an AWS Representative to know how we can help accelerate your business.
Further reading
- Learn about tools and best practices to optimize your AWS costs while maximizing performance and scalability: AWS Cost Optimization
- Stay updated on the latest developments, best practices, and customer stories in augmented reality (AR) and virtual reality on AWS: AWS Spatial Computing Blog – AR/VR
- Access a collection of articles, tutorials, and videos to help you build and optimize media workflows using AWS Media Services: AWS Media Resources
- Explore Bitmoji to create your own personalized avatar and see how it enhances user interactions in apps like Snapchat: Bitmoji