Skip to main content
2025

Reducing operational complexity using Amazon EKS Hybrid Nodes with Flawless

Learn how Flawless, a film technology company, streamlined operations and reduced overhead using Amazon EKS Hybrid Nodes.

Benefits

5x

improvement in rendering times

2+

days to train models instead of weeks

Overview

AI is opening up creative possibilities in the entertainment industry, from enhancing visual effects to breaking down language barriers. Unlike generative approaches, Flawless builds assistive tools for film and television that are rooted in clean data and consent so that technology strengthens human creativity. But as the company scaled its technology to serve major studios worldwide, its operational complexity grew.

Flawless needed powerful GPU computing resources to train its AI models, but its fragmented approach to infrastructure management resulted in managing multiple disconnected systems and manual processes. That’s when the company turned to Amazon Web Services (AWS) for an innovative solution that could transform how it manages hybrid GPU infrastructure. Using Amazon Elastic Kubernetes Service (Amazon EKS) Hybrid Nodes—which brings the power of Amazon EKS to on premises and edge infrastructure—Flawless streamlined operations and reduced management overhead while lowering costs.

About Flawless

Flawless is an artist-first, assistive AI company developing tools for visual dubbing and localization that protect creativity and expand storytelling worldwide.

Opportunity | Using Amazon EKS Hybrid Nodes to scale AI infrastructure for Flawless

Flawless develops AI tools that assist filmmakers rather than replace them. The company specializes in visual dubbing and long-form localization (that is, adapting films for different languages and markets). This technology preserves actors’ facial expressions and lip movements while changing the spoken language. As Flawless grew, its infrastructure management became increasingly complex and operationally demanding.

AI researchers were manually provisioning dedicated cloud instances and individual GPUs, which required constant oversight during long-running experiments that could span multiple days. Meanwhile, its software-as-a-service (SaaS) solution operated on an entirely separate technology stack, creating operational silos that were difficult to manage efficiently. That fragmented approach created significant operational overhead as teams had to manage multiple systems, manual provisioning processes, and
disconnected workflows.

To solve this challenge, Flawless adopted Amazon EKS Hybrid Nodes. Using this feature, the company can integrate third-party GPU resources into its existing AWS infrastructure while maintaining central management and control through Amazon EKS.

“Because of our growth, we needed to significantly scale our operations. Amazon EKS Hybrid Nodes presented us with a cost-efficient scaling solution,” says James Morgan, principal platform engineer at Flawless. “Our analysis showed that continuing with our previous approach would have led to unsustainable costs and reduced our time to market by limiting our ability to provide researchers with the required computing resources. This centralized management of GPUs helped us optimize team integration while meeting our expansion goals. As a result, we can continue to expand to new customers and maintain our internal efficiencies.”

Solution | Reducing AI model training times from weeks to days with a hybrid GPU infrastructure

Flawless adopted Amazon EKS Hybrid Nodes to create a unified environment capable of managing both AWS and third-party GPU resources from a single control plane. The Flawless team established secure network connections between its external GPU environments and the AWS Cloud to meet varying research and platform requirements. Then, the team joined the external GPU nodes to an Amazon EKS cluster, facilitating access to a wide range of Amazon EKS features.

The actual hybrid connection required only a couple lines of code. AWS Support—a team that provides proactive planning and communications, advisory, automation, and cloud expertise to help customers achieve business outcomes—stepped in to guide the more advanced networking configurations across the hybrid environment. “When we had queries or questions, the AWS team was there to answer them,” says Will Ferguson, director of platform engineering at Flawless. “That support helped speed up the project. An issue that might have taken weeks of debugging could be solved in a few days.”

With the hybrid infrastructure in place, Flawless transformed how its teams work. Instead of manually provisioning and managing individual cloud instances, researchers now submit jobs through advanced schedulers like KAI and Kueue, which automatically determine compute requirements and deploy workloads across the hybrid environment. This unified approach provides greater flexibility in node management, supporting increased compute reuse by keeping resources warm, reducing container startup times, and achieving better cache hit rates. Additionally, Karpenter, KEDA, and KAI schedulers have improved provisioning times and system responsiveness, leading to more efficient compute usage. Researchers can also access fractional GPU capabilities that weren’t previously available. Additionally, Flawless can access hardware to support high performance workloads while maintaining the flexibility to burst back into AWS when needed.

The results have been transformative for Flawless’s research and development capabilities. “Some AI models have historically taken us multiple weeks to fully train,” says Ferguson. “We can now reduce that training time down to only a couple days. Because our training-experimenting-testing cycle is superfast, we can improve our product quickly—and that’s because we have access to powerful GPUs.” Thanks to the reduction in training time, Flawless can run more experiments, which in turn gets ideas from research into the product faster. Additionally, rendering times have improved by up to five times, while the company has gained the flexibility to scale resources rapidly up and down to match unpredictable workload demands. By minimizing operational overhead, Flawless can operate more efficiently.

Outcome | Powering global expansion through modernized infrastructure

Using Amazon EKS Hybrid Nodes, Flawless is now in a stronger position to compete and scale in the AI-powered entertainment industry. The company has significantly reduced the operational complexity of managing hybrid GPU infrastructure across environments, unlocking possibilities for innovation and growth. Looking ahead, Flawless continues to migrate its SaaS solution to use the hybrid infrastructure and onboard more researchers to the new system.

By combining scalable infrastructure on AWS with Flawless’s visual-dubbing technology, more stories reach global audiences while staying true to original performances. “Having greater flexibility in where we run things on AWS is one side effect of this modernization,” says Morgan. “Using Amazon EKS Hybrid Nodes means that while we’re working with third-party GPU resources, we can actually expand our use of AWS technology to new regions worldwide.”

Missing alt text value
Because of our growth, we needed to significantly scale our operations. Amazon EKS Hybrid Nodes presented us with a cost-efficient scaling solution.

James Morgan

Principal Platform Engineer, Flawless

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

Contact sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages