2024

Unitary Scales AI Moderation to 26 Million Videos Daily with Amazon EKS

Discover how Unitary scaled their AI moderation solution, managing over 1,000 nodes while reducing costs by 50–70% with Amazon EKS and Karpenter.

Key Outcomes

50%–70%

reduction in overall costs over 18 months

80%

reduction in container boot times for fast response

100

million users served effortlessly across digital platforms

1,000+

nodes managed effortlessly by 3 person team at peak traffic

Overview

Software startup Unitary exists to make the internet safer, helping improve lives and create a fairer world. Built on Amazon Web Services (AWS), the company uses artificial intelligence (AI), specifically machine learning (ML), to help its customers with online content moderation. Given the daily surge of content, processing millions of images and videos, Unitary needed a scalable solution to manage growing demand.

On AWS, Unitary helps platforms with over 100 million combined users combat toxic content while minimizing the impact on human moderators. To meet this scale, Unitary chose to run its ML inference workloads on Amazon Elastic Kubernetes Service (Amazon EKS), a managed service for running Kubernetes on AWS.

About Unitary

Based in the United Kingdom, software company Unitary aims to make the internet safer. It builds solutions that blend human expertise and artificial intelligence for moderating user-generated content quickly, accurately, and cost-efficiently.

Opportunity | Using Amazon EKS to Scale Efficiently and Manage Growth for Unitary

Founded in 2019, Unitary specializes in multimodal ML models for identifying harmful text, images, and videos amid seas of digital content. “Neither human nor AI moderation alone can do the whole job, and platforms are forced to combine the two,” says Matt Camp, principal architect at Unitary. “We provide a blended solution that catches harmful content in less time, with more accuracy, and at a lower cost.”

The first version of Unitary’s product was originally built on Amazon Elastic Compute Cloud (Amazon EC2), which offers secure and resizable compute capacity for virtually any workload. But scaling to handle a few million videos daily would have required self-managing more than 1,200 instances. That would have been an overwhelming challenge for the three-person platform team because of the added operational overhead and complexity of self-management at high volume.

As Unitary’s AI models improved, the volume of content that the company needed to process soon outpaced what its initial infrastructure could handle. In 2021, Unitary decided to build a new microservice-based, event-driven architecture on Amazon EKS. For efficient node management, the AWS team recommended adopting Karpenter, an open-source Kubernetes node lifecycle manager, alongside Amazon EKS to scale infrastructure on demand.

Solution | Optimizing AI Moderation for 26 Million Videos While Saving up to 70 Percent on Costs

Unitary uses an API to ingest up to 26 million videos daily, which represent several billion frames of content and tens of millions of hours of audio. When Unitary receives a video to be analyzed, multiple microservices that run on Amazon EKS split the video into frames, extract the audio, and then classify the video’s content using ML. Afterward, the extracted parts of the videos are stored. Then, Unitary runs a series of AI and ML models—some of which can be trained using a customer’s safety policies—to detect toxic or unsafe content in the combined multimodal context of visuals, audio, and text. Finally, the models generate a summary that includes a safety score, which Unitary delivers to customers so that they can make informed moderation decisions.

Using Amazon EKS and Karpenter for scaling, Unitary was able to unlock the flexibility and speed it needed to accommodate unpredictable traffic while maintaining its uptime. With Karpenter, the company can implement several optimizations using a single tool. For example, Karpenter gracefully handles the necessary Kubernetes node disruptions for security updates and pod scheduling without interrupting system throughput and stability. Using several Karpenter NodePools, Unitary can provide different offerings to its customers according to their different latency requirements. Moreover, the company is developing subsecond-latency solutions so that it can classify harmful images before they go live. Thanks to the speed at which Karpenter can provision new Kubernetes nodes, Unitary now has the responsiveness required to achieve these results.

This flexibility improves productivity; what might require several teams in another company can be done by Unitary with a handful of people. Because Unitary doesn’t need to continuously manage clusters, its small team can focus on more important work, such as enhancing the user experience and developing new features. “Even running at peak traffic, using over 1,000 nodes to process more than 20 million videos daily, our entire team spent 5 days abroad without needing to manage or interact with the clusters,” says Camp. The system automatically replaces unhealthy nodes, rescheduling pods to verify that no text, image, or video is left unmoderated. Also, Unitary has reduced container boot times on its ML inference workloads by up to 80 percent. That means that large containers can be spun up quickly to react to changing patterns of customer traffic.

Using Karpenter, Unitary unlocked the benefits of intelligent selection for Amazon EC2 instances across zones. The company uses a blend of Amazon EC2 Spot Instances, which let users take advantage of unused Amazon EC2 capacity on AWS, and Amazon EC2 On-Demand Instances, which let users pay for compute capacity by the hour or second with no long-term commitments. The consolidation feature of Karpenter, in addition to the tool’s speed, has empowered Unitary to quickly redistribute workloads and adapt to the dynamic availability of more cost-effective resources. In fact, Unitary estimates that using Spot Instances for GPU inference infrastructure has reduced costs by 50–70 percent over 18 months.

As Unitary scaled its infrastructure to meet growing demand, the need for robust observability became increasingly critical. The number of nodes in each Kubernetes cluster can fluctuate within minutes from 30 to over 1,000. That variability means handling a growing volume of time series data and scraping more samples with Prometheus, an open-source monitoring and alerting solution. After trying multiple observability tools, Unitary decided to use Amazon Managed Service for Prometheus—a service that offers highly available, secure, and managed monitoring for containerized systems. Thus, the company was able to collect the level of metrics it required without having to manage the scaling of Prometheus resources.

Outcome | Making the Internet Safer with Innovative Solutions

On AWS, Unitary helps platforms with over 100 million combined users combat toxic content while minimizing the impact on human moderators. As the company continues to grow, Amazon EKS and Karpenter will continue to be a key part of its infrastructure. “We did not have to pay a premium to achieve the ML functionality that we want on Amazon EKS—we needed to pay for only the underlying compute,” says Camp. “By keeping these costs low, we can continue to innovate at a more agile pace and transfer these savings to our customers.”

Unitary continually trains its models to keep pace with its customers’ changing needs. It also plans to invent solutions on AWS to continue overcoming the challenges of traditional content moderation and make the internet safer. “As a startup, we wanted to rapidly scale our product, so we needed to build a fast, flexible, and highly scalable platform,” says Sasha Haco, CEO of Unitary. “On AWS, we did just that."

Architecture Diagram

Unitary's customers submit content for assessment by generating an API request, which is received by a microservice running on Amazon EKS and fronted by a load balancer. The microservice assigns a job ID to the content and confirms to the customer that the request will be processed asynchronously. The job is then published to an Amazon SNS topic, which sends the message to a subscribed SQS queue. This queue holds details of all the content awaiting classification. This architecture enables error handling, and the queue depth is used as the basis for autoscaling the services running on EKS. The media processing microservice retrieves a job from this queue and assesses the type of content that has been submitted. If required, it breaks videos into frames and extracts the audio stream. The relevant inference microservices, which run Unitary's machine learning models, are then invoked on the different modalities of the content: the image/video frame (Triton inference server), any text extracted from the visuals (OCR server), and the video audio stream (audio inference server), as appropriate. The media processing microservice then aggregates the outputs from the inference microservices, classifies the content against customer content moderation policies, and summarizes the results. This summary includes a safety score or risk level for each policy category. The results summary is added to another SQS queue, via an SNS topic, to be sent back to the customer via a webhook. The results are also persisted into a Postgres database by the results writer microservice, acting as a system of record for the content classification outcomes.

As a startup, we wanted to rapidly scale our product, so we needed to build a fast, flexible, and highly scalable platform. On AWS, we did just that.

Sasha Haco

CEO, Unitary

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages