AWS Machine Learning Blog
Nielsen Sports sees 75% cost reduction in video analysis with Amazon SageMaker multi-model endpoints
This is a guest post co-written with Tamir Rubinsky and Aviad Aranias from Nielsen Sports.
Nielsen Sports shapes the world’s media and content as a global leader in audience insights, data, and analytics. Through our understanding of people and their behaviors across all channels and platforms, we empower our clients with independent and actionable intelligence so they can connect and engage with their audiences—now and into the future.
At Nielsen Sports, our mission is to provide our customers—brands and rights holders—with the ability to measure the return on investment (ROI) and effectiveness of a sport sponsorship advertising campaign across all channels, including TV, online, social media, and even newspapers, and to provide accurate targeting at local, national, and international levels.
In this post, we describe how Nielsen Sports modernized a system running thousands of different machine learning (ML) models in production by using Amazon SageMaker multi-model endpoints (MMEs) and reduced operational and financial cost by 75%.
Challenges with channel video segmentation
Our technology is based on artificial intelligence (AI) and specifically computer vision (CV), which allows us to track brand exposure and identify its location accurately. For example, we identify if the brand is on a banner or a shirt. In addition, we identify the location of the brand on the item, such as the top corner of a sign or the sleeve. The following figure shows an example of our tagging system.
To understand our scaling and cost challenges, let’s look at some representative numbers. Every month, we identify over 120 million brand impressions across different channels, and the system must support the identification of over 100,000 brands and variations of different brands. We have built one of the largest databases of brand impressions in the world with over 6 billion data points.
Our media evaluation process includes several steps, as illustrated in the following figure:
- First, we record thousands of channels around the world using an international recording system.
- We stream the content in combination with the broadcast schedule (Electronic Programming Guide) to the next stage, which is segmentation and separation between the game broadcasts themselves and other content or advertisements.
- We perform media monitoring, where we add additional metadata to each segment, such as league scores, relevant teams, and players.
- We perform an exposure analysis of the brands’ visibility and then combine the audience information to calculate the valuation of the campaign.
- The information is delivered to the customer by a dashboard or analyst reports. The analyst is given direct access to the raw data or through our data warehouse.
Because we operate at a scale of over a thousand channels and tens of thousands of hours of video a year, we must have a scalable automation system for the analysis process. Our solution automatically segments the broadcast and knows how to isolate the relevant video clips from the rest of the content.
We do this using dedicated algorithms and models developed by us for analyzing the specific characteristics of the channels.
In total, we are running thousands of different models in production to support this mission, which is costly, incurs operational overhead, and is error-prone and slow. It took months to get models with new model architecture to production.
This is where we wanted to innovate and rearchitect our system.
Cost-effective scaling for CV models using SageMaker MMEs
Our legacy video segmentation system was difficult to test, change, and maintain. Some of the challenges include working with an old ML framework, inter-dependencies between components, and a hard-to-optimize workflow. This is because we were based on RabbitMQ for the pipeline, which was a stateful solution. To debug one component, such as feature extraction, we had to test all of the pipeline.
The following diagram illustrates the previous architecture.
As part of our analysis, we identified performance bottlenecks such as running a single model on a machine, which showed a low GPU utilization of 30–40%. We also discovered inefficient pipeline runs and scheduling algorithms for the models.
Therefore, we decided to build a new multi-tenant architecture based on SageMaker, which would implement performance optimization improvements, support dynamic batch sizes, and run multiple models simultaneously.
Each run of the workflow targets a group of videos. Each video is between 30–90 minutes long, and each group has more than five models to run.
Let’s examine an example: a video can be 60 minutes long, consisting of 3,600 images, and each image needs to inferred by three different ML models during the first stage. With SageMaker MMEs, we can run batches of 12 images in parallel, and the full batch completes in less than 2 seconds. In a regular day, we have more than 20 groups of videos, and on a packed weekend day, we can have more than 100 groups of videos.
The following diagram shows our new, simplified architecture using a SageMaker MME.
Results
With the new architecture, we achieved many of our desired outcomes and some unseen advantages over the old architecture:
- Better runtime – By increasing batch sizes (12 videos in parallel) and running multiple models concurrently (five models in parallel), we have decreased our overall pipeline runtime by 33%, from 1 hour to 40 minutes.
- Improved infrastructure – With SageMaker, we upgraded our existing infrastructure, and we are now using newer AWS instances with newer GPUs such as g5.xlarge. One of the biggest benefits from the change is the immediate performance improvement from using TorchScript and CUDA optimizations.
- Optimized infrastructure usage – By having a single endpoint that can host multiple models, we can reduce both the number of endpoints and the number of machines we need to maintain, and also increase the utilization of a single machine and its GPU. For a specific task with five videos, we now use only five machines of g5 instances, which gives us 75% cost benefit from the previous solution. For a typical workload during the day, we use a single endpoint with a single machine of g5.xlarge with a GPU utilization of more than 80%. For comparison, the previous solution had less than 40% utilization.
- Increased agility and productivity – Using SageMaker allowed us to spend less time migrating models and more time improving our core algorithms and models. This has increased productivity for our engineering and data science teams. We can now research and deploy a new ML model in under 7 days, instead of over 1 month previously. This is a 75% improvement in velocity and planning.
- Better quality and confidence – With SageMaker A/B testing capabilities, we can deploy our models in a gradual way and be able to safely roll back. The faster lifecycle to production also increased our ML models’ accuracy and results.
The following figure shows our GPU utilization with the previous architecture (30–40% GPU utilization).
The following figure shows our GPU utilization with the new simplified architecture (90% GPU utilization).
Conclusion
In this post, we shared how Nielsen Sports modernized a system running thousands of different models in production by using SageMaker MMEs and reduced their operational and financial cost by 75%.
For further reading, refer to the following:
- Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker
- Model hosting patterns in Amazon SageMaker, Part 3: Run and optimize multi-model inference with Amazon SageMaker multi-model endpoints
- Load Testing SageMaker Multi-Model Endpoints
About the Authors
Eitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate Generative AI and Machine Learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.
Gal Goldman is a Senior Software Engineer and an Enterprise Senior Solution Architect in AWS with a passion for cutting-edge solutions. He specializes in and has developed many distributed Machine Learning services and solutions. Gal also focuses on helping AWS customers accelerate and overcome their engineering and Generative AI challenges.
Tal Panchek is a Senior Business Development Manager for Artificial Intelligence and Machine Learning with Amazon Web Services. As a BD Specialist, he is responsible for growing adoption, utilization, and revenue for AWS services. He gathers customer and industry needs and partner with AWS product teams to innovate, develop, and deliver AWS solutions.
Tamir Rubinsky leads Global R&D Engineering at Nielsen Sports, bringing vast experience in building innovative products and managing high-performing teams. His work transformed sports sponsorship media evaluation through innovative, AI-powered solutions.
Aviad Aranias is a MLOps Team Leader and Nielsen Sports Analysis Architect who specializes in crafting complex pipelines for analyzing sports event videos across numerous channels. He excels in building and deploying deep learning models to handle large-scale data efficiently. In his spare time, he enjoys baking delicious Neapolitan pizzas.
Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.