Ampersand Runs 50,000 Concurrent Machine Learning Models on AWS Batch in Less than 1 Day
Ampersand runs complex machine learning (ML) workloads to provide television advertisers with aggregated viewership insights and predictions for over 40 million households. The advertising technology and sales company needed a way to ingest 200,000 data partitions and provide optimized recommendations to support its customers’ television advertisement campaigns at scale. After testing various open-source tools, Ampersand engaged Amazon Web Services (AWS) for a solution.
Ampersand chose to use AWS Batch, which lets developers, scientists, and engineers easily and efficiently run hundreds of thousands of batch computing and ML jobs on AWS and dynamically provisions the optimal quantity and type of compute resources (such as CPU- or memory-optimized instances) based on the volume and specific resource requirements of the jobs submitted. By using AWS Batch, Ampersand is running and managing thousands of complex ML workloads at the same time, delivering valuable data-driven insights to its customers.
On AWS Batch, we scaled a cluster to support 50,000 workloads in less than 1 hour. Previously, we would not have been able to reach that capacity.”
Senior Machine Learning Engineer, Ampersand
Running Thousands of ML Jobs Concurrently at Scale
Headquartered in New York City, Ampersand delivers data-driven insights to help television advertisers plan, activate, and measure campaigns across over 165 apps and networks in every US market. Its goal is to use deterministic data to inform viewership insights for advertisers on a granular level. Ampersand must analyze data from 200,000 data partitions and perform 200,000 separate fitting processes to create ML models, which are used to produce aggregated viewership insights. “We needed to run a large number of ML jobs concurrently and at the lowest cost possible,” says Daniel Gerlanc, senior director of data science and ML at Ampersand. “We also wanted to scale the cluster on demand and have access to a wide range of compute instances.”
Seeking a scalable way to handle its large, complex ML workloads, the company developed a solution using an open-source container-orchestration system and an open-source library for parallel computing, but Ampersand found that using these tools required a significant amount of labor and effort to achieve the performance level that it required.
Ampersand discovered that by using AWS Batch, it could achieve the scalability, parallelism, compute options, and concurrency that it required. “Before engaging AWS, we couldn’t run a quarter of the workloads that we wanted in parallel,” says Jeffrey Enos, senior ML engineer at Ampersand. “Using AWS Batch, we could solve a lot of the issues that we were facing. Running parallel tasks would be much simpler.” In 2021, Ampersand and the AWS team began deploying ML workloads on AWS Batch, completing the solution in only 6 months.
Delivering Television Viewership Insights Using AWS Batch
Using AWS Batch, Ampersand can run thousands of ML workloads concurrently. “We use AWS Batch array jobs to run the same ML models across all our partitions, with different parameters based on region, demographic, and cable network,” says Gerlanc. “Now, we are running about 50,000 concurrent ML workloads on AWS Batch, and we could probably scale to over 100,000.” Using AWS Batch, Ampersand can run its ML models and review output data on the same day, vastly improving its speed of delivery. The company has reduced its turnaround time from 1 week to less than 1 day on AWS, providing valuable viewership predictions to customers at a faster pace.
Ampersand performs all its workloads on a per-region basis. Batches of data are translated to a specific number of AWS Batch jobs in a geographic area. The company then uses AWS Batch to translate data partitions into Docker images, which are files used to deploy code as a container on Docker, an open-source containerization service. Ampersand then uses AWS Batch to schedule and orchestrate Docker images across Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, which can be used to run hyperscale workloads at a significant cost savings and accelerate workloads by running parallel tasks. Using Spot Instances, Ampersand reduced its compute costs by 78 percent.
On AWS, Ampersand has access to several features that it can use to optimize computing costs and support a high data volume. For example, the company uses automatic scaling features on AWS Batch to quickly scale its clusters up and down based on demand, significantly reducing fixed compute costs associated with fitting ML models. “On AWS Batch, we scaled a cluster to support 50,000 workloads in less than 1 hour,” says Enos. “Previously, we would not have been able to reach that capacity. Without this solution, the process would be very slow.” Using open-source tools, Ampersand would need to create its own scheduler and manage scaling for the solution by itself. Because AWS Batch is a managed service, Ampersand can scale quickly and seamlessly without needing to build new infrastructure, supporting continued growth.
Optimizing ML Workloads for Cost on AWS
Ampersand is running hundreds of thousands of complex ML tasks at scale using AWS Batch. In the future, the company plans to test AWS Graviton processors, which are designed to deliver optimal price performance for cloud workloads running on Amazon EC2. Using these solutions, Ampersand estimates that it could further reduce physical CPU costs by 50 percent.
On AWS, Ampersand has achieved significant cost and scalability benefits and will continue to optimize its complex ML workloads on AWS. “Without the scalability and performance of AWS Batch and the cost savings of Spot Instances, we would not have been able to achieve these processing volumes,” says Gerlanc.
Ampersand is one of the industry’s largest sources of combined multiscreen TV inventory and STB viewership insights, changing the way TV is bought, activated, and measured.
Benefits of AWS
- Runs 50,000 ML workloads concurrently
- Reduced turnaround time from 1 week to less than 1 day
- Saved 78% using Spot Instances
- Reduced fixed costs using automatic scaling features
- Scaled to support 50,000 workloads in less than 1 hour
- Supports automatic scaling
AWS Services Used
WS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.
Amazon Elastic Cloud Compute (EC2)
Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 500 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload. We are the first major cloud provider that supports Intel, AMD, and Arm processors, the only cloud with on-demand EC2 Mac instances, and the only cloud with 400 Gbps Ethernet networking.
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.