Bayesian ML Models at Scale with AWS Batch
This post was contributed by Ampersand’s Jeffrey Enos, Senior Machine Learning Engineer, Daniel Gerlanc, Senior Director for Data Science, and Brandon Willard, Data Science Lead.
Ampersand is a data-driven TV advertising technology company that provides aggregated TV audience impression insights and planning on 42 million households, in every media market, across more than 165 networks and apps and in all dayparts (broadcast day segments). With a commitment to privacy, Ampersand enables advertisers to reach their target audience by building viewer profiles to help advertisers understand which networks and at what times their ads are most likely to be seen by their desired audience.
The Ampersand Data Science (ADS) team estimated that building their statistical models would require up to 600,000 physical CPU hours to run, which would not be feasible without using a massively parallel and large-scale architecture in the cloud. After trying several solutions, Ampersand turned to AWS Batch, a highly scalable batch and ML scheduler and orchestrator that gives users access to a large quantity of compute resources and allows them to run their containerized jobs with the best fit of CPU, memory, and GPU resources. The scalability of AWS Batch enabled Ampersand to compress their time of computation over 500x through massive scaling while optimizing their costs using Amazon EC2 Spot.
In this blog post, we will provide an overview of how Ampersand built their TV audience impressions (“impressions”) models at scale on AWS, review the architecture they have been using, and discuss optimizations they conducted to run their workload efficiently on AWS Batch.
Modeling TV impressions at scale
ADS builds statistical models that predict impressions. Using insights from these models, Ampersand constructs successful advertising campaigns for its clients.
The company’s data scientists use Bayesian methods to predict future impressions for different combinations of geographic regions, demographics, and cable television networks (e.g., CNN, ESPN, AMC) over time. ADS refers to a single region, demographic, network combination as a DNR.
A single model is built for each DNR that predicts how many people are watching at any given time. Each model is a fully Bayesian Hidden Markov Model (HMM) that explicitly characterizes the “states” of impressions, with the most basic states being “viewing” and “not-viewing”. Each state has corresponding parameters that are used to help predict transitions from one state to the other at any given time. The models are estimated with Markov Chain Monte Carlo (MCMC) and produce sample results that can be used to directly estimate arbitrary functions of its impressions’ predictions and for uncertainty propagation. More specifically, these sample results are combined to make predictions for higher-level aggregate categories. You can experiment with HMMs using this Python notebook.
ADS writes their model specifications and fits their models using open-source Python packages. Their HMM implementation uses the open source packages Aesara and AePPL, projects for which Brandon Willard is the lead developer. A similar implementation using the standard PyMC3 NUTS sampler alongside a custom forward-filter backward sample (FFBS) step is provided by Ampersand’s own pymc3-hmm package.
Ampersand’s computational ML architecture on AWS
ADS’s data architecture relies on several AWS services to ingest aggregated impressions data, transform it, and then use it to fit their models at scale. We will go over the flow of the architecture then discuss optimizations ADS conducted to run at scale.
First, ADS ingests provider input data into an Amazon Redshift data warehouse for pre-processing. Next, ADS unloads the data in Parquet format to a bucket on S3, partitioning the data such that downstream processes may load the minimum amount of data for a single model fit. When a model update is needed, ADS initiates a refitting process using an AWS Batch compute environment configured to use Amazon EC2 Spot Instances. After model fitting is complete, ADS loads predictive samples into a ClickHouse database cluster deployed on EKS. A FastAPI-based application, also deployed on EKS, then aggregates and serves impressions estimates to Ampersand’s sales team.
As new impressions data becomes available, ADS re-estimates all the models. Each model is estimated to use up to 10,000 hourly data points and each refitting involves the re-computation of 200,000 such models. Due to the size of Ampersand’s HMMs and the number of samples required, a model fit can take between one to three physical CPU hours. This means that a total of 200,000 to 600,000 physical CPU hours are required to recompute the 200,000 models depending on the number of observations.
Assuming that a single c5.24xlarge Amazon EC2 instance is used, it would take between 170 and 520 days to recompute all models using all the physical cores on that instance. By using a compute environment with a 50,000 vCPU limit, Ampersand can fit all models in less than 1 day.
Exploring different computational backends
The ADS team had the following requirements when building out their stack:
- Automatically scale computational resources from and to zero
- Run fully on Spot Instances to reduce cost
- Write minimal custom code to facilitate scheduling and resource management
- Provide isolation between jobs
The team evaluated different frameworks and tools until settling on AWS Batch as it met their requirements in addition to providing other advantages: 1) the simplified compute cluster and job scheduler management due to the managed aspect of the service, 2) the mapping of the service to the “embarrassingly parallel” structure of their problem through the use of array jobs, 3) the native integration with Amazon EC2 Spot using the SPOT_CAPACITY_OPTIMIZED allocation strategy.
Architecture and workflow optimizations
As ADS started to use the service, they looked at optimization opportunities by refining instance selection and optimizing the data pipeline.
Before diving deeper into the optimization, we will review key notions of AWS Batch. The service key components are Job Queues (JQs) and Compute Environments (CEs). Job Queues are where you place a job upon submission and where the job will stay until it is completed. Compute Environments (CE) can be seen as computational clusters that can support one or multiple kinds and sizes of instances depending on their configuration. One job queue can be attached to multiple CEs and periodically, AWS Batch will look at the state of the queue and evaluate if it needs to scale up a given CE to run jobs that are ready to be executed.
To optimize their deployment, ADS used an open-source monitoring solution to monitor their CEs’ scalability and instance selection. They initially noticed that small instance types were being selected by AWS Batch and Spot, decreasing their chances of acquiring spot capacity. To solve this, Ampersand data scientists specifically selected instances from the C/M/R families of the 4th and 5th generation (c5.4xlarge and above). This caused AWS Batch and Spot to focus on larger instances only and unlocked its ability to scale and acquire large amounts of capacity.
Another optimization that ADS implemented was on the data pipeline. As shared earlier, aggregated impressions data is ingested into Amazon Redshift for pre-processing. When experimenting with compute backends, ADS noticed that its Redshift cluster was being hit hard due to the high concurrency of the workload and sudden spikes of usage. To overcome that bottleneck, ADS exported the data from Redshift to Amazon S3 as Parquet and aligned the data partitions to the way their application ingests them.
Results and future plans
The Ampersand Data Science team has built a scalable architecture that is able to run efficiently on AWS Batch. It is able to run its whole workload on Amazon EC2 Spot with considerable cost savings. ADS estimated from past bills that it saved 78% using the combination of AWS Batch (which is free) with EC2 Spot instead of using On-Demand Instances. It intends to increase its savings further with Graviton-based instances which would cut current costs by an additional 50% over the current approach.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.