AWS for Industries

How Databricks on AWS helps optimize real-time bidding using machine learning

Real-time Bidding (RTB) has transformed online advertising by automating ad transactions through auctions, enhancing efficiency and precision. However, RTB faces challenges like transparency issues and ad fraud risks.

Using Databricks on Amazon Web Services (AWS) addresses these challenges with the Databricks Real-time Bidding Accelerator, which leverages machine learning and predictive analytics to optimize RTB strategies, providing advertisers with real-time insights and improved ad targeting.

Integrated with the Databricks Lakehouse platform, the Databricks Real-time Bidding solution streamlines data processing and offers versatile applicability beyond viewability prediction. This makes it a valuable asset for various large-volume use cases in the programmatic advertising industry.

Real-time Bidding

Real-time bidding, or RTB, is a digital advertising strategy where multiple entities bid in a real-time auction to purchase advertising inventory on a website or app. An auction is “real-time” because advertisers submit bids simultaneously when a user loads a website page or app screen with an ad unit.

In the past, before real-time bidding, advertisers would look for websites where they thought their target audience would visit. They would pre-purchase that ad space from the publishers directly. RTB revolutionized the landscape of online advertising, offering advertisers and publishers a dynamic and data-driven approach to reach their target audiences.

RTB introduced precision and scale in digital advertising by providing real-time data sets that buyers and sellers could use to determine which ad campaign was best suited for the consumer and environment. The RTB data set also includes data that can be used to optimize the performance of campaigns in real-time, improving ROI. One such metric is viewability, which determines whether an ad is truly visible to a user. This is a vital performance indicator that many RTB systems take into account before bidding on an ad opportunity.

Despite its numerous benefits, RTB also comes with drawbacks.

A lack of transparency to the end of the ad lifetime makes it difficult for advertisers to ensure their ads reach the intended audience, leading to uncertainties in campaign performance and ad placement. Ad fraud risks compromise the integrity of RTB auctions, wasting ad spend and distorting performance metrics. Additional challenges include:

  • Limitations in ad targeting
  • Ad fatigue
  • Pricing volatility
  • Fragmented inventory management
  • Limited creative options contributing to suboptimal targeting
  • Diminishing returns
  • Operational complexities
  • Reduced ad engagement and effectiveness

RTB Optimization platform uses machine learning for predictive analytics

To tackle these challenges, Databricks has developed the Real-time Bidding Solution Accelerator which facilitates RTB optimization. It’s a notebook-based solution which integrates with the Databricks Lakehouse platform. It is designed to help AdTechs optimize their customers’ RTB strategy by predicting the viewability of programmatic advertising.

The Databricks Real-time Bidding Accelerator enables advertisers to:

  • Analyze large volumes of real-time bidding data
  • Identify patterns
  • Apply machine learning (ML) algorithms to optimize bidding strategies
  • Improve ad targeting
  • Increase campaign performance
  • Maximize return on ad spend (ROAS)

Distributed computing and built-in libraries empower advertisers to extract valuable insights from data, make data-driven decisions, and continuously refine their RTB strategies for better results in the dynamic digital advertising landscape.

The Databricks RTB Accelerator has an open architecture that allows RTB firms to quickly develop a viewability prediction solution, providing a clearer understanding of the ad market. This enables them to make well-informed decisions on ad placements, optimize performance, and gain a competitive edge.

With Databricks’ RTB Accelerator publishers can gain improved control over their inventory and the cost per ad impressions (CPM). In addition, advertisers using the Databricks’ RTB Accelerator can run more effective campaigns by bidding only on impressions that are more likely to be viewed by a given user, which maximizes the impact of their ads.

Versatile data architecture streamlines data ingestion

The Databricks Lakehouse is a versatile platform, utilizing AWS services, combining data lake and data warehousing architecture with governance features. It is built on a secure open-source data lake foundation. It seamlessly integrates with Databricks’ RTB Accelerator.

Databricks Lakehouse provides a solution for demand-side platforms (DSP) who have massive volumes of streaming data and want to automate data ingestion and simplify nested JSON files for their teams. The streaming data, along with a set of JSON files, is sent to a Databricks data loader. This process uses a Spark Streaming dataset to define delta lake tables and stores the data into silver tables. These silver tables are combined to create a gold table that supports change data capture (CDC).

Databricks Lakehouse combined with Databricks RTB Accelerator showcases the potential of data lake architecture as it streamlines data ingestion, analysis, and prediction, enabling advertisers to optimize media investments. Following is a description of the solution:

  • Data pipeline: The Databricks Delta Live Table (DLT) feature simplifies data ingestion and automates scaling and fault tolerance. This ensures a seamless data pipeline that provides advertisers with real-time insights and predictions to advertisers.
  • Three key steps:

1. It streams real-time bidding data.
2. Then, it parses nested JSON data.
3. Finally, it performs exploratory data analysis using Databricks SQL.

  • ML model creation: The second part of the solution includes the creation of a robust ML model using XGBoost. The model predicts viewability and can be deployed either through batch processing or streaming using standard Databricks workflow jobs. Alternatively, it can be deployed as a real-time REST API.

Figure 1 Databrick’s Lakehouse for Real-time Bidding firm powered by AWS

Figure 1 – Databrick’s Lakehouse for Real-time Bidding firm powered by AWS

This video recording provides a comprehensive overview of Databricks Lakehouse architecture for the common workflow used by real-time bidding firms, such as DSPs and agency trading teams. The presentation focuses on key decision points and discusses how to handle them using the AWS Cloud with Databricks. From data source and ingestion to storage transformation, machine learning and artificial intelligence processes, model/data serving, and data publishing, you’ll gain insights into the end-to-end process.

Applicable to multiple large-volume use cases

It’s important to note that predicting viewability is just one of the many use cases for Databricks Lakehouse with the Databricks RTB Accelerator in the programmatic advertising industry. This solution combination offers significant benefits for other use cases that rely on data pipelines for real-time processing of large amounts of data. These use cases also involve the utilization of ML algorithms to extract insights and predictions from the processed data, including:

  • Campaign performance reporting
  • Anomaly detection
  • Bid price level model
  • Click-through rate detection


Real-time bidding (RTB) has reshaped the digital advertising landscape, offering efficiency and data-driven precision, but also posing challenges such as transparency issues and ad fraud risks. Databricks’ Real-time Bidding Accelerator, coupled with Databricks Lakehouse integration running on AWS, presents a powerful solution to optimize RTB strategies and improve ad targeting in real-time. Beyond predicting viewability, this platform’s versatility lends itself to a wide range of data processing and analytics needs, cementing its position as a valuable tool for the ever-evolving programmatic advertising industry.

Check out more AWS Partners or contact an AWS Representative to ask how we can help accelerate your business.

Further Reading

About Databricks

Databricks is a Data and AI company. With origins in academia and the open-source community, Databricks was founded in 2013 by the original creators of Apache Spark, Delta Lake and MLflow. As the world’s first and only Lakehouse platform in the cloud, Databricks combines the best of data warehouses and data lakes to offer an open and unified platform for data and AI.

Hari Radhakrishnan

Hari Radhakrishnan

Hari Radhakrishnan is a Senior Partner Solutions Architect in advertising and marketing technology at Amazon Web Services (AWS). Leveraging his industry experience, he is responsible for solution architecture and technical strategy for advertising and or marketing technology partners on AWS. He recruits and helps partners develop solutions on AWS.

Layla Yang

Layla Yang

My name is Layla Yang. I am a Solutions Architect at Databricks. Before Databricks I started my career in AdTech industry focusing on building Machine Learning models and data products. I spent few years at adtech startups to design, build and deploy automated predictive algorithm into production for real-time bidding (RTB) plugged in major Ad Exchange and SSPs. My work also included MMM (media mix modeling), DMP user segmentation and customer recommendation engine. Currently I work with start-ups in the NYC and Boston area to scale their existing data engineering and data science efforts leveraging Apache Spark technology. I studied physics back in university and I love skiing.

Nicole Lu

Nicole Lu

Nicole Lu is a Data and ML engineer who is working as the technical program lead for Databricks Industry Solutions and maintains their Github. Initially working as a management consultant, she grew fascinated by the ‘What’ and ‘How’ of technological innovation beyond the ‘Why’, and dived deep into technical domains. She worked at Accenture and Databricks professional services solving diverse real-world data analytics use cases. In her current role, she runs an industry solutions publishing house that brings repeatable use case patterns on Databricks to the public to accelerate time-to-value.

Sarah Zuhusky

Sarah Zuhusky

Sarah Zuhusky is Head of Adtech Solutions for Advertising & Marketing at Amazon Web Services (AWS). She is focused on developing ad intelligence solutions to help AWS customers further innovate and differentiate their addressability offerings. Sarah has 10+ years of diverse experience in the advertising industry. She spearheaded new media technologies for L'Oreal, Delta, Mars, and American Express at Digitas, managed global inventory and data supply for P&G at AudienceScience, and ran the business development and go-to-market efforts for Integral Ad Science's Connected TV product suite.