Customer Stories / Media & Entertainment / Global

Modernizing Big Data Processing Using Amazon EMR with Yahoo Advertising
Discover how Yahoo Advertising is using Amazon EMR to process add analytics at scale, accelerating its modernization journey.
Scales
add analytics pipelines efficiently
Accelerates
complete upgrades and updates in less than 1 hour
Empowers
innovation
Increases
team efficiency
Reduces
operational burdens
Overview
Imagine a vast library filled with an endless number of books, each page brimming with information. Now, think of the work that it takes to not only find a specific piece of information in this library but also make sense of it and use it wisely. This is what big data processing entails: sifting through large databases of digital information to find and deploy valuable insights.
Yahoo Advertising faced a similar challenge. To perform complex add analytics, the organization needed to manage, process, and derive insights from data that connects hundreds of millions of global users. To scale its big data processing capabilities, Yahoo Advertising modernized its applications by migrating its analytics pipelines to Amazon Web Services (AWS). This strategic move has opened up new avenues for innovation, efficiency, and enhanced decision-making.

Opportunity | Using Amazon EMR to Scale Big Data Processing Capabilities for Yahoo Advertising
Yahoo Advertising is the advertising division of Yahoo, a trusted guide for hundreds of millions of people globally. Founded in 1994, Yahoo is well known as one of the pioneers of the early internet era and has since grown to offer a range of products and services, including digital advertising solutions. Renowned for its work in big data, the company also created Apache Hadoop, an open-source framework used to process large datasets and perform complex computations.
In 2022, Yahoo Advertising chose AWS as its preferred cloud provider, marking the start of a major digital transformation. Yahoo Advertising plans to migrate all its advertising technology workloads—including its analytics, identity solutions, and media-buying environments—to AWS over the next few years. For big data processing, the company chose to migrate its analytics pipelines from on-premises infrastructure to Amazon EMR, a cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning.
Using Amazon EMR, Yahoo Advertising realized that it could scale its big data processing capabilities to manage larger amounts of data. And, without the limitations of on-premises hardware, adopting this technology would facilitate innovation and exploration of new use cases.
“Because we have multiple teams running on AWS, we have already used Amazon EMR even before the digital transformation,” says Zilvinas Saltys, senior principal architect at Yahoo. “We knew that this technology was sensible, it worked for us, and it was cost-effective.”

Migrating to AWS has introduced new technologies and new ways of working, and it has made existing processes simpler and more visible.”
Adesh D’Silva
Senior Software Engineer, Yahoo
Solution | Modernizing and Optimizing the Contextual Targeting Environment Using AWS Services
The contextual targeting team at Yahoo Advertising was one of the first to complete its migration. This unit is responsible for verifying that advertisers deliver ads to the right audience, at the right time. “Our job is to identify the websites that are suitable for displaying ads and to build data pipelines that gather and compile information about these websites for our advertising network,” says Saltys. “This process involves handling large amounts of data. We receive samples of this data, not the entirety, which can amount to hundreds of gigabytes every 5 minutes. This data is crucial for our contextual targeting pipelines.”
Instead of opting for a lift-and-shift migration, the contextual targeting team completely optimized and modernized its environment on AWS. It uses Amazon EMR to run various types of jobs based on open-source frameworks, such as Apache Flink and Apache Spark. To schedule Amazon EMR jobs and facilitate cluster creation, the team adopted Amazon Managed Workflows for Apache Airflow (Amazon MWAA), a secure and highly available managed workflow orchestration service for Apache Airflow.
The contextual targeting team stores its data on Amazon Simple Storage Service (Amazon S3), an object storage service built to retrieve virtually any amount of data from anywhere. All inputs and outputs from the pipelines are also directed toward Amazon S3. To query and interface with this data, the team relies on Amazon Athena, a serverless, interactive analytics service, and AWS Glue, a serverless data integration service—effectively replacing its on-premises Apache Hive database.
By migrating to AWS, the contextual targeting team unlocked several benefits. Team members now have access to modern frameworks and recent versions of their software on AWS. Upgrades and updates, which previously might have been lengthy and complex processes, can now be completed in about 1 hour.
The migration has also contributed to a more dynamic work environment. On AWS, team members no longer need to manage or configure hardware infrastructure. This migration empowers them to work more independently, manage their own product development, and decide on their required capacity and software versions—driving innovation and efficiency.
“It’s an absolutely different dynamic of how our data pipelines work,” says Saltys. “On AWS, we’re able to work more independently and effectively without centralized teams, empowering us to upgrade, scale, and resolve issues more quickly.”
Outcome | Continuing to Migrate Data to Benefit Millions of Users
As Yahoo Advertising finalizes the migration of its advertising analytics pipelines to AWS, the impact of this modernization will be significant. More teams will continue to incorporate Amazon EMR and other technologies into their workflows, which will set the tone for how the organization processes, visualizes, and works with big data as a whole.
“Migrating to AWS has introduced new technologies and new ways of working, and it has made existing processes simpler and more visible,” says Adesh D’Silva, senior software engineer at Yahoo. “This migration has been of great benefit to our team.”
About Yahoo
Yahoo serves as a trusted guide for hundreds of millions of people globally, helping them achieve their goals online. For advertisers, Yahoo Advertising offers omnichannel solutions and powerful data to engage with brands and deliver results.
AWS Services Used
Amazon EMR
Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.
Amazon Managed Workflows for Apache Airflow
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) orchestrates your workflows using Directed Acyclic Graphs (DAGs) written in Python.
AWS Glue
AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.
Amazon Athena
Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
Get Started
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.