AWS for M&E Blog

Contextualized viewer engagement and monetization for live OTT events

Popular live stream events often command higher viewer attention due to the thrill, uncertainty, freshness, and unknowns of the content. On the other hand, in the over-the-top (OTT) world the last mile network latency and media player buffer often push a viewer at least few seconds behind real time. This elevated viewer attention and last-mile network latency can be together leveraged to deliver contextualized and hyper-targeted engagement, teasers, and surprises, which can lead to augmented monetization opportunities or effective viewer experience. Depending on the context of the live content and the viewer profile different personalized and contextualized engagement instruments may be pushed to the viewer interface near real time. Such instruments include trivia, coupon offers, polls, contests, or personalized advertisements with a call to action.

The joy and impulse associated with the special moments of a live event are short-lived from a viewer’s perspective. It is important to use such short moments of truth with the best possible personalized gratification. In the longer term, this also leads to reduced churn, higher brand value, and augmented revenue streams for the content service provider.

This post explores the typical challenges associated with implementing a fitting solution to the this problem. It further provides a solution framework that can be adopted for such opportunities.

Barrier to entry

Even when a market research predicts such a solution to be successful and even if an executive sponsorship is in place, there are still potential roadblocks for an enterprise. Some of them are:

  • Availability of in-house expertise or infrastructure for delivering machine learning use cases. Beyond the accuracy of machine learning models, scalability, reliability, and low latency are critical architectural concerns for such use cases.
  • Confidence and comfort of the team in delivering scalable, reliable, secure, and highly performant event driven solutions.
  • Agility to deliver a low time to market solution.
  • Cost justification and optimization, given that such business models are often experimental and do need several iterations for a successful evolution.

This post explains how a cloud native architecture can effectively address most of these potential entry barriers.

Determining the engagement mechanism

Once a request is received with the viewer identity and current media player position, it is possible to look up what they are just about to see (given a last mile network latency) or what they have already seen in the last few seconds, within the time range of ecstasy or moment of truth. The next piece of the puzzle is what best instrument can be delivered to the viewer at this juncture. It can be a hyper-targeted advertisement or an engagement widget or a marketing coupon or any other super innovative method of personal gratification. From the implementation perspective, this would need a rules engine, an effective segmentation algorithm, or a machine learning classifier that will be able to provide such decisions with a business-acceptable level of accuracy and with the needed amount of reliability, scalability, and latency. Subsequent blogs in this series will outline how such effective mechanisms can be set up on AWS in an agile, scalable, and cost-effective fashion.

Core solution building blocks

Let us now illustrate the core solution building blocks, which are necessary to deliver this use case of contextualized viewer engagement for live OTT events.

  1. Live stream source: The live stream is generated with input to AWS Elemental MediaLive, through AWS Elemental MediaPackage and with Amazon CloudFront providing the endpoint for viewer consumption. The output HLS stream is used both for viewer consumption and for gaining context and insights about the live event.
  2. Live stream analyzer: This is a Python script hosted on an Amazon Elastic Computing Cloud (Amazon EC2) instance, which continuously reads the manifest file of the live stream. For live events, transport stream (TS) segments get incrementally added to the manifest as the event progresses. The Python script needs to run in a loop throughout the duration of the event and poll for such incremental TS segments. Once a new TS file is received, the Python script posts its fully qualified URL to a preconfigured Amazon Simple Notification Service (Amazon SNS) topic for further downstream analysis. It is recommended to launch this instance in the same region as MediaPackage (if used) or closest to the nearest edge cache to minimize latency issues. Moreover, these EC2 instance types hosted on an Auto Scaling group across multiple Availability Zones would deliver the required performance and availability.
  3. Content analyzer: Once a new TS file URL is received from the Amazon SNS topic (referred to in the previous section), AWS Lambda functions are triggered to analyze the content within the TS segment. Content analysis tasks include objects of interest – for example a sports jersey, celebrities, facial expressions, sponsor logo, text, and audio transcript. AWS services such as Amazon Rekognition, Amazon Transcribe, or Amazon Comprehend provide a ready interface for such analysis. Amazon Simple Queue Service (Amazon SQS) fanout pattern is used to route the request to separate Lambda functions for each such aspect of content analysis, with the result being persisted on Amazon DynamoDB. Amazon DynamoDB Accelerator  (DAX) is used to ensure that viewer read requests are served with latency. In most scenarios, it is advisable to process one frame every two seconds, minimizing the cost and use of computational resources. Moreover, it is a business choice to analyze the content holistically or only a subset, which is enough for viewer enticement and gratification strategies.
  4. Contextual personalizer: With a given viewer media player position, it can be determined of what they have seen in the last few seconds. It can also be determined of what they will see in the next few seconds (assuming last-mile network latency or player buffer). Similarly, the viewer identifier is queried to find the best matched demographic or behaviorgraphic cluster. These two inputs are used to determine what best contextual personalizer instrument can be delivered to the viewer. A periodically refreshed machine learning clustering model is required to keep the cohorts accurate and relevant. Moreover, with business inputs, there can be a rule engine to choose the personalized instrument. For example, for demographic cluster A, behavior cluster B, with their favorite team winning in a soccer match, what best can be presented to the viewer? The choices can be between an advertisement, or a coupon or asking a trivia with a reward. In matured implementations, this functionality can be replaced with a machine learning classifier to predict the most appropriate instrument for delivery to the viewer.
  5. Personalizer instrument filter: It is important to keep the interference and clutter level to a strategic minimum for a viewer engrossed in their favorite live event. So, among multiple choices received from the contextual personalizer module, it is only pragmatic to deliver the best one (or two) for a smooth blend and tolerance into their viewing experience. Amazon Personalize is leveraged to rank such contextual personalizer instruments based on the highest predicted relevance for a given viewer.
  6. Campaign management interface: While not strictly within this solution context, it is imperative to interface with a campaign management tool to manage the over-arching campaign cycles in terms of its goals, objectives, flights, spends, engagement tools, target segments, etc.

The following diagram illustrates how these building blocks integrate to deliver this solution:

Adopting machine learning for viewer engagement

While it is imperative to incorporate machine learning into this solution landscape, it does have associated challenges as outlined in the following:

  • One-size-fits-all style: Depending on the content, demographics, behavioral diversity and other influencers, a single machine learning model may not be able to deliver results under all scenarios. For example, young adult expectations and engagement across continents may be different while watching the same soccer match. 
  • Cost of data: There are significant costs associated with diverse and large-scale data ingestion, analysis, and delivering a full machine learning pipeline. Cost-benefit analysis is essential.
  • Target group volatility: Viewer behaviors are often labile, with changing interests driven by socio-economic factors or social media influences.
  • True viewer intention: The viewer behavior as captured from the application clickstream may not be fully indicative of their true underlying intent.
  • Operational cost: It may not be cost effective to invoke ML-based engagement service for every viewer. It may be efficient to dynamically tag viewers to suitable demographic, psychographic, or behavioral clusters, therefore reducing the operations cost.
  • Window of opportunity: How best can the span of viewer elation and impulse, in response to a live event be measured?

As summarized in the following diagram, here are some tenets to be further considered while designing a solution for such use cases.

  • Today’s media ecosystem is diverse with volatile consumer behavior. In order to capture the right predictor variable sets, ingest data from diverse sources, such as real-time content metadata, event characteristics, clickstream, social media, etc.
  • Evolve to the right ML model mix, such as classifiers, segmentation, or real-time stochastic models such as Markov chains or Bayesian networks. Often, a viewer behavior may not be reflective of their history, but be better predicted by the state transition probabilities.
  • Provide a viewer with minimalistic but elegant user experience. A ranking mechanism is necessary to decide on the best engagement instruments, which should be presented to the viewer in a given context.
  • Diligent data analysis, dimensionality reduction and domain driven feature engineering for optimized model training.


This post outlined how in a world of perpetually fragmented viewer attention span, it is possible to capture viewer attention and interest during live events. The elevated level of attention and interest associated with live events, along with frequent last-mile latency does give a short but effective window to win over the viewers through contextual and personalized engagement. In upcoming posts, we will deep dive into specific implementation details, quick start guides, and mechanisms to effectively select the personalized gratification choice for a given set of contextual inputs.