AWS Partner Network (APN) Blog

PBS Provides Tailored Experiences for Viewers with Amazon Personalize

PBS-Logo-2022

Public Broadcasting Service (PBS) wanted to build a smart recommendation engine (SRE) capable of making high-quality suggestions to viewers based on a multitude of factors.

To ensure success, PBS decided to partner with a cloud consultancy with artificial intelligence (AI) and machine learning (ML) expertise and deep knowledge of the Amazon Web Services (AWS) platform.

ClearScale, an AWS Premier Tier Services Partner with 11 AWS Competencies including Machine Learning, Nonprofit, and Data and Analytics, was an excellent match for PBS and gave the nonprofit exactly what it needed to enhance viewer experiences significantly in the streaming age.

“We worked with ClearScale to set up and configure our initial solutions and data pipelines,” said Mikey Centrella, Director of Product Management at PBS. “We needed to leverage insights faster and launch something in months rather than years. Their experts set up an AWS Cloud configuration and related services for using Amazon Personalize to save us a tremendous amount of effort and thousands of engineering hours.”

PBS is an Arlington, Virginia-based nonprofit organization founded in 1969 that broadcasts educational, news, and entertainment programs to more than 100 million television viewers across the U.S. and more than 32 million people online. PBS currently has approximately 330-member television stations, distributing the highest quality of content to all 50 U.S. states, Puerto Rico, U.S. Virgin Islands, Guam, and American Samoa.

The Challenge

Like many of today’s leading media and streaming platforms, PBS wanted to take its overall user experience to the next level. The organization hoped to provide audiences with better in-app programming recommendations based on a multitude of factors—deep links between titles, current popularity trends, user behavioral patterns, and more—to improve engagement and long-term loyalty.

On the surface, creating such a recommendation engine seems complex. Yet, the reality is that building these engines doesn’t require data science expertise or AI/ML mastery. Companies only need to find the right combination of cloud-native tools and services, and then feed them with their data. With the right toolkit, these services don’t take years to develop.

Fortunately, AWS offers managed AI/ML solutions that enable engineers to leverage pre-built models and automate much of the hard work of creating, training, and fine-tuning them. The challenge lies in knowing how to maximize what the cloud offers, especially given how quickly things change.

That’s why PBS approached ClearScale, a leader in MLOps which is the type of technical expertise PBS needed to build the ideal recommendation system and sustain it over time. Together, PBS and ClearScale decided to move forward with an AWS-powered solution on top of Amazon Personalize.

ClearScale-PBS-Personalize-1

Figure 1 – Main architecture diagram.

For PBS to build a truly differentiated recommendation system, the company needed the latest and greatest cloud technologies available, on top of expert implementation guidance.

ClearScale came up with a detailed roadmap for tackling PBS’s recommendation system project that included data operations, machine learning operations, and demonstrational user interface.

Data Operations

First, ClearScale and PBS determined together which data sources would feed into future ML models:

  • PBS’ Media Manager
  • PBS’ User Profiles
  • Google Analytics metadata

PBS Media Manager is a content management system that PBS member stations use to publish and share titles across different platforms. Media Manager also contains rich metadata, such as a product’s release date, tags, and author, and comes with rules that contribute to deciding what gets shown to viewers in search results.

For instance, Media Manager takes a viewer’s age or location into account before making a recommendation. That way, young children don’t accidentally come across titles for older audiences, or viewers in a region aren’t recommended a news series from another location on the other side of the country.

PBS User Profiles contain valuable details about individual viewers, such as their previous interactions with PBS apps, their watchlists, watch times, and viewing history. Therefore, User Profiles contain some of the most obvious evidence of what people enjoy watching.

ClearScale and PBS also decided to incorporate contextual information from Google Analytics to gain a more comprehensive understanding of who watches PBS content and where. Google Analytics has non-sensitive data about people that can be useful in making inferences about their viewing preferences.

The platform can also see what types of devices people use to watch content, which serves as another data point for a recommendation system to consider. For example, a viewer might watch PBS news on their phone during a train commute to work. But, once at home, they may watch shows on TV with their kids.

To consolidate data from the first two sources, ClearScale set up a prototype environment for an Amazon Aurora for PostgreSQL relational database. The database existed in full isolation from PBS production systems to ensure maximum resiliency for extract, transform, load (ETL) processes. Google Analytics data was captured via an ingestion pipeline and stored in Amazon Simple Storage Service (Amazon S3).

ClearScale then implemented a data pipeline starting with AWS Glue, a serverless cloud-native solution to crawl, validate, and transform data from diverse sources. ClearScale also configured AWS Glue to make data consumable by formatting it into Parquet and offloading it into a data lake. These steps are all orchestrated using AWS Step Functions, allowing PBS to benefit from automated stateflow management and exceptions handling.

AWS Lake Formation and AWS Glue Data Catalog were crucial for securing PBS’s data lake and pointing other cloud services to the right data stores. Data in the lake can be accessed in two ways, both using standard SQL:

  • Serverless analytics with Amazon Athena is best for ad-hoc exploration tasks when cost is the most crucial factor.
  • A robust data warehouse on top of Amazon Redshift for regular, well-defined queries with strict SLA requirements.

With the infrastructure for data operations in place, ClearScale was ready to tackle the MLOps side of the project.

Machine Learning Operations

ClearScale helped PBS establish the four primary stages of the ML lifecycle:

  • Model development
  • Training
  • Inference
  • Evaluation

Fortunately, AWS gives companies the ability to harness the power of data science and machine learning across these four stages without having to build models from the ground up.

ClearScale data engineers created the initial version of the smart recommendation engine based on Amazon Personalize, while keeping in mind that PBS engineers would eventually take full ownership. ClearScale used Amazon FSx for Lustre to make data available for the system as it’s loaded. The team also integrated Amazon SageMaker Studio as the development environment ML engineers use to maintain models.

At the center of the model pre-production work are AWS Lambda, Amazon Athena, and AWS Step Functions. ClearScale connected them with Amazon Personalize to fetch data, load changes, and train the model.

With these services in place, ClearScale selected the core recipes (which are Amazon Personalize algorithms fine-tuned for specific use cases) for PBS’s smart recommendation engine and built four models based on different requirements per recommendations input and output:

  • Popularity Count ML model: Suggests TV shows based on mainstream popularity. This is the simplest model in scope, yet it’s important. Because other models dive deep into past data, they suggest programs relevant to the user yet distributed throughout history.
    .
    In the media and entertainment industry, where the goal is to promote recent titles, this model helps others not go too deep into the weeds. By limiting the range of data taken into account to the previous week, it’s possible to identify recent trends and augment them with predictions from other models. To keep those trends fresh, this model is retrained daily.
    .
  • Items Relationships ML model: Suggests TV shows based on collaborative filtering to recommend programs that are most similar to ones the viewer interacted with before. This recipe (SIMS) digs deeper to reveal relationships between shows, including ones that are not evident to human intelligence at first glance, nor traditional linear and statistical algorithms.
    .
  • Interactions History ML model: Suggests TV shows based on user behavioral patterns using active learning. With active learning, the model is supplied with user activities in the same session where recommendations are provided. This allows it to discover new rules in seconds without going through complete retraining, which would take hours.
    .
  • Personalized Ranking ML model: Ranks TV shows based on apparent user preferences. Instead of fetching particular items, this algorithm takes ones supplied by PBS (“Best Christmas Shows” digest, for example) and returns them in an order reflecting user preferences.
Machine Learning Models Comparison
Criteria Popularity Count Items Relationships Interactions History Personalized Ranking
Patterns Popularity Similarity Behavior Behavior
Dimensionality 1,000s 10,000s 100,000s 10,000s
Performance Best Better Good Better
Coverage Low Medium High Medium
Accuracy Good Better Best Better
Retraining Weekly Weekly

Online

Monthly

Weekly
Recipe Popularity-Count SIMS User-Personalization Personalized-Ranking

ClearScale deployed each of these models at Amazon Personalize’s unified REST API, backed by the Amazon API Gateway, to make findings from PBS’s recommendation engine available to the many platforms that support the company’s streaming application. Access controls are based on Amazon Cognito and AWS Identity and Access Management (IAM) to ensure viewers only have access to their own data.

Each model’s API consists of four close-connected microservices:

  • Real-Time Recommendations API: Receives user information and, in a few seconds, offers recommendations on which great show will attract and entertain them next.
  • Personalized Notifications API: Do the same as the last microservice yet are used in conjunction with off-session marketing channels like SMS, email, or push notifications.
  • Feedback Loop API: Processes feedback from viewers in the form of “thumbs up” or “thumbs down” to determine their satisfaction with recommendations and, hence, their correctness.
  • Configuration Management API: Allows PBS administrators to fine-tune the recommendation engine on the fly without redeploying any system parts.

The world is not static in any sense, and neither is machine learning. As the environment evolves, trained models no longer operate as well as they did after being deployed. In 99% of cases, models degrade over time, decreasing the business value as well as end-user satisfaction. For example, the items catalog receives new titles that are never seen by the model.

In the best scenario, the model would refuse to recommend the title, introducing bias. In the worst scenario, the model would provide incorrect predictions leading to poor decisions. To ensure the model is not frozen in place, it must be continuously retrained on the most up-to-date data and occasionally change its shape to fit new game rules.

The custom Model Monitor was added on top of Amazon CloudWatch to measure a precision metric that characterized the system’s ability to make good recommendations to viewers. It doesn’t just monitor metrics, it also makes automated decisions based on them. For example, it retrains the model when it’s close to a certain threshold, so the metric value never drops below it, keeping viewers happy.

ClearScale’s proof of concept (PoC) for PBS yielded a “Precision at 10” metric of 0.0706. This number means, with every 10 titles recommended, at least one will be favored by the user with 71% probability. It’s worth saying that many other recommender systems can only achieve a 0.03 result.

Demonstrational User Interface

The last phase of the project was to create a user interface (UI) prototype that would allow PBS viewers to personalize their accounts in a simple, visually engaging way. ClearScale created a demo web application that reused existing business logic and capitalized on the new recommendation engine.

The demo app was powered by TypeScript, ReactJS, and Sass for the UI, as well as for data management using Effector (client-side) and React-Query (API integration). While serving its purpose as a functional prototype, reflecting PBS’s uniqueness with both styling and branding guidelines applied. Due to responsiveness, natively inherited from Material-UI, the demo app works equally well on desktops, tablets, and phones.

The demo user interface included the following components:

  • “Web Hosting” delivers the demo app to viewers and make it accessible regardless of platform.
  • “Unified Auth” allows PBS viewers to log in with existing credentials and automatically make their watch histories, preferences, and other personalization data available to SRE.
  • “Title Card” feature displays details about a show when a person hovers over it in the catalog, as well as a rating indicating whether the title is relevant to the user.
  • “Content Player” enables viewers to view recommendations in the demo app.
  • “Top Picks for {User}” displays a personalized list to viewers based on the real-time recommendations API and its Interactions History ML model.
  • “Feedback Loop” allows viewers to judge the relevance of recommendations provided by the system and see in real time how it affects the offered content.
  • “Top {K} Over Last Week” displays recent, popular titles across PBS’s entire audience based on the Popularity Count ML model.

The Benefits

Today, PBS has an effective MLOps platform and recommendation system that it can build on going forward. The data pipeline that ClearScale set up cleans, validates, and enriches raw data PBS has accumulated over its 50-year history. The data that flows into the organization’s recommendation system is consistent, accurate, and complete, making it a single source of truth for current and future AI-driven endeavors.

The new recommendation engine also gives PBS the ability to deliver more personalized experiences to viewers based on a myriad of factors. The four models that ClearScale built incorporate variables such as mainstream popularity, inter-title relationships, and user behavior to arrive at recommendations that are highly likely to please viewers.

Finally, the demo web application ClearScale developed for PBS showcases the power of the new recommendation engine in a user-friendly interface. It gives people the opportunity to quickly find titles they enjoy and share feedback on specific recommendations, empowering PBS to fine-tune viewers’ experiences.

At a time when major broadcasting companies are competing for viewership on numerous streaming applications, ClearScale helped PBS build its own ML-powered solution that leans on robust cloud-native tools from AWS. PBS now has a scalable MLOps platform it can use to provide better experiences for millions of viewers every day.

.
ClearScale-APN-Blog-Connect-1
.


ClearScale – AWS Partner Spotlight

ClearScale is an AWS Premier Tier Consulting Partner that helps customers design, build, deploy, and manage complex cloud architectures on time and on budget.

Contact ClearScale | Partner Overview