This is a contextual advertising solution with enhanced machine learning (ML) capabilities, designed to reach target audiences without using third party cookies. Contextual advertising enables advertisers to reach an audience based on the content consumed by users. It uses an event-driven serverless architecture based on a highly scalable and cost-optimized design. The architecture enables demand side platforms (DSPs), advertisement publishers and supply side platforms (SSPs) to build a contextual intelligence solution utilizing AWS artificial intelligence (AI) and machine learning services to extract relevant metadata and map it to their own taxonomy or industry-standard taxonomy, which informs the programmatic bids for advertisement publishers, brand safety for advertisers and advertisement creative classification for supply side platforms.

Architecture Diagram

Disclaimer: Not for production use

Download PDF 


  1. DSP, SSP or ad publishers invoke an API  on Amazon API Gateway to trigger content discovery to fetch text, images, audio, and video from the provided content.
  2. Customers can use either a crawler or content management system API to extract different media types and store them in an Amazon S3 bucket.
  3. The serverless crawler is built on AWS Step Functions to orchestrate exploration and download of content. Ephemeral discovery data is stored in Amazon DynamoDB.
  4. Content (text, image, video) is stored in Amazon S3 and available metadata in Amazon DynamoDB for analysis.
  5. Content discovery completion event starts controller orchestration built on AWS Step Functions, AWS Lambda, and Amazon SNS.
  6. Amazon SNS events invoke AWS Lambda functions to start content analysis using Amazon Comprehend, Amazon Rekognition, and Amazon Transcribe.
  7. Amazon DynamoDB stores topics, sentiment, and object labels from content analysis workflow.
  8. Contextual Intelligence Taxonomy Mapper (CITM) uses the Bidirectional Encoder Representations from Transformers (BERT) model deployed on Amazon SageMaker. CITM maps metadata in Amazon DynamoDB to an industry standard taxonomy such as IAB Content Taxonomy.
  9. The AWS Lambda function gets the mapping and stores it in Amazon DynamoDB within the category service. 
  10. Bidding servers invoke API built on Amazon API Gateway to fetch categories from Amazon DynamoDB to inform programmatic advertising bids with low latency.

Well-Architected Pillars

  • All the services used in the design provide cloud watch metrics that can be used to monitor individual components of the design. Amazon API Gateway and AWS Lambda allow for publishing of new versions through an automated pipeline.

  • Amazon API Gateway provides a protection layer when invoking category service through an outbound API. All the proposed services support integration with AWS Identity and Access Management (IAM), which can be used to control access to resources and data.

  • AWS Lambda, Amazon DynamoDB, Amazon S3, Amazon Comprehend and Amazon Rekognition provide high availability within a region. Customers can deploy Amazon SageMaker endpoints in a highly available manner.

  • The solution requires batch processing for content discovery and content analysis. The performance requirements for batch processing range from minutes to hours; AWS Lambda, Amazon Comprehend and Amazon Rekognition are designed to meet them. Category service requires latency of less than 10 milliseconds (ms). Provisioned Concurrency in AWS Lambda and the HTTP API in Amazon API Gateway can support a latency requirement of less than 10 ms.

  • This solution uses AWS Lambda to design all compute components of content discovery and content analysis, keeping billing to pay per millisecond. The data store is designed using Amazon DynamoDB and Amazon S3, providing a low total cost of ownership for storing and retrieving data. For content analysis, the solution uses Amazon Comprehend and Amazon Rekognition. These allow customers to pay only when data is processed by the service. The category service uses Amazon API Gateway, reducing API development time and helps customers make sure they only pay when an API is invoked.

  • The solution uses the scaling behaviors of AWS Lambda and Amazon API Gateway to reduce over-provisioning resources. It uses AWS Managed Services to maximize resource utilization and to reduce the amount of energy needed to run a given workload.


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.