AWS for M&E Blog

Monitor HLS and DASH live streams using a canary monitor

Introduction

Monitoring adaptive bitrate (ABR) media workloads requires attention to detail to provide a successful viewing experience. Keeping an eye on the health of the origin servers, confirming video and audio quality, and analyzing data reports from players are common monitoring tactics. However, issues such as manifest non-compliance, staleness, incorrect ad-break decorations, or missing data in an ad-tracking response can be easily overlooked without inspecting the manifests coming out of each individual origin.

This blog post describes a canary monitor solution to gain visibility into the data plane of an ABR stream—primarily manifests and ad-tracking data, which come out of different origins. Monitoring the data plane of an ABR stream can help prevent and resolve unexpected issues with playback or monetization by proactively alerting operators to potential issues. Using a canary monitor for inspecting and archiving content during a live stream can help confirm expectations, such as the presence of ad breaks and provides an archive for post-event troubleshooting.

The canary monitoring tool is available for download at https://github.com/aws-samples/monitor-hls-and-dash-streams-using-canary-monitor

Requirements

The canary monitor is a Python 3 script, which requires a system with a Python environment version 3.8 or newer. At a minimum, you will need the urllib3 library, which is used for sending all HTTP requests. In addition, you will need the following libraries based on the use case and enabled features:

  • lxml – for parsing DASH manifest responses
  • boto3 and botocore– for sending metrics to Amazon CloudWatch
  • jinja2– for creating Amazon CloudWatch dashboard json file

Description

The canary monitor is a tool, which, like a player, downloads and inspects HLS or DASH manifests from an origin at regular intervals. It can also download segments and ad-tracking data. It inspects the data after every download, runs a check for a large set of validations, creates logs and publishes custom metrics to Amazon CloudWatch. It works with various origins, but has been primarily designed to monitor streams originating from AWS Elemental MediaPackage and AWS Elemental MediaTailor. When pointed towards a MediaTailor origin, it can detect ad-break content replaced by the service and validate the ad-tracking data during ad breaks.

The script expects the user to provide one or more HLS or DASH live stream manifest URLs to monitor, which must be accessible from the machine where the script is running. The HLS manifest URL can be pointing to either the top-level (multivariant) manifest, or a specific rendition’s manifest. When pointed to the top-level HLS manifest, the script can pick and monitor one specific rendition, multiple renditions, or all renditions. Once started, the script continues to send an HTTP GET request for all provided manifest URLs at regular intervals (default 5 seconds) until you stop the process.

Each time the script downloads a manifest, it performs checks on the content of the manifest and optionally emits CloudWatch metrics. The checks include monitoring for staleness, discontinuities, spec compliance, ad breaks, and audio/video misalignment. If monitoring of MediaTailor ad-tracking data is enabled, the script looks for avail information during detected ad breaks. See the following table, which highlights important log messages and details the type of checks the script performs.

Example checks

A B C D
1 Name Description Log level Impact category
2 Warning
3 Possible lip sync issue Occurs when presentation time of a segment n from an adaptation set is more than 100 ms away from presentation time of segment n from any other adaptation set in a DASH manifest. WARNING Playback
4 Discontinuity Occurs when EXT-X-DISCONTINUITY is found in an HLS manifest. Occurs when “t” value of segment n+1 does not equal “t” + “d” value of segment n and segments are in the same DASH manifest period. WARNING Playback
5 Staleness Occurs when no new segment is found in the manifest response for x seconds, where x is the value defined by –stale argument. WARNING Playback
6 Differences between segments across renditions Occurs when the same media sequence segments across renditions differ in any of the following associated attributes when monitoring multiple HLS renditions: discontinuity (has EXT-X-DISCONTINUITY tag), discontinuity sequence, EXT-X-PROGRAM-DATE-TIME (only video segments). WARNING Playback
7 Manifest value has changed Occurs when EXT-X-VERSION or EXT-X-TARGETDURATION value have changed in an HLS manifest. WARNING Playback
8 Ad break was longer/shorter than advertised Occurs when the sum of segment durations between ad break start and end does not match the advertised ad break duration (+/- 1 second). WARNING Monetization
9 Nested ad break start Occurs when an ad break start manifest decoration is found while inside of an ad break. WARNING Monetization
10 Playhead is drifted Occurs when monitoring a MediaTailor endpoint including ad-tracking data and the calculated playhead at ad break start is more than 15 seconds apart from the startTimeInSeconds for the avail id. WARNING Monetization
11 Info
12 Found new period Occurs when a new period is found in a DASH manifest. The log message includes parsed SCTE-35 information if present. INFO
13 Found avail in tracking response Occurs when monitoring a MediaTailor endpoint and an avail is found in the ad-tracking data during an ad break. An avail is found when calculated playhead during an ad break falls within the startTimeInSeconds to (startTimeInSeconds + durationInSeconds) range for any avail found in the ad-tracking data. The log message includes details about the avail (availId, durationInSeconds, availadscount, creatives) INFO

The script sends several metrics to CloudWatch when run with –cwmetrics argument. Metrics are created under CanaryMonitor custom namespace with 2 additional dimensions – Endpoint (endpoint identifier) and Type (hls or dash).

Example metrics

A B C
1 Metric Name Description Value
2 discontinuity Occurs when EXT-X-DISCONTINUITY is found in HLS or when “t” of segment n+1 does not equal “t” + “d” of segment n in DASH.
3 manifest4xx Occurs with manifest 4xx HTTP response
4 manifest5xx Occurs with manifest 5xx HTTP response
5 manifesttimeouterror Occurs with manifest request timeout or connection error
6 manifestresponsetime Manifest response time milliseconds
7 stale Occurs when no new segment is found in the manifest response for x seconds, where x is the value defined by –stale argument.
8 addurationadvertised Duration as advertised in EXT-X-CUE-OUT tag or EXT-X-DATERANGE + DURATION tag for HLS and duration as advertised in a new period, which contains SpliceInfoSection with BreakDuration or segmentationDuration for DASH. seconds
9 addurationactual The sum of segment durations between ad break start and ad break end. Occurs when EXT-X-CUE-IN tag or EXT-X-DATERANGE + SCTE35-IN tag is found in HLS or a new period is found in DASH which signals an ad break end. For AWS MediaTailor endpoints, the sum of replaced segment durations during detected ad break. seconds
10 addurationdelta Difference between actual and advertised ad break duration, measured as actual – advertised. seconds
11 ptsdelta The maximum difference between (t – pto)/timescale across all adaptations sets for all new segments in DASH. seconds

Following is a screenshot example of a CloudWatch dashboard created by the script monitoring multiple endpoints.

An image depicting a CloudWatch dashboard showing metrics published by the canary monitor.

Running the script

To start the canary monitor, provide a single URL through –url argument or a list of URLs through the endpoints.csv file.

Example of canary monitor start (assuming endpoints.csv file is in the local directory)

$ python3 canarymonitor.py --emt --emtadsegmentstring asset --manifests --tracking --gzip --cwmetrics --dashboards --loglevel INFO --frequency 5 --stale 18 --httptimeout 3 --endpointslistfile endpoints.csv --stdout --label myliveevent_emt 

Once the script is running, it will continuously log, save data and send metrics to CloudWatch.

Summary

This blog post provides a high-level overview of a canary monitoring tool used to enhance the observability of a streaming media workload by gaining better visibility into the data plane—the data coming out of different origin services in an ABR workflow. Integrate this tool into your monitoring strategy by running it on an Amazon EC2 instance or a local machine to monitor one or hundreds of endpoints during your next streaming event.

Tomas Juraska

Tomas Juraska

Tomas Juraska is an Enterprise Account Engineer based in the United States. He has served customers in the media and entertainment industry for more than 8 years.