Monitor HLS and DASH live streams using a canary monitor

Introduction

Monitoring adaptive bitrate (ABR) media workloads requires attention to detail to provide a successful viewing experience. Keeping an eye on the health of the origin servers, confirming video and audio quality, and analyzing data reports from players are common monitoring tactics. However, issues such as manifest non-compliance, staleness, incorrect ad-break decorations, or missing data in an ad-tracking response can be easily overlooked without inspecting the manifests coming out of each individual origin.

This blog post describes a canary monitor solution to gain visibility into the data plane of an ABR stream—primarily manifests and ad-tracking data, which come out of different origins. Monitoring the data plane of an ABR stream can help prevent and resolve unexpected issues with playback or monetization by proactively alerting operators to potential issues. Using a canary monitor for inspecting and archiving content during a live stream can help confirm expectations, such as the presence of ad breaks and provides an archive for post-event troubleshooting.

The canary monitoring tool is available for download at https://github.com/aws-samples/monitor-hls-and-dash-streams-using-canary-monitor

Requirements

The canary monitor is a Python 3 script, which requires a system with a Python environment version 3.8 or newer. At a minimum, you will need the urllib3 library, which is used for sending all HTTP requests. In addition, you will need the following libraries based on the use case and enabled features:

lxml – for parsing DASH manifest responses
boto3 and botocore– for sending metrics to Amazon CloudWatch
jinja2– for creating Amazon CloudWatch dashboard json file

Description

The canary monitor is a tool, which, like a player, downloads and inspects HLS or DASH manifests from an origin at regular intervals. It can also download segments and ad-tracking data. It inspects the data after every download, runs a check for a large set of validations, creates logs and publishes custom metrics to Amazon CloudWatch. It works with various origins, but has been primarily designed to monitor streams originating from AWS Elemental MediaPackage and AWS Elemental MediaTailor. When pointed towards a MediaTailor origin, it can detect ad-break content replaced by the service and validate the ad-tracking data during ad breaks.

The script expects the user to provide one or more HLS or DASH live stream manifest URLs to monitor, which must be accessible from the machine where the script is running. The HLS manifest URL can be pointing to either the top-level (multivariant) manifest, or a specific rendition’s manifest. When pointed to the top-level HLS manifest, the script can pick and monitor one specific rendition, multiple renditions, or all renditions. Once started, the script continues to send an HTTP GET request for all provided manifest URLs at regular intervals (default 5 seconds) until you stop the process.

Each time the script downloads a manifest, it performs checks on the content of the manifest and optionally emits CloudWatch metrics. The checks include monitoring for staleness, discontinuities, spec compliance, ad breaks, and audio/video misalignment. If monitoring of MediaTailor ad-tracking data is enabled, the script looks for avail information during detected ad breaks. See the following table, which highlights important log messages and details the type of checks the script performs.

Example checks

	A	B	C	D
1	Name	Description	Log level	Impact category
2	Warning
3	Possible lip sync issue	Occurs when presentation time of a segment n from an adaptation set is more than 100 ms away from presentation time of segment n from any other adaptation set in a DASH manifest.	WARNING	Playback
4	Discontinuity	Occurs when EXT-X-DISCONTINUITY is found in an HLS manifest. Occurs when “t” value of segment n+1 does not equal “t” + “d” value of segment n and segments are in the same DASH manifest period.	WARNING	Playback
5	Staleness	Occurs when no new segment is found in the manifest response for x seconds, where x is the value defined by –stale argument.	WARNING	Playback
6	Differences between segments across renditions	Occurs when the same media sequence segments across renditions differ in any of the following associated attributes when monitoring multiple HLS renditions: discontinuity (has EXT-X-DISCONTINUITY tag), discontinuity sequence, EXT-X-PROGRAM-DATE-TIME (only video segments).	WARNING	Playback
7	Manifest value has changed	Occurs when EXT-X-VERSION or EXT-X-TARGETDURATION value have changed in an HLS manifest.	WARNING	Playback
8	Ad break was longer/shorter than advertised	Occurs when the sum of segment durations between ad break start and end does not match the advertised ad break duration (+/- 1 second).	WARNING	Monetization
9	Nested ad break start	Occurs when an ad break start manifest decoration is found while inside of an ad break.	WARNING	Monetization
10	Playhead is drifted	Occurs when monitoring a MediaTailor endpoint including ad-tracking data and the calculated playhead at ad break start is more than 15 seconds apart from the startTimeInSeconds for the avail id.	WARNING	Monetization
11	Info
12	Found new period	Occurs when a new period is found in a DASH manifest. The log message includes parsed SCTE-35 information if present.	INFO
13	Found avail in tracking response	Occurs when monitoring a MediaTailor endpoint and an avail is found in the ad-tracking data during an ad break. An avail is found when calculated playhead during an ad break falls within the startTimeInSeconds to (startTimeInSeconds + durationInSeconds) range for any avail found in the ad-tracking data. The log message includes details about the avail (availId, durationInSeconds, availadscount, creatives)	INFO

The script sends several metrics to CloudWatch when run with –cwmetrics argument. Metrics are created under CanaryMonitor custom namespace with 2 additional dimensions – Endpoint (endpoint identifier) and Type (hls or dash).

Example metrics

	A	B	C
1	Metric Name	Description	Value
2	discontinuity	Occurs when EXT-X-DISCONTINUITY is found in HLS or when “t” of segment n+1 does not equal “t” + “d” of segment n in DASH.
3	manifest4xx	Occurs with manifest 4xx HTTP response
4	manifest5xx	Occurs with manifest 5xx HTTP response
5	manifesttimeouterror	Occurs with manifest request timeout or connection error
6	manifestresponsetime	Manifest response time	milliseconds
7	stale	Occurs when no new segment is found in the manifest response for x seconds, where x is the value defined by –stale argument.
8	addurationadvertised	Duration as advertised in EXT-X-CUE-OUT tag or EXT-X-DATERANGE + DURATION tag for HLS and duration as advertised in a new period, which contains SpliceInfoSection with BreakDuration or segmentationDuration for DASH.	seconds
9	addurationactual	The sum of segment durations between ad break start and ad break end. Occurs when EXT-X-CUE-IN tag or EXT-X-DATERANGE + SCTE35-IN tag is found in HLS or a new period is found in DASH which signals an ad break end. For AWS MediaTailor endpoints, the sum of replaced segment durations during detected ad break.	seconds
10	addurationdelta	Difference between actual and advertised ad break duration, measured as actual – advertised.	seconds
11	ptsdelta	The maximum difference between (t – pto)/timescale across all adaptations sets for all new segments in DASH.	seconds

Following is a screenshot example of a CloudWatch dashboard created by the script monitoring multiple endpoints.

Running the script

To start the canary monitor, provide a single URL through –url argument or a list of URLs through the endpoints.csv file.

Example of canary monitor start (assuming endpoints.csv file is in the local directory)

$ python3 canarymonitor.py --emt --emtadsegmentstring asset --manifests --tracking --gzip --cwmetrics --dashboards --loglevel INFO --frequency 5 --stale 18 --httptimeout 3 --endpointslistfile endpoints.csv --stdout --label myliveevent_emt

Once the script is running, it will continuously log, save data and send metrics to CloudWatch.

Summary

This blog post provides a high-level overview of a canary monitoring tool used to enhance the observability of a streaming media workload by gaining better visibility into the data plane—the data coming out of different origin services in an ABR workflow. Integrate this tool into your monitoring strategy by running it on an Amazon EC2 instance or a local machine to monitor one or hundreds of endpoints during your next streaming event.

Select your cookie preferences

AWS for M&E Blog