AWS for M&E Blog
Monitor HLS and DASH live streams using a canary monitor
Introduction
Monitoring adaptive bitrate (ABR) media workloads requires attention to detail to provide a successful viewing experience. Keeping an eye on the health of the origin servers, confirming video and audio quality, and analyzing data reports from players are common monitoring tactics. However, issues such as manifest non-compliance, staleness, incorrect ad-break decorations, or missing data in an ad-tracking response can be easily overlooked without inspecting the manifests coming out of each individual origin.
This blog post describes a canary monitor solution to gain visibility into the data plane of an ABR stream—primarily manifests and ad-tracking data, which come out of different origins. Monitoring the data plane of an ABR stream can help prevent and resolve unexpected issues with playback or monetization by proactively alerting operators to potential issues. Using a canary monitor for inspecting and archiving content during a live stream can help confirm expectations, such as the presence of ad breaks and provides an archive for post-event troubleshooting.
The canary monitoring tool is available for download at https://github.com/aws-samples/monitor-hls-and-dash-streams-using-canary-monitor
Requirements
The canary monitor is a Python 3 script, which requires a system with a Python environment version 3.8 or newer. At a minimum, you will need the urllib3 library, which is used for sending all HTTP requests. In addition, you will need the following libraries based on the use case and enabled features:
- lxml – for parsing DASH manifest responses
- boto3 and botocore– for sending metrics to Amazon CloudWatch
- jinja2– for creating Amazon CloudWatch dashboard json file
Description
The canary monitor is a tool, which, like a player, downloads and inspects HLS or DASH manifests from an origin at regular intervals. It can also download segments and ad-tracking data. It inspects the data after every download, runs a check for a large set of validations, creates logs and publishes custom metrics to Amazon CloudWatch. It works with various origins, but has been primarily designed to monitor streams originating from AWS Elemental MediaPackage and AWS Elemental MediaTailor. When pointed towards a MediaTailor origin, it can detect ad-break content replaced by the service and validate the ad-tracking data during ad breaks.
The script expects the user to provide one or more HLS or DASH live stream manifest URLs to monitor, which must be accessible from the machine where the script is running. The HLS manifest URL can be pointing to either the top-level (multivariant) manifest, or a specific rendition’s manifest. When pointed to the top-level HLS manifest, the script can pick and monitor one specific rendition, multiple renditions, or all renditions. Once started, the script continues to send an HTTP GET request for all provided manifest URLs at regular intervals (default 5 seconds) until you stop the process.
Each time the script downloads a manifest, it performs checks on the content of the manifest and optionally emits CloudWatch metrics. The checks include monitoring for staleness, discontinuities, spec compliance, ad breaks, and audio/video misalignment. If monitoring of MediaTailor ad-tracking data is enabled, the script looks for avail information during detected ad breaks. See the following table, which highlights important log messages and details the type of checks the script performs.
Example checks
A | B | C | D | |
1 | Name | Description | Log level | Impact category |
2 | Warning | |||
3 | Possible lip sync issue | Occurs when presentation time of a segment n from an adaptation set is more than 100 ms away from presentation time of segment n from any other adaptation set in a DASH manifest. | WARNING | Playback |
4 | Discontinuity | Occurs when EXT-X-DISCONTINUITY is found in an HLS manifest. Occurs when “t” value of segment n+1 does not equal “t” + “d” value of segment n and segments are in the same DASH manifest period. | WARNING | Playback |
5 | Staleness | Occurs when no new segment is found in the manifest response for x seconds, where x is the value defined by –stale argument. | WARNING | Playback |
6 | Differences between segments across renditions | Occurs when the same media sequence segments across renditions differ in any of the following associated attributes when monitoring multiple HLS renditions: discontinuity (has EXT-X-DISCONTINUITY tag), discontinuity sequence, EXT-X-PROGRAM-DATE-TIME (only video segments). | WARNING | Playback |
7 | Manifest value has changed | Occurs when EXT-X-VERSION or EXT-X-TARGETDURATION value have changed in an HLS manifest. | WARNING | Playback |
8 | Ad break was longer/shorter than advertised | Occurs when the sum of segment durations between ad break start and end does not match the advertised ad break duration (+/- 1 second). | WARNING | Monetization |
9 | Nested ad break start | Occurs when an ad break start manifest decoration is found while inside of an ad break. | WARNING | Monetization |
10 | Playhead is drifted | Occurs when monitoring a MediaTailor endpoint including ad-tracking data and the calculated playhead at ad break start is more than 15 seconds apart from the startTimeInSeconds for the avail id. | WARNING | Monetization |
11 | Info | |||
12 | Found new period | Occurs when a new period is found in a DASH manifest. The log message includes parsed SCTE-35 information if present. | INFO | |
13 | Found avail in tracking response | Occurs when monitoring a MediaTailor endpoint and an avail is found in the ad-tracking data during an ad break. An avail is found when calculated playhead during an ad break falls within the startTimeInSeconds to (startTimeInSeconds + durationInSeconds) range for any avail found in the ad-tracking data. The log message includes details about the avail (availId, durationInSeconds, availadscount, creatives) | INFO |
The script sends several metrics to CloudWatch when run with –cwmetrics argument. Metrics are created under CanaryMonitor custom namespace with 2 additional dimensions – Endpoint (endpoint identifier) and Type (hls or dash).
Example metrics
A | B | C | |
1 | Metric Name | Description | Value |
2 | discontinuity | Occurs when EXT-X-DISCONTINUITY is found in HLS or when “t” of segment n+1 does not equal “t” + “d” of segment n in DASH. | |
3 | manifest4xx | Occurs with manifest 4xx HTTP response | |
4 | manifest5xx | Occurs with manifest 5xx HTTP response | |
5 | manifesttimeouterror | Occurs with manifest request timeout or connection error | |
6 | manifestresponsetime | Manifest response time | milliseconds |
7 | stale | Occurs when no new segment is found in the manifest response for x seconds, where x is the value defined by –stale argument. | |
8 | addurationadvertised | Duration as advertised in EXT-X-CUE-OUT tag or EXT-X-DATERANGE + DURATION tag for HLS and duration as advertised in a new period, which contains SpliceInfoSection with BreakDuration or segmentationDuration for DASH. | seconds |
9 | addurationactual | The sum of segment durations between ad break start and ad break end. Occurs when EXT-X-CUE-IN tag or EXT-X-DATERANGE + SCTE35-IN tag is found in HLS or a new period is found in DASH which signals an ad break end. For AWS MediaTailor endpoints, the sum of replaced segment durations during detected ad break. | seconds |
10 | addurationdelta | Difference between actual and advertised ad break duration, measured as actual – advertised. | seconds |
11 | ptsdelta | The maximum difference between (t – pto)/timescale across all adaptations sets for all new segments in DASH. | seconds |
Following is a screenshot example of a CloudWatch dashboard created by the script monitoring multiple endpoints.
Running the script
To start the canary monitor, provide a single URL through –url argument or a list of URLs through the endpoints.csv file.
Example of canary monitor start (assuming endpoints.csv file is in the local directory)
Once the script is running, it will continuously log, save data and send metrics to CloudWatch.
Summary
This blog post provides a high-level overview of a canary monitoring tool used to enhance the observability of a streaming media workload by gaining better visibility into the data plane—the data coming out of different origin services in an ABR workflow. Integrate this tool into your monitoring strategy by running it on an Amazon EC2 instance or a local machine to monitor one or hundreds of endpoints during your next streaming event.