Demystifying Apple low-latency HTTP live streaming
The latest entrant to the world of low latency over-the-top (OTT) streaming is Apple’s draft specification: Low-Latency HLS. This blog post will explore some of the features and nuances surrounding the new HTTP live streaming format and as such, and is purely informational in nature.
OTT streaming, that is video streams delivered over the internet, have dramatically improved in quality of the past decade. In fact, OTT is seen as the dominating format for UltraHD experiences compared to traditional broadcast such as Satellite, Cable and Terrestrial distribution. However, one area where traditional broadcast trumps OTT is latency. Latency is the amount of time that passes between some action occurring in front of a camera lens and that action being displayed to a viewer. Traditional broadcast has this number pegged down to between 8 and 10 seconds. Historically however, OTT streams have been between 40 and 60 seconds — sometimes even higher.
The vast difference is really down to 3 components: intrinsic latency, network latency and forward buffer latency.
Intrinsic latency comes from the production of the video streams themselves. Things like contribution from location to studio, graphic compositing, commentary and more all add up to delay in the presentation before it’s even distributed to consumers. This is usually in the order of mid single digit seconds and is equal to the latency experienced in legacy broadcast means such as FM radio. Getting to this latency for OTT is seen as the holy grail.
Network latency is equal to the least efficient component in the network stack between the client and the origin. This includes things like the protocols themselves (TCP, HTTP, HLS/DASH etc), the round trip times, the bandwidth available and the CDN caching behaviors.
The forward buffer latency is how much latency your player chooses to start with. This is a good latency from a quality perspective as this latency is a buffer against adverse network conditions. The more content that exists in the forward buffer, the less likely a consumer is to experience a re-buffering event.
The problem with reducing OTT latency is that it is a complex problem with many compromises to make it work at scale. Sacrificing forward buffer means that OTT operators must be willing to also sacrifice at least some quality; improving network latency can compromise scalability and robustness while reducing intrinsic latency means a lot of complexity and cost in the production workflow.
HLS when compared to MPEG-DASH had significant drawbacks when it came to both network latency and forward buffer latency.
Because HLS clients poll a server to discover new segments, an inherent time-cost is unavoidable. If the poll request to the CDN is cached (which it almost universally is) it then translates to a latency equal to the TTL (time to live) value configured for this type of request in the CDN. What’s more is that because of the back and forth of client to server in this polling behavior, a lot of valuable time is consumed simply in the chattiness of the protocol.
On the forward buffer side, clients that are following the HLS RFC should start their playback at a point that is 3 segments behind the last available segment. If your segment duration is 6 seconds, this equates to a very healthy forward buffer of 18 seconds but in turn a massive amount of latency.
Low Latency to Date
There are already a few solutions available to date in order to reduce latency. In fact one such open source community solution is an adaption to the existing HLS specification simply named LHLS.
Another comes from the DASH Industry Forum which leverages the SegmentTemplate mode of MPEG-DASH along with MP4 Fragments with HTTP chunked transfer encoding.
Underpinning both solution is the ability for the client side player to request a segment that is still being encoded. In other words, the player is able to decode the top of the segment while the tail of the segment is still being generated. This is especially useful for if you segment duration is 6 seconds, it allows a latency of say 2 seconds or less.
Perhaps the most important part of the Apple defined Low-Latency HLS is the fact that incomplete segments should never be made available to the client. Instead the parts of the segment should be individually signaled in the playlist — I will detail this later.
Apple defined Low-Latency HLS
Low-Latency HLS, a draft specification is effectively a suite of changes predominantly to address the network latency part of the problem. In it there are four broad changes:
Importantly, the specification is fully backwards compatible with previous versions of HLS, however clients will need to be running iOS13 and above to leverage the features.
First, the server signals to the client that it has been configured to support low latency features with EXT-X-SERVER-CONTROL along with some attributes to define the exact functions available.
Next, the client can choose to send any configuration of query string parameters to modify the resulting playlist at the server side. This is a huge departure for HLS and will have long lasting ramifications in the industry if adopted and here’s why:
Historically, HLS origins and CDNs were immune to query string parameters. That is, a client could send anything in the query string of the HTTP request and it would not affect the response from server. This has led to a fairly prolific live streaming architecture whereby the encoder and/or packager will pre-publish and push m3u8 files directly to the CDN which in turn is configured to simply ignore any query strings. This meant that the m3u8 files were the same for all viewers and very easily cacheable. This is no longer the case.
Apple have reserved any query string keys beginning with _HLS. This means that in order to support Apple’s Low Latency HLS your CDN will need to include any query strings that match this pattern in the object’s cache key.
You will also need to run server-side software to modify the HLS manifest files as per the instructions of the query string parameters from the client. Apple has made a reference php library available for this purpose (note: you need to be a part of the Apple Developer Program to access this library).
Now on to the actual features:
1. Reduce Publishing Latency
If your encoder has the ability to publish sub-segments (that is either CMAF Chunks or partial TS which Apple is calling partial segments) to an origin, HLS now has the ability to discretely address these partial segments. This means that you may keep your full segment duration at the recommended 6 seconds but signal availability of parts that constitute the segment currently being created by the encoder.
This is signaled in the playlist like so:
In the above example, the full segment fileSequence273.ts is not available in its entirety, however, every 333ms a new partial segment of the current bleeding edge full segment is signaled. This would require a packager that is capable of generating m3u8 updates and generating media files at least as frequently as the duration of the partial segments.
One approach might be to simply stream each segment as it is currently being encoded using HTTP1.1/2 chunked transfer encoding to an origin and write some server side AWS Lambda Functions to generate the m3u8s referencing the #EXT-X-PARTs with byte-range offset as the locations rather than discrete files as in the Apple example above.
There still exists a very big problem in that the client will still poll for segment availability. If the CDN caches this request for say 2 seconds, then having a sub-segment or partial segment duration of anything less than 2 seconds will be wasted. To this extent, the client can now send two sets of query strings in order to achieve two important things—
- bust the cache to get a new m3u8 from origin and;
- instruct the origin not to respond with a 404 if the partial segment is unavailable, but instead hold the connection open for up to 3x segment durations until the requested manifest sequence is available.
This is done with a combination of:
- _HLS_msn=<N>which instructs the origin that the client is only interested in a playlist that contains the media sequence number N and;
- _HLS_part=<M>which instructs the origin that the client is only interested in a playlist that contains partM of media sequence N.
For example: …/playlist.m3u8?_HLS_msn=100&_HLS_part=4 instructs the CDN and origin not to respond to this request until a playlist is available that contains part 4 of the media sequence 100. If there are 99 full segments in the playlist and 4 parts of 1 in-progress segment or more, then and only then, should the server respond.
This is a fascinating concept and quite flexible. I can see that encoders, packagers, CDNs and more will be racing to implement these features. Having a connection held open for multiple seconds however… that is something that remains to be seen at large scale. This is not a very expected behavior and there may be many lower level networking stacks that will close the connection earlier than intended. In fact, most CDNs will assume that the origin is too slow and give up if the connection is held open with no or little data transfer. This is something that will need to be comprehensively tested.
2. Eliminate Segment Round Trip
Apple have now also introduced HTTP2 Server Push into the HLS specification. Essentially, the client should pass again via query string, this time a boolean of _HLS_push=1/0, whether or not the most recent partial segment at the bottom of the m3u8 list should be pushed in parallel with the m3u8 response.
Historically, the HLS client would need to download the full m3u8 response, enumerate all the sequences and then only after that make a separate HTTP request for the segment, often wasting several hundred milliseconds in the process.
CDNs will need to support HTTP2 push and be able to intrinsically understand which object to push along side a cached m3u8 response for this feature to be of significant value.
3. Reduce Playlist Transfer Overhead
Another long standing issue with HLS Live Streams is that of perpetually growing manifest files. If you are say publishing a Test Match Cricket game that on average lasts 8 hours with 2 second segments, that will equate to 14,000 segments each with 3-4 attributes at the 8th hour. The resulting multi hundred kilobyte file, even with gzip, can take significant time to download which is compounded by the update frequency of every 2 seconds.
Imagine this issue if it were to trombone with the introduction of Partial Segments every 300ms?
That is clearly why you can now signal via the query string _HLS_skip=YES which instructs the server to only send the delta from the last playlist to now. The resulting manifest will insert #EXT-X-SKIP:SKIPPED-SEGMENTS=3 in lieu of the actual segments; in this case 3. The #SERVER-CONTROL attribute ofCAN-SKIP-UNTIL= should also be set to a horizon of no less than 6 segments from the live edge.
4. Switch Tiers Quickly
Lastly, Apple have now also introduced a new method for HLS clients to switch between representations more acutely. The representation switch process previously would require a very healthy forward buffer and the Apple HLS client would, once comfortable, start to seek out higher quality representations. If the forward buffer dropped below a certain number of seconds, the client would fall back to a lower quality representation in the hopes of increasing this forward buffer.
This logic breaks when the forward buffer is only a few hundred milliseconds as this is no time to run through the above logic before a playback stall (colloquially known as a rebuffer).
To this end, you can now include ?_HLS_report=/other/manifest.m3u8in the request. This can be used to include the segment availability hints of adjacent representations to the one currently requested. The server should then include these hints inline with the request playlist via the #EXT-X-RENDITION-REPORT:tag at the bottom of the playlist. The attributes should be at least the currently available segment media sequence id and the part number.
Switching boundaries are defined via the INDEPENDENT=YES attribute.
Apple have provided some client-side APIs. One to control the forward buffer latency and therefore estimated latency and another to control the behavior when recovering from a stall (i.e. re-buffer event). For ultimate low latency applications, you should now instruct your iOS clients to catch up to the live edge and discard the missed frames during the re-buffer event.
The specification is preliminary stage meaning that you will need to test it thoroughly before going to production. During the presentation, Roger Pantos indicated that the specification should be rolled into the wider HLS RFC later this year.
Almost no CDNs that I am aware of will cache query strings when configured for HLS delivery. This means that at least for testing, the CDN configuration should be taken out of the architecture and worked on in parallel to ensure compliance. As the concept is in Beta period anyway from Apple, your App will not make it through the App Store validation.
In Amazon CloudFront, changing the query string caching behavior is simply requires two rules to be updated:
Apple have made available a server side php script that encapsulates the server side features require for HLS Low-Latency. This is available through the Apple Developer Program.