Post AbxIpM4VyWFCDjer32 by djspiewak@fosstodon.org
 (DIR) More posts by djspiewak@fosstodon.org
 (DIR) Post #AbxIorvc33IWjRGK0W by djspiewak@fosstodon.org
       2023-11-18T23:29:53Z
       
       0 likes, 1 repeats
       
       I was curious as to how Apple was handling pre-roll promos and ads. These pre-rolls are pretty clearly dynamic (i.e. not burned into the playlist) since sometimes I don't get them, and can be multi-stage (e.g. Foundation episodes usually have three independent ones). HLS Interstitials are very new and Apple TV's pre-rolls predate its support, so naturally I wanted to know how this was done.
       
 (DIR) Post #AbxIovYMZR4pyUPoK8 by djspiewak@fosstodon.org
       2023-11-18T23:31:50Z
       
       0 likes, 0 repeats
       
       To be clear, I was able to think of a bunch of mostly-equivalent ways in which this could have been achieved, so it's not like it's some unexplainable magic trick, but I was very curious as to exactly which set of tradeoffs Apple had chosen.Today, I finally decided to take the time to read through an m3u8 for an Apple TV first-party episode (from For All Mankind Season 1) to get the scoop. All zero of you who were also curious can thus save yourselves the effort.
       
 (DIR) Post #AbxIoyuS5lWqNrlzfc by djspiewak@fosstodon.org
       2023-11-18T23:33:57Z
       
       0 likes, 0 repeats
       
       Backing up and explaining more for folks who haven't worked with media streaming… (random background: I used to work on all the tech things at Disney+ and Hulu)When you press "play" in a streaming app, it doesn't just start downloading a big ol' mp4 and slurp that sucker directly into the hardware decoder. There are a huge number of reasons why this doesn't work well (MOV atom anyone?), but one of the most significant is bitrate.
       
 (DIR) Post #AbxIp0fdXorlqUrBr6 by djspiewak@fosstodon.org
       2023-11-18T23:35:39Z
       
       0 likes, 0 repeats
       
       We generally have an expectation that our content plays seamlessly at whatever its designated framerate (e.g. 29.97) from start to finish with no visible interruptions. We also have the expectation that it starts quickly (~1 seconds is industry standard, and some services beat this by a lot!) and we can pause, seek forward and backward arbitrarily, stop and resume quickly, and other expectations that are insane when you really think about it.
       
 (DIR) Post #AbxIp2g45COg4P4Yng by djspiewak@fosstodon.org
       2023-11-18T23:37:17Z
       
       0 likes, 0 repeats
       
       We also have other expectations that are quite complex, such as subtitles in a panoply of languages, configurable in the client player, different audio dub translations of the source content, and of course… personally targeted advertisements injected along the way.
       
 (DIR) Post #AbxIp69x8X5Sry4yIq by djspiewak@fosstodon.org
       2023-11-18T23:39:04Z
       
       0 likes, 0 repeats
       
       All of this has to work even while our network conditions fluctuate from good to bad and back again, which brings us back to the origin of our problem: how do we *stream* video and associated timeline-aligned data (e.g. subtitles) in a way which is reliable in the face of unreliable network conditions and dynamically makes tradeoffs against quality? The answer is playlists.
       
 (DIR) Post #AbxIp9JHQJKYdlSxl2 by djspiewak@fosstodon.org
       2023-11-18T23:42:01Z
       
       0 likes, 0 repeats
       
       Playlists basically represent a temporally-aware encoding of the *metadata* about the stream. Not just title and such, but everything the player needs to actually produce the experience (mostly… we'll come back to that). There are four major playlist formats – HLS (which is everywhere), DASH (which is great and nowhere), and proprietary formats used by Netflix and YouTube respectively). All have their quirks, but the most significant thing they contain is a media segment reference.
       
 (DIR) Post #AbxIpB8icY4SJaXYZc by djspiewak@fosstodon.org
       2023-11-18T23:44:52Z
       
       0 likes, 0 repeats
       
       A media segment is effectively just a truncated slice of the video, usually about 2 seconds in length. A single playlist contains as many segments as needed, in order, to traverse the entire stream. Each segment is generally a separate file on disk (and thus a separate URL on the CDN!) and usually exists in several different variants, covering things like bitrate, HDR/SDR, Dolby Atmos, and more. The client player is in control of which variant(s) it loads and plays for each segment and when.
       
 (DIR) Post #AbxIpCvJzKXhqcHsy8 by djspiewak@fosstodon.org
       2023-11-18T23:47:06Z
       
       0 likes, 0 repeats
       
       This is a nice scheme, because it allows the client to instrument its own capabilities on the fly – including network conditions, hardware, localization, etc – and choose the best variant to match. It even allows client-side control over quality/bandwidth tradeoffs (higher quality usually means more bitrate which means more bandwidth), which we see in YouTube's cog menu (among other places).
       
 (DIR) Post #AbxIpGSOqnmio4mqRc by djspiewak@fosstodon.org
       2023-11-18T23:48:44Z
       
       0 likes, 0 repeats
       
       The catch is that the playlist itself is generally static. It's just a file linking to other files in order. Now, there are some exceptions to this (dynamically-generated manifests with read-through caching are more and more common now), but you have to be careful since your client-observed video start time is limited by 1) the time it takes to fetch and parse the manifest, 2) the time it takes to fetch the DRM keys, and 3) the time it takes to fetch the first segment from said manifest.
       
 (DIR) Post #AbxIpINVhx3ulUVy6K by djspiewak@fosstodon.org
       2023-11-18T23:49:54Z
       
       0 likes, 0 repeats
       
       So then the problem becomes the following: how can we have *subsections* of the video stream which are non-static? Ads are the big gorilla in the room here, since we want to ultra-personalize every ad opportunity at all times, but promos (such as those on Apple TV) are another good one.
       
 (DIR) Post #AbxIpKBsy8x4O15iG8 by djspiewak@fosstodon.org
       2023-11-18T23:52:10Z
       
       0 likes, 0 repeats
       
       Apple was the originator of the HLS specification, so it was always a safe bet that they weren't using DASH (which has nicer solutions to this problem). There have long been a few non-standard approaches to try to address this within HLS players, but none of that seemed Apple's style. Nonetheless, the “proper" solution (HLS Interstitials) were only added to the spec within the last two years, and remained poorly supported even on Apple hardware until much more recently than that.
       
 (DIR) Post #AbxIpM4VyWFCDjer32 by djspiewak@fosstodon.org
       2023-11-18T23:54:12Z
       
       0 likes, 0 repeats
       
       And yet, Apple TV had personalized preroll promos long before this, hence my minor curiosity.Now, to be clear, pre- and post-roll content injection is fundamentally easier than midroll (e.g. most streaming ads) because you don't need to worry about lining up with a segment boundary in the stream: the stream hasn't started yet, so just slap some video in there *before* you start and you're good to go!
       
 (DIR) Post #AbxIpNvN5U7PxxOa4e by djspiewak@fosstodon.org
       2023-11-18T23:55:48Z
       
       0 likes, 0 repeats
       
       And as it turns out, this is exactly what Apple did. Though there are hints that midroll injections (*ahem* ads *ahem*) are coming in the future to an Apple TV near you. Specifically, Apple TV promos are *not* using HLS interstitials (even though they're now supported on all current Apple hardware). Instead, Apple uses an out of band bit of metadata at the front of the manifest to link out to an *additional* URL (probably service-backed) which likely returns an HLS fragment for a sub-player.
       
 (DIR) Post #AbxIpPnI8UqNlTd9l2 by djspiewak@fosstodon.org
       2023-11-18T23:57:55Z
       
       0 likes, 0 repeats
       
       They then have custom logic in their own player to interpret this bit of metadata as a preroll fetch, and then likely fork off a second player, superimposed over the main player, which handles the preroll (not dissimilar to how they implement interstitials, but without having to solve buffer alignment). The cost here is longer video start time (which is easy to see as a user of Apple TV), but the upside is they got to ship a very user-impacting feature (promos!) without waiting for spec.
       
 (DIR) Post #AbxIpTrh04k0OHijAm by djspiewak@fosstodon.org
       2023-11-18T23:58:46Z
       
       0 likes, 0 repeats
       
       Oh and right next to that preroll metadata tag is a second tag which lists the *number* of expected midroll breaks. Currently that number is 0 (as expected), but consider this a strong hint that Apple is at last thinking about adding an ad supported tier to Apple TV in the future.
       
 (DIR) Post #AbxIpXEqSS2kqxZlB2 by djspiewak@fosstodon.org
       2023-11-18T23:59:34Z
       
       0 likes, 0 repeats
       
       (no insider knowledge in any of this btw; you can figure this all out by either cracking open the browser devtools or doing what I did and downloading some media on the TV app and reading the m3u8 file on local disk)