[HN Gopher] Netflix Cloud Packaging in the Terabyte Era
       ___________________________________________________________________
        
       Netflix Cloud Packaging in the Terabyte Era
        
       Author : mmcclure
       Score  : 105 points
       Date   : 2021-09-28 05:53 UTC (1 days ago)
        
 (HTM) web link (netflixtechblog.com)
 (TXT) w3m dump (netflixtechblog.com)
        
       | MrBuddyCasino wrote:
       | OT: starting with covid, Netflix reduced bitrate to relieve
       | stress on overloaded networks. Did they ever reverse this? Video
       | still looks overly compressed to me, so much so that I find it
       | annoying.
        
         | tinus_hn wrote:
         | They forced SD resolution to appease technophobes who were
         | crying Netflix was wasting bandwidth on 'unnecessary
         | entertainment' while they were having trouble with their
         | 'necessary video meetings'. Which of course was a limitation of
         | bandwidth across the internet while Netflix mostly uses local
         | bandwidth, they try to stream from machines on the ISP networks
         | because it saves them a ton of money.
         | 
         | After about a month this was reversed.
         | 
         | In the meantime it's possible they have switched compression
         | standards or settings which you may have noticed. But the covid
         | limitations should be long gone.
         | 
         | https://openconnect.netflix.com/en_gb/
        
         | Widdershin wrote:
         | As far as I can tell Covid-quality is still in full effect, to
         | the point where I don't watch movies on Netflix at all anymore.
         | I often start one, remember the quality and then find an
         | alternate source.
         | 
         | I wonder if they've seen that it doesn't impact churn as much
         | as it saves them money, and they're keeping it that way as long
         | as possible.
        
           | MrBuddyCasino wrote:
           | > I wonder if they've seen that it doesn't impact churn as
           | much as it saves them money, and they're keeping it that way
           | as long as possible.
           | 
           | Yes thats my suspicion too.
        
         | Aissen wrote:
         | It should have come back. If not, it's probably a bug, but good
         | luck trying to report that to Netflix.
        
       | dkarp wrote:
       | It's not entirely clear to me what the job of the Packager is
       | here as opposed to a Virtual Package.
       | 
       | After chunk encoding and Virtual Assembly with an index file, are
       | Netflix actually packaging the encoded
       | video/audio/subtitles/metadata into a single file for each encode
       | that is pushed to their CDN? If so, then why is the Packager even
       | necessary? Why not go one step further and create a Virtual
       | Package as well?
       | 
       | Is it so that they can better distribute the packaged encodes to
       | their CDN?
        
         | jasode wrote:
         | _> It's not entirely clear to me what the job of the Packager
         | is here._
         | 
         | In the thread article's 1st paragraph is a link to a previous
         | article with more details of Packager:
         | https://netflixtechblog.com/packaging-award-winning-shows-wi...
        
           | dkarp wrote:
           | Right, and I actually read that. Maybe I should rephrase my
           | question. I'd like to know why is a Packager necessary
           | instead of virtually packaging and presenting an interface of
           | a packaged file?
           | 
           | Then you'd be able to keep your encoded chunks as chunks
           | instead of having to download them and then upload them again
           | to MezzFs.
        
             | jasode wrote:
             | _> why is a Packager necessary instead of virtually
             | packaging and presenting an interface of a packaged file?_
             | 
             | Maybe I'm parsing your question wrong but perhaps the
             | confusion is cleared up if we remember that the "CDN" in
             | the Netflix architecture diagram _is not physically stored
             | at AWS S3_ : https://miro.medium.com/max/350/1*A5PR2QJ7STPb
             | Ud2xTg6z_g.png
             | 
             | See: https://www.google.com/search?q=netflix+cdn+appliance+
             | %22ope...
             | 
             | And a recent 2021 article says the Netflix CDN appliance
             | has ~280 terabytes of local disk storage:
             | https://dev.to/gbengelebs/netflix-system-design-how-
             | netflix-...
             | 
             | Thus, the "output" video files of Packager is eventually
             | physically transferred to geographically distributed ISP
             | datacenters.
             | 
             | So to attempt to reconstruct Netflix's thought process...
             | 
             | Once we work backwards from the need to eventually store
             | copies of video content on distant Netflix CDN appliances,
             | the question becomes which type of video file to store:
             | 
             | (a) the codec-specific format (good for archival storage
             | that can generate new downstream formats) -- but not
             | optimized for client playback such as fast random seeking,
             | a/v sync across dynamic changing resolutions, etc)
             | 
             | -- or --
             | 
             | (b) the codec- _agnostic_ format -- which is good for
             | client device seeking, etc
             | 
             | The option (a) wouldn't make sense since it you'd still
             | need a cpu-intensive process (the "virtual packager to read
             | chunks" in your words instead of The Packager) running at
             | the ISP appliance to present client-device-optimized a/v
             | streams. You'd have wasteful cpu cycles across multiple
             | appliances creating the same "virtual package".
             | 
             | So that leaves option (b) ... which means you need a batch
             | process of some type (what Netflix calls Packager) running
             | against AWS S3 storage to create a package for subsequent
             | distribution to all appliances.
             | 
             | Therefore, if your proposal of _" virtually packaging and
             | presenting an interface of a packaged file"_ can be
             | reworded as _" on-the-fly generated ephemeral a/v frames
             | from the chunks"_ , their physical topology and need for
             | efficient use of cpu wouldn't make that an optimal
             | architecture.
        
               | dkarp wrote:
               | Indeed, that's what I meant by:
               | 
               | > Is it so that they can better distribute the packaged
               | encodes to their CDN?
               | 
               | The Packager, as far as I can tell, is not actually a cpu
               | intensive process. That article mentions that the
               | bottleneck was caused by IO which was resolved with the
               | Virtual Assembler. It's not doing any encoding at all.
               | It's stitching the encoded video together and packaging
               | it with the audio/subtitles/metadata.
               | 
               | It seems like this could be done at the edge without ever
               | actually packaging the whole file. My guess is the same
               | as yours though, the reason they're actually packaging
               | the encodes is to send them to their CDN. But it's still
               | a guess as maybe the actual reason is that the Virtual
               | Package doesn't work for some other reason.
               | 
               | It could just as well be to allow testing the packaged
               | encode like you would any other deployment artifact
               | before distributing.
        
               | jasode wrote:
               | _> The Packager, as far as I can tell, is not actually a
               | cpu intensive process. [...] It's not doing any encoding
               | at all. _
               | 
               | A Netflix employee can chime in but I assume cpu
               | intensive since the desired objectives of Packager can
               | only be achieved by _transcoding_ which implies a cpu
               | intensive process. E.g. Netflix wants to use AV1 as one
               | the output formats for Packager. So any process that
               | converts a codec-specific format to other coding-agnostic
               | formats will require cpu. (e.g. Apple ProRes on the input
               | files and AV1 on the output files.) Also more evidence
               | that significant cpu is involved is this:
               | 
               |  _> The overall ProRes video processing speed is
               | increased from 50GB/Hour to 300GB/Hour. From a different
               | perspective, the processing time to movie runtime ratio
               | is reduced from 6:1 to about 1:1. [...] All the cloud
               | packager instances now share a single scheduling queue
               | with optimized compute resource utilization. _
               | 
               | The 300 gb/hr would be ~83 MB/sec which is way under the
               | disk throughput S3 can provide. AWS says 100 Gbit/sec is
               | possible which would be ~10 GB/sec :
               | https://aws.amazon.com/premiumsupport/knowledge-
               | center/s3-ma...
               | 
               |  _> It seems like this could be done at the edge without
               | ever actually packaging the whole file._
               | 
               | But on-the-fly-ephemeral "virtual package" instead of a
               | realized output disk file would lead to constant network
               | traffic between the Netflix CDN appliance at the various
               | ISP datacenters back to AWS S3. I guess this is
               | technically possible but not sure what you gain with this
               | alternative architecture instead of the CDN appliance
               | just downloading one "package" file at off-peak hours
               | (4:00am).
        
               | dkarp wrote:
               | > A Netflix employee can chime in but I assume cpu
               | intensive since the desired objectives of Packager can
               | only be achieved by transcoding
               | 
               | I think you're maybe confusing the Packager with the
               | Encoder. Transcoding (encoding) happens before packaging
               | and is distributed. That's what the "encoded chunks" I've
               | been talking about are. There is a different package for
               | each encode, this is in the third linked article from the
               | OP: https://netflixtechblog.com/high-quality-video-
               | encoding-at-s...
               | 
               | The package part seems similar to what is done by
               | MKVMerge (https://en.wikipedia.org/wiki/MKVToolNix) + the
               | stitching of the chunked video encodes. There's very
               | little processing necessary compared to encoding.
               | MKVMerge will give you an mkv file from a H.264 video,
               | audio files and subtitles in milliseconds and it doesn't
               | matter how big the files are as it's just a container.
               | They use a different container, but it's the same idea.
               | 
               | You wouldn't have to read from S3, you could just push
               | your encoded chunks to your CDN instead and use the edge
               | server to virtual package them.
        
               | jasode wrote:
               | _> I think you're maybe confusing the Packager with the
               | Encoder. Transcoding (encoding) happens before packaging_
               | 
               | The way I used "transcoding" was to refer to Netflix's
               | Packager process of converting (in their words) _" codec-
               | specific" elementary stream_ format to _" codec-agnostic"
               | with extra frame metadata_. I should have used a
               | different word than "transcode" to encompass that
               | (especially if the input and output files are the same
               | "codec" but just different containers) ... but whatever
               | the underlying process is, it implies (some) cpu
               | constraints because they're only processing at 83 MB/sec
               | throughput from SSD disks on S3. My laptop doing a simple
               | "mux" type of operation with ffmpeg or MKVMerge can
               | concatenate streams into another container greater than
               | 400 MB/sec.
               | 
               |  _> You wouldn't have to read from S3, you could just
               | push your encoded chunks to your CDN instead and use the
               | edge server to virtual package them._
               | 
               | The Netflix blog says the Packager is _scanning
               | /analyzing_ the input file for exact frame start and stop
               | times and storing that knowledge as extra metadata to
               | enable future clients to randomly skip around the video.
               | Just focusing on that one algorithm tells us it's not
               | something we want to do repeatedly (and virtually) on all
               | the edge CDN servers.
               | 
               | (I'm reminded of analogous situation in mp3 vbr format
               | that doesn't have exact time frame timestamps built-in
               | for random seeking. Therefore, skipping to exactly 45m17s
               | of a 60 minute mp3 takes a long time as the audio player
               | "scans" the mp3 from the beginning to "count up" to
               | 45m17s. One can build an "index" for fast mp3 random seek
               | but that requires pre-processing the whole mp3 that's
               | more cpu intensive than a simple mux operation.)
        
               | liuliu wrote:
               | Maybe they simply haven't got there yet. Packaging and
               | deliver to CDN would be technically simpler comparing to
               | maintaining the virtual packager + chunks at CDN level
               | (not to mention the virtual packager if needs to run at
               | CDN level, requires smarter CDN such as Cloudflare
               | Worker).
        
               | fragmede wrote:
               | The "edge" is a more recent invention, or rather, the
               | level of abstraction now available to the public for "the
               | edge" lends itself much more readily to such things when
               | designed from scratch. If Netflix were to
               | Greenfield/rebuild it from scratch today, from what I'm
               | reading off Netflix's blog, your proposal seems
               | reasonable. It depends on what their internal abstraction
               | on what their edge actually looks like in practice but
               | I'm guessing it's simultaneously more and less advanced
               | compared to eg Cloudflare's edge workers but institution
               | inertia means "if it ain't broke" is a guiding principal
               | all its own. If you wanted to get a job at Netflix,
               | propose that change, and implement it, I'd bet they'll
               | reward you handsomely for it.
        
       | wongarsu wrote:
       | So basically Netflix made their own network file system with S3
       | as a backing storage? This feels a bit like reinventing the
       | wheel, but I can understand how you get there if you start with
       | S3 and then need more and more functionality of a normal network-
       | mounted file system.
        
         | mkr-hn wrote:
         | It reminds me of how video game companies sometimes create
         | whole virtual file systems in gigabyte+ blobs to deal with the
         | limits of OS file systems. They're probably less useful in 2021
         | with SSDs becoming the norm. With hard drives, the thousands of
         | files a game uses would end up in little fragments all over the
         | platters and lead to horrible latency and head thrashing.
        
           | WalterBright wrote:
           | > With hard drives, the thousands of files a game uses would
           | end up in little fragments all over the platters and lead to
           | horrible latency and head thrashing.
           | 
           | I remember the PC game "Riven" which came on several CDs. The
           | game would often pause and ask you to insert another CD to
           | continue.
        
             | [deleted]
        
           | dcm360 wrote:
           | At least hard drives still have decent seek times. Load time
           | optimization for consoles with optical drives was another
           | league: a single seek across the disk could lead to waiting a
           | second longer on the loading screen.
        
             | dragontamer wrote:
             | Crash Bandicoot solved this problem by writing the level
             | data in a streaming fashion, so that the CD-ROM would never
             | have to seek while playing through a level.
             | 
             | Of course, the CD-ROM only spins in one direction however.
             | Crash Bandicoot programmers solved this issue by making it
             | impossible to run backwards in the game.
             | 
             | The game was basically one dimensional on purpose: you can
             | only run forward, and never backwards, to optimize the CD-
             | ROM bandwidth. Still, the game was groundbreaking, the
             | gameplay and characters fresh. Its an incredible technical
             | achievement, to the point where most people didn't even
             | realize that running backwards was a technical hurdle they
             | felt like not solving.
        
               | beefjerkins wrote:
               | > Of course, the CD-ROM only spins in one direction
               | however. Crash Bandicoot programmers solved this issue by
               | making it impossible to run backwards in the game.
               | 
               | I've been playing crash since I was a kid, and I never
               | thought about _why_ you couldn 't run backwards. What a
               | brilliant solution.
        
         | amitport wrote:
         | From the article:
         | 
         | "There are existing distributed file systems for the cloud as
         | well as off-the-shelf FUSE modules for S3. We chose to enhance
         | MezzFS instead of using these other solutions because the cloud
         | storage system where packager stores its output is a custom
         | object store service built on top of S3 with additional
         | security features. Doing so has the added advantage of being
         | able to design and tune the enhancement to suit the
         | requirements of packager and our other encoding applications."
        
           | zekrioca wrote:
           | It says a lot without saying anything meaningful.
        
             | ignoramous wrote:
             | Netflix doesn't use S3 as-is but with a wrapper on top
             | which means they couldn't use off-the-shelf solutions.
             | 
             | aka _...the cloud storage system where packager stores its
             | output is a custom object store service built on top of S3
             | with additional security features._
        
       | xibalba wrote:
       | > As Netflix becomes a producer of award winning content, the
       | studio and content creation needs are also pushing the envelope
       | of technology advancements
       | 
       | This is a truly bizarre motivation. Should not the needs of the
       | customer be the motivator for advancing technology? Also, Netflix
       | has been winning awards for original content since 2013:
       | 
       | https://en.wikipedia.org/wiki/List_of_accolades_received_by_...
        
         | kwertyoowiyop wrote:
         | The rest of that paragraph does provide an example to justify
         | that statement. Beyond that, they probably want to generate
         | some content at very high resolutions, frame rates, and color
         | ranges.
        
       | theronmad wrote:
       | Somehow too small a fine for such a corporation against the
       | background of the European Union being sued by the giants
       | monopilists Apple and Google. I recently published my
       | https://essaywritinghelp.pro/ essay on the practice of law and
       | reviewed these lawsuits.
        
       | londons_explore wrote:
       | Packaging a movie is something that only needs to be done once
       | right? Like filming the shots, hiring the actors, and writing the
       | script?
       | 
       | So why does a few hours matter in a total process which is many
       | months or even years?
        
         | [deleted]
        
         | maverwa wrote:
         | From what I understand, this is using in the production itself
         | as well, potentially for steps inbetween? Especially the"The
         | Conclusion" reads like its not a one-time thing but happens
         | through the whole process?
        
           | wongarsu wrote:
           | Yes, the intro mentions "cloud-based post-production editing
           | and collaboration pipelines", and the conclusion is "This
           | significantly improves the Studio high quality proxy
           | generation efficiency". While the article avoids mentioning
           | why they actually need this, it reads like they have some
           | collaboration platform for post-production of their own
           | productions, and that involves frequent exports into a
           | streamable format.
        
         | MontyCarloHall wrote:
         | I assume that not all encodings of every movie are stored in
         | perpetuity, to save space. Thus, sometimes they need to be
         | regenerated on-demand, which is why efficiency is important.
        
           | endisneigh wrote:
           | Why not though? Isn't storage cheap compared to compute?
           | 
           | Hell, compared to the cost to license the movie storage is
           | pennies.
        
             | MontyCarloHall wrote:
             | I'm sure someone at Netflix did the cost/benefit analysis
             | and found that for extremely infrequently accessed
             | encodings of certain movies (weird bitrate/resolution
             | combinations), it's actually cheaper to generate them
             | dynamically once in a blue moon than to store them in
             | perpetuity.
             | 
             | > Hell, compared to the cost to license the movie storage
             | is pennies.
             | 
             | Yup. I'd be very curious to see the breakdown in how much
             | Netflix spends on compute versus content
             | production/licensing. I'd be willing to bet compute is an
             | order or two of magnitude cheaper than content. I guess all
             | these little optimizations add up? Or maybe their
             | engineering management can't see the forest from the trees.
        
               | twistedpair wrote:
               | I suspect they spend much more on transit than compute.
        
         | amitport wrote:
         | Once or a few times. It still saves money. Certainly in the
         | Netflix scale.
         | 
         | You invest some in engineering and you will save constant
         | percentage $$ in the Netflix scale.
         | 
         | Many smaller distributors don't invest internally in this kind
         | of things (and don't have the internal engineering to achieve
         | this). They will buy of-the-shelves solutions though.
        
       ___________________________________________________________________
       (page generated 2021-09-29 23:02 UTC)