[HN Gopher] Netflix Cloud Packaging in the Terabyte Era
___________________________________________________________________
Netflix Cloud Packaging in the Terabyte Era
Author : mmcclure
Score : 105 points
Date : 2021-09-28 05:53 UTC (1 days ago)
(HTM) web link (netflixtechblog.com)
(TXT) w3m dump (netflixtechblog.com)
| MrBuddyCasino wrote:
| OT: starting with covid, Netflix reduced bitrate to relieve
| stress on overloaded networks. Did they ever reverse this? Video
| still looks overly compressed to me, so much so that I find it
| annoying.
| tinus_hn wrote:
| They forced SD resolution to appease technophobes who were
| crying Netflix was wasting bandwidth on 'unnecessary
| entertainment' while they were having trouble with their
| 'necessary video meetings'. Which of course was a limitation of
| bandwidth across the internet while Netflix mostly uses local
| bandwidth, they try to stream from machines on the ISP networks
| because it saves them a ton of money.
|
| After about a month this was reversed.
|
| In the meantime it's possible they have switched compression
| standards or settings which you may have noticed. But the covid
| limitations should be long gone.
|
| https://openconnect.netflix.com/en_gb/
| Widdershin wrote:
| As far as I can tell Covid-quality is still in full effect, to
| the point where I don't watch movies on Netflix at all anymore.
| I often start one, remember the quality and then find an
| alternate source.
|
| I wonder if they've seen that it doesn't impact churn as much
| as it saves them money, and they're keeping it that way as long
| as possible.
| MrBuddyCasino wrote:
| > I wonder if they've seen that it doesn't impact churn as
| much as it saves them money, and they're keeping it that way
| as long as possible.
|
| Yes thats my suspicion too.
| Aissen wrote:
| It should have come back. If not, it's probably a bug, but good
| luck trying to report that to Netflix.
| dkarp wrote:
| It's not entirely clear to me what the job of the Packager is
| here as opposed to a Virtual Package.
|
| After chunk encoding and Virtual Assembly with an index file, are
| Netflix actually packaging the encoded
| video/audio/subtitles/metadata into a single file for each encode
| that is pushed to their CDN? If so, then why is the Packager even
| necessary? Why not go one step further and create a Virtual
| Package as well?
|
| Is it so that they can better distribute the packaged encodes to
| their CDN?
| jasode wrote:
| _> It's not entirely clear to me what the job of the Packager
| is here._
|
| In the thread article's 1st paragraph is a link to a previous
| article with more details of Packager:
| https://netflixtechblog.com/packaging-award-winning-shows-wi...
| dkarp wrote:
| Right, and I actually read that. Maybe I should rephrase my
| question. I'd like to know why is a Packager necessary
| instead of virtually packaging and presenting an interface of
| a packaged file?
|
| Then you'd be able to keep your encoded chunks as chunks
| instead of having to download them and then upload them again
| to MezzFs.
| jasode wrote:
| _> why is a Packager necessary instead of virtually
| packaging and presenting an interface of a packaged file?_
|
| Maybe I'm parsing your question wrong but perhaps the
| confusion is cleared up if we remember that the "CDN" in
| the Netflix architecture diagram _is not physically stored
| at AWS S3_ : https://miro.medium.com/max/350/1*A5PR2QJ7STPb
| Ud2xTg6z_g.png
|
| See: https://www.google.com/search?q=netflix+cdn+appliance+
| %22ope...
|
| And a recent 2021 article says the Netflix CDN appliance
| has ~280 terabytes of local disk storage:
| https://dev.to/gbengelebs/netflix-system-design-how-
| netflix-...
|
| Thus, the "output" video files of Packager is eventually
| physically transferred to geographically distributed ISP
| datacenters.
|
| So to attempt to reconstruct Netflix's thought process...
|
| Once we work backwards from the need to eventually store
| copies of video content on distant Netflix CDN appliances,
| the question becomes which type of video file to store:
|
| (a) the codec-specific format (good for archival storage
| that can generate new downstream formats) -- but not
| optimized for client playback such as fast random seeking,
| a/v sync across dynamic changing resolutions, etc)
|
| -- or --
|
| (b) the codec- _agnostic_ format -- which is good for
| client device seeking, etc
|
| The option (a) wouldn't make sense since it you'd still
| need a cpu-intensive process (the "virtual packager to read
| chunks" in your words instead of The Packager) running at
| the ISP appliance to present client-device-optimized a/v
| streams. You'd have wasteful cpu cycles across multiple
| appliances creating the same "virtual package".
|
| So that leaves option (b) ... which means you need a batch
| process of some type (what Netflix calls Packager) running
| against AWS S3 storage to create a package for subsequent
| distribution to all appliances.
|
| Therefore, if your proposal of _" virtually packaging and
| presenting an interface of a packaged file"_ can be
| reworded as _" on-the-fly generated ephemeral a/v frames
| from the chunks"_ , their physical topology and need for
| efficient use of cpu wouldn't make that an optimal
| architecture.
| dkarp wrote:
| Indeed, that's what I meant by:
|
| > Is it so that they can better distribute the packaged
| encodes to their CDN?
|
| The Packager, as far as I can tell, is not actually a cpu
| intensive process. That article mentions that the
| bottleneck was caused by IO which was resolved with the
| Virtual Assembler. It's not doing any encoding at all.
| It's stitching the encoded video together and packaging
| it with the audio/subtitles/metadata.
|
| It seems like this could be done at the edge without ever
| actually packaging the whole file. My guess is the same
| as yours though, the reason they're actually packaging
| the encodes is to send them to their CDN. But it's still
| a guess as maybe the actual reason is that the Virtual
| Package doesn't work for some other reason.
|
| It could just as well be to allow testing the packaged
| encode like you would any other deployment artifact
| before distributing.
| jasode wrote:
| _> The Packager, as far as I can tell, is not actually a
| cpu intensive process. [...] It's not doing any encoding
| at all. _
|
| A Netflix employee can chime in but I assume cpu
| intensive since the desired objectives of Packager can
| only be achieved by _transcoding_ which implies a cpu
| intensive process. E.g. Netflix wants to use AV1 as one
| the output formats for Packager. So any process that
| converts a codec-specific format to other coding-agnostic
| formats will require cpu. (e.g. Apple ProRes on the input
| files and AV1 on the output files.) Also more evidence
| that significant cpu is involved is this:
|
| _> The overall ProRes video processing speed is
| increased from 50GB/Hour to 300GB/Hour. From a different
| perspective, the processing time to movie runtime ratio
| is reduced from 6:1 to about 1:1. [...] All the cloud
| packager instances now share a single scheduling queue
| with optimized compute resource utilization. _
|
| The 300 gb/hr would be ~83 MB/sec which is way under the
| disk throughput S3 can provide. AWS says 100 Gbit/sec is
| possible which would be ~10 GB/sec :
| https://aws.amazon.com/premiumsupport/knowledge-
| center/s3-ma...
|
| _> It seems like this could be done at the edge without
| ever actually packaging the whole file._
|
| But on-the-fly-ephemeral "virtual package" instead of a
| realized output disk file would lead to constant network
| traffic between the Netflix CDN appliance at the various
| ISP datacenters back to AWS S3. I guess this is
| technically possible but not sure what you gain with this
| alternative architecture instead of the CDN appliance
| just downloading one "package" file at off-peak hours
| (4:00am).
| dkarp wrote:
| > A Netflix employee can chime in but I assume cpu
| intensive since the desired objectives of Packager can
| only be achieved by transcoding
|
| I think you're maybe confusing the Packager with the
| Encoder. Transcoding (encoding) happens before packaging
| and is distributed. That's what the "encoded chunks" I've
| been talking about are. There is a different package for
| each encode, this is in the third linked article from the
| OP: https://netflixtechblog.com/high-quality-video-
| encoding-at-s...
|
| The package part seems similar to what is done by
| MKVMerge (https://en.wikipedia.org/wiki/MKVToolNix) + the
| stitching of the chunked video encodes. There's very
| little processing necessary compared to encoding.
| MKVMerge will give you an mkv file from a H.264 video,
| audio files and subtitles in milliseconds and it doesn't
| matter how big the files are as it's just a container.
| They use a different container, but it's the same idea.
|
| You wouldn't have to read from S3, you could just push
| your encoded chunks to your CDN instead and use the edge
| server to virtual package them.
| jasode wrote:
| _> I think you're maybe confusing the Packager with the
| Encoder. Transcoding (encoding) happens before packaging_
|
| The way I used "transcoding" was to refer to Netflix's
| Packager process of converting (in their words) _" codec-
| specific" elementary stream_ format to _" codec-agnostic"
| with extra frame metadata_. I should have used a
| different word than "transcode" to encompass that
| (especially if the input and output files are the same
| "codec" but just different containers) ... but whatever
| the underlying process is, it implies (some) cpu
| constraints because they're only processing at 83 MB/sec
| throughput from SSD disks on S3. My laptop doing a simple
| "mux" type of operation with ffmpeg or MKVMerge can
| concatenate streams into another container greater than
| 400 MB/sec.
|
| _> You wouldn't have to read from S3, you could just
| push your encoded chunks to your CDN instead and use the
| edge server to virtual package them._
|
| The Netflix blog says the Packager is _scanning
| /analyzing_ the input file for exact frame start and stop
| times and storing that knowledge as extra metadata to
| enable future clients to randomly skip around the video.
| Just focusing on that one algorithm tells us it's not
| something we want to do repeatedly (and virtually) on all
| the edge CDN servers.
|
| (I'm reminded of analogous situation in mp3 vbr format
| that doesn't have exact time frame timestamps built-in
| for random seeking. Therefore, skipping to exactly 45m17s
| of a 60 minute mp3 takes a long time as the audio player
| "scans" the mp3 from the beginning to "count up" to
| 45m17s. One can build an "index" for fast mp3 random seek
| but that requires pre-processing the whole mp3 that's
| more cpu intensive than a simple mux operation.)
| liuliu wrote:
| Maybe they simply haven't got there yet. Packaging and
| deliver to CDN would be technically simpler comparing to
| maintaining the virtual packager + chunks at CDN level
| (not to mention the virtual packager if needs to run at
| CDN level, requires smarter CDN such as Cloudflare
| Worker).
| fragmede wrote:
| The "edge" is a more recent invention, or rather, the
| level of abstraction now available to the public for "the
| edge" lends itself much more readily to such things when
| designed from scratch. If Netflix were to
| Greenfield/rebuild it from scratch today, from what I'm
| reading off Netflix's blog, your proposal seems
| reasonable. It depends on what their internal abstraction
| on what their edge actually looks like in practice but
| I'm guessing it's simultaneously more and less advanced
| compared to eg Cloudflare's edge workers but institution
| inertia means "if it ain't broke" is a guiding principal
| all its own. If you wanted to get a job at Netflix,
| propose that change, and implement it, I'd bet they'll
| reward you handsomely for it.
| wongarsu wrote:
| So basically Netflix made their own network file system with S3
| as a backing storage? This feels a bit like reinventing the
| wheel, but I can understand how you get there if you start with
| S3 and then need more and more functionality of a normal network-
| mounted file system.
| mkr-hn wrote:
| It reminds me of how video game companies sometimes create
| whole virtual file systems in gigabyte+ blobs to deal with the
| limits of OS file systems. They're probably less useful in 2021
| with SSDs becoming the norm. With hard drives, the thousands of
| files a game uses would end up in little fragments all over the
| platters and lead to horrible latency and head thrashing.
| WalterBright wrote:
| > With hard drives, the thousands of files a game uses would
| end up in little fragments all over the platters and lead to
| horrible latency and head thrashing.
|
| I remember the PC game "Riven" which came on several CDs. The
| game would often pause and ask you to insert another CD to
| continue.
| [deleted]
| dcm360 wrote:
| At least hard drives still have decent seek times. Load time
| optimization for consoles with optical drives was another
| league: a single seek across the disk could lead to waiting a
| second longer on the loading screen.
| dragontamer wrote:
| Crash Bandicoot solved this problem by writing the level
| data in a streaming fashion, so that the CD-ROM would never
| have to seek while playing through a level.
|
| Of course, the CD-ROM only spins in one direction however.
| Crash Bandicoot programmers solved this issue by making it
| impossible to run backwards in the game.
|
| The game was basically one dimensional on purpose: you can
| only run forward, and never backwards, to optimize the CD-
| ROM bandwidth. Still, the game was groundbreaking, the
| gameplay and characters fresh. Its an incredible technical
| achievement, to the point where most people didn't even
| realize that running backwards was a technical hurdle they
| felt like not solving.
| beefjerkins wrote:
| > Of course, the CD-ROM only spins in one direction
| however. Crash Bandicoot programmers solved this issue by
| making it impossible to run backwards in the game.
|
| I've been playing crash since I was a kid, and I never
| thought about _why_ you couldn 't run backwards. What a
| brilliant solution.
| amitport wrote:
| From the article:
|
| "There are existing distributed file systems for the cloud as
| well as off-the-shelf FUSE modules for S3. We chose to enhance
| MezzFS instead of using these other solutions because the cloud
| storage system where packager stores its output is a custom
| object store service built on top of S3 with additional
| security features. Doing so has the added advantage of being
| able to design and tune the enhancement to suit the
| requirements of packager and our other encoding applications."
| zekrioca wrote:
| It says a lot without saying anything meaningful.
| ignoramous wrote:
| Netflix doesn't use S3 as-is but with a wrapper on top
| which means they couldn't use off-the-shelf solutions.
|
| aka _...the cloud storage system where packager stores its
| output is a custom object store service built on top of S3
| with additional security features._
| xibalba wrote:
| > As Netflix becomes a producer of award winning content, the
| studio and content creation needs are also pushing the envelope
| of technology advancements
|
| This is a truly bizarre motivation. Should not the needs of the
| customer be the motivator for advancing technology? Also, Netflix
| has been winning awards for original content since 2013:
|
| https://en.wikipedia.org/wiki/List_of_accolades_received_by_...
| kwertyoowiyop wrote:
| The rest of that paragraph does provide an example to justify
| that statement. Beyond that, they probably want to generate
| some content at very high resolutions, frame rates, and color
| ranges.
| theronmad wrote:
| Somehow too small a fine for such a corporation against the
| background of the European Union being sued by the giants
| monopilists Apple and Google. I recently published my
| https://essaywritinghelp.pro/ essay on the practice of law and
| reviewed these lawsuits.
| londons_explore wrote:
| Packaging a movie is something that only needs to be done once
| right? Like filming the shots, hiring the actors, and writing the
| script?
|
| So why does a few hours matter in a total process which is many
| months or even years?
| [deleted]
| maverwa wrote:
| From what I understand, this is using in the production itself
| as well, potentially for steps inbetween? Especially the"The
| Conclusion" reads like its not a one-time thing but happens
| through the whole process?
| wongarsu wrote:
| Yes, the intro mentions "cloud-based post-production editing
| and collaboration pipelines", and the conclusion is "This
| significantly improves the Studio high quality proxy
| generation efficiency". While the article avoids mentioning
| why they actually need this, it reads like they have some
| collaboration platform for post-production of their own
| productions, and that involves frequent exports into a
| streamable format.
| MontyCarloHall wrote:
| I assume that not all encodings of every movie are stored in
| perpetuity, to save space. Thus, sometimes they need to be
| regenerated on-demand, which is why efficiency is important.
| endisneigh wrote:
| Why not though? Isn't storage cheap compared to compute?
|
| Hell, compared to the cost to license the movie storage is
| pennies.
| MontyCarloHall wrote:
| I'm sure someone at Netflix did the cost/benefit analysis
| and found that for extremely infrequently accessed
| encodings of certain movies (weird bitrate/resolution
| combinations), it's actually cheaper to generate them
| dynamically once in a blue moon than to store them in
| perpetuity.
|
| > Hell, compared to the cost to license the movie storage
| is pennies.
|
| Yup. I'd be very curious to see the breakdown in how much
| Netflix spends on compute versus content
| production/licensing. I'd be willing to bet compute is an
| order or two of magnitude cheaper than content. I guess all
| these little optimizations add up? Or maybe their
| engineering management can't see the forest from the trees.
| twistedpair wrote:
| I suspect they spend much more on transit than compute.
| amitport wrote:
| Once or a few times. It still saves money. Certainly in the
| Netflix scale.
|
| You invest some in engineering and you will save constant
| percentage $$ in the Netflix scale.
|
| Many smaller distributors don't invest internally in this kind
| of things (and don't have the internal engineering to achieve
| this). They will buy of-the-shelves solutions though.
___________________________________________________________________
(page generated 2021-09-29 23:02 UTC)