[HN Gopher] FreeBSD optimizations used by Netflix to serve video...
___________________________________________________________________
FreeBSD optimizations used by Netflix to serve video at 800Gb/s
[pdf]
Author : _trackno5
Score : 301 points
Date : 2022-11-03 10:58 UTC (12 hours ago)
(HTM) web link (people.freebsd.org)
(TXT) w3m dump (people.freebsd.org)
| krylon wrote:
| And here I sit like a chump with my home server connected to a
| 100MBit switch. (I paid for that switch, and I'm not replacing it
| until it gives up the ghost.) (And before you ask, the server
| also runs FreeBSD, and I'm very happy with the result.)
| nix23 wrote:
| Bring it to the max with multipath ;) since you have already
| Freebsd, no need to throw those beautiful reliable thing's
| away, maybe just buy a second...third? Dirt-cheap 100MBit card:
|
| https://en.wikipedia.org/wiki/Multipath_TCP#Implementation
| krylon wrote:
| The server has a second NIC, but the switch has no more free
| ports. I briefly thought of bonding, but stopped when I read
| that the switch would need to support it (which it almost
| certainly does not).
|
| But my point was that for my requirements, 100MBit are
| actually sufficient and FreeBSD still is a good choice for
| me, I was just being snarky about it. (I do find it
| aesthetically displeasing, though, that my wifi is now faster
| than my wired network, but I can live with that.)
| toast0 wrote:
| I understand the motivation, but $20 gets you an 8-port gigE
| switch, so it seems like the wrong hill to die on. :)
| krylon wrote:
| I know, but so far 100MBit is sufficient, actually, I rarely
| move Gigabytes of data around. When it becomes annoying, I'll
| get a new switch, but so far the pressure is really low.
| nix23 wrote:
| I really think Netflix could make some good money being a
| multimedia-cdn (even for "competitors")
| jedberg wrote:
| I thought the same thing 10 years ago when I worked there. At
| the time management was not interested in losing focus on doing
| anything other than streaming movies to customers.
|
| But it should be noted that the FreeBSD Openconnect boxes are
| highly optimized to Netflix's use case. Which is serving a
| predefined set of content that has been pre-rendered. Youtube
| and its ilk are a completely different use case.
|
| The Netflix cache is so optimized for serving Netflix movies
| that for many years we still used Akamai for all of our other
| CDN needs, but it looks like they may have finally moved that
| to Netflix's own CDN now.
| virtuallynathan wrote:
| It works with such high efficiency because we know how to place
| content in advance, and the catalog is relatively small. Trying
| to serve 800Gbps of YouTube content would be a nightmare.
| ilyt wrote:
| I wonder how much of YT traffic is the "big" (say >200k
| viewers in a month) vs the small guys.
|
| But yeah, once your hot data size exceeds cache byebye
| efficiency
| adgjlsfhk1 wrote:
| One hard part is that on the youtube side, most views occur
| within the first 48 hours or so and a good fraction occur
| within the first 6. With netflix, they have a catalogue of
| ~5000 videos and gets <200 new per month. Youtube has
| around 30k channels with more than 500k subscriptions, so
| that's somewhere around 30k videos per week.
| nix23 wrote:
| I don't talk directly about Youtube, but also serving Disney,
| hula, but ESPECIALLY national/continental portals like
| Arte.tv, Play-SRF, ARD-Mediathek etc.
| drewg123 wrote:
| Indeed. YT has a much different problem, which is to
| determine which video is going to go viral, and then
| transcode it into popular formats when it does.
|
| In comparison, we pre-transcode everything to exacting
| standards, so all our CDN has to do is serve is static files.
| coldpie wrote:
| Wow, that is actually a really interesting idea in the context
| of developing a YouTube competitor. Delivery & bandwidth are a
| really high barrier to entry, and piggy-backing off of
| Netflix's existing network could really lower those costs. I
| agree the "providing services to your direct competition" is
| probably a stumbling block, and likely Netflix has other irons
| in the fire. But anyway it's a cool idea to think about.
| ilyt wrote:
| It's nice to see someone actually still does proper engineering
| instead of farting something about cloud and webscale and just
| throwing money at a problem.
| kleiba wrote:
| Yet, in order to _watch_ Netflix on FreeBSD, you have to jump
| through such hoops as "downloading either google chrome,
| vivaldi, or brave, and [using] a small shell script which
| basically creates a small jail for some ubuntu binaries that
| actually install widevine which is essential for viewing some DRM
| content such as Netflix" [1]
|
| [1] https://www.youtube.com/watch?v=mBYor4wL62Q
| __MatrixMan__ wrote:
| BitTorrent is probably easier. I just wish there was a good way
| to send money to the artists without also funding DRM
| enhancements.
| andsoitis wrote:
| So you want to send money to all the people who worked on the
| TV Show or the Movie you just downloaded?
|
| I don't think you realize how impractical that is. Take a
| look at the credits at the end of a movie some time. Or look
| up the list of people who worked on a particular episode of a
| show (yes, it can vary throughout a season).
| andrewxdiamond wrote:
| Certainly impractical for big budget shows, but Patreon has
| proved the model works
| __MatrixMan__ wrote:
| It wouldn't be impractical if the studio planned ahead for
| it.
|
| There could be a the address of a smart contact at the end
| of the credits. Every time more than, say $1000, piles up
| in that address, whatever is there gets dispensed to the
| contributors at the end of that month.
|
| Plex could aggregate those addresses and tell you how to
| allocate your payment based on how you allocated your
| attention. Yes I know that's what Netflix does, but I
| control my Plex server. Nobody is then going to find
| additional ways to monetize that data.
|
| I know it's unconventional, but I really don't think it's
| crazy to want to reward the creators of content that you
| consume while simultaneously not wanting to contribute
| towards the development of ecosystems that prevent people
| from being in control of their tech.
| reaperducer wrote:
| _It wouldn 't be impractical if the studio planned ahead
| for it._
|
| Studios already plan for this.
|
| For a short time in the 80's, one of my mother's job
| responsibilities was making sure every single person
| involved in the production of a movie in the 1940's got
| their revenue check each quarter, whether it was for
| $50.00, or 12C/. Hundreds of people. Hundreds of checks.
| __MatrixMan__ wrote:
| Ok, so I've torrented a movie and I want to send the
| equivalent of your mom a check so that next quarter it's
| $0.13 instead of $0.12, where do I look in the credits to
| get her address?
|
| Perhaps in the 80's it would've been impractical to pay
| her to multiplex hundreds of $1 input checks into the
| appropriate set of $50 or $0.12 output checks, but that's
| now a job that's early done by a computer.
| [deleted]
| [deleted]
| IntelMiner wrote:
| Devils advocate: The people who work on the server engineering
| at Netflix don't exactly have much control over copyright
| holders being lawyer brained man children
| jbirer wrote:
| That is the problem with the BSD license, it says "use my work
| and don't give anything back". Of course, GPL gets violated
| too, but would be very difficult by an American company like
| Netflix.
| pjmlp wrote:
| UNIX's strength was never in the desktop experience, rather
| server room.
| akreal wrote:
| Not true for macOS.
| pjmlp wrote:
| You mean NeXTSTEP, all that makes it unique isn't part of
| POSIX, and Steve Jobs had a quite clear position on UNIX
| value for desktop computing.
| SpaceInvader wrote:
| FreeBSD is not a "desktop first" system and has strengths
| elsewhere. I use it for 20+ years constantly. Sadly my
| experiments with FreeBSD desktop ended years ago as there
| always was something "not working".
| asveikau wrote:
| Typing this on a FreeBSD laptop.
|
| Haven't tried using netflix on it though.
| Ar-Curunir wrote:
| I don't think it needs to be said that while FreeBSD can
| serve as a daily driver for some people, it is insufficient
| for the vast majority of computer users in the world
| alberth wrote:
| > which is essential for viewing some DRM content such as
| Netflix
|
| Are you complaining that Netflix doesn't want people to pirate
| content, content they might have licensed from 3rd parties
| which contractually bind them to not let being pirated?
|
| This + is the development/resources/cost of serving such few
| people on FreeBSD even worth it.
|
| Note: I'm a huge FreeBSD fan. But consider this totally
| understandable on Netflix part.
| somehnguy wrote:
| But it doesn't prevent it from being pirated at all. You can
| get any Netflix release you want within minutes of release on
| any torrent site. Sometimes _before_ the official release
| even.
|
| It just makes normal people jump through hoops to watch the
| things they are trying to pay for. That's a DRM issue in
| general though, I acknowledge this isn't just a Netflix
| thing.
| seanw444 wrote:
| And I will stick to getting it that way for as long as DRM
| exists on the given platform. I'll still pay for the
| subscription, but I'm handling the data my way.
| KronisLV wrote:
| > I'll still pay for the subscription, but I'm handling
| the data my way.
|
| Huh, that's an interesting take. I feel like something
| similar might end up being what you need to do with
| certain video games as well.
|
| For example, I bought Grand Theft Auto IV as a boxed copy
| back when it came out (though most of my games are
| digital now). The problem is that the game expects Games
| For Windows Live to be present, which is now deprecated
| and some folks out there can't even launch the game
| anymore. It's pretty obvious what one of the solutions
| here is.
| webmobdev wrote:
| Me too. Especially because these same DRM will soon be
| used to uniquely identify and profile you when these
| streamers also become an ad platform.
| judge2020 wrote:
| What does DRM have to do with this? They'll connect what
| you watch on Peacock with what you watch on Netflix on
| your computer? Do you have a reference?
| Thaxll wrote:
| DRM makes 0 sense since you can get any content using
| torrents in 2min. It's not protecting anything, as a matter
| of fact it's just making people download more since it's a
| painful experience.
|
| For example on Windows with Chrome you only get 720p playback
| for Netflix, complete nonsense.
| mschuster91 wrote:
| Yeah, but tell that to braindead content license owners.
| googlryas wrote:
| Sure, but if there was no drm, there would probably just be
| a chrome extension you could install and rip/share content
| more readily than via BitTorrent.
|
| I don't like it, but there is some logic to it. For
| business types, it isn't merely the existence of ripped
| copies, but the ease of creating and spreading them.
| _trackno5 wrote:
| Recording of the presentation can be found here:
| https://www.youtube.com/watch?v=36qZYL5RlgY
|
| Pretty cool stuff
| [deleted]
| eatonphil wrote:
| From what I can see in a quick search (and from this
| presentation), Netflix only uses FreeBSD for serving video and
| they run these servers themselves in their own datacenters I
| guess. In contrast their apps on EC2 use Linux [0]. Sounds like
| the time has not yet come when AWS is paying anyone full time to
| support FreeBSD on EC2.
|
| [0] https://twitter.com/brendangregg/status/1412201241472471048
| erk__ wrote:
| cperciva whom you link have worked quite a bit on EC2 support
| for FreeBSD, a lot of it documented on their blog [0] and
| supported by Patreons at [1].
|
| But yeah it would be nice if there was someone who could work
| on it full time
|
| [0]: https://www.daemonology.net/blog/2022-03-29-FreeBSD-
| EC2-repo...
|
| [1]: https://www.patreon.com/cperciva
| eatonphil wrote:
| Yep! In the thread he describes how he is not enough.
| vbezhenar wrote:
| What does it mean to support FreeBSD on EC2? Surely it's just a
| KVM so you can run whatever you want?
| [deleted]
| sanxiyn wrote:
| It means, for example, writing a FreeBSD kernel driver for
| Elastic Network Adapter (ENA). Both Linux kernel driver and
| FreeBSD kernel driver is available at
| https://github.com/amzn/amzn-drivers
| cotillion wrote:
| Netflix works because they move content close to the users.
| This is done by either having the ISP establish a peering
| connection directly to Netflix hosted servers or by having the
| ISPs host "Open Connect Appliances" which cache the most
| requested content. These appliances are based on FreeBSD.
|
| The AWS egress savings from this setup must be immense.
|
| https://openconnect.netflix.com/
| ilyt wrote:
| Yup, cloud bandwidth is insanely expensive considering to
| what you _actually_ pay to get link to your datacenter.
|
| And you pay either by 95th percentile (basically "peak
| usage") or by whole link, not per megabyte sent
| [deleted]
| paravz wrote:
| What is Gb/s per watt of power between 2x400Gb/s servers and a
| single 800Gb/s ?
|
| Following these reports since 2015, when I compared estimated
| cost of your 9Gb/s server to F5 load balancer :)
| pyuser583 wrote:
| I think it's weird and cool how Netflix used FreeBSD/Dlang.
|
| Linux is just the automatic go to. It's great the big tech
| companies are rethinking these basics.
| loeg wrote:
| Where are you seeing any mention of Dlang?
| throw0101a wrote:
| And to think not that long ago I remember being excited when the
| V.92 standard was released and I could get 56 kb/s on my dial-up
| connection:
|
| * https://en.wikipedia.org/wiki/V.92
| rwl4 wrote:
| How about the marvel that was Walnut Creek's cdrom.com that
| served 10,000 simultaneous FTP connections back in 1999? [1]
|
| I was always blown away by how much more efficient FreeBSD's
| network stack was compared to Linux at the time. It convinced
| me to go FreeBSD-only for a few years.
|
| [1] http://www.kegel.com/c10k.html
| alberth wrote:
| > compared to Linux at the time
|
| Do you consider that not to still be the case?
| adrian_b wrote:
| Before 2003, FreeBSD was definitely both faster and more
| reliable than Linux, especially for networking or storage
| applications.
|
| After that, Intel and AMD have introduced cheap multi-
| threaded and multi-core CPUs. Linux was adapted very
| quickly to work well on such CPUs, but FreeBSD has
| struggled for many years until reaching an acceptable
| performance on multi-threaded or multi-core CPUs, so it
| became much slower than Linux.
|
| Later, the performance gap between Linux and FreeBSD has
| diminished continuously, so now there is no longer any
| large difference between them.
|
| Depending on the hardware and on the application, either
| Linux or FreeBSD can be faster, but in the majority of the
| cases the winner is Linux.
|
| Despite that, for certain applications there may be good
| reasons to choose FreeBSD, even where it happens to be
| slower than Linux.
| lukego wrote:
| FreeBSD was held back by limited TCP options around when
| packet mobile internet (GPRS) came along. That was around
| 2003 too.
|
| I remember noticing Yahoo properties being almost
| unusable in GPRS because they did packet loss detective
| and recovery in such basic ways e.g. no SACK.
| anthk wrote:
| Any setting for today's connection on capped mobile data?
| 2.7 KB/S max.
| LeonenTheDK wrote:
| > Depending on the hardware and on the application,
| either Linux or FreeBSD can be faster, but in the
| majority of the cases the winner is Linux.
|
| I'm not denying this, but do you have a source? I've been
| trying to find modern "Linux vs FreeBSD" performance
| tests but haven't been super successful. Mostly I find
| things from the early 2000s when FreeBSD had a clear
| lead.
| yakubin wrote:
| https://www.phoronix.com/review/bsd-linux-eo2021
| mrtweetyhack wrote:
| jedberg wrote:
| > Depending on the hardware and on the application,
| either Linux or FreeBSD can be faster, but in the
| majority of the cases the winner is Linux.
|
| Do you have any data to back that up? Everything I've
| seen recently and my own experience tells me this isn't
| the case but I also don't have any data to back up my
| position. Would love to find some good data on this
| either way.
| adrian_b wrote:
| I have been using continuously both FreeBSD and Linux
| since around 1995, since FreeBSD 2.0 and some Slackware
| Linux distribution.
|
| In the early years, I have run many benchmarks between
| them, in order to choose the one that was the best suited
| for certain applications.
|
| However, during the last decade, I did not bother to
| compare them any more, because now the main reasons why I
| choose one or the other do not include the speed.
|
| Even if I have right now, besides me, several computers
| with FreeBSD and several with Linux, it would not be easy
| for me to run any benchmark, because they have very
| different hardware, which would influence the results
| much more than the OS.
|
| For all the applications where I use FreeBSD (for various
| networking and storage services), its performance is
| adequate, and I use it instead of Linux for other
| reasons, not depending on whether it might be faster or
| slower.
|
| In the applications where computational performance is
| important, I use Linux, but that is not due to some
| benchmark results, but because some commercial software
| is available only for Linux, e.g. CUDA libraries or FPGA
| design programs.
|
| Many benchmark results comparing FreeBSD and Linux may be
| influenced more by the file systems used than by the OS
| kernel.
|
| I have seen recently some benchmark comparing FreeBSD and
| Linux for a database application dominated by SSD I/O,
| but I cannot remember a link to it.
|
| The only file system shared by Linux and FreeBSD is ZFS.
| With ZFS, the benchmark results were similar for Linux
| and FreeBSD. However, FreeBSD was faster when using UFS
| and Linux was much faster, when using either XFS or EXT4
| (BTRFS was much slower than ZFS). Such a benchmark was
| much more influenced by the file system than by the
| operating system.
|
| In conclusion, it is very hard to make a good comparison
| between FreeBSD and Linux, because you need identical
| hardware, which must be restricted to the shorter list
| that is well supported by FreeBSD, and you need to run
| some micro-benchmark testing some kernel system calls.
|
| Otherwise, the result may depend more on the supported
| software, hardware or file systems, than on the OS
| kernel.
| jedberg wrote:
| Right exactly which is why it's hard to find data. But
| I'd love to see someone who has tried to limit variables
| to just the network stack to figure out if one network
| stack is better than the other.
|
| But you're right, in the end you just have to set up both
| for your particular use case with the best optimizations
| each has to offer and see which performs better.
| Thaxll wrote:
| The web run on Linux like most FANG servers do, so it
| makes sense with the $$$ / people / R&D that this OS is
| faster. A conservative number would be that 99.9% of the
| web runs on Linux and it's probably much higher.
|
| At the scale of Google / MS / Amazon / Apple if servers
| would run faster of BSD* they would use it. We're talking
| about 10's millions of servers here.
|
| https://www.phoronix.com/review/bsd-linux-eo2021/7
|
| It gives you a pretty clear picture.
| jedberg wrote:
| Based on that logic, Windows is the superior operating
| system and always has been, because it's always been used
| by more people on their desktop than anything else.
|
| There are a lot more factors involved in OS choice that
| could drive popularity other than the speed of the
| network stack. And BTW, Hotmail runs on BSD. MacOS is a
| fork of BSD. And Yahoo ran on BSD (and may still).
| drewg123 wrote:
| Author here, happy to answer questions
| waynesonfire wrote:
| How did you generate those flamegraphs and what other tools did
| you use to measure performance?
|
| My motivation for asking comes from these findings in the pdf,
|
| Did the graph show the bottleneck contention on aio queue? Did
| the graph show that "a lot of time was spent accessing memory"?
|
| What made freebsd a better platform compared to Linux to begin
| tackling this problem?
|
| Thanks! Super interesting. Both a freebsd fan and I have
| workloads that I'd love to explore benchmarking to squeeze more
| performance.
| drewg123 wrote:
| > How did you generate those flamegraphs and what other tools
| did you use to measure performance?
|
| We have an internal shell script that takes hwpmc output and
| generates flamegraphs from the stacks. It also works with
| dtrace. I'm a huge fan of dtrace. I also make heavy use of
| lockstat, AMD uProf, and Intel Vtune.
|
| > Did the graph show the bottleneck contention on aio queue?
| Did the graph show that "a lot of time was spent accessing
| memory"?
|
| See the graph on page 32 or so of the presentation. It shows
| huge plateaus in lock_delay called out of the aio code. Its
| also obvious from lockstat stacks (run as lockstat -x
| aggsize=4m -s 10 sleep 10 > results.txt)
|
| See the graph on page 38 or so. The plateaus are mostly
| memory copy functions (memcpy, copyin, copyout).
|
| We already use FreeBSD on our CDN, so it just made sense to
| do the work in FreeBSD.
|
| The talk is on Youtube https://youtu.be/36qZYL5RlgY
| smokel wrote:
| The flame graphs might be generated using Brendan Gregg's
| utility, see https://www.brendangregg.com/flamegraphs.html
| drewg123 wrote:
| They are generated by a local shell script that uses the
| same helpers (stackcollapse*.pl, difffolded.pl). Our
| revision control says the script was committed by somebody
| else though. It existed before I joined Netflix.
| monotux wrote:
| How long will you be able to keep up with this near yearly
| doubling of bandwidth used for serving video? :)
| drewg123 wrote:
| It depends on when we get PCIe Gen5 NICs and servers with
| DDR5 :)
| alberth wrote:
| Any current estimates on timing?
| toast0 wrote:
| Not the OP, but PCIe5 NICs are already available in the
| market; I've seen people requesting help getting them to
| work on desktop platforms which have PCIe5 as of the most
| recent chips. AFAIK, currently, both AMD and Intel
| release desktop before server; I don't think there's a
| public release date for Zen4 server chips, but probably
| this quarter or next? Intel's release process is too hard
| for me to follow, but they've got desktop chips with
| PCIe5, so whenever those get to the server, then that
| might be an option too.
| Rafuino wrote:
| Public release date for Zen4 server has been disclosed
| for November 10, FYI. https://www.servethehome.com/amd-
| epyc-genoa-launches-10-nove....
|
| Looks like Intel's release is coming January 10.
| https://www.tomshardware.com/news/intel-sapphire-rapids-
| laun...
| dist1ll wrote:
| How involved was Netflix in the design of the Mellanox NIC? How
| many stakeholders does this type of networking hardware have,
| relatively speaking?
|
| Also, what percentage of CDN traffic that reaches the user is
| served directly from your co-located appliances?
| _-david-_ wrote:
| There are a lot of slides and I am on my phone, so sorry if it
| was addressed in the slides.
|
| How does Linux compare currently? I know in the past FreeBSD
| was faster, but are there any current comparisons?
| tame3902 wrote:
| 1. I got excited when I saw arm64 mentioned. How competitive is
| it? Do you think it will be a viable alternative for Netflix in
| the future?
|
| 2. On amd, did you play around with BIOS settings? Like turbo,
| sub-numa clustering or cTDP?
| drewg123 wrote:
| Arm64 is very competitive. As you can see from the slides,
| the Ampere Q80-30 is pretty much on-par with our production
| AMD systems.
|
| Yes, I've spent lots of time in the AMD BIOS over the years,
| and lots of time with our AMD FAE (who is _fantastic_ , BTW)
| poking at things.
| crest wrote:
| Which NIC and driver combinations support kTLS offloading to
| the NIC?
|
| How did you deal with the hardware/firmware limitations on the
| number of offloadable TLS sessions?
| drewg123 wrote:
| We use Mellanox ConnectX6-DX NICs, with the Mellanox drivers
| built into FreeBSD 14-current (which are also present in
| FreeBSD 13).
| throw0101a wrote:
| > _We use Mellanox ConnectX6-DX NICs_
|
| Is there a plan to move to the Connect X-7 eventually?
|
| Depending on the bandwidth available, that'd be either 2x
| to get the same 800Gb/s as here (or perhaps eventually with
| 4x to get 1600Gb/s).
| drewg123 wrote:
| Yes, I'm looking forward to CX7. And to other pcie Gen5
| NICs!
| eddyg wrote:
| Wondering if there's a video presentation to go along with the
| slides?
| notaplumber1 wrote:
| This talk was given at this years EuroBSDcon in Vienna,
| recording is up on YouTube.
|
| https://2022.eurobsdcon.org/
|
| https://www.youtube.com/watch?v=36qZYL5RlgY
|
| Some really great talks this year from all the *BSDs, highly
| recommend checking a look: https://www.youtube.com/playlist?l
| ist=PLskKNopggjc6_N7kpccFZ...
| coredog64 wrote:
| And is the video presentation on Netflix?
| alberth wrote:
| A. Just curious, are these servers performing any work besides
| purely serving content? Eg user auth, album art, show
| description, etc?
|
| B. What's the current biggest bottleneck preventing higher
| throughout?
|
| C. Has everything been up streamed? Meaning, if I were to
| theoretically purchase the exact same hardware - would I be
| able to achieve similar throughout?
|
| (Amazing work by the way in these continued accomplishments.
| These posts over thr years are always my favorite HN stories.)
| drewg123 wrote:
| a) These are CDN servers, so they serve CDN stuff. Some do
| serve cover art, etc sorts of things.
|
| b) Memory bandwidth and PCIe bandwidth. I'm eagerly awaiting
| Gen5 PCIe NICs and Gen5 PCIe / DDR5 based servers :)
|
| c) Yes, everything in the kernel has been upstreamed. I think
| there may be some patches to nginx that we have not
| upstreamed (SO_REUSEPORT_LB patches, TCP_REUSPORT_LB_NUMA
| patches).
| fsckin wrote:
| What tools do you use for load testing / benchmarking?
| drewg123 wrote:
| At a very basic microbenchmark level, I use stream, netpef, a
| few private VM stress tests, etc. But the majority of my
| testing is done using real production traffic.
| alberth wrote:
| If a "typical" NIC was used, what do you think the throughput
| would be?
|
| I have to imagine considerably less (e.g. 100 Gb/s instead of
| 800).
| drewg123 wrote:
| Back of the envelop guess is ~400Gb/s. Each node has enough
| memory BW for about 240Gb/s, then factor in some efficiency
| loss for NUMA..
| toast0 wrote:
| Not the OP, but that's basically in the slides. When it's
| kTLS, but not NIC kTLS. Maybe you could optimize that a bit
| more around the edges if NIC kTLS wasn't an option.
| amelius wrote:
| At what point will it make more sense to use specialized
| hardware, e.g. network card that can do encryption?
| drewg123 wrote:
| We already do. The Mellanox ConnectX6-Dx with crypto
| support.. It does inline crypto on TLS records as they are
| transmitted. This saves memory bandwidth, as compared to a
| traditional lookaside card.
| MichaelZuo wrote:
| What's the error rate, or uptime ratio, of those cards?
| drewg123 wrote:
| Were you assuming they were giant FPGA based NICs..? They
| are production server NICs, using asics with a reasonable
| power budget. I don't recall any failures.
| MichaelZuo wrote:
| Well I wasn't, though I was expecting some non-zero
| amount of failures.
|
| That's pretty impressive if it's literally zero.
|
| How many machines are deployed with NICs?
| drewg123 wrote:
| I don't have any visibility into how many DOA NICs we
| have, so I can't say that Mellanox is better or worse at
| that point. But I do see most NIC related tickets for NIC
| failures once machines are in production. In general,
| we've found Mellanox NICs to be very reliable.
| PYTHONDJANGO wrote:
| * How is the DRM applied? * Is the software, that does DRM open
| source, too?
| alberth wrote:
| How much "U's" of space do ISP typically give you (e.g. 4U, 8U,
| etc)?
| nixgeek wrote:
| This is going to be a "How long is a piece of string?". Each
| ASN will be unique, and even within any large ISP, there may
| be many OCA deployment sites (there won't just be one for
| Virgin Media in UK) and each site will likely have subtly
| different traffic patterns and content consumption patterns,
| meaning the OCA deployment may be customized to suit, and the
| content pushed out (particularly to these NVME-based nodes)
| will be tailored accordingly.
|
| Since the alternative for an ISP is to be carrying the bits
| for Netflix further, the likelihood is they'll devote
| whatever space is required because that's much cheaper than
| backhauling the traffic and ingressing over either a
| settlement-free PNI or IXP link to a Netflix-operated cache
| site, or worse, ingressing the traffic over a paid transit
| link.
|
| Meanwhile, on the flipside, since Netflix funds the OCA
| deployments they have a strong interest in not "oversizing"
| the sites. That said I'm sure there is an element of growth
| forecasting involved once a site has been operational for a
| period of time.
| vkaku wrote:
| Read the presentation. Had super noobie level questions.
|
| Is the RAM mostly used by page content read by the NICs due to
| kTLS?
|
| If there was better DMA/Offload could this be done with a
| fraction of the RAM? (NVME->NIC)
|
| If there was no need to TLS, would the RAM usage drop
| dramatically?
| drewg123 wrote:
| These are actually fantastic questions.
|
| Yes, the RAM is mostly used by content sitting in the VM page
| cache.
|
| Yes, you could go NVME->NIC with P2P DMA. The problem is that
| NICs want to read data at once TCP mss (~1448b) and NVME
| really wants to speak in 4K sized chunks. So there needs to
| be some buffers somewhere. It might eventually be CXL based
| memory, but for now it is host memory.
|
| EDIT: missed the last question. No, with NIC kTLS, the host
| RAM usage is about the same as it would be without TLS at
| all. Eg, connection data sitting in the socket buffers refers
| to pages in the host vm page cache which can be shared among
| multiple connections. With software kTLS, data in the socket
| buffers must refer to private, per-connection encrypted data
| which increases RAM requirements.
| hzhou321 wrote:
| What prevents linux to achieve the same bandwidth?
| _trackno5 wrote:
| Not sure about all other optimisations, but Linux doesn't
| have support for async sendfile.
| [deleted]
| erk__ wrote:
| Do you know if there is any documentation regarding interfacing
| with the KTLS, eg to implement support for a new library?
| sanxiyn wrote:
| For Linux, there is a documentation at kernel.org:
| https://docs.kernel.org/networking/tls.html
| drewg123 wrote:
| The ktls(4) man page is a start. The reference implementation
| is OpenSSL right now. I did support for an internal Netflix
| library a while ago, I probably should have documented it at
| the time. For now feel free to contact me via email with
| questions (the username in the URL, but @netflix.com)
| kloch wrote:
| What filesystem(s) are you using for root and content?
|
| And If ZFS, what options are you using?
| drewg123 wrote:
| We use ZFS for root, but not content. For content we use UFS.
| This is because ZFS is not compatible with "zero-copy"
| sendfile, since it uses its own ARC cache rather than the
| kernel page cache, meaning sending data stored on ZFS
| requires an extra data copy out of the ARC. Its also not
| compatible with async sendfile, as it does not have the
| methods required to call the sendfile completion handler
| after data is read from disk into memory.
| deltarholamda wrote:
| >For content we use UFS
|
| I found this extremely interesting. ZFS is almost a cure-
| all for what ails you WRT storage, but there is always
| something that even Superman can't do. Sometimes old-school
| is best-school.
|
| Thanks for the presentation and QA!
| [deleted]
| nicholasjarnold wrote:
| I come from the time when the first internet connection my house
| had was a 56k modem...just before cable modems/DOCSIS started
| rolling out in the midwest. These speeds are somewhat mind
| boggling to me. (Yeah, yeah, datacenter vs home, but it's still
| somewhat hard to imagine saturating pipes like those.)
|
| While standing in a state of mild awe at 800Gb/s I read reviews
| and consider upgrading my house to 2.5Gb/s equipment... Should I
| just wait for 10Gbit to get a bit cheaper? Should I ditch copper
| and go fiber like that guy who was on the front page here
| recently (probably not, but that was cool)? Maybe raw single core
| CPU performance is starting to level off a bit, but it seems that
| networking technologies are still advancing a rapid clip!
| seized wrote:
| Fiber 10Gb is very cheap. NICs and SFPs from Ebay, fibre from
| FS.com in whatever length you want. I got a plenum rated 100 ft
| 4 pair cable from FS.com for $100 or so, and it was only that
| expensive for the plenum rating as it runs through my cold air
| returns.
| ksec wrote:
| Just some napkin maths. ( Correct me if I am wrong )
|
| Looking at the 800Gbps Config, Dell R7525 with Dual 64C / 128T
| and 4x Connect-DX 800Gbps in 2U.
|
| With Zen 4C, 128C and PCI-E 5.0, Connect-7, two node could fit
| into 2U. i.e doubling to 1.6Tbps per 2U.
|
| That is going from 16Tbps to 32Tbps per Rack. ( Using 40U only )
|
| To things in perspective, if every user were to use 20Mbps Stream
| at the same time, ( not going to happen due to time zone
| difference ), the 250M Netflix subscribers worldwide would need
| 5000M Mbps or 5000 Tbps. That is less than 200 Racks to serve
| every single of their customer on planet earth. ( Ignoring
| Storage. ) You could ship a Rack to every Region, State, Nation,
| Jurisdiction or Local ISP and Exchange and be done with it.
|
| I hope Lisa Su sent drewg123 and his team at Netflix with Zen 4C
| ASAP to play, _cough_ , I mean help them test it.
|
| Note: We have PCI-E 6.0 ( and 7.0 ), DDR6 on Roadmap. The 200
| Racks could be down to 50 Racks by the end of this decade.
| Assuming Netflix is still streaming at the same bitrate.
| loeg wrote:
| Netflix is more likely to use a single box of this kind of
| throughout at any given POP than a rack of them. For bigger
| installations they can use cheaper, less throughput-dense
| hardware (although I don't know if they do).
| carlhjerpe wrote:
| Take a look at the hardware, it isn't particularly expensive
| stuff.
| loeg wrote:
| Aside from GPUs, I'm not sure how you would increase the
| cost density much. Those NICs doing hundreds of Gbps and
| TLS aren't cheap, nor are the fast SSDs needed to sustain
| the load, nor is RAM or top end AMD server CPUs. Of course,
| the cost is absolutely worth it to Netflix!
| carlhjerpe wrote:
| Yes, but it's still just one box, if you're building a
| cluster of cheaper characteristics you need more of
| everything. A high-end server VS a cluster of 10
| machines, 10 machines wouldn't be cheaper to get to the
| same throughput, it's not alien specialized supertech,
| it's just top of the line commodity hardware. (10 is just
| an example number here).
| loeg wrote:
| I mean, I guess I disagree with your stipulation that you
| couldn't lower total costs somewhat using slightly more
| slightly lower end hardware, if rack space was cheap.
|
| > top of the line commodity hardware
|
| Yeah -- cost in commodity hardware scales super-linearly
| with performance.
| alberth wrote:
| > That is going from 16Tbps to 32Tbps per Rack ... only need
| 200 racks
|
| I doubt ISP's give an entire rack to Netflix. I wouldn't be
| surprised if they only get like 4U total (hence why throughput
| per server is so important to Netflix).
| BonoboIO wrote:
| The minimum requirements
|
| https://openconnect.zendesk.com/hc/en-
| us/articles/3600345383...
|
| I think it depends on the size of isp, probably a rack would
| be too much even for the biggest isps, but one 4u too less.
| TFortunato wrote:
| Looking at the banner pic on their main page, they seem to
| have at least one ISP install of multiple racks in the
| wild. Also, doing a little reading on how "fill" of the
| devices works, they talk about doing peer-to-peer filling
| of appliances located at the same site, which leads me to
| believe, even if not deploying a full rack, deploying
| multiple appliances to an ISP site is a relatively normal
| occurance
|
| https://openconnect.netflix.com/en/peering/
| meltedcapacitor wrote:
| Why not? It's top bandwidth consumer for a retail ISP and
| surely any reasonable amount of rack space is worth the
| savings in interconnect bandwidth.
| loeg wrote:
| They are frequently rack space constrained, hence these
| super dense hardware.
| jedberg wrote:
| Some ISPs give a full rack, some don't. It depends on how
| much traffic they have and how willing they are.
|
| But a lot of the racks sit at internet exchange points, where
| Netflix rents one or more racks at a time.
| recuter wrote:
| There is the rather intriguing prospect of NVM Express over
| Fabrics (NVMe-oF):
| https://en.wikipedia.org/wiki/NVM_Express#NVMe-oF
|
| Marvel Octeon 10 DPU (with an integrated 1 Terabit switch):
| https://www.marvell.com/content/dam/marvell/en/company/media...
|
| Probably pretty soon you'll be able to chuck in a few hot
| swappable 100 TB Nimbus exadrives
| (https://nimbusdata.com/products/exadrive/) in there and call
| it a day. 1T in 1U. :)
| Melatonic wrote:
| Interesting to see that Infiniband is still kicking
| coherentpony wrote:
| Not really. Ethernet and Infiniband are both perfectly
| capable from a bandwidth perspective. Streaming video isn't
| remotely close to latency-bound, which is where Infiniband
| would be better suited.
| Melatonic wrote:
| The people doing this might also be doing infra as code for the
| virtualization layer on the hardware itself - which this might
| not be able to satisfy. At minimum they surely have a ton of
| this stuff deployed already so changing hardware specs big time
| might not be worth the cost.
|
| Also are you taking into account encryption for those specs?
| reaperducer wrote:
| _That is less than 200 Racks to serve every single of their
| customer on planet earth. ( Ignoring Storage. )_
|
| If you're going to ignore storage, Netflix could just ship a
| low-end video server to every one of its customers and be done
| with it.
|
| Every problem is an easy problem if your pretend the hard parts
| don't exist.
| OliverGuy wrote:
| How much storage does Netflix actually need for its whole
| library?
|
| It's got about 17,000 titles globally [1]. If they have
| copies in SD, 720p, HD and 4k that would be 68,000 versions
| (plus some extra audio tracks for stuff dubbed in multiple
| languages, but I suspect this is fairly minimal in terms of
| storage though)
|
| Let's assume that the resolutions have the bitrates at 5, 10,
| 15 and 20 mbps.
|
| The average length of a Netflix original movie is ~90mins [2]
|
| So that would require about 575TB in storage if I have done
| my maths correctly.
|
| You would need about 20x30TB Kioxia CD6 SSDs for all that.
| Very expensive but definitely technically possible.
|
| I could totally see it being possible to fit those drives in
| a single node to push the 800gbps required, not increasing
| the over rack requirement at all. Not sure if the bandwidth
| from that many drives is enough, might have to cache some of
| the most watched stuff to ram)
|
| Not gonna see any in home boxes with all the titles pre
| loaded any time soon though. As a hard drive array that's
| still 30x20TB drives.
|
| [1] https://www.comparitech.com/blog/vpn-privacy/netflix-
| statist...
|
| [2] https://stephenfollows.com/netflix-original-movies-
| shows/#:~...)
| AdrianB1 wrote:
| Do they keep on every server the global library? I guess
| they partition it geographically.
| tecleandor wrote:
| In their OpenConnect network they keep the most demanded
| titles and the latest releases. And IIRC that refreshes
| nightly (with new releases and whatever is hot that day)
|
| https://openconnect.zendesk.com/hc/en-
| us/articles/3600356180...
| virtuallynathan wrote:
| Back of the Napkin Zen4 / Genoa gets you to ~500GB/s PCIe and
| ~500GB/s of DRAM bandwidth -- nearly 4Tbps! Zen3/Rome is
| ~300GB/s PCIe and ~300GB/s DRAM -- about 2.4Tbps. A single 2U
| box with Genoa might scale to 1.25Tbps+ of useful Netflix
| traffic. We'll have to see what magic Drew can pull :)
| aeyes wrote:
| You are probably overestimating Netflix traffic by a lot.
|
| IX.br peak traffic is 20Tb/s, DE-CIX peak traffic is 14Tb/s,
| AMS-IX is around 11Tb/s.
|
| The 800Gpbs machine is probably enough for a country.
|
| Netflix traffic stats at PIT Chile, this is their only peering
| connection in Chile: https://www.pitchile.cl/wp/graficos-con-
| indicadores/streamin...
| srmn wrote:
| This assumption misses out on all the private interconnect
| links and deployed OpenConnect appliances within ISP networks
| - a majority of Netflix's traffic today. IXes are only a
| small portion of overall internet traffic.
| lostlogin wrote:
| I notice people streaming in very low resolution without
| realising it, and sometimes intervene when the pain gets too
| great.
|
| I'd be vey surprised if the average bitrate was anywhere near
| the appropriation.
|
| However that wasn't the point of the calculation, it was
| looking for a maximum.
| orangepurple wrote:
| Agree and 20 mbps is a reasonable rate for modern codecs
| for resolutions up to 4k for the 99% of viewers
| ocbyc wrote:
| "In networking units"
| BonoboIO wrote:
| It amazes me, that Netflix is capable of such top of the line
| engineering things (really mindblowing stuff, one machine that
| streams nearly 1 Terabit pers second), but is for the love of god
| unable to stream HD Content to my iPhone (newest firmware, all
| up2date). Tried everything gigabit wifi, cellular, multiple ISPs
| ...
|
| It is better for me to pirate their content, play it with Plex
| and be happy. I pay for Netflix, but still have to download it,
| to see it an acceptable quality. Absurd. The support couldn't
| help. It doesn't affect, because I have my Torrent/Plex Setup,
| but for 99.9% of people it is a subpar experience.
|
| I think the best years are over for Netflix. The hard awakening
| is here to make content that the users want and they are a
| movie/tv content company, not primarily a ,,tech company".
| staringback wrote:
| leetharris wrote:
| You live in a bubble. The vast majority of the world likely
| cannot even tell the difference between HD and 4K. Netflix
| continues to grow its content and retain subscribers.
| BonoboIO wrote:
| Netflix is a media company as I said.
|
| Well 4K vs HD, you are right, but 480p on a Retina display
| right in front of me. Really obvious.
| selfhoster69 wrote:
| > unable to stream HD Content to my iPhone
|
| Yeah this has been the case since forever. It prioritizes
| instant playback vs forcing 1080p or similar.
|
| Can't speak for iPhone, but on iPad, I've moved to using the
| website which goes goes to 1080p immediately.
|
| > still have to download it, to see it an acceptable quality
|
| Downloaded content do contain a whole lot more compression than
| streaming at max phone supported quality, so just a tiny FYI.
| this15testing wrote:
| related to slide 4...
|
| how much does netflix donate to the FreeBSD foundation relative
| to their profits?
| hnarn wrote:
| "Netflix does contribute financially to the FreeBSD Foundation
| and has done so since 2012. Last year they engaged at the
| "platinum" level with contributing more than $50,000+ USD to
| the foundation." (2019)
|
| Took about five seconds to Google, it's the first result for
| "netflix donations to freebsd".
|
| NFLX Q3 2019 earnings were about $5.2B.
|
| So about 0.001%, I guess.
| this15testing wrote:
| haha
___________________________________________________________________
(page generated 2022-11-03 23:01 UTC)