[HN Gopher] Serving Netflix Video Traffic at 800Gb/s and Beyond ...
___________________________________________________________________
Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf]
Author : ksec
Score : 434 points
Date : 2022-08-19 11:53 UTC (11 hours ago)
(HTM) web link (nabstreamingsummit.com)
(TXT) w3m dump (nabstreamingsummit.com)
| walrus01 wrote:
| Everything old is new again: Anyone remember seeing a 32-bit/33
| MHz PCI (not pci-x, not pci-e) card for SSL acceleration in the
| late 1990s? It was totally a thing at one point in time when your
| typical 1U rackmount server was single-core CPUs and quite weak
| in overall math processing power.
|
| OpenBSD had support for them like 22 years ago.
|
| https://www.google.com/search?client=firefox-b-d&q=SSL+accel...
|
| Now we have TLS1.2/TLS1.3 offload getting built into the PCI-E
| 4.0 100/200/400GbE (whatever speed) NIC.
| lprd wrote:
| I wonder if they are using something like truenas or just
| interfacing directly with OpenZFS (assuming they use ZFS).
| BonoboIO wrote:
| It amazes me, that Netflix is capable of such top of the line
| engineering things, but is for the love of god unable to stream
| HD Content to my iPhone. Tried everything gigabit wifi, cellular
| ...
|
| It is better for me to pirate their content, play it with plex
| and be happy. I pay for Netflix and it is absurd.
|
| I think the best years are over for Netflix. The hard awakening
| is here to make content that the users want and they are a
| movie/tv content company, not primarily a ,,tech company".
| Infinitesimus wrote:
| Some ISPs throttle Netflix. Not sure your background but it
| might be helpful to have more details about the type phone (I'd
| expect a difference in 13 pro max vs an old 7) and ISP to see
| if others have similar problems.
| BonoboIO wrote:
| iPhone XS iOS 15.4 Netflix App Updated A1 Telekom both mobile
| and wifi. It does not even work at work with a different ISP.
|
| Netflix has all the bandwidth data and metrics, but this is
| not working since ages. Maybe a more basic setup on their end
| would bring better results. Focus more and delivery, not 10
| different UI versions, AB Tests, Batch Job Workflows and so
| on. They Post on their engineering blog how they test
| multiple TVs, multiple Profiles in encoding, great things,
| but if the basics don't work ... well what is it good for.
|
| I think they lost their focus.
| ksec wrote:
| And this is still on ConnectX-6 Dx, with PCI-Gen 5 and
| ConnectX-7, Netflix should be able to push for _1.6Tbps_ per box.
| This will hopefully keep drewg123 and his team busy for another
| year :P
| dragontamer wrote:
| At that point, RAM itself would likely be the bottleneck.
|
| But maybe DDR5 will come out by then and get this team busy
| again lol.
| wmf wrote:
| Genoa does indeed have roughly double the memory bandwidth.
| Moral_ wrote:
| A lot of the reasons they've had to build most of this stuff
| themselfs is because they decided for some reason to use freeBSD.
|
| The NUMA work they did, I remember being in a meeting with them
| as a Linux Developer at Intel at the time. They bought NVMe
| drives or were saying they were going to buy NVMe drives from
| Intel which got them access to "on the ground" kernel developers
| and CPU people from Intel. Instead of talking about NVMe they
| spent the entire meeting asking us about howt the Linux kernel
| handles NUMA and corner cases around memory and scheudling. If I
| recall correctly I think they asked if we could help them
| upstream BSD code for NVMe and NUMA. I think in that meeting
| there was even some L9 or super high up NUMA CPU guy from
| Hillsborough they some how convinced to join.
|
| The conversation and technical discussion was quite fun, but it
| was sort of funny to us at the time they were having to do all
| this work on the BSD kernel that was solved years ago for linux.
|
| Technical debt I guess.
| ksec wrote:
| Is NUMA a solved issue on Linux? Correct me if I am wrong but I
| was under the impression it may be better handled under certain
| conditions, but NUMA, the problem in itself is hardly solved.
| cperciva wrote:
| Netflix tried Linux. FreeBSD worked better.
| dboreham wrote:
| By some definition of better.
| trasz wrote:
| It worked faster. It's a common misconception among newbies
| that "Linux has NUMA" automatically means it will use NUMA
| properly in a given workload. What it actually means is you
| _should_ be able to use existing functionality. Sometimes
| you'll only need to configure it, sometimes you'll need to
| reimplement it from the scratch, and doing that in FreeBSD
| is easier because there's less bloat.
| throw0101c wrote:
| *At the time when they created the OCA project.
|
| If someone was going to do a similar comparison now the
| results _could_ be different.
| jeffbee wrote:
| I still don't get the NUMA obsession here. It seems like they
| could have saved a lot of effort and a huge number of
| powerpoint slides by building a box with half of these
| resources and no NUMA: one CPU socket with all the memory and
| one PCIe root complex and all the disks and NICs attached
| thereto. It would be half the size, draw half the power, and be
| way easier to program.
| Bluecobra wrote:
| If you are buying servers at scale the costs will certainly
| add up vs. buying two processors. If you buy single proc
| servers, that is double the amount of chassis, rail kits,
| power supplies, power cables, drives, iLO/iDRAC licenses,
| etc.
| dboreham wrote:
| You can build motherboards with two or more completely
| isolated sets of CPU and memory, that are physically
| compatible with standard racks etc.
| Bluecobra wrote:
| Good point, I forgot about those. It would be interesting
| to see if 1x PowerEdge C6525 with four single processor
| nodes is cheaper than 2x Dell R7525 servers. The C6525
| does support dual processor, so it does seem a bit
| wasteful to me.
| muststopmyths wrote:
| Can you buy non NUMA mainstream CPUs though? Honest question
| because I'd love to be rid of that BS too
| jeffbee wrote:
| NUMA is an outcome of system configuration. You can make a
| non-NUMA platform using any CPU. You just limit yourself to
| 1 CPU socket.
|
| Here's a Facebook engineering blog post about how they left
| NUMA behind. https://engineering.fb.com/2016/03/09/data-
| center-engineerin...
| Dylan16807 wrote:
| > You can make a non-NUMA platform using any CPU. You
| just limit yourself to 1 CPU socket.
|
| Well, not on Epyc generation 1. Those have four NUMA
| segments in each socket.
|
| Also those Xeon Platinum 9200 processors Intel made as an
| attention grab.
| jeffbee wrote:
| EPYC Naples wasn't good for much of anything though, so I
| am trying to forget it.
| drewg123 wrote:
| This is a testbed to see what breaks at higher speed. Our
| normal production platforms are indeed single socket and run
| at 1/2 this speed. I've identified all kinds of unexpected
| bottlenecks on this testbed, so it has been worth it.
|
| We invested in NUMA back when Intel was the only game in
| town, and they refused to give enough IO and memory bandwidth
| per-socket to scale to 200Gb/s. Then AMD EPYC came along. And
| even though Naples was single-socket, you had to treat it as
| NUMA to get performance out of it. With Rome and Milan, you
| can run them in 1NPS mode and still get good performance, so
| NUMA is used mainly for forward looking performance testbeds.
| pclmulqdq wrote:
| This is amazing work from the Netflix team. I'm looking forward
| to 1.6 Tb/s in 4 years.
|
| It is interesting that this work is happening on FreeBSD, and
| potentially with diverging implementations than Linux. Linux
| programs seem to be moving towards userspace getting more power,
| with things like io_uring and increasing use of frameworks like
| DPDK/SPDK. This work is all about getting userspace out of the
| way, with things like async sendfile and kernel TLS. That's
| pretty neat!
| mgerdts wrote:
| PCIe Gen 5 drives look poised for wide availability next year
| and NVIDIA has been demoing CX7 [1] which is also PCIe Gen 5.
| Intel already has some Gen 5 chips and AMD looks like they will
| follow soon [2]. Surely there will be other bumps, but I bet
| they pull it off in way less than 4 years.
|
| 1. https://www.servethehome.com/nvidia-connectx-7-shown-at-
| isc-...
|
| 2. https://wccftech.com/amd-epyc-7004-genoa-32-zen-4-core-
| cpu-s...
| the8472 wrote:
| kTLS has been added to linux too including offload. It also has
| p2p-dma, so in principle you can shovel the file directly from
| NVMe to the NIC and have the NIC encrypt it, so it'll never
| touch the CPU or main memory. But that only works on specific
| hardware.
| [deleted]
| robocat wrote:
| Memory is the cache for popular content. You couldn't serve
| fast enough directly from NVMe.
|
| "~200GB/sec of memory bandwidth is needed to serve 800Gb/s"
| and "16x Intel Gen4 x4 14TB NVME". So each NVMe drive would
| need to serve 12.5GB/s which is more than the 8GB/s limit for
| PCIe 4.0 x4. Also popular content would need to be on every
| drive, drastically lowering the total content stored.
|
| Also see drewg's comment on this for a different reason:
| https://news.ycombinator.com/item?id=32523509
| pclmulqdq wrote:
| With HBM2 sapphire rapids chips, I assume you can actually
| get there. There is probably an insane price premium for
| them, though, so I wouldn't hold my breath.
| mschuster91 wrote:
| They... serve 800 gigabytes a second on _one single content
| server_ , do I get that right?
| plucas wrote:
| Gigabits, I presume, so 100 GB/s.
| Aissen wrote:
| Almost, it's 800 gigabits. Still _a lot_.
| la64710 wrote:
| Great engineering but how does this 800Gb/s throughput achieved
| translate downstream all the way to the consumers? I suspect
| there may be switches and routers from ISPs and others that
| Netflix do not control in between that will reduce the effective
| throughout to the end user.
| kkielhofner wrote:
| ISP routers have been more-or-less indistinguishable from
| switches for decades at this point. They're all "line rate"
| which is to say that regardless of features, packet size, etc
| they'll push traffic between interfaces at whatever the
| physical link is capable of without breaking a sweat.
|
| In the case of Netflix it is in the ISPs best interest to let
| them push as much traffic to their customer eyeballs as
| possible. After all, it's much "easier" and cheaper to build
| out your internal fabric and network (which you have to do
| anyway for the traffic) than it is to buy and/or acquire
| transit to "the internet" for this level of traffic.
| vbernat wrote:
| Modern routers are unable to do line-rate regardless of
| packet size. See for example the Q100 ASIC from Cisco. Rated
| for 10.8 Tbps, it is only able to achieve 6 Bpps [1]. So it
| needs 200-byte packets to hit line-rate. However, as for
| Netflix, this is not problem since they only push big
| packets.
|
| [1]: https://xrdocs.io/8000/tutorials/8201-architecture-
| performan...
| kkielhofner wrote:
| Wow, I've been out of this space for a while! Last I was
| paying close attention to any of this 10G ports were new.
| Glad I learned something from my old life today!
|
| I stand corrected on "always line rate all the time in any
| circumstance" but by your math and my general point < 1
| Tbps from one of these appliances across multiple 100G
| ports isn't problematic in the least from a hardware
| standpoint - especially for the Netflix traffic pattern
| with relatively full (if not max MTU) packets.
| [deleted]
| Cyph0n wrote:
| On the contrary, even older routers can handle this load with
| no sweat. Service provider-grade routers can handle 10 to 200
| Tbps depending on size.
| Aeolun wrote:
| But then it gets to my home and it's trashed down to
| 100Mbit/s
| rayiner wrote:
| Of course--the fat backbone pipe is progressively split
| into smaller pipes as it gets to your house. The internet
| isn't a big truck. It's a series of tubes.
| jeffbee wrote:
| That would be more than enough to watch half a dozen
| Netflix streams at the same time.
| paxys wrote:
| The 800 Gb/s isn't going to a single user. There are switches
| and routers in the middle, sure, but they are all doing their
| job, which is to split up traffic. The end user only needs ~8
| Mb/s for a 4K stream.
| jiripospisil wrote:
| Here's the accompanying video:
| https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa...
| antonio-ramadas wrote:
| I found the same video on the website of the summit:
| https://nabstreamingsummit.com/videos/2022vegas/
|
| I'm on mobile and there does not seem to exist a direct link.
| Search for: "Case Study: Serving Netflix Video Traffic at
| 400Gb/s and Beyond"
| Aissen wrote:
| How do you deal with the higher power density of these servers
| that needs to be put at the ISP locations ? Don't they have some
| constraints for the open connect machines ?
| Bluecobra wrote:
| Delivering power is not the problem, cooling is. You can load
| up a cabinet with four 60A PDU's (~50kW) but the challenge is
| to cool all that hardware you packed in the cabinet.
| Aissen wrote:
| Yeah, I was including that in the budget (server fans), but
| technically you're correct, DC cooling is powered separately.
| kkielhofner wrote:
| The Dell R7525 chassis is available with dual 800w power
| supplies. General thinking for power supplies is that each
| power supply is connected to completely separate power
| distribution - independent cabling, battery backup, generators,
| and points of entry to the facility. In many cases it's also
| two different power grids. This is so that if one power source
| fails anywhere the load can move over to the other power supply
| without exceeding the power that can be delivered through a
| single power supply or trip a breaker anywhere. Under normal
| operating conditions each power supply is doing half the load.
|
| Additionally, the National Electric Code in the US specifies
| that continuous load should not exceed 80% of given
| circuit/breaker capacity.
|
| So with dual 800 power supplies at "max" 80% load that's "only"
| 640 watts for one of these 2U servers. For 208V power that's
| only 3 amps. High density (for sure) compared to the old days
| but not as ridiculous as it may seem.
| Aissen wrote:
| You're right, it's not that much for 2U. But for this config,
| I think they'd probably go for the 1400W power supplies:
|
| - 16 x 25W SSDs
|
| - 2 x 225W CPUs
|
| - On top of that, add RAM, cooling, etc.
|
| Honestly, it's still manageable. I doubt they'd put 10 of
| those in a single rack (you'd need an ISP that would want
| serve 2.2M subscribers in peak from a single location, not
| necessarily desirable on their side); but if the site is
| getting full, you'd feel the (power) pressure (slowly)
| mounting.
| kkielhofner wrote:
| I didn't dig into the CPU config, etc but you're right
| they'd probably go for the 1400W power supplies which is
| 5.4 amps max at 208V. It's for an older config (based on
| the other specs) but the current Netflix OpenConnect docs
| call for 750w[0] which is more reasonable even for this
| hardware configuration because no one really wants to
| consistently run their power supplies at 80% (even in
| branch loss) for obvious reasons.
|
| They absolutely wouldn't want to concentrate them. The
| entire purpose is to reduce ISP network load and get as
| close to the customer eyeballs as possible. I don't have
| any experience with these but I imagine ISPs would install
| them at their peering "hubs" in major cities - in my
| experience the usual suspects like Chicago, NYC, Miami,
| etc.
|
| [0] - https://openconnect.zendesk.com/hc/en-
| us/articles/3600345383...
| gmm1990 wrote:
| Why use/make processors with numa if you have to go to all this
| trouble not to use it?
| 0x457 wrote:
| Well, the point of NUMA is to allow you to do things like in
| slides rather than - everyone suffers equally talking to North
| Bridge. Fabric between NUMA nodes isn't the selling point -
| fast and direct connection between CPU and other components is.
|
| Plus, not every workload is: read from disk -> encrypt -> send
| to nic
| 0x500x79 wrote:
| Awesome feats of engineering here taking hardware and software
| into account when designing the system for a holistic approach to
| serving content as quickly as possible!
|
| The slide deck background though: At least half of the products
| in the slide deck template are no longer on Netflix...
| Joel_Mckay wrote:
| How would this compare with 42 server slots running 100Gbps DRBD
| in RAID 0? If I recall, it can pre-shard the data based on a
| round-robin balancer. ;)
| drewg123 wrote:
| We don't consider solutions like DRBD that introduce inter-
| dependencies between servers. Any CDN server has to be able to
| fail and not take down our service.
| ajross wrote:
| So... the driver and device level seems happy here, but is anyone
| else creeped out by "asynchronous sendfile()"? I mean, how do you
| even specify that? You have a giant file you want dumped down the
| pipe, so you call it, and... just walk away? How do you report
| errors? What happens to all the data buffered if the other side
| resets the connection? What happens if the connection just
| stalls?
|
| In synchronous IO paradigms, this is all managed by the
| application with whatever logic the app author wants to
| implement. You can report the errors, ping monitoring, whatever.
|
| But with this async thing, what's the API for that? Do you have
| to write kernel code that lives above the driver to implement
| devops logic? How would one even document that?
|
| +1 for the technical wizardry, but seems like it's going to be a
| long road from here to an "OS" feature that can be documented.
| 0x457 wrote:
| Here is announcement for the feature:
| https://www.nginx.com/blog/nginx-and-netflix-contribute-new-...
|
| And here are the slides explaining it:
| https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-en...
|
| There are video of various talks by Gleb Smirnoff explaining
| all this magic on YouTube.
|
| The feature is fully documented in `man 2 sendfile`, it was
| part of the patch that did the work.
| zackmorris wrote:
| This was my thought too. I've been struggling with the concept
| of "if you don't have anything nice to say, don't say anything
| at all" lately, because I've been programming too long and just
| see poison pills and better alternatives everywhere I look.
|
| But I believe that async is an anti-pattern. From the article:
| * When an nginx worker is blocked, it cannot service other
| requests * Solutions to prevent nginx from blocking like
| aio or thread pools scale poorly
|
| Nothing against nginx (I use it all the time, it's great) but I
| probably would have used a synchronous blocking approach. The
| bottleneck there would be artificial limits on stuff like I/O
| and the number of available sockets or processes.
|
| So.. why isn't anyone addressing these contrived limits of sync
| blocking I/O at a fundamental level? We pretend that context
| switching overhead is real, but it's not. It's an artifact of
| poorly written kernels from 30+ years ago (especially in
| Windows) where too many registers and too much thread state
| must be saved while swapping threads. We're basically all
| working around the fact that the big players have traditionally
| dragged their feet on refactoring that latency.
|
| And that some of the more performant approaches like atomic
| operations using compare and swap (CAS) on thread-safe queues
| beat locks/mutexes/semaphores. And that content-addressable
| memory with multiple busses or even network storage beats
| vertical scaling optimizations.
|
| So I dunno, once again this feels like kind of a drink-the-
| kool-aid article. If we had a better sync blocking foundation,
| then a simple blocking shell script could serve video and this
| whole PDF basically goes away. Rinse, repeat with most web
| programming too, where miles-long async code becomes a single
| deterministic blocking function that anyone can understand.
|
| I'm kind of reaching the point where I expect more from big
| companies to fix the actual root causes that force these async
| workarounds. I kind of gave up on stuff like that over the last
| 10 years, so am behind the times on improvements to sync
| blocking kernel code. I'd love to hear if anyone knows of an OS
| that excels at that.
| 0x457 wrote:
| Slide 25 shows benchmark between "old" sendfile and "new"
| sendfile:
|
| https://www.slideshare.net/facepalmtarbz2/new-sendfile-in-
| en...
|
| > but I probably would have used a synchronous blocking
| approach.
|
| Well, send a patch, then.
| wmf wrote:
| _I probably would have used a synchronous blocking approach_
|
| Then Varnish is probably more your style. (A discussion
| between phk and drewg would be fascinating to watch.)
|
| _We pretend that context switching overhead is real, but it
| 's not._
|
| This sounds crackpot to be honest. Linux has put a lot of
| effort into optimizing context switching (that's why they
| have NPTL instead of M:N) and I assume FreeBSD has as well.
|
| _...this whole PDF basically goes away_
|
| Sync vs. async doesn't solve any of the NUMA or TLS issues
| that this whole PDF is about.
| drewg123 wrote:
| This has been a feature upstream in FreeBSD for roughly 6
| years.
|
| If there is a connection RST, then the data buffered in the
| kernel is released (either freed immediately, or put into the
| page cache, depending on SF_NOCACHE).
|
| sendfile_iodone() is called for completion. If there is no
| error, it marks the mbufs on the socket buffer holding the
| pages that were recently brought in as ready, and pokes the TCP
| stack to send them. If there was an error, it calls TCP's
| pru_abort() function to tear down the connection and release
| what's sitting on the socket buffer. See
| https://github.com/freebsd/freebsd-src/blob/main/sys/kern/ke...
| mrbonner wrote:
| My work requires to deal with political crap day long to get
| promoted to a staff role. I miss this kind of work.
| ryanianian wrote:
| > deal with political crap day long to get promoted to a staff
| role
|
| That's largely what many staff+ engineers have to do, even in
| otherwise healthy organizations. "Staff" isn't a glorified,
| autonomous, and stress-free version of senior at most
| companies. There's nothing wrong with staying at the senior
| level indefinitely provided (1) the pay and other factors are
| keeping up with your contributions and (2) the staff+ and
| management folks are being effective umbrellas for the politics
| and messy uninteresting details behind interesting problems
| like this.
| touisteur wrote:
| Will be fun to see what can be done with pcie5 stuff and new 400g
| NICs. Really amazed by the recent increase in bandwidth. Sfp56
| recently becoming 'mainstream' in datacenters with 200G
| controlers at <1500 each, you can stuff 8 or 10 of those in your
| server. And you get immediate x2 with next gen. If you can
| offload some of the heavy work to one (or several) GPUs or these
| FPGA accelerator boards (Alveo or more niche but also crazy
| ReflexCES with eth-800G capability) you're really starting to get
| a 'datacenter in a box' system. If compacity is important, the
| next years are going to be very interesting.
| jwmoz wrote:
| I've wondered how they achieve it and it's so far beyond my
| knowledge and skills, truly astounding. The level of expertise
| and costs must be so high.
| Aeolun wrote:
| Spend a few years just thinking of how to optimize video
| delivery and you'd be a lot closer to understanding :)
| throw0101c wrote:
| Previous discussions on 400Gb/s:
|
| * https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-...
|
| * https://news.ycombinator.com/item?id=28584738
| skynetv2 wrote:
| This is amazing work. I cant help but state that we have been
| doing these in HPC environments for at least 15 years - User
| space networking, offloads, NUMA domains aware scheduling, jitter
| reduction ... great to see it being put to good use in more
| mainstream workloads. Goes to show - software is eating the
| world.
| drewg123 wrote:
| I worked in HPC as well, and I have to point out emphatically
| that this _IS NOT USERSPACE NETWORKING_. The TCP stack resides
| in the kernel. The vast majority of our CPU time is system +
| interrupt time.
| phantomathkg wrote:
| > Serve only static media files This part is weird to me. My
| understanding is DRM lock the file at a per user level, so the
| DRM encrypted chunk would be different from yours. Unless for all
| bitrate, and for all streaming format, Netflix has already pre-
| computed everything. Otherwise, there must be some sort of pre-
| computation before it can be served over TLS.
| wmf wrote:
| That's not how DRM works. The content is encrypted once and
| that key is sent to the client. The content key is probably
| wrapped in some per-session key (which may be wrapped in a per-
| user key wrapped in a per-device key or something).
| paxys wrote:
| I love technical content like this. Not only is it incredibly
| interesting and informative, it also serves as a perfect
| counterpoint to the popular " _why does Netflix need X thousand
| engineers, I could build it in a weekend_ " sentiment that is
| frequently brought up on forums like this one.
|
| Building software and productionalizing/scaling it are two very
| different problems, and the latter is far more difficult. Running
| a successful company always requires an unlimited number of very
| smart people who are willing to get their hands dirty optimizing
| every aspect of the product and business. Too many people today
| think that programming starts and ends at pulling a dozen popular
| libraries and making some API calls.
| Aaronmacaron wrote:
| I think the problem is that the "easy" parts of netflix such as
| the UI or the recommendation engine seem like they were hacked
| together over the weekend. Of course deploying and maintaining
| something of the scale of netflix is incredibly hard. But if
| they can afford thousands of engineers who optimize the
| performance why can't they hire a few UI/UX engineers to fix
| the godawful interface which is slightly different on every
| device? I think this is where this sentiment stems from.
| stackbutterflow wrote:
| That's what puzzles me about Uber. I believe that behind the
| scenes it does pretty complex things as explained many times
| on HN, but it's the worst app I've ever used. UI and UX wise
| it's so bad that if you told me it was a bootcamp graduation
| project I'd have no problem believing you.
| bradstewart wrote:
| I honestly find Netflix's the easiest to navigate, by far.
|
| Hulu did that big redesign, and it's extremely pretty to look
| at, but even after a few years of trying to use it, I _still_
| struggle to do anything other than "resume episode". Finding
| the previous episode, list episodes, etc is always an
| exercise in randomly clicking, swiping, long pressing,
| waiting for loading bars, etc.
|
| One thing Netflix _really_ got right as well: the "Watch It
| Again" section. So many times I want to rewatch the episode I
| just "finished" (because either my wife finished a show when
| I leave the room, the kids fell off the table, I fell asleep
| or wasn't paying attention, etc), and every other platform
| makes this extremely difficult to find.
|
| Back to Hulu--the only way I know how is the search feature,
| which is a PITA with a remote.
| Shaanie wrote:
| I'm surprised you think Netflix's UI and UX is that poor.
| Which streaming service do you think does a better job?
| hugey010 wrote:
| None of them, since they basically all copied Netflix! The
| grid view limits users to slowly looking over limited
| categories of content. Any list based tree structure would
| be better in my opinion.
| zasdffaa wrote:
| Why do you say "UI and UX"; how are they different in your
| view?
|
| Jargon BS is invading people's heads and it has to stop.
| brabel wrote:
| Not OP, but I think the Swedish TV streaming service has a
| simpler, while nicer UX (hope you can at least see this
| from your country, if not play the content):
| https://www.svtplay.se/
|
| Admitedly, it follows the same pattern as Netflix, but I
| like how it's more responsive and feels way
| simpler/lighter.
| paxys wrote:
| You just linked to an exact copy of Netflix.
| NavinF wrote:
| > godawful interface which is slightly different on every
| device
|
| Which devices are you referring to? I've only used the PC and
| mobile interfaces both of which are quite pleasant.
| paxys wrote:
| Technically speaking I think Netflix's UX blows every other
| streaming app out of the water. It loads instantly, scrolling
| is smooth, search is instant. Buttons are where you'd expect
| and do what you expect. They have well-performing and up-to-
| date apps for every conceivable device and appliance. They
| support all the latest audio and video codecs.
|
| This is all in stark contrast to services like HBO Max and
| Disney+ which still stutter and crash multiple times a day.
| Amazon for some reason treats every season of a TV show and
| HD/SD versions of movies as independent items in their
| library. I still haven't been able to download a HBO Max
| video for offline viewing on iOS without the app crashing on
| me at 99%.
|
| The problems you mention with Netflix are real, but they have
| more to do with the business side of things. Netflix
| recommendations seem crap because they don't have a lot of
| third party content to recommend in the first place. Their
| front page layout is optimized to maximize repetition and
| make their library seem larger. They steer viewers to their
| own shows because that's what the business team wants. None
| of these are problems you can fix by reassigning engineers.
| P5fRxh5kUvp2th wrote:
| My complaint about netflix UI/UX aren't technical in
| nature, I agree with you their player is the best out
| there, hands down.
|
| The issue is the business polices surrounding it. The UI
| itself is user-hostile.
| rurp wrote:
| > Buttons are where you'd expect and do what you expect.
|
| Wait, what? Netflix is the absolute worst at this. Every
| time I log in the interface is different! Netflix could not
| care less about users having a consistent seamless
| experience.
|
| But as far as performance goes, I totally agree with you.
| The performance is impressively good and noticeably better
| than the other streaming apps I use.
|
| The UX is just so bad in so many ways (UI churn, autoplay,
| useless ratings, useless categories, recaps that can be
| watched exactly once, and so on...) it mostly ruins the app
| for me. The actual video quality is great though.
| 0x457 wrote:
| Interface is the same, order of rows is different. Yes,
| it sucks. However, other streaming apps are much worse:
|
| - by the time HBO Max finish loading, I've already lost
| interest
|
| - Amazon Prime constantly gives me errors, and it's often
| hard to find what you paid for and what you have to pay
| for
|
| - Paramount+ often restart episode from beginning instead
| of resuming.
|
| - Many leave shit in your queue with a few seconds left
| for you to "Continue Watching". I still have shows in
| Paramount+ that I've finished months ago in the queue,
| and there is no way to delete them without watching end
| credits. - HBO Max only allows you FF in small fixed
| intervals
|
| - Plex...used to be okay, now it's pushing its streaming
| services and works very bad offline
|
| - Apple TV has awful offline experience compared to
| netflix in terms of UX
|
| Nah, I will take netflix constantly changing rows over
| shit others do.
| xnx wrote:
| > Building software and productionalizing/scaling it are two
| very different problems, and the latter is far more difficult.
|
| Is this claim based on some example I should know? Countless
| companies never achieve product/market fit, but very few I can
| think of fail because they weren't able to handle all their
| customers.
| ternaryoperator wrote:
| This! I am frustrated at how often devs will not accept that
| simple things become incredibly complicated at scale. That
| favorite coding technique? That container you wrote? Those
| tests you added? All good, but until you've tested them at
| scale, don't assert that everyone should use them. This dynamic
| is true in the other direction too: that techniques often taken
| for granted simply are not feasible in highly resource-
| constrained environments. With rare exception, the best we can
| say with accuracy is that "I find X works well enough in the
| typical situations I code for."
| geodel wrote:
| Seems to be mixing too many things here. Many scaling/ hardware
| challenges need a lot of people but it can still be true that
| Netflix has choke full of engineers making half-assed turd Java
| frameworks day in and day out. I know this because we are
| forced to use these crappy tools as they are made by Netflix so
| supposed to be best.
|
| It's just that they succeeded in streaming market with low
| competition and great success bring in lot of post facto
| justifications on how outrageously great Netflix tech infra is.
|
| I mean it may be excellent for their purpose but to think their
| solution can be industry wide replicated seems not true to me.
| paxys wrote:
| So Netflix published a framework which seemingly isn't
| suitable for your use case, your managers forced you to use
| it, and your response is to blame...Netflix?
| tankenmate wrote:
| Que? You don't seem to have much justification for your
| points; it seems more like a rant as you have had a bad
| experience using software provided by Netflix. It would be
| great if you could provide more details about what was wrong
| with it rather than just "we are forced to use these crappy
| tools". I'm genuinely interested.
|
| In my personal experience lots of companies (admittedly all
| large companies, but many of which sell their services /
| software / hardware to smaller companies) have a use for
| serving hundreds of Gbps of static file traffic as cheaply as
| possible. And the slides for this talk seem exactly on the
| money (again from my experience slinging lots of static data
| to lots of users).
| AtNightWeCode wrote:
| Scaling streaming for a company at the size of Netflix is very
| easy. You can use any edge cache solution, even homemade. The
| complexity at N seems to stem from other things.
| yibg wrote:
| This is exactly the type of comment OP is referring to. Have
| you build a steaming service at this scale? Do you actually
| know what's involved? Or are you just looking at the surface
| level, making a bunch of assumptions and reaching a gut feel
| conclusion?
| n0tth3dro1ds wrote:
| >You can use any edge cache solution
|
| Umm, those solutions exist (from places like AWS and Azure)
| _because_ Netflix was able to do it without them. The cloud
| platforms recognized that others would want to build their
| own streaming services, so they built video streaming
| offerings.
|
| You have the cart in front of the horse. The out-of-the-box
| solutions of today don't exist without Netflix (and YouTube)
| building a planet scale video solution first.
| AtNightWeCode wrote:
| N had problems in US because they served data from CA.
| Today, N uses edge caching and the data for me in Europe is
| sent less than 10km to my home. And it should be cheap. We
| are talking about serving static content here. It is not
| very difficult.
| jedberg wrote:
| Why do you think Netflix served out of California? They
| only did that for the first few months, until they
| adopted Akamai, Limelight, and L3 CDNs. That was long
| before Netflix launched in Europe.
| AtNightWeCode wrote:
| Well they use to, they tried to bully various ISPs into
| increasing their throughput before they jumped the edge
| cache wagon, long time after competitors. Akamai is a
| stellar company. Don't think N uses A services today. At
| the end of the day. N mostly serves static content to
| users and I highly doubt that hardware costs is a very
| relevant parameter.
| jedberg wrote:
| With all due respect, you have no idea what you're
| talking about. I worked there during the transition from
| 3rd party CDNs to OpenConnect. We got off 3rd party CDNs
| in 2013/4 and operated solely out of OpenConnect, in
| large part because no 3rd party CDN was capable of
| serving our amount of video at any price, including
| Akamai. We weren't even streaming out of our own
| datacenter anymore by the time I started, and that was
| when streaming was still free with your DVD plan.
|
| And your timeline is all wrong too. Netflix didn't even
| engage with the ISPs about bandwidth until long after
| moving out of our own datacenter. We started the
| OpenConnect program specifically to make it easier for
| ISPs, there was no bullying. The spat you're thinking of
| is that Comcast didn't want to adopt the OpenConnect but
| also didn't want to appropriately peer with other
| networks to give their customers the advertised speeds.
|
| And hardware cost is a hugely relevant parameter. Being
| efficient with hardware is the difference between
| profitable streaming at that scale and not profitable.
| AtNightWeCode wrote:
| You mean all the heat maps provided by Comcast and so on
| from 2014(?) are incorrect? That they lied about all the
| traffic from CA caused by N?
| jedberg wrote:
| Please link those heat maps. I think you're reading them
| wrong.
| 0x457 wrote:
| > any edge cache solution
|
| Someone still has to do the R&D for edge cache? These slides
| are about Open Connect - their own edge cache solution that
| gets installed in partners racks (i.e. ISPs and Exchanges).
| Before things that Netflix and Nginx implemented in FreeBSD,
| hardware compute power was wasted on various things they
| discuss in slides.
|
| Yes, you can throw money at the problem and buy more
| hardware.
| AtNightWeCode wrote:
| Fair. Point taken. I answered the comment not the article.
| seydor wrote:
| I dont see the point. A centralized data hose that is replacing
| what internet was designed to be : a decentralized, multi
| routed network. The problem may be useful to them, but unlikely
| to be useful to anyone who doesn't already work there. I dunno,
| if it was possible to monetize decentralized or bittorrent
| video hosting, i think it would solve the problem in a more
| interesting and resilient way. With fewer engineers.
|
| But it's like, every discussion today must end with something
| about the pay and head count of engineers.
| paxys wrote:
| While we are at it let's just put video streaming on the
| blockchain! Who needs all these engineers and servers.
| jedberg wrote:
| But only seven people can stream at once!
| RexM wrote:
| Once you download the chain you can watch anything you
| want! You'll have a local copy of _everything_
| oleganza wrote:
| I understand and even share a little bit of your sentiment,
| but I'm tired of stretched "X is now not what X was supposed
| to be".
|
| Strictly speaking, the Internet was supposed to help some
| servers survive and continue working together despite some
| others being destroyed by a nuke. That is more-or-less the
| case today: we see how people use VPNs to route around
| censorship. Whether you were supposed to stream TikTok videos
| directly from the phones of their authors or through a
| centralized data hose - i'm not sure that was ever the grand
| idea.
|
| Also "decentralized" and "monetize" don't go well together
| because innovation is stimulated by profit margins and rent-
| free decentralized solutions by definition have those margins
| equal to zero (otherwise the solution is not decentralized
| enough).
| jedberg wrote:
| It's funny you mention this. When I worked at Netflix, we
| looked at making streaming peer to peer. There were a lot of
| problems with it though. Privacy issues, most people have
| terrible upload bandwidth from home, people didn't like the
| idea of their hardware serving other customers, home hardware
| is flakey so you'd constantly be doing client selection, and
| other problems.
|
| So it turns out decentralized multi routed is not a good
| solution for video streaming.
| gizajob wrote:
| Works great for storing pirated content though
| jedberg wrote:
| Usually you aren't live streaming your pirated content
| right off other people's boxes. You download it first and
| then view it. So you don't need every chunk available at
| just the right time.
| monocasa wrote:
| Popcorn Time worked pretty well with just that model;
| watching more or less immediately is it's downloaded in
| order from the swarm.
| jedberg wrote:
| Did you actually use Popcorn time? It got stuck all the
| time waiting for a chunk. Also, again, people sharing
| pirated content don't care about privacy and are happy to
| share their home hardware for other people to use. Paying
| customers care about that stuff.
| monocasa wrote:
| I have; it worked flawlessly for content that was
| decently seeded. And that's without the sorts of table
| stakes you'd expect for a streaming platform like the
| same content encoded at different bit rates, but chunked
| on the same boundaries so you can dynamically change
| bitrate as your buffer depletes.
|
| And I'm not sure most people actually care if their home
| hardware is being used for whatever by the service
| they're using, or else there'd be pushback on electron
| apps from more than just HN.
|
| The sense I always got from Netflix's P2P work was that
| it was heavily tied into the political battles wrt the BS
| arguments that Netflix should pay for peering with tier 2
| ISPs. Did this work there continue much after that
| problem went quieter?
| PaywallBuster wrote:
| Used it dozens of times, usually works fine for the
| popular content.
|
| Good quality, barely any buffering.
|
| The niche content may be too difficult for a "live"
| streaming experience.
| naet wrote:
| Somebody I know (cough) starts torrent downloads in
| sequential order after downloading the first and last
| chunk, and then opens the file in VLC while it is
| downloading.
|
| Works amazingly well for watching something front to back
| if your download speed is fast enough; you'd never know
| it wasn't being streamed. The hardest part is finding a
| good torrent for what you want to watch. Ironically the
| Netflix catalog is one of the most easily available to
| pirate since people rip it directly from web.
| seydor wrote:
| recently i see a lot of people with very high upload
| speeds. Nobody is using them though, but nominally they are
| there.
| jedberg wrote:
| Sure, very recently. But all the other issues still
| apply. A real time feed from random people's machines is
| very difficult at best.
| seydor wrote:
| I ve watched a lot of HBO (not available here) on popcorn
| time
| chasd00 wrote:
| wouldn't a peer-to-peer setup be a non-starter legally?
| ..or at least incredibly high risk. I could see major ISPs
| complaining if Netflix is using the upstream side of the
| ISP's customers for profit.
| jedberg wrote:
| Yes. :)
| tinus_hn wrote:
| No, Microsoft is doing the same thing and nobody cares.
| Just mention it the small print in the agreement and
| offer a way to turn it off.
| onlyrealcuzzo wrote:
| To be pedantic, scaling by itself isn't _that_ difficult.
|
| Scaling cost-effectively is.
| sllabres wrote:
| Tell that e.g. Tesla
|
| What I've read they burned a lot of money and hat large
| problems scaling nevertheless. Which I don't find too
| surprising, not because they are unable, but because it isn't
| easy to scale.
|
| From my experience and from what I read scaling people
| roughly a power of ten is a larger change in an organisation
| and therefor likely a challenge. For _any_ technical process
| the boundaries might not be strictly a power of ten but i
| would say that scaling a power of a hundred is a challenge if
| this value is not already reached on any process in your
| organisation.
| onlyrealcuzzo wrote:
| True.
|
| Scaling to - say - Paramount+ size should not be difficult
| if you're willing to pay AWS / Azure / GCP 10-100x what it
| would cost to serve it yourself (which in many cases
| actually makes sense).
|
| It's possible at Netflix's size, they couldn't just run on
| AWS anymore. Though, given enough lead time and a realistic
| growth curve - I'm sure it's feasible.
|
| Obviously scaling manufacturing is not a solved problem
| like (realistically) scaling network and compute usage.
| yibg wrote:
| Serving Netflix streaming traffic from AWS would be...
| unwise. One the bandwidth cost would be enormous even if
| they can handle it. And two I doubt they can handle that
| much traffic.
| eru wrote:
| Yes, and No. At some point, even scaling at all would be
| hard.
|
| (Just like sending a human to Alpha Centauri is hard, even if
| you had unlimited funds.)
| Dylan16807 wrote:
| Like it how? Accomplishing a grand feat is nearly the
| opposite of scaling.
|
| If Netflix built out more slower servers, that would be
| acceptable scaling. I don't see any plausible scenario
| where that becomes too difficult. Even if they had billions
| of subscribers.
| toast0 wrote:
| Eh, sending a human to Alpha Centauri wouldn't be that
| hard... Although it would be difficult to know for sure if
| they arrived, and for ease of transport, you may want to
| send a dead human.
| kaba0 wrote:
| It depends entirely on the problem domain. Sure, it is more
| of a devops problem when the problem is trivially
| parallelizable, but often you have a bottleneck service (e.g.
| the database) that has to run on a single machine. No matter
| how many instance serves the frontend * if every call will
| have to pass through that single machine.
|
| * after a certain scale
| le-mark wrote:
| > Too many people today think that programming starts and ends
| at pulling a dozen popular libraries and making some API calls.
|
| The needle keeps moving doesn't it? A tremendous breadth of
| difficult problems can be effectively addressed by pulling
| together libraries and calling APIs today that weren't possible
| before. Today's hard problems are yesterday impossibilities.
| The challenge for those seeking to make an impact is to dream
| big enough.
| ezconnect wrote:
| The basic problem is the same, pushing the hardware to its
| limits.
| whatshisface wrote:
| The basic problem is delivering value to someone.
| ezconnect wrote:
| Programmers are not passionate to deliver value to
| someone, that's the businessman problem.
| bcrosby95 wrote:
| Not every programmer is passionate about the same thing.
| I got into this field because I love building things that
| make people's lives easier.
| echelon wrote:
| Sure.
|
| Anecdotal, but most of the people I've worked with as ICs
| couldn't give a damn about that. They want dollarydoos.
|
| One of the 10X-ers I know (they exist and are real), told
| me repeatedly how he'd much rather be doing his own
| thing. He hates the business needs. But income is
| important and that's why he's dedicated to doing it. I'm
| surprised at how focused and good he is given his
| disposition, and I want to hire him when I scale my
| business more. Drive and passion are sometimes just
| spontaneous.
|
| An old CEO of mine even quipped that we were not family
| and that we were there to do a job. All true. Most of the
| people doing that job were only there for the money.
|
| Most jobs that drive sales and revenue simply aren't fun
| or rewarding. There's lots of infrastructural glue and
| scaling. Tiring, boring, monotonous work. 24/7 oncall
| work. The money is good, though.
| [deleted]
| [deleted]
| nwallin wrote:
| > the popular "why does Netflix need X thousand engineers, I
| could build it in a weekend" sentiment that is frequently
| brought up on forums like this one.
|
| I don't think that's a popular sentiment about Netflix.
| Twitter, Reddit, Facebook, yes, but Netflix, YouTube, Zoom, not
| so much.
| mihaic wrote:
| I don't think this actually answers why Netflix needs to many
| engineers. This seems like the sort of thing that one or two
| experienced engineers would spend a year refining, and it would
| turn out like this.
|
| This is the sort of impressive work that I've never seen scale.
| drewg123 wrote:
| Author here... Yes, most of this work was done by me, with
| help from a handful of us on the OCA kernel team at Netflix
| (and external FreeBSD developers), and our vendor partners
| (Mellanox/NVIDIA).
|
| With that said, we are standing on the shoulders of giants.
| There are tons of other optimizations not mentioned in this
| talk where removing any one of them could tank performance.
| I'm giving a talk about that at EuroBSDCon next month.
| tetha wrote:
| The way I've been putting it to people lately is: Never
| underestimate how hard a problem can grow by making it big. And
| also, at times, it is hard to appreciate how difficult
| something becomes if you haven't walked the path at least
| partially.
|
| Like, from work, hosting postgres. At this point, I very much
| understand why a consultant once said - "You cannot make
| mistakes in a postgres 10GB or 100GB and a dozen transactions
| per second in size". And he's right, give it some hardware,
| don't touch knobs except for 1 or 2 and that's it. The average
| application accessing our postgres clusters is just too small
| to cause problems.
|
| And then we have 2 postgres clusters with a dataset size of 1TB
| or 2TB peaking at like 300 - 400 transactions per second.
| That's not necessarily big or busy for what postgres can do,
| but it becomes noticeable that you have to do some things right
| at this point and some patterns just stop working hard.
|
| And then there are people dealing with postgres instances 100 -
| 1000x bigger than this. And that's becoming tangibly awesome
| and frightening by now, using awesome in a more oldschool way
| there.
| mlrtime wrote:
| Not only make it big, engineer it in a way that makes it
| profitable for the business.
|
| I'm sure there are many teams that could design such a
| network with nearly unlimited resources, but it is entirely
| different when you have profit margins.
| victor106 wrote:
| As someone once said "Big is different"
| Sytten wrote:
| I think a fair criticism would be how many engineers they have
| compared to their competitors. Disney+ is on a similar scale,
| can they do the same/similar job with less people? And
| considering netflix pays top of market, how much does Disney
| spends for their engineering effort to get their result. Would
| netflix benefit from just throwing more hardware at the problem
| vs paying more engineers 400-500k/y to optimize?
| paxys wrote:
| Disney (the company) has 20x the number of employees as
| Netflix, and just 2x the market cap (in fact they were
| briefly worth the same last year), ~2x the revenue and 2/5
| the net income. So Netflix is clearly doing something right.
| eru wrote:
| Perhaps they are just running different business models?
|
| Walmart's market cap per employee is probably much, much
| lower than Disney or Netflix, too. That doesn't mean
| Walmart is doing anything wrong.
| ziddoap wrote:
| > _Disney (the company) has 20x the number of employees_
|
| Is that all of Disney or just Disney+?
|
| It doesn't seem like that would be a useful statistic if
| that includes completely unrelated positions (e.g. does
| that 20x statistic include Disney employees working at
| Disney Land/World serving up hotdogs? Because they probably
| don't contribute much to the streaming service)
| briffle wrote:
| Netflix also has production studios they now own making
| content.
| thfuran wrote:
| Content like hotdogs at an amusement park?
| diab0lic wrote:
| https://bridgertonexperience.com/san-francisco/
|
| https://strangerthings-experience.com/
| scrlk wrote:
| Disney Streaming had 850 employees as of 2017 [0] (can't
| find any newer figures); LinkedIn is suggesting 1k-5k.
|
| [0] https://en.wikipedia.org/wiki/Disney_Streaming
| rybosworld wrote:
| That seems like a fair point if you just consider the video
| streaming. I know that Netflix wants to break into gaming.
| I'd imagine the bandwidth required for that is higher than
| streaming videos.
| jon-wood wrote:
| It's really not, especially if you look at their current
| model for doing so. Netflix at the moment are breaking into
| mobile gaming, which means the bandwidth requirements are
| placed on Apple/Google's app store infrastructure. I'd be
| surprised if Netflix don't have any sort of metrics
| gathering infrastructure to judge how much people are
| playing those games, but they're also likely reusing the
| same infrastructure used by Netflix video streaming for
| that, so the incremental increase in load may well be
| negligible.
| rybosworld wrote:
| I was referring to their plans for a game streaming
| service.
| jwmoz wrote:
| I watch Disney content sometimes and it constantly drops or
| freezes, you can see the difference in quality compared to
| Netflix.
| bmurphy1976 wrote:
| Yeah, you can totally see the difference. Netflix encoding
| looks like shit.
|
| I've done a lot of video processing professionally (the
| server side stuff, exactly what Netflix does) and Netflix
| is by far the worst of all the streaming providers. They
| absolutely sacrifice the quality of the video to save
| bandwidth costs in aggregate and it shows (or more
| accurately it doesn't show, all the fidelity is lost).
| mkmk wrote:
| Do you think it's worth it to pay the extra $5-10/month
| for premium quality?
| https://help.netflix.com/en/node/24926
| bradstewart wrote:
| Even the Premium 4k streams have surprisingly low
| bitrates and, occasionally, framerates. I dug out the
| blu-ray player the other day and was absolutely shocked
| how good things looked and, even more so, _sounded_ --the
| audio quality from Netflix (and most streaming services,
| really) is simply atrocious.
| jedberg wrote:
| Are you getting the best Netflix encodings? You might be
| getting worse quality because your ISP throttles Netflix.
| bagels wrote:
| Your isp may be throttling bandwidth for Netflix, leading
| to lower quality encodings being served to you. Comcast
| does this, for instance.
| nicce wrote:
| You can't make such conclusions from your own experience.
| It is one form of bias. There are many variables. For me it
| is the opposite, for example.
| iamricks wrote:
| Standing on the shoulders of giants, Netflix engineers didn't
| have blog posts from other companies on how to handle the
| scale they started facing. Facebook didn't have blog posts to
| reference when they scaled to 1B users. They pay for talent
| that have built systems that had not been built before and
| they have seen a return on it so they continue to do it.
| wowokay wrote:
| Hulu was around before netflix
| gavin_gee wrote:
| yeah and have you see the awful performance of Hulu? its
| basically unusable. poster child for under investing in
| the streaming platform.
| paxys wrote:
| Huh? Netflix predates Hulu by over a decade.
| msh wrote:
| Hulu was never Netflix scale. YouTube is a better
| example.
| birdyrooster wrote:
| Not even close. YouTube has orders of magnitude more
| content and vastly more users. Google Global Cache was
| the inspiration for Open Connect.
| jedberg wrote:
| Youtube is very different than Netflix from a technical
| problem perspective. They serve free videos to anyone
| around the world that are uploaded by users.
|
| It's closer to a live streaming problem than pre-encoded
| video like Netflix.
|
| Having worked at Netflix I can say that the YouTube
| problem is much more complex.
| why_only_15 wrote:
| I wonder what portion of Youtube's request traffic can be
| served with cache servers at the edge with a few hundred
| terabytes of storage. There's a very long tail but i
| would guess a significant portion of their traffic is the
| top ~10,000 videos at any given moment.
| spockz wrote:
| There was a Google organised hackathon on this topic.
| Given a set of resources, locations, and (estimated)
| popularity, Optimise for video load time by determining
| what should be moved to the cache when and where.
| Cerium wrote:
| Sure? "After an early beta test in Oct. of that year,
| Hulu was made available to the public on March 12, 2008--
| a year after Netflix launched its own streaming service."
|
| [1] https://www.foxbusiness.com/technology/5-things-to-
| know-abou...
| esotericimpl wrote:
| pclmulqdq wrote:
| The engineers are definitely cost-effective at this scale.
| They may be the highest-leverage engineers at the company in
| terms of $ earned from their efforts compared to $ spent. The
| improvements that come from performance engineers at large
| companies are frequently worth $10M/year/person or more.
|
| Most companies maintain internal calculations of these sorts
| of things, and make rational decisions.
| gregsadetsky wrote:
| Sorry for the tangent, but really curious to ask:
|
| When you say that companies maintain internal calculations
| of the benefits, would you say that it's (extremely
| roughly) something like: $10M benefit, need 5 core
| engineers + benefits + PM + testing lab etc etc -> we can
| spend up to $500k per eng give or take.
|
| Or is the $10M one number (that would be held somewhat
| secretly internally at the company) and the salaries mostly
| represents where the market is? Does the (salary) market
| take into account the down-the-line $10M value?
|
| Basically, could those engs negotiate to be paid more, or
| are they already sort of paid close to exactly what the
| group they're part of generates in terms of revenue?
|
| Thanks!
|
| --
|
| I see that you said $10M per person, not for the "network
| optimization group". Hmm. So it would be fair to say that
| the engs are definitely not paid according to the value
| they generate..? I wouldn't be surprised by that but just
| to confirm.
| pclmulqdq wrote:
| The simple fact is that you are not paid for the value
| you create. You are paid based on the salary you can
| demand. For performance engineers, $10
| million/year/person opportunities are kind of rare,
| meaning that you can't demand close to that. Your
| alternatives to big tech are things like wall street,
| which pay very well, so you can demand a higher salary
| (and/or higher level) than a normal engineer of your
| skill would get. However, this is nowhere near the value
| of the work.
| Hermitian909 wrote:
| Not OP, but 1 engineer -> 10M of benefit sounds right for
| my company.
|
| In terms of negotiation, it really depends on how
| differentiated your skills are. Short answer is that if
| you can convince management that it would be difficult to
| find other engineers who could deliver the optimizations
| you're delivering, yes, you have leverage.
| pclmulqdq wrote:
| This is exactly right about negotiation and your
| skillset. I have seen performance engineers in the right
| place at the right time get 10-20% of their benefit to
| the company (I have seen both $1 million/year
| compensation for line workers and $10+ million/year for
| very senior folks).
|
| Very highly skilled engineers in specific niches can
| basically price themselves like monopolists, because the
| company can easily figure out how much money they are
| leaving on the table by not hiring them. This is not like
| "feature work" engineers, whose value is very nebulous
| and unknown.
| donavanm wrote:
| If you are an employee there is little to no relationship
| between your output and your compensation. Employer
| employee relationships are based on the _cost to the
| employer_ to secure equivalent or better output.
|
| Secondly, yes $10M per employee of revenue or cash flow
| is pretty reasonable for similar companies. The
| prioritization is NOT "how many employees per $MM." The
| allocation is "what opportunity is the highest $MM return
| per available employee."
| toast0 wrote:
| > Would netflix benefit from just throwing more hardware at
| the problem vs paying more engineers 400-500k/y to optimize?
|
| Where the CDN boxes go, you can't always just throw more
| hardware. There's a limited amount of space, it's not
| controlled by Netflix, and other people want to throw
| hardware into that same space. Pushing 800gbps in the same
| amount of space that others do 80gbps (or less) is a big
| deal.
| [deleted]
| slillibri wrote:
| Disney bought a majority ownership in BAMTECH to build
| Disney+.
| entropie wrote:
| I wasn't able to watch disney+ via chromecast for like a year
| in 4k. Stuttering every 10 seconds or so. I never had
| problems like this with netflix.
| criddell wrote:
| I guess you weren't a Comcast customer in 2014 trying to
| watch Netflix and getting low quality, stuttering video. At
| the time lots of people tried to frame it as a net
| neutrality issue but in the end I think it was a peering
| dispute that involved a third party.
|
| https://www.wsj.com/articles/SB1000142405270230483470457940
| 1...
| loopercal wrote:
| I think this just validates their points. Netflix has
| more engineers and 8 years of them building and fixing
| things, so they have fewer issues.
| rakoo wrote:
| Sure, if you place yourself in an arbitrarily hard problem, it
| takes a lot to solve it. "How we dug a 100m pit without using
| machines in 2 days" is an incredible feat, but the constraints
| only serve those who put them.
|
| Serving large content has been solved for decades already. It's
| much easier and reliable to serve from multiple sources, each
| at their maximum speed. Want more speed ? Add another source.
| Any client can be a source.
|
| Netflix artificially restrains itself by only serving from
| their machines. It is a very nice engineering feat, but is
| completely artificial. As a user it feels weird to think of
| them highly when they could just have gone the easier road.
| zinclozenge wrote:
| How would you do it if you had much more modest scale
| requirements? Say a few thousand simultaneous viewers. I'm
| kicking around an idea for a niche content video streaming
| service, but I don't know much about the tech stacks for it.
| vagrantJin wrote:
| A few thousand?
|
| Just use Nginx and a backend lang of your choosing.
| zinclozenge wrote:
| Not even bother using a cdn?
| ev1 wrote:
| For low-traffic niche content that might not be a cache
| hit in the first place in every region?
|
| I wouldn't bother. Unless you use storage at the CDN -
| which is probably very not cost effective for you.
| rakoo wrote:
| Use bittorrent. Every viewer is also a source. The more
| people watch, the less your servers are loaded.
|
| Bittorrent is built towards "offline" viewing. Try Peertube
| for a stack that is more built for streaming and has
| bittorrent sharing built-in (actually webtorrent, because
| the browser doesn't speak raw TCP or UDP, but the idea is
| the same)
| jedberg wrote:
| The constraint is profit. Sure, with unlimited money you can
| just keep getting more and more servers. But that costs
| money. It would end up swamping any profit to be made.
|
| By creating this optimized system, it makes serving that much
| video _profitable_.
| rakoo wrote:
| No, the constraints is _only you serve content_. But once
| the content is distributed, anyone else can also distribute
| it.
| jedberg wrote:
| I'm curious as to who you think would pay for the video
| if anyone could distribute it and watch it.
| yibg wrote:
| And break the profit and also probably legal constraints.
| Good job now you don't have a company anymore.
| dmikalova wrote:
| This just isn't true though. I worked at a relatively minor
| video streaming company and we overloaded and took down AWS
| CloudFront for an entire region. They refused to work with us
| or increase capacity because the datacenter (one of the ones
| in APac) was already full. This was on top of already
| spreading the load across 3 regions. We only had a few
| million viewers.
|
| We ended up switching to Fastly for CDN. There's something
| hidden here though that becomes a problem at Netflix size. We
| were willing to pay the cloud provider tax, and we didn't dig
| down into kernel level or storage optimizations because off
| the shelf was good enough. At Netflixes scale, that adds up
| to millions of extra server hours you have to pay for if you
| don't do the 5% optimizations outlined in the article.
| rakoo wrote:
| You still have the same constraints: only you can serve
| content.
|
| The solution I'm talking about is bittorrent. The more
| people watch your content, the less your servers bear load.
| That is using the internet to its best potential instead of
| reverting back to the centralized model of the big shopping
| mall and its individual users.
| rvnx wrote:
| I think nobody said Netflix' infrastructure can be built in a
| weekend. However, the scale doesn't matter that much after a
| certain point once the scaling "wall" has been pierced. If you
| are a biscuit factory that produces 100'000'000 biscuits per
| year or 500'000'000 biscuit per year then the gap between 100M
| and 500M isn't that impressive so much anymore as it's mostly
| about scaling existing processes. However, if you turn a 1'000
| biscuit shop into a 1'000'000 biscuits company then it's very
| impressive.
| bmurphy1976 wrote:
| Nonsense.
|
| It's still impressive. A 5x increase at that scale can be a
| phenomenal challenge. Where do you source the ingredients?
| Where do you build the factories (plural because at that
| scale you almost certainly have multiple locations in
| different geographic locales subject to different regulatory
| structures). Where do you hire the people? How do you manage
| it? What about the storage and shipping and maintenance of
| all the equipment and on and on? How much do you do in house
| how much do you outsource to partners? What happens when a
| partner goes belly up or can't meet your ever increasing
| needs?
|
| Your comment is a great example of what the OP pointed out.
| jon-wood wrote:
| My favourite example of these sort of extreme scaling
| issues are the fact that McDonald's apparently declined to
| sell products with blueberries in them because modelling
| showed they'd have to buy the world's entire supply of
| blueberries in order to do so.
| notamy wrote:
| I thought this was hyperbolic, so I looked into it:
|
| > _The menu team comes up with interesting ideas like
| including kale in salads. The procurement team and
| suppliers then try to get the menu team to understand the
| challenges. How do you bring kale to 14,000 restaurants?
| As one example, when they introduced Blueberry Smoothies
| in the U.S., McDonald's ended up consuming one third of
| the blueberry market overnight._
|
| https://www.forbes.com/sites/stevebanker/2015/10/14/mcdon
| ald...
|
| I couldn't find any other source to back it up, but still
| wow! That's an absurd number.
| menzoic wrote:
| McDonald's sells blueberry muffins
| indigodaddy wrote:
| So there is an extreme dearth of blueberries I guess
| compared to other food goods? I mean, McDs isn't taking
| over the entire supply of potatoes or chickens for
| example correct?
| zaroth wrote:
| I think the point is that the supply chains probably need
| upwards of years of time to adapt in some cases, you
| can't just turn on a recipe that needs a full cup of
| blueberries per serving on Monday and expect there to be
| a spare million cups of blueberries to be lying around
| the supply chain on Tuesday.
|
| In the case in animal product, there are almost certainly
| major operations worldwide that have been built and
| financed purely to serve McDonalds demand. They probably
| have to even build these out well before entering some
| markets.
| UncleEntity wrote:
| They grow a lot of potatoes in the US. Last week I hauled
| a load of tater tots destined for McDonalds. I've hauled
| potato products for McDonalds quite often.
|
| They raise a lot of chickens in the US. I've hauled
| chicken nuggets or chicken breasts for McDonald's in the
| past quite often.
|
| I can't even tell you where they grow blueberries.
| belinder wrote:
| It's a good point, and I think it's an interesting
| comparison. Obviously improving by a factor of 1000 is better
| than improving by a factor of 5. But the absolute improvement
| is still 4 times larger. 400'000'000 extra biscuits is going
| to bring a lot more revenue than 999'000 biscuits
| paxys wrote:
| It's the exact opposite.
|
| Taking the software example, you can easily scale from 1 to
| 100 users on your own machine. You can handle thousands by
| moving to a shared host. Using off-the-shelf web servers and
| load balancers will help you serve a million+. From there on
| you'll have to spend a lot more effort optimizing and fixing
| bottlenecks to get to tens, maybe hundreds of millions. What
| if you want to handle a billion users? Five billion? Ten
| billion? It always gets harder, not easier.
|
| Pushing the established limits of a problem takes
| exponentially more effort than reusing existing solutions,
| even though the marginal improvement may be a lot smaller.
| Getting from 99.9% to 99.99% efficiency takes _more_ effort
| than getting from 90% to 99%, which takes more effort than
| getting from 50% to 90%.
|
| You never pierce the scaling wall. It only keeps getting
| higher.
| xuhu wrote:
| If you can serve 1K users with 10 employees, you can
| probably serve 1M users with 10k employees.
| kaba0 wrote:
| And you can birth one baby in 3 months by 3 women, right?
|
| To add something useful as well besides snark, first of
| all, there are hard physical limits, which are sometimes
| well within context (you really shouldn't try to
| outcompete light speed for example, relevant in some
| high-freq trading, infrastructure projects). Then you can
| try to increase headcount to any number, you won't
| produce for example a better compiler. There are simply
| jobs that are more "serial" - the only way to win at
| those is to try to employ the very best of the field in a
| small team.
| xuhu wrote:
| No, just 3 babies in 9 months.
| sllabres wrote:
| That won't help your customer expecting their 'baby'
| after three month due to the increased mother-workforce
| ;)
| kaba0 wrote:
| You can deliver DVDs to netflix subscribers as well to
| achieve a much bigger throughput, but I doubt they would
| be as popular as they are right now :D
| beckingz wrote:
| Sneakernet!
|
| "Never underestimate the bandwidth of a station wagon
| full of tapes hurtling down the highway." -Andrew
| Tannenbaum
| bmurphy1976 wrote:
| That's too simplistic. What about the doctors and medical
| facilities and other supporting infrastructure? What
| about the baby food and medicine and clothing and
| supplies and what about the people to take care of the
| children? You think you can just keep throwing more women
| at a hospital having babies to infinity and not have any
| problems?
| Dylan16807 wrote:
| There's a limit on those things, but it might as well be
| infinity when you're trying to have 1 baby or 3 babies or
| 100 babies.
|
| 1M users and 10k employees is not in the range where you
| have crushingly impactful logistics.
| Dylan16807 wrote:
| But the goal wasn't better or faster. It was giving more
| customers the same service. You're talking about a
| completely different problem.
| beckingz wrote:
| Remember that global productivity usually does not scale
| with headcount!
|
| Each employee adds some overhead, which requires more
| employees... which requires more employees.
| Mavvie wrote:
| Sounds like the rocket equation! Perhaps big companies
| are rocket science?
| loopercal wrote:
| If you told McDonald's to double their number of McRibs
| produced next year that would be an incredible challenge to
| meet. They already sell enough that it affects the global
| pork market, it'd be insane for them to double their demand
| for pork. What about other supplies, would this result in a
| reduced burger demand? How can they ensure they can respond
| appropriately either way? They probably run near
| fridge/storage capacity, does increasing this mean they need
| to also increase storage at restaurants?
|
| That's a 2X increase. Now do it again and a half for a 5x.
| Crazy to say there's a "scaling wall" that once you "pierce"
| it's easy to scale up. It's the opposite, McDonald's already
| knows how to supply and sell X McRibs a year, there's no
| company that's ever sold 5X those McRibs so they have to
| figure it out themselves.
| rkagerer wrote:
| There's an old rule of thumb that each order of magnitude
| increase (10x) brings a whole new set of challenges.
|
| Anecdotally I experienced this when scaling my software
| product from 1 --> 10 --> 100's --> 1000's etc. of users.
|
| Thats not to say 2x can't be a substantial challenge, as
| you pointed out. It gets harder (and IMO more fun) when
| you're at the bleeding edge of your industry.
| bombcar wrote:
| Part of it depends on if "build it five more times, again" is
| a viable strategy.
|
| Building five "Netflixes" with identical content is possible;
| the amount of content wouldn't change (it would decrease, the
| cynic says); you just need parallel copies of everything
| (servers, bandwidth, etc).
|
| The fun would come in syncing usernames, etc through the
| system.
|
| It's an entirely different class of problem compared to
| "acquire resource, convert it, sell it".
| zeroxfe wrote:
| > the gap between 100M and 500M isn't that impressive
|
| This is absolutely not true. The closer you are to peak
| performance, the harder it is to scale, and the returns
| diminish heavily. At many major tech companies, there's a
| huge amount of effort into just 1% - 5% optimizations --
| these efforts really require creative thinking and complex
| engineering (not just "scaling existing processes".) At the
| volumes these companies operate, even a 1% optimization is
| quite significant.
| carlhjerpe wrote:
| Aren't you contradicting yourself?
|
| If you're on 100M users you're probably scaling vertically.
| So adding 5x more hardware shouldn't be a problem.
|
| But when you're at 500M all of a sudden it makes sense to
| optimize further since the capital saved will be the same
| percentage(ish) but the money is worth peoples time all of
| a sudden.
|
| I know that we don't care particularly about power savings
| in the DCs I've worked in, because they're relatively
| small. While bigtech will do all kinds of shenanigans to
| save a couple watts here and there, because it's worth it
| across your hundreds of thousands of servers.
| beoberha wrote:
| Seeing scale issues as purely hardware bound is
| incredibly naive. Even in a case like streaming, if
| you're pushing more bits through the wire, it's likely
| the increase in usage causing the traffic increase
| affects the software systems you have in place to support
| your service start degrading and you need to rearchitect
| them. Very few problems at that scale can be solved by
| throwing more hardware at them.
| Quarrelsome wrote:
| > why does Netflix need X thousand engineers, I could build it
| in a weekend
|
| I would like to hope nobody asks that. Video is the one of the,
| if the not the hardest data plumbing use-case on the internet.
| dragontamer wrote:
| I'd say realtime communications is harder.
|
| A lot of these tricks being discussed here cannot be applied
| to Skype calls.
| OJFord wrote:
| Surely GP would agree, unless you mean even audio-only
| calls? Otherwise it's just an extra requirement(s) on top
| of 'video'.
| dragontamer wrote:
| The amount of transcoding needed to get a conference call
| up hurts my brain. If 20 people are talking on Skype, the
| server needs to receive those 20 streams, decode them,
| mix the audio together, recode the streams, and then
| broadcast it back out to all 20 people.
|
| I'm not a telecommunications guy, but I had some
| professors back in college explain how difficult and
| fundamental the research of "ma-bell" was from the 60s
| through 80s. I'm talking Erlang, C++, CLOS circuits, etc.
| etc. The innovations from Bell Labs are nearly endless.
|
| Telephone communications is one of the biggest sources of
| fundamental comp. sci research over the 1950s through
| 2000s.
| nordsieck wrote:
| > I'm talking Erlang, C++, CLOS circuits, etc. etc. The
| innovations from Bell Labs are nearly endless.
|
| A lot of innovations did come out of Bell Labs. But I'm
| pretty sure Erlang wasn't one of them.
| dragontamer wrote:
| Oh, you're right. There seems to have been a glitch in my
| memory somehow. Still, its Ericsson, which is
| telecommunications nonetheless.
| eru wrote:
| > Otherwise it's just an extra requirement(s) on top of
| 'video'.
|
| I'm not sure what you mean? Real time communication, both
| video and audio-only, have much lower latency
| requirements. You can't just buffer ahead when you have
| some spare bandwidth, like Netflix or YouTube can.
| OJFord wrote:
| Yeah that's what I'm kind of facetiously calling 'just an
| extra requirement'.
|
| My point was intended to be that there's the same
| challenges and more - but it's not something I've thought
| about in depth (and certainly not had to work with), it
| maybe wasn't a very good characterisation because it's
| not the same on the other side either, no large file to
| serve because at the start of the call it doesn't exist
| yet for example, so perhaps I take it back.
| eru wrote:
| Yeah, jon-wood really did a great write-up of the
| challenges involved.
|
| In any case, it's hard to say what the 'greatest'
| engineering challenge is. You can make almost any kind of
| engineering really challenging, if you (or the market..)
| sets yourself a very low cost-ceiling.
| bombcar wrote:
| Video calling is "easier" in that the p2p option is
| workable for some variant of "Works".
|
| It is _much much harder_ because you can 't do the "cache
| everything on the edge" solution. If storage was
| infinitely cheap and small, Netflix could run their
| entire business by sending a USB stick with _every single
| movie /tv show they have_ on it encrypted to you, and
| everything would play locally. This is basically what
| they do with their edge servers/CDNs.
|
| You can't do that with video calls, because the
| video/audio didn't exist 1 millisecond ago.
| jon-wood wrote:
| Streaming pre-recorded video and streaming realtime video
| are almost entirely different use cases.
|
| Pre-recorded video streaming is, under the hood, really
| just a high-volume variant of serving up static web
| pages. You have a few gigabytes of file to send from the
| server its stored on to the device that wants to play
| back the video. As this presentation demonstrates that
| isn't trivial at scale, but the core functionality of
| sending files over the internet is what it was designed
| to do from day one. Because you can generally download
| video across the internet faster than it can be played
| back its possible to build up a decent sized buffer which
| allows you to paper over temporary variance in network
| performance without the customer noticing.
|
| Realtime video streaming has two variants. One to many
| Twitch style video streaming is relatively simple, since
| you can encode video into files and upload them to a
| server for people watching to download those files. This
| is how HLS streaming works, and most of the techniques
| Netflix use to optimise video delivery can also be
| applied here at the cost of adding latency between the
| event being streamed and people consuming it. That
| latency will often sit at about 30 seconds, and people
| generally find that acceptable.
|
| Skype style realtime video streaming is much harder.
| You're taking video from one person's camera, and then
| sending it over the internet to one or more people's
| device. You can't do any sort of pre-processing on that,
| or stage the video on servers closer to the consuming
| users, because you have no way of generating that video
| until the point your users decide to start talking to
| each other. Because you can't pre-stage that video you
| need to be able to establish a network route between the
| people on a call, potentially in an environment where
| none of the participants have any open connection from
| the internet directly to the device they're streaming
| from. Slight fluctuations in network performance can
| potentially degrade video delivery to the point of it
| being unusable. The most common route to deal with that
| is systems that attempt to establish a direct connection
| (ideally over a local network) between participants, and
| if that doesn't work going via relay servers operated by
| the software provider. These servers provide a single
| point on the internet all parties can connect to, and
| then allow passing packets as if they were all on the
| same network.
| jdyyc wrote:
| I work on a very technically trivial service at a large
| company.
|
| It's the kind of thing that people run at home on a raspberry
| pi, docker container or linux server and it consumes almost no
| resources.
|
| But at our organization this needs to scale up to millions of
| users in an extremely reliable way. It turns out this is
| incredibly hard and expensive and takes a team of people and a
| bucket of money to pull it off correctly.
|
| When I tell people what I work on they only think about their
| tiny implementation of it, not the difficulty of doing it at an
| extreme scale.
| RektBoy wrote:
| csmpltn wrote:
| At this point, they should've just gone for an in-house bare-
| bones operating system that supports the bare minimum: reading
| chunks from disk, encrypting them, and forwarding them to the
| NIC.
|
| Besides that, it sees like all of the heavy lifting here is done
| by Mellanox hardware...
| wmf wrote:
| FreeBSD is their "in-house" operating system since they modify
| it to do what they want.
| csmpltn wrote:
| But do they really need an entire operating system for what
| amounts to simply copying around chunks of data? I think they
| could've gone for some slim RTOS-ish solution instead: no
| user-mode, no drivers, bare minimum.
| wmf wrote:
| They're using the FreeBSD filesystem and network stack,
| both of which are significant amounts of code. I guess they
| could have tried the rump kernel concept but it sounds like
| a lot of work.
| drewg123 wrote:
| I worked on an OS like that once. The problem is with "all
| the other stuff" that you need to support that's outside
| the core mission of your OS. You wind up bogged down on
| each additional feature that you need to implement from
| scratch (or port from another OS with a compatible
| license). With FreeBSD, all this comes for free.
|
| We chose to use FreeBSD, and have contributed our code back
| to the FreeBSD upstream to make the world a better place
| for everyone.
| drewg123 wrote:
| Its only doing the crypto. The VM system and the TCP stack are
| doing most of the heavy lifting, and are both stock FreeBSD.
| yrgulation wrote:
| This is innovation and proper engineering. They choose freebsd.
| Shows they are not afraid of solving actual hard problems that
| yield impressive results. These are the types of engineers i'd
| hire in a heart beat - if i ever was to own a successful company.
|
| Simply following trends and doing what everyone else does leads
| to mediocre results and the assembly line type of work that most
| software development has become.
| faizshah wrote:
| I have a bit of a naive question. If TLS has this much overhead
| how do HFT and other finance firms secure their connections?
|
| I know they use a number of techniques like kernel bypass to get
| the lowest latency possible, but maybe they have explored some
| solution to this problem as well.
| shiftpgdn wrote:
| Mellanox cards or private links
| wyager wrote:
| TLS doesn't really add latency on top of TCP after you make the
| initial connection - it mostly adds a bit of extra processing
| overhead for encryption. HFT firms aren't usually encryption-
| bandwidth-constrained. I'm not actually sure if most exchange
| FIX connections or whatever actually run over TLS, but that
| would be reasonable.
| drewg123 wrote:
| TLS has the highest overhead when you're serving data at rest,
| like static files that are not already in the CPU cache. For
| serving dynamic data that is in the CPU cache, TLS offload
| matters a lot less. Our workload is basically the showcase for
| TLS offload.
| naikrovek wrote:
| i love (love) how everyone else who answered this question
| alongside you made what appears to be a complete stab in the
| dark guess while only you knew the answer.
|
| never be afraid to admit that you don't know something.
| guessing wrong is a much worse look than not answering at
| all.
| ddmitriev wrote:
| Trading connections that go over private links such as cross-
| connects between the firm's and the exchange's equipment within
| a colocation facility are not encrypted.
| AtlasBarfed wrote:
| HFT needs to be outlawed.
|
| No exchange should allow trades to complete, I would argue in
| any time less than 15 minutes, and each trade should have a
| random 1-15 minute delay pad on top of that.
|
| The HFT access only serve the larger financial firms, and are
| used to do frontloading and other basically-illegal tricks. It
| provides anti-competitive advantage to large firms for markets
| that are supposed to open access/fair trading. And of course it
| leads to AI autotrading madness.
|
| I get that it keeps a lot of tech people very well compensated,
| but it is either in the service of unregulated fraud at worst
| and unfair advantage at best.
| jonahhorowitz wrote:
| Not really germane to the topic, but a financial transactions
| tax would effectively kill HFT without the complexity that
| you're suggesting.
| [deleted]
| theideaofcoffee wrote:
| This is a remarkable technical achievement that builds on all of
| its past work, as are the other updates from Netflix in the past
| with serving ever more traffic from a single box. That said, I
| still find it terrifying that so many users would be affected by
| a single machine going down, that blast radius is so huge!
|
| Do we know if the rates that these hosts serve actually make it
| into production? Or do they derate the amount they serve from a
| single host and add others?
| lanstin wrote:
| I think they buffer and if the stream has issues the client
| connects to another host. They have been doing chaos monkey for
| a long long time.
| drewg123 wrote:
| As I said in a parallel comment, this is a testbed platform to
| see what problems we'll encounter running at these speeds.
| Production hosts are single socket, and can run at roughly 1/2
| this speed.
|
| I regret that I've crashed boxes doing hundreds of Gb/s.
| Thankfully our stack is resilient enough that customers didn't
| notice.
| qwertox wrote:
| Discussion of the same presentation 11 months ago, when the title
| was 400GB/s.
|
| https://news.ycombinator.com/item?id=28584738
|
| This was the video which was posted back then alongside the
| slides: https://www.youtube.com/watch?v=_o-HcG8QxPc
| ndom91 wrote:
| Video of this presentation available here:
| https://cdnapisec.kaltura.com/index.php/extwidget/preview/pa...
| forgot_old_user wrote:
| thank you!
| haunter wrote:
| As a total outsider it looks like FreeBSD is the "silent" OS
| behind lot of the big money projects. Not just this but recently
| learned it's the base OS of the Playstation 4 and 5 system too.
| Is there a reason why FreeBSD is so popular? Just general
| reliability? And why not the other BSD projects? Also one, like
| me, would assume Linux is behind all of these but alas not.
| keewee7 wrote:
| The sysadmin experience on FreeBSD used to be more opinionated
| than on Linux. This was before most Linux distros adopted
| systemd.
|
| The reason companies like Sony and Apple pick FreeBSD is
| because they get an open source POSIX-compliant OS they can
| drastically modify down to the kernel level without having to
| open source their modifications.
| Thev00d00 wrote:
| Sony used it because they got an entire OS for free with no
| obligation to release the source.
| naikrovek wrote:
| Blame the GPL for this. The GPL is directly responsible for the
| livelihood of many/all BSD developers, and i could not be
| happier about that. Linux is overrated in a lot of ways.
| robocat wrote:
| However the GPL is irrelevant in this case because the
| NetFlix Open Connect appliance is not sold to ISPs. The GPL
| is only relevant if you are distributing GPL software e.g.
| Sony PlayStation.
| trunnell wrote:
| The OpenConnect team at Netflix is truly amazing and lots of fun
| to work with. My team at Netflix partnered closely with them for
| many years.
|
| Incidentally, I saw some of their job posts yesterday. If you
| think this presentation was cool, and you want to work with some
| competent yet humble colleagues, check these out:
|
| CDN Site Reliability Engineer
| https://jobs.netflix.com/jobs/223403454
|
| Senior Software Engineer - Low Latency Transport Design
| https://jobs.netflix.com/jobs/196504134
|
| The client side team is hiring, too! (This is my old team.)
| Again, it's full of amazing people, fascinating problems, and
| huge impact:
|
| Senior Software Engineer, Streaming Algorithms
| https://jobs.netflix.com/jobs/224538050
|
| That last job post has a link to another very deep-dive tech talk
| showing the client side perspective.
| bagels wrote:
| Slide says "we don't transcode on the server"
|
| Surely they transcode on some server? Maybe they just mean they
| don't do it on the same server that is serving bits to customers?
| naikrovek wrote:
| It seemed clear to me: they don't transcode on the server that
| is sending data to the viewer. Transcoding is done once per
| piece of media and target format combination, instead of on the
| fly as it is viewed.
| a-dub wrote:
| i haven't looked yet but i'm going to guess: edge caching running
| on custom hardware with smart predictions and congestion control
| algorithms for determining what gets cached where and when.
| paxys wrote:
| Does anyone know where these servers are hosted? Certainly not
| AWS I imagine?
| kkielhofner wrote:
| As close to the eyeballs as possible. With OpenConnect[0] they
| are located in ISP facilities and/or carrier-neutral facilities
| with access to a peering fabric (kind of the same thing as the
| OpenConnect Appliance is "hosted" by the ISP).
|
| It's a win-win. ISPs don't have to use their peering and/or
| transit bandwidth to upstream peers and users get a much better
| experience with lower latency, higher reliability, less
| opportunity for packet loss, etc.
|
| [0] - https://openconnect.netflix.com/en/
| amelius wrote:
| How many customers does that serve?
| Aissen wrote:
| At 15Mb/s for a start-quality 4k stream (5 times higher than
| the average ISP speed measured by Netflix), that serves 53k
| simultaneous customers.
|
| In the US, the fastest ISP for Netflix usage seems to be
| Comcast (https://ispspeedindex.netflix.net/country/us ), with
| an average speed of 3.6Mbps. That would serve an average of
| 222k simultaneous customers on a single server.
| samcrawford wrote:
| That 15Mb/s figure for 4K is out of date by a couple of
| years. They previously targeted a fixed average bitrate of
| 15.6Mb/s. They now target a quality level, as scored by VMAF.
| This makes their average bitrate for 4K variable, but they
| say it has an upper bound of about 8Mb/s. See
| https://netflixtechblog.com/optimized-shot-based-encodes-
| for...
| Aissen wrote:
| Yep, that's correct. It looks like Netflix forgot to update
| their support pages for this:
| https://help.netflix.com/en/node/306 .
| umanwizard wrote:
| What does start-quality mean?
| Aissen wrote:
| Not much, see sibling comment. It used to be the minimum
| quality for enjoyable 4k. (4k Blu-Ray discs have much
| higher bitrates with HEVC). But since, Netflix heavily
| optimized their encoding, greatly reducing the bandwidth
| needs.
| danielheath wrote:
| Video formats require more data for the first frame of each
| scene - subsequent frames can be encoded as transformations
| of the previous frame.
| [deleted]
| ksec wrote:
| That depends on the content's bitrate. Netflix serves their
| video with bitrate anywhere from 2 - 18Mbps. Say if average
| were 10Mbps, that is roughly 80K customer per box.
| daper wrote:
| I have some experience serving static content and working with
| CDNs. Here is what I find interesting / unique here:
|
| - They are not using OS page cache or any memory caching for
| that, every request is served directly from disks. This seems
| possible only when requests are spread between may NVMe disks
| since single high-end NVMe like Micron 9300 PRO has max 3.5GB/s
| read speed (or 28Gbps) - far less than 800Gbps. Looks like it
| works ok for long-tail content but what about new hot content
| everybody wants to watch at the day of release? Do they spread
| the same content over multiple disks for this purpose?
|
| - Async I/O resolves issues with nginx process stalling because
| of disk read operation but only after you've already opened the
| file. Depending on FS / number of files / other FS activities,
| directory structure opening the file can block for significant
| time and there is no async open() AFAIK. How they resolve that?
| Are we assuming i-node cache contains all i-nodes and open() time
| is insignificant? Or are they configuring nginx() with large open
| file cache?
|
| - TLS for streamed media was necessary because browsers started
| to complain about non-TLS content. But that makes things sooo
| complicated as we see in the presentation (kTLS is 50% of CPU
| usage before moving to encryption offloaded by NIC). One has to
| remember that the content is most probably already encrypted
| (DRM), we just add another layer of encryption / authentication.
| TLS for media segments make so little sens IMO.
|
| - When you relay on encryption or TCP offloading by NIC you are
| stuck with that is possible with your NIC. I guess no HTTP/3 over
| UDP or fancy congestion control optimization in TCP until the
| vendor somehow implements it in the hardware.
| mgerdts wrote:
| A Micron 9300 Pro is getting rather long in the tooth. They are
| using PCIe gen 4 drives that are twice as fast as the Micron
| 9300.
|
| My own testing on single socket systems that look rather
| similar to the ones they are using suggests it is much easier
| to push many 100 Gbit interfaces to their maximum throughput
| without caching. If your working set fits in cache, that may be
| different. If you have a legit need for sixteen 14 TiB (15.36
| TB) drives, you won't be able to fit that amount of RAM into
| the system. (Edit: I saw a response saying they do use the
| cache for the most popular content. They seem to explicitly
| choose what goes into cache, not allowing a bunch of random
| stuff to keep knocking the most important content out of cache.
| That makes perfect sense and is not inconsistent with my
| assertion that hoping a half TiB cache will do the right thing
| with 224 TiB of content.)
|
| TLS is probably also to keep the cable company from snooping on
| the Netflix traffic, which would allow the cable company to
| more effectively market rival products and services. If there's
| a vulnerability in the decoders of encrypted media formats,
| putting the content in TLS prevents a MITM from exploiting
| that.
|
| From the slides, you will see that they started working with
| Mellanox on this in 2016 and got the first capable hardware in
| 2020, with iterations since then. Maybe they see value in the
| engineering relationship to get the HW acceleration that they
| value into the hardware components they buy.
|
| Disclaimer: I work for NVIDIA who bought Mellanox a while back.
| I have no inside knowledge of the NVIDIA/Netflix relationship.
| ShroudedNight wrote:
| Just from reading the specs (I.E. real world details might
| derail all of this):
|
| https://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2
|
| Given one can specify arbitrary offsets for sendfile(), it's
| not clear to me that there must be any kind of O(k > 1)
| relationship between open() and sendfile() calls: As long as
| you can map requested content to a sub-interval of a file, you
| can co-mingle the catalogue into an arbitrarily small number of
| files, or potentially even stream directly off raw block
| devices.
| drewg123 wrote:
| Responding to a few points. We do indeed use the OS page cache.
| The hottest files remain in cache and are not served from disk.
| We manage what is cached in the page cache and what is directly
| released using the SF_NOCACHE flag.
|
| I believe our TLS initiative was started before browsers
| started to complain, and was done to protect our customer's
| privacy.
|
| We have lots of fancy congestion optimizations in TCP. We
| offload TLS to the NIC, *NOT* TCP.
| daper wrote:
| Can I ask if your whole content can be stored on a single
| server so content is simply replicated everywhere or there is
| some layer above that that directs requests to the specific
| group of servers storing the requested content? I assume the
| described machine is not just part of tiered cache setup
| since I don't think nginx capable for complex caching
| scenarios.
| drewg123 wrote:
| No, the entire catalog cannot fit on a single server.
|
| There is a Netflix Tech Blog from a few years ago that
| talks about this better than I could:
| https://netflixtechblog.com/content-popularity-for-open-
| conn...
| eru wrote:
| Does the encryption in DRM protect the metadata?
| daper wrote:
| AFAIK no. The point of DRM is to prevent recording / playing
| the media on a device without decryption key (authorization).
| So the goal is different than TLS that is used by the client
| to ensure the content is authentic, unaltered during
| transmission and not readable by a man-in-the-middle.
|
| But do we really need such protection for a TV show?
|
| "Metadata" in HLS / DASH is a separate HTTP request which can
| be served over HTTPS if you wish. Then it can refer to media
| segments served over HTTP (unless your browser / client
| doesn't like "mixed content").
| throw0101c wrote:
| > _But do we really need such protection for a TV show?_
|
| DRM may be mandated by the content owners. TLS gives
| Netflix customers privacy against their ISP snooping what
| they're watching.
| sam0x17 wrote:
| > But do we really need such protection for a TV show?
|
| What you watch can be a very private thing, especially for
| famous people.
| nextgens wrote:
| No, and it doesn't protect the privacy of the viewer either!
| saurik wrote:
| FWIW, neither does the TLS layer: because the video is all
| chunked into fixed-time-length segments, each video causes
| a unique signature of variable-byte-size segments, making
| it possible to determine which Netflix movie someone is
| watching based simply on their (encrypted) traffic pattern.
| Someone built this for YouTube a while back and managed to
| get it up to like 98% accuracy.
|
| https://www.blackhat.com/docs/eu-16/materials/eu-16-Dubin-I
| -...
|
| https://americansforbgu.org/hackers-can-see-what-youtube-
| vid...
| nightpool wrote:
| Did TLS 1.3 fix this with content length hiding? Doesn't
| it add support for variable-length padding that could
| prevent the attacker from measuring the plaintext content
| length? Do any major servers support it?
| drewg123 wrote:
| Author here. AMA
| sam0x17 wrote:
| > Sendfile
|
| Ah, so this is why everything stutters / falls apart when you
| switch subtitles on or off -- it has to access a whole
| different file and resume at the same place in that file I
| assume? I would think you would want the (verbal) audio
| separated out in a different file so it can be swapped out on
| the fly without re-initializing the video stream, and same
| thing with subtitle files? I'm just making some assumptions
| based on the behavior I've seen but would be cool to know how
| this works.
| drewg123 wrote:
| No, video and subtitles are separate files.
|
| I've never seen this bad behavior myself. Do you mind sharing
| the client you're using?
| ManWith2Plans wrote:
| Do you have a link to video or audio for this presentation? I'm
| probably don't speak for just myself when I say I would love to
| see it.
| quux wrote:
| Someone else linked the video here:
| https://news.ycombinator.com/item?id=32520750
| w10-1 wrote:
| Thank you very much "drewg123"!
|
| Future technology advances increasingly looks like this complex
| work integrating hardware, OS fixes, team collaboration. People
| and teams and companies working together, and contributing to
| shared resources like FreeBSD. Tolerating mistakes at scale,
| giving credit where credit is due, and all the other things
| that make respect real, which creates a space to get things
| done.
|
| Most of us will never get close to these opportunities or
| contexts, but still it helps us advance our own
| technique/culture to observe and model your story. And perhaps
| you'll help new collaborators find you. All the best.
| Bluecobra wrote:
| What kind of tuning is done in the BIOS? Is that profile
| available to view to everyone? Are you using a custom BIOS from
| Dell?
| drewg123 wrote:
| Not much tuning needed to be done. The little that was is
| mentioned in the talk, and was basically to set NPS=1 and to
| disable DLWM, in order to be able to access the full xgmi
| interconnect bandwidth at all times, even when the CPU is not
| heavily loaded.
| nh2 wrote:
| You mention AIO in nginx.
|
| In 2021 somebody submitted a patch for io_uring support in
| nginx:
|
| https://mailman.nginx.org/pipermail/nginx-devel/2021-Februar...
|
| I'm not sure if there has been further progress on it so far.
| In one comment feedback is "it doesn't seem to make the typical
| nginx use case much faster" [at that time].
|
| But I find this interesting, because io_uring can make almost
| all things async that can't be used async so far in Linux
| (open(), stat(), etc) and thus in nginx.
|
| Would io_uring integration in nginx be relevant for you?
| [deleted]
| throw0101c wrote:
| You're using ConnectX-6 Dx here. Any technical reason for that
| particular NIC, or just haven't gotten around to ConnectX-7s
| yet?
|
| Have you examined other NIC vendors? (Chelsio?)
| drewg123 wrote:
| This talk was given roughly 4 months ago. CX7 was not
| available yet. I'm looking forward to testing on them when we
| get some.
|
| We looked at Chelsio (as T6 was available well before
| CX6-DX). However, the CX6-DX offers a killer feature not
| available on T6. The CX6-DX can remember the crypto state of
| any in-order stream, while the T6 cannot. That means that the
| TCP stack can send, say, 4K of a TLS record, wait for acks,
| and come back 40ms later and send the next 4K _and DMA just
| the requested 4K from the host_. The T6 cannot remember the
| state, and would need to DMA the first 4K (which was already
| sent) in order to re-establish the crypto state, and then DMA
| the requested 4K. This could run the PCIe bus out of
| bandwidth. The alternative is to make TCP always chunk sends
| at the TLS record size, but this was horrible for streaming
| quality.
| phantomathkg wrote:
| > Serve only static media files
|
| This part I don't get. How about DRM? Unless Netflix pre-DRM
| all contents for all user?
| Bluecobra wrote:
| I would think that media files would be already encrypted and
| gets decrypted by the Netflix client. Otherwise the DRM could
| easily be defeated by using something like Wireshark.
| drewg123 wrote:
| Yes, all our content is also DRMed. Else somebody could
| easily pirate content..
| onedr0p wrote:
| To be fair, it already seems easily pirated. DRM is
| useless, if content is able to be viewed on some personal
| device it can be ripped and shared. I'd be curious how much
| effort/money companies dump into adding DRM measures, it
| seems like a lost cause. Maybe it just makes the execs
| sleep better at night.
| bri3d wrote:
| Encrypting assets on the fly using a per-consumer symmetric
| key would be prohibitively expensive, so I'm sure the media
| is stored pre-encrypted using a shared symmetric key.
|
| It only really matters that this key is unique per package,
| not per user, because once even a single user can compromise
| the trusted execution environment and extract either the key
| or the plain video stream, that piece of content is now
| pirated anyway. So, key reuse against the same content
| probably isn't really a major part of the threat model - this
| attacker could share the key with others, but they might as
| well share the decrypted content instead.
| sophacles wrote:
| Every iteration of this prezzo I've seen over the years has
| made for a fascinating morning read, thanks!
|
| As much as I enjoy the results of the work, I'm always a bit
| curious how the sausage is made. Is pushing the hardware limits
| your primary job or something you do periodically? How do you
| go about selecting the gear you use? How much do you work with
| the vendors? (etc etc) I'd really enjoy a behind the scenes
| blog post or something wrt this serving absurd amounts of
| traffic from a single box.
| drewg123 wrote:
| My role is to make our CDN servers more efficient. One of the
| easiest and most fun ways to do that is to push servers as
| hard as I can and see what breaks and what doesn't scale. I
| also work with our hardware team and their vendors to
| evaluate new hardware and how it can fit into our system.
|
| But I do plenty of other things as well, including fixing
| random kernel bugs. You can read the git log of the FreeBSD
| main branch to see some of the things I've been working on..
| gopaz wrote:
| What about the storage? Is it using Raid? Does blocksize
| matter? What filsystem is used?
| drewg123 wrote:
| Every storage device is independent (no RAID), and runs UFS.
| We use UFS because, unlike ZFS, it integrates directly with
| the kernel page cache.
| pyrolistical wrote:
| When are you going to cut the CPU/main memory out completely?
|
| The bottleneck is at your NIC anyways, so seems like there
| would be a market for NIC that can directly read from disk into
| NIC's working memory
| drewg123 wrote:
| We've looked at this. The problem is that NICs want to read
| in TCP MSS size chunks (1448 bytes, for example), while
| storage devices are highly optimized for block-aligned (4K)
| chunks. So you need to buffer the storage reads someplace,
| and for now the only practical answer is host memory. There
| are NVME technologies that could help, but they are either
| too small, or come at too large of a price premium. CXL
| memory looks promising, but its not ready yet.
| Matthias247 wrote:
| Does it? I thought with segmentation offloads the NIC
| basically gets TCP stream data in more or less arbitrary
| sizes, and then segments in into MTU sizes on its own?
| drewg123 wrote:
| We do fairly sophisticated TCP pacing, which requires
| sending down some small multiple of MSS to the NIC, so it
| doesn't always have the freedom to pull 4K at a time.
| [deleted]
| bri3d wrote:
| At what point does it make sense to replace the CPU and OS with
| custom hardware and software? At this point the CPU is
| basically doing TCP state maintenance and DMA supervision, but
| not much else, right?
|
| I totally get the cost, convenience, and supply chain risk-
| value in commodity stuff that you can just go out and buy, but
| once you're bound to a single network card, this advantage
| starts to go away, and it seems like you're fighting with the
| entire system topology when it comes to NUMA, no? Why not a
| "TCP file send accelerator" instead of a whole computer?
| wmf wrote:
| I suppose you could attach NVMe drives directly to Bluefield
| and cut out x86.
| jalino23 wrote:
| I was specifically looking for whats their tech stack for
| playback? they pretty much have to use HLS for ios safari right?
| where do those manifest server fit in? what about non ios browser
| playback?
| wly_cdgr wrote:
| alpb wrote:
| What's the benefit of going from 100Gb/s to 800Gb/s through
| kernel/hardware optimizations as opposed to adding more machines
| to meet the same throughput in this case? I'd be curious at what
| point returns on the engineering effort is diminishing in this
| problem.
| seabrookmx wrote:
| IIRC a lot of these boxes are deployed at actual ISP's so
| they're closer to customers. I'd imagine the rackspace is
| therefore limited and the more you can push from a single
| machine, the better.
| quotehelp1829 wrote:
| I think it's quite obvious that instead of 8 machines you then
| only need 1. This results in reduced costs for machinery,
| storage (as each machine would have its own storage) and
| probably power consumption too. Also, same room of servers can
| throughput 8 times more content.
|
| Edit: Whoops, apparently this tab has been open for four hours
| and of course someone already had responded to you, lol.
| wistlo wrote:
| This could be an answer as to why Netflix comes up reliably when
| all the other streaming services in my experience (Hulu, Disney,
| HBO Max, Amazon Prime) can take many multiples of time to
| initialize and deliver a stable stream.
| drewg123 wrote:
| To be honest, this has much more to do with Randall Stewart's
| RACK TCP, and his team's obsession with improving our member's
| QoE. Ironically, this costs a lot of CPU as compared to normal
| TCP (since it is doing pacing, limiting TSO burst sizes, etc).
| https://github.com/freebsd/freebsd-src/blob/main/sys/netinet...
| OJFord wrote:
| Of those I only have Prime, and really agree. It was never as
| good I don't think, but lately in particular it's been _so_
| slow to start (and then it 's an advert! It'll do it again for
| the actual content once I click 'skip'!) and occasionally
| pauses to buffer mid-stream.
|
| I don't get that with Netflix, I've occasionally had it crash
| out 'sorry this could not be played right now' (which is a
| weird bug itself - because it always loads fast & fine when I
| immediately press play on it again) but never such slow loading
| or pausing.
| _gabe_ wrote:
| This is incredible. I really like how you're able to trace the
| evolution of the systems as well.
|
| It makes me wonder what the next hardware revolution will be. It
| seems like most resource intense applications are bottlenecking
| at transferring memory. UE5's nanite tech hinges on the ability
| to transfer memory directly from disk to GPU, Netflix built
| specific hardware to avoid copying memory between userspace and
| hardware, and I wonder how much other performance we're missing
| out on because we can't transfer memory fast enough.
|
| How much faster could AI training be if we could get memory
| directly from disk to the GPU and avoid the CPU orchestrating it
| all? What about video streaming? I have a feeling these processes
| already use some clever tricks to avoid unnecessary trips through
| the CPU, but it will be interesting to see which direction
| hardware goes with this in mind.
| aftbit wrote:
| This is definitely the direction that things are going. In the
| GPU space, see things like GPUDirect[1]. In networking and
| storage, especially for hyperscale stuff, see the rise of
| DPUs[2] replacing CPUs.
|
| 1: https://developer.nvidia.com/gpudirect
|
| 2: https://www.servethehome.com/what-is-a-dpu-a-data-
| processing...
___________________________________________________________________
(page generated 2022-08-19 23:00 UTC)