[HN Gopher] How Video Streaming Processing Works
       ___________________________________________________________________
        
       How Video Streaming Processing Works
        
       Author : dreampeppers99
       Score  : 258 points
       Date   : 2022-01-04 11:02 UTC (11 hours ago)
        
 (HTM) web link (howvideo.works)
 (TXT) w3m dump (howvideo.works)
        
       | pjc50 wrote:
       | Good top-level summary of an extremely complicated subject.
       | 
       | The containers diagram is surprising since the arrow means
       | "derived from" - the earlier formats are at the top, I initially
       | thought the arrow was in the wrong direction. Containers are kind
       | of a nightmare since there is a lot of hidden knowledge you need.
       | Many of the tools will let you put any video in any container,
       | but only certain combinations actually work properly across the
       | various devices and operating systems you will actually need to
       | use. It's easiest to stick to "H264 + AAC in an MPEG4 container".
       | 
       | HLS is a horrible hack that also has the side effect of making it
       | harder to download and store videos.
        
         | GeneticGenesis wrote:
         | While HLS isn't the cleanest protocol (Yay for extensions of
         | the M3U playlist format...), it's actually really good at what
         | it's designed to do - provide reliable video streaming while
         | using HTTP/S over variable networks.
         | 
         | Ultimately, HLS isn't designed for downloading and storing
         | videos, it's designed for streaming.
        
           | beardedetim wrote:
           | Came to say this exact thing. HLS and it's fancy brother
           | LLHLS aren't storage formats like MP4/FLV are. I think of HLS
           | as a playback format: I'd play a Playlist when watching a
           | VOD/livestream but I'd probably save it as an MP4.
        
         | dylan604 wrote:
         | >making it harder to download and store videos.
         | 
         | And all of the studios cheered! /s
         | 
         | But in reality, it hasn't, right? There are so many tools
         | available that will download a video as a single file for you
         | that's not even an issue. You should try looking for the Wizard
         | with that strawman.
        
           | ajb257 wrote:
           | There's a difference between not being designed for
           | something, and being designed to prevent something. HLS isn't
           | designed to prevent saving, it's just optimised for
           | streaming.
        
             | danachow wrote:
             | The GP stated that hls makes it harder to download and
             | store videos which is mostly a dubious claim, but even if
             | it were true, in most cases where it's used this is (if
             | only hypothetically) a benefit since major stream providers
             | generally don't want their streams stored. The reply sounds
             | appropriate.
        
       | cogman10 wrote:
       | A quibble I have with this is introducing video as capturing RGB.
       | 
       | Digital video very rarely uses that color space. Instead, the YUV
       | family of colorspaces is far more common.
       | 
       | Further, the FPS is somewhat wonk. While computer games and some
       | british TV will broadcast at 60fps, most video is instead at
       | 20->30fps.
        
       | arkj wrote:
       | This is a really good introduction to how video works. The only
       | think I find missing from a production OTT environment is DRM.
       | Also I assume when the author says "http" he means "https" there
       | are very few (I knew only one and even they used https
       | termination by Akamai) providers who use "http" these days for
       | streaming even with DRM.
       | 
       | If anyone is building an economically viable OTT platform they
       | should consider building their own CDN. Insiders tell me the
       | Singapore based OTT provider HOOQ went bankrupt paying huge bills
       | to AWS and Akamai.
        
         | capableweb wrote:
         | > Also I assume when the author says "http" he means "https"
         | there are very few (I knew only one and even they used https
         | termination by Akamai) providers who use "http" these days for
         | streaming even with DRM.
         | 
         | For what the guide is, I think HTTP is accurate enough, as
         | HTTPS is not its own protocol, it's always HTTP+TLS/SSL, so
         | just saying HTTP is good enough. The ones that need to add the
         | extension, will know so. For video editors (and other parts of
         | the target for this guide), it's more than likely superfluous
         | information.
         | 
         | Also, as far as I know, DRM and HTTPS have different use cases.
         | The DRM we see in browsers are around controlling playback on
         | the client device, not encrypting the content to/from the media
         | server/client device, while HTTPS is the opposite. So if they
         | are using DRM or not doesn't matter when we talk about using
         | HTTPS or not.
        
       | choxi wrote:
       | Does all the 1.7GB of the decoded video get copied to the GPU? Or
       | is there some playback controller that knows how to read the
       | "delta format" from the codecs and only copies deltas to the
       | framebuffer?
       | 
       | It still blows my mind that we can stream video at 60FPS. I was
       | making an animation app that did frame-by-frame playback and
       | 16.6ms goes by fast! Just unpacking a frame into memory and
       | copying it to the GPU seemed like it took a while.
        
         | pavlov wrote:
         | You shouldn't copy the frame data to the GPU (assuming that's
         | literally what your code was doing).
         | 
         | Instead create a GPU texture that's backed by a fixed buffer in
         | main memory. Decode into that buffer, unlock it, and draw using
         | the texture. The GPU will do direct memory access over PCIe,
         | avoiding the copy.
         | 
         | The CPU can't be writing into the buffer while the GPU may be
         | reading from it, so you can either use locks or double
         | buffering to synchronize access.
        
           | danachow wrote:
           | > Instead create a GPU texture that's backed by a fixed
           | buffer in main memory. Decode into that buffer, unlock it,
           | and draw using the texture. The GPU will do direct memory
           | access over PCIe, avoiding the copy.
           | 
           | With a dedicated GPU with its own memory there still is
           | usually a memory to memory copy, it just doesn't have to
           | involve the CPU.
        
             | pavlov wrote:
             | Yeah, like so many other things in the GPU world, main RAM
             | texture storage is more of a hint to the graphics card
             | driver -- "this buffer isn't going away and won't change
             | until I explicitly tell you otherwise".
             | 
             | It definitely used to be that GPUs did real DMA texture
             | reads though, at least in the early days of dedicated GPUs
             | with fairly little local RAM. I'm thinking back to when the
             | Mac OS X accelerated window compositor was introduced --
             | the graphics RAM simply wouldn't have been enough to hold
             | more than a handful of window buffers.
        
         | pjc50 wrote:
         | Smart people chuck the _encoded_ video at the GPU and let that
         | deal with it: e.g. https://docs.nvidia.com/video-
         | technologies/video-codec-sdk/n... ; very important on low end
         | systems where the CPU genuinely can't do that at realtime
         | speed. Raspberry Pi and so on.
         | 
         | > 16.6ms
         | 
         | That's sixteen million nanoseconds, you should be able to issue
         | thirty million instructions in that time from an ordinary 2GHz
         | CPU. A GPU will give you several _billion_. Just don 't waste
         | them.
        
           | cogman10 wrote:
           | Agreed. GPUs support decoding a wide range of codecs (even
           | though you are probably using something like H.264). So it
           | doesn't make sense wasting the time to both decode the data
           | and pipe it out to the GPU.
        
       | no_time wrote:
       | What I'm curious about is the actual hardware behind video
       | sharing sites. Like how can Streamable,Reddit,Twitter encode such
       | a massive amount of videos at scale. Do they have GPU farms?
       | Dedicated hardware encoding hardware that us mortals can't buy? I
       | left out YT on purpose because they have practically endless
       | money to throw at the problem.
        
         | VWWHFSfQ wrote:
         | the last I checked the video metadata streamable is just using
         | ffmpeg
        
         | ggregoire wrote:
         | Just for reference, I've got a server with a single RTX 2080
         | that reencodes 32 HD streams in parallel with NVENC and the GPU
         | is used at only 10%.
        
         | GeneticGenesis wrote:
         | Great question! The real answer is it varies, but for H.264,
         | most just encode on software right now, because GPUs are
         | expensive (especially in the cloud), and the failure rates are
         | really high (if you try to build your own). ffmpeg and lib264
         | is really fast on modern hardware with decent X86 extensions.
         | 
         | It's also worth noting that YouTube also now builds its own
         | transcoding chips [1], and AWS just launched a dedicated
         | transcoding instances based on Xilinx chips:
         | 
         | [1] https://arstechnica.com/gadgets/2021/04/youtube-is-now-
         | build... [2] https://aws.amazon.com/ec2/instance-types/vt1/
        
         | dmw_ng wrote:
         | Software can be done at utterly stupid multiples of realtime at
         | SD resolutions with only a few cores, depending on your quality
         | target. Cores are very cheap
         | 
         | Fancy GPUs tend to support 8 or more HD streams, even consumer
         | cards using patched drivers.
         | 
         | Then you have dedicated accelerator hardware, these can pack a
         | tremendous amount of transcode into a tiny package. For example
         | on AWS you have vt1 instances which support 8 (or 16?)
         | simultaneous full HD/SD/QHD ladders at 2x realtime for around
         | $200/mo.
         | 
         | In answer to your actual question, at least YouTube selectively
         | transcodes using fancier/more specific methods according to the
         | popularity of the content. They do the cheap thing in bulk and
         | the high quality thing for the 1% of content folk actually
         | watch
        
         | londons_explore wrote:
         | Nearly all big sites just use ffmpeg and no hardware
         | acceleration.
        
       | dceddia wrote:
       | Good overview of all the parts involved! I was hoping they'd talk
       | a little more about the timing aspects, and keeping audio and
       | video in sync during playback.
       | 
       | What I've learned from working on a video editor is that "keeping
       | a/v in sync" is... sort of a misnomer? Or anyway, it _sounds_
       | very "active", like you'd have to line up all the frames and
       | carefully set timers to play them or something.
       | 
       | But in practice, the audio and video frames are interleaved in
       | the file, and they naturally come out in order (ish - see
       | replies). The audio plays at a known rate (like 44.1KHz) and
       | every frame of audio and video has a "presentation timestamp",
       | and these timestamps (are supposed to) line up between the
       | streams.
       | 
       | So you've got the audio and video both coming out of the file at
       | way-faster-than-realtime (ideally), and then the syncing ends up
       | being more like: let the audio play, and hold back the next video
       | frame until it's time to show it. The audio updates a "clock" as
       | it plays (with each audio frame's timestamp), and a separate loop
       | watches the clock until the next video frame's time is up.
       | 
       | There seems to be surprisingly little material out there on this
       | stuff, but the most helpful I found was the "Build a video editor
       | in 1000 lines" tutorial [0] along with this spinoff [1], in
       | conjunction with a few hours spent poring over the ffplay.c code
       | trying to figure out how it works.
       | 
       | 0: http://dranger.com/ffmpeg/
       | 
       | 1: https://github.com/leandromoreira/ffmpeg-libav-tutorial
        
         | pjc50 wrote:
         | > let the audio play, and hold back the next video frame until
         | it's time to show it. The audio updates a "clock" as it plays
         | (with each audio frame's timestamp), and a separate loop
         | watches the clock until the next video frame's time is up.
         | 
         | Yes .. but. They're interleaved within the container, but the
         | encoder does not guarantee that they will be properly
         | interleaved or even that they will be particularly temporally
         | close to each other. So if you're operating in "pull" mode, as
         | you should, then you may find that in order to find the next
         | video frame you need to de-container (even if you don't fully
         | decode!) a bunch of audio frames that you don't need yet, or
         | vice versa.
         | 
         | The alternative is to operate in "push" mode: decode whatever
         | frames come off the stream, audio or video, and push them into
         | separate ring buffers for output. This is easier to write but
         | tends to err on the side of buffering more than you need.
        
           | dceddia wrote:
           | Interesting, I think I just dealt with this problem! I'd
           | heard of the push/pull distinction but had interpreted it as
           | "pull = drive the video based on the audio" and "push = some
           | other way?". I think I saw "pull mode" referenced in the
           | Chromium source and I had a hard time finding any definitive
           | definition of push/pull.
           | 
           | What I was originally doing was "push", then: pull packets in
           | order, decode them into frames, put them into separate
           | audio/video ring buffers. I thought this was fine and it
           | avoided reading the file twice, which I was happy with.
           | 
           | And then the other day, on some HN thread, I saw an offhand
           | comment about how some files are muxed weird, like <all the
           | audio><all the video> or some other pathological placement
           | that would end up blocking one thread or another.
           | 
           | So I rewrote it so that the audio and video threads are
           | independent, each reading the packets they care about and
           | ignoring the rest. I think that's "pull" mode, then? It seems
           | to be working fine, the code is definitely simpler, and I
           | realized that the OS would probably be doing some intelligent
           | caching on the file anyway.
           | 
           | Your mention of overbuffering reminds me, though - I still
           | have a decent size buffer that's probably overkill now. I'll
           | cut that back.
        
       | mtrovo wrote:
       | That's a really nice introduction. One newbie question: how is
       | this influenced by DRM on the browser? Is it all the same plus
       | some security on top or do videos with DRM use a proprietary
       | codecs and packaging?
        
         | GeneticGenesis wrote:
         | DRM'd video content uses the same video codecs and containers,
         | but introduces segment encryption during the packaging phase.
         | In most cases, this encryption is within the audio and video
         | samples rather than on the entire segment. Most content is
         | encrypted using MPEG Common Encryption (CENC) - though there
         | are a couple of variants.
         | 
         | Decryption keys are then exchanged using one of the common
         | proprietary DRM protocols, usually Widevine (Google), Playready
         | (Microsoft), or Fairplay (Apple). The CDM (Content Decryption
         | Module) in the browser is then passed the decryption key, so
         | the browser. Can decrypt the content for playback.
        
         | manorwar8 wrote:
         | there is a nice series explaining DRM here
         | https://sander.saares.eu/post/drm-is-not-a-black-box-01/
         | 
         | I think the DRM part is made during the packaging phase the
         | https://howvideo.works/#packaging (where they need to signalize
         | the media has drm and also encrypt the media)
        
       | rgovostes wrote:
       | As good as thread as any to ask: I have a lot of footage that is
       | recorded continuously over several hours. Most of the time
       | nothing notable happens in the video, but I have a set of
       | timestamps of interesting events. (Think security camera with
       | motion detection events, though this is not that.)
       | 
       | Currently I extract the frame at each interesting timestamp and
       | serve them as JPEGs. I have a web-based playback UI that includes
       | a scrubber and list of events to jump to a particular timestamp.
       | 
       | I would love to upgrade to a real video player that streams the
       | underlying video, allowing the user to replay footage surrounding
       | a timestamp. I have to be able to reliably seek to an exact frame
       | from the event list, though.
       | 
       | I've been looking for something web-based, self-hostable, that
       | doesn't require I transcode my media up front (e.g., break down
       | into HLS chunks). I have few users accessing it at a time so I
       | can transcode on the fly with some caching (though I think it is
       | already H.262 or H.264). Is there anything suitable out there?
        
         | dylan604 wrote:
         | Just because a file is encoded as H.264 does not mean it is
         | streamable. The encoding needs to be done in a way that makes
         | streaming realistic.
         | 
         | For the rest of it, I would suggest an ffmpeg solution on a
         | server. You can have it re-encode just the requested times
         | while encoding to a streaming friendly format. There are JS
         | libraries available that allow you to use HLS in the native
         | <video> tag.
        
         | foo_barrio wrote:
         | I had a similar issue and worked completely around it by using
         | ffmpeg to generate a video file around the point of interest.
         | Instead of serving up the JPEG as in your case, I would serve a
         | small 10s mp4. It was something like (-5s to +5s). Trying to
         | directly stream the file and seek was unreliable for me and I
         | didn't want to get into setting up a full streaming server.
        
         | cogman10 wrote:
         | What is possible is going to depend a lot on the CPU you have
         | and the media you have.
         | 
         | That said, ffmpeg is going to be the best tool (IMO) to handle
         | this. You may also look at a tool like Vapoursynth or AviSynth
         | if you want to do any sort of preprocessing to the images.
         | 
         | If the video is H.262 (or it is H.264 at a insane bitrate like
         | 50Mbps), I'd encourage transcoding to something not as bitrate
         | heavy. AV1 and HEVC are 2 of the best in class targets (but
         | require a LOT of computational horsepower... OK there is also
         | technically VVC, but nothing really supports that).
         | 
         | If time is of the essence, then I'd suggest looking into what
         | sort of codecs are supported by your CPU/GPU. They won't give
         | you great quality but they will give you very fast transcoding.
         | You'll want to target the latest codec possible.
         | 
         | H.264 is pretty old at this point, H.265 (HEVC) or vp9 will do
         | a better job at a lower bitrate if your card supports either.
         | They are also relatively well supported. VP9 is royalty free.
         | 
         | If your GPU or CPU do not support any recent codec, you might
         | look into the SVT encoders for AV1/VP9, and x264/5 for H.264 or
         | H.265.
         | 
         | All this said, if the codec is fine and at a streamable
         | bitrate, ffmpeg totally supports copying the stream from
         | timeslices. You'll have to play around with buffering some of
         | the stream so you can have ffmpeg do the slicing, but it's not
         | too hard. That's the best option if the stream is streamable
         | (transcoding will always hurt quality).
         | 
         | Oh, and you'll very likely want to compile ffmpeg from source.
         | The version of ffmpeg bundled with your OS is (likely) really
         | old and may not have the encoders you are after. It's a huge
         | PITA, but worth it, IMO. Alternatively you can likely find a
         | build with all the stuff you want... but you'll need a level of
         | trust in the provider of that binary.
        
       | mrtksn wrote:
       | Not to judge the content quality but I'm curious about the
       | motivation to build this website. Why? Because in the recent days
       | we were discussing the poor quality Google search results and one
       | argument is that the results are poor because the quality of the
       | content has degraded due to the motivations of the content
       | creators, i.e. optimising everything for SEO and views, likes.
       | etc. with guidance of analytics.
       | 
       | So how this works? Is the creator of the website doing SEO here?
       | Will this be sold later? Is it a project for a portfolio? Why are
       | we getting this good quality content? Will it be surfaced by
       | Google?
       | 
       | Not that I suspect anything nefarious, just curious about how
       | original content(beyond commentary on social media) is made these
       | days. What made that content possible? Why the creator(s) spent
       | time to create graphics and text and payed for a domain name and
       | probably hosting?
        
         | erikbye wrote:
         | Well, he does link to his personal blog, which contains a link
         | to his LinkedIn. His motivation might be personal marketing,
         | job hunting.
         | 
         | Or, he just likes to share knowledge for free, some people
         | still find joy in that, fortunately.
         | 
         | Then there's that cliche of how you learn more (or cementing
         | your knowledge) by teaching. Sometimes that's all the
         | motivation I need.
        
         | GeneticGenesis wrote:
         | We actually built How Video Works as a side project at Mux [1]
         | (inspired by How DNS Works [2]) - there's a note about it at
         | the top of the page. We have contributions from our own team as
         | well as others in the industry.
         | 
         | Our main motivation is to try to educate on the complexities
         | and intracies of streaming video. Despite streaming video
         | representing 80+% of the internet, it's all underpinned by a
         | fairly small community of engineers, which we're eager to help
         | grow through tools like this, and the Demuxed community [3].
         | 
         | Edit: I should also mention that Leandro was kind enough to
         | adapt a this content from his amazing Digital Video
         | Introduction [4]
         | 
         | [1] https://mux.com/ [2] https://howdns.works/ [3]
         | https://2021.demuxed.com/ [4]
         | https://github.com/leandromoreira/digital_video_introduction
        
           | pdxandi wrote:
           | I've been looking for something educational and introductory
           | about like this for a while. Appreciate you folks at Mux
           | putting this together.
        
           | mrtksn wrote:
           | Cool. So how do you measure success for this project? How
           | would your analytics dashboard look like?
        
         | ziggus wrote:
         | Uh, if you read a bit about Leandro, you'll learn that he's a
         | senior engineer at Grupo Globo in Brazil. I'll leave it to you
         | to discover more about Globo.
        
       | bambax wrote:
       | Slightly OT, but the OP links to this video which I found very
       | interesting:
       | 
       | https://www.youtube.com/watch?v=VQOdmckqNro
       | 
       | In it two guys discuss how to reconstruct an audio file from its
       | image representation, and it turns out to be pretty
       | straightforward.
       | 
       | In the end they are discussing the legal implications of being
       | able to reconstruct audio from an image: if you buy the rights to
       | the image does it give you the right to the audio (probably not!)
       | 
       | But what it makes me wonder is how one could maybe draw sound, or
       | have some kind of generative art program that could be used to
       | first draw a wave and then listen to it. Maybe this has been done
       | already?
        
         | dusted wrote:
         | not as an art project, but software like milkytracker allows
         | you to directly draw (short) waveforms with the mouse and play
         | them back to listen to
        
       | rlyshw wrote:
       | A quick search for "latency" in here has one little hand-wavey
       | blurb about Mux working to optimize HLS.
       | 
       | >Using various content delivery networks, Mux is driving HTTP
       | Live Streaming (HLS) latency down to the lowest levels possible
       | levels, and partnering with the best services at every mile of
       | delivery is crucial in supporting this continued goal.
       | 
       | In my experience, HLS and even LLHLS are a nightmare for latency.
       | I jokingly call it "High Latency Streaming", since it seems very
       | hard to (reliably) obtain glass-to-glass latency in the LL range
       | (under 4 seconds). Usually Latency with cloud streaming gets to
       | at least 30+s.
       | 
       | I've dabbled with implementing WebRTC solutions to obtain Ultra
       | Low Latency (<1s) delivery but that is even more complicated and
       | fragmented with all of the browsers vying for standardization.
       | The solution I've cooked up in the lab with mediasoup requires an
       | FFMPEG shim to convert from MPEGTS/h264 via UDP/SRT to MKV/YP9
       | via RTP, which of course drives up the latency. Mediasoup has a
       | ton of opinionated quirks for RTP ingest too, of course. Still
       | I've been able to prove out 400ms "glass-to-glass" which has been
       | fun.
       | 
       | I wonder if Mux or really anyone has intentions to deliver
       | scalable, on cloud or on prem solutions to fill the web-native
       | LL/Ultra LL void left by the death of flash. I'm aware of some
       | niche solutions like Softvelum's nimble streamer, but I hate
       | their business model and I don't know anything about their
       | scalability.
        
         | GeneticGenesis wrote:
         | Hey, I work in the Product team at Mux, and worked on the LL-
         | HLS spec and our implementation, I own our real-time video
         | strategy too.
         | 
         | We do offer LL-HLS in an open beta today [1], which in the best
         | case will get you around 4-5 seconds of latency on a good
         | player implementation, but this does vary with latency to our
         | service's origin and edge. We have some tuning to do here, but
         | best case, the LL-HLS protocol will get to 2.5-3 seconds.
         | 
         | We're obviously interested in using WebRTC for use cases that
         | require more real-time interactions, but I don't have anything
         | I can publicly share right now. For sub-second streaming using
         | WebRTC, there are a lot of options out there at the moment
         | though, including Millicast [2] and Red5Pro [3] to name a
         | couple.
         | 
         | Two big questions comes up when I talk to customers about
         | WebRTC at scale:
         | 
         | The first is how much reliability and perceptual quality people
         | are willing to sacrifice to get to that magic 1 second latency
         | number. WebRTC implementations today are optimised for latency
         | over quality, and have a limited amount of customisability - my
         | personal hope is that the client side of the WebRTC will become
         | more unable for PQ and reliability, allowing target latencies
         | of ~1s rather than <= 200ms.
         | 
         | The second is cost. HLS, LL-HLS etc. can still be served on
         | commodity CDN infrastructure, which can't currently serve
         | WebRTC traffic, making it an order of magnitude cheaper than
         | WebRTC.
         | 
         | [1] https://mux.com/blog/introducing-low-latency-live-
         | streaming/ [2] https://www.millicast.com/ [3]
         | https://www.red5pro.com/
        
         | majormajor wrote:
         | It's usually layers of HLS at that. For live broadcasts,
         | someone has a camera somewhere. Bounce that from the sports
         | stadium to a satellite, and someone else has a satellite
         | pulling that down. So far so good, low latency.
         | 
         | But that place pulling down the feed usually isn't the
         | streaming service you're watching! There are third parties in
         | that space, and third party aggregators of channel feeds, and
         | you may have a few hops before the files land at whichever
         | "streaming cable" service you're watching on. So even if they
         | do everything perfectly on the delivery side, you could already
         | be 30s behind, since those media files and HLS playlist files
         | have already been buffered a couple times since they can come
         | late or out of order at any of those middleman steps. Going
         | further and cutting all the acquisition latency out? That
         | wasn't something really commonly talked about a few years ago
         | when I was exposed to the industry. It was complained about
         | once a year for the Super Bowl, and then fell down the backlog.
         | You'd likely want to own in-house signal acquisition and build
         | a completely different sort of CDN network.
         | 
         | Last I talked to someone familiar with it, the way stuff that
         | cares about low latency (like streaming video game services)
         | does it is much more like what you talk about with custom
         | protocols.
        
           | thrashh wrote:
           | The funny thing is that the web used to have a well-supported
           | low latency streaming protocol... and it was via Flash. When
           | the world switched away from Flash, we created a bunch of
           | CDN-friendly formats like HLS but by their design, they
           | couldn't be low latency.
           | 
           | And it broke all my stuff because I was relying on low
           | latency. And I remember reading around at the time -- not a
           | single person talked about the loss of a low latency option
           | so I just assumed no one cared for low latency.
        
             | slimscsi wrote:
             | Flash "low latency" was just RTMP. CDNs used to offer RTMP
             | solutions, but they were always priced significantly higher
             | than their corresponding HTTP solutions.
             | 
             | When the iPhone came out, HTTP video was the ONLY way to
             | stream video to it. It was clear Flash would never be
             | supported on the iPhone. Flash was also a security
             | nightmare.
             | 
             | So in that environment, The options were:
             | 
             | 1) Don't support video on iOS
             | 
             | 2) Build a system that can deliver video to iOS, but keep
             | the old RTMP infrastructure running too.
             | 
             | 3) Build a system that can deliver video to iOS, Deprecate
             | the old RTMP infrastructure. This option also has a
             | byproduct of reduced bandwidth bills.
             | 
             | For a company, Option 3 is clearly the best choice.
             | 
             | edit: And for the record, latency was discussed a lot
             | during that transition (maybe not very publicly). But
             | between needing iOS support, and reducing bandwidth costs,
             | latency was a problem that was decided to be solved later.
        
               | thrashh wrote:
               | I'm familiar with all of what you're saying. I set up
               | RTMP servers.
               | 
               | I'm more taking from the standpoint of like Apple or
               | Google. HLS is by Apple after all.
        
               | londons_explore wrote:
               | Google puts quite a lot of effort into low latency
               | broadcast for their Youtube Live product. They have
               | noticed that they get substantially more user retention
               | if there are a few seconds of latency vs a minute. When
               | setting up a livestream, there are even choices for the
               | user to trade quality for latency.
               | 
               | That's mostly because streamers want to interact with
               | their audience, and lag there ruins the experience.
        
         | torginus wrote:
         | What's wrong with WebRTC? Other than it not being simple. In my
         | experience it's supported well enough by browsers. On the
         | hosting side, you've got Google's C++ implementation, or you
         | there's a GStreamer backend, so you can hook it up with
         | whatever GStreamer can output. In the stuff I'm doing for work,
         | we can get well below 100ms latency out of it. Since Google
         | uses it for Stadia, i'm pretty sure it can do far better than
         | that? What do you need low latency for, what's your use case?
         | Video conferencing? App/Game streaming?
        
           | slimscsi wrote:
           | Cost and scale. HTTP video is significantly cheaper to
           | deliver because of the robust and competitive CDN market.
           | 
           | You can deliver all your video via WebRTC with lower latency,
           | but your bandwidth bill will be an order of magnitude higher.
        
             | torginus wrote:
             | But if you are using a CDN you are not really streaming,
             | are you?
        
               | slimscsi wrote:
               | "Streaming" in the media industry just means you don't
               | need to download the entire file before playing it back.
               | The majority of streaming services use something like HLS
               | or DASH that breaks up the video into a bunch of little 2
               | to 10 seconds files. The player will then download them
               | as needed.
               | 
               | But even then, many CDNs CAN "stream" using chunked
               | transfer encoding.
        
               | dmw_ng wrote:
               | It's just packet switching with much larger packets, the
               | streaming you're thinking of is essentially the same,
               | just with 16-50 ms sample size rather than 2-10 seconds.
        
               | rlyshw wrote:
               | Love this. A great point. HLS via CDN is really just
               | "downloading files but the source is provided kinda fast"
        
           | rlyshw wrote:
           | Yeah as the sibling comment mentions these WebRTC
           | implementations do not scale. While you "can hook it up" for
           | hyper-specific applications and use cases, it does not scale
           | to say an enterprise, where a single SA needs to support LL
           | streaming out to tens of thousands of users.
           | 
           | I imagine the (proprietary) stadia implementation is highly
           | tuned to that specific implementation, with tons of control
           | over the video source (cloud GPUs) literally all the way down
           | to the user's browser(modern chrome implementations). Plus
           | their scale likely isn't in the tens of thousands from a
           | single origin. Even still, I continue to be blown away by the
           | production latency numbers achieved by game streaming
           | services.
           | 
           | And my use-case is no use-case or every use-case. I'm just a
           | lowly engineer that has seen this gap in the industry.
        
             | relueeuler wrote:
             | What makes you write that "these" WebRTC implementations do
             | not scale? Which implementations do you have in mind and
             | why do you think they do not scale? Where do they fall
             | over, and at what point?
        
         | itisit wrote:
         | Live streaming latency does not jibe well with sports. I've
         | since learned to disable any push notifications that reveal
         | what happened 30 seconds prior to my witnessing it. What can be
         | done, at scale, to get us back to the "live" normally
         | experienced with cable or satellite?
        
           | giantrobot wrote:
           | > What can be done, at scale, to get us back to the "live"
           | normally experienced with cable or satellite?
           | 
           | Stick with satellite distribution? You're going to have a
           | devil of a time scaling any sort of real-time streaming over
           | an IP network. Every hop adds some latency and scaling pretty
           | much requires some non-zero amount of buffering.
           | 
           | IP Multicast might help but you have to sacrifice bandwidth
           | for the multicast streams and have support all down the line
           | for QoS. It's a hard problem which is why no one has cracked
           | it yet. You need a setup with real-time capability from
           | network ingest, through peering connections, all the way down
           | to end-user terminals.
        
         | keithwinstein wrote:
         | Hmm, we're getting <200 ms glass-to-glass latency by streaming
         | H.264/MP4 video over a WebSocket/TLS/TCP to MSE in the browser
         | (no WebRTC involved). Of course browser support for this is not
         | universal.
         | 
         | The trick, which maybe you don't want to do in production, is
         | to mux the video on a per-client basis. Every wss-server gets
         | the same H.264 elementary stream with occasional IDRs, the
         | process links with libavformat (or knows how to produce an MP4
         | frame for an H.264 NAL), and each client receives essentially
         | the same sequence of H.264 NALs but in a MP4 container made
         | just for it, with (very occasional) skipped frames so the
         | server can limit the client-side buffer.
         | 
         | When the client joins, the server starts sending the video
         | starting with the next IDR. The client runs a JavaScript
         | function on a timer that occasionally reports its sourceBuffer
         | duration back to the server via the same WebSocket. If the
         | server is unhappy that the client-side buffer remains too long
         | (e.g. minimum sourceBuffer duration remains over 150 ms for an
         | extended period of time, and we haven't skipped any frames in a
         | while), it just doesn't write the last frame before the IDR
         | into the MP4 and, from an MP4 timestamping perspective, it's
         | like that frame never happened and nothing is missing. At 60
         | fps and only doing it occasionally this is not easily
         | noticeable, and each frame skip reduces the buffer by about 17
         | ms. We do the same for the Opus audio (without worrying about
         | IDRs).
         | 
         | In our experience, you can use this to reliably trim the
         | client-side buffer to <70 ms if that's where you want to fall
         | on the latency-vs.-stall tradeoff curve, and the CPU overhead
         | of muxing on a per-client basis is in the noise, but obviously
         | not something today's CDNs will do for you by default. Maybe
         | it's even possible to skip the per-client muxing and just
         | surgically omit the MP4 frame before an IDR (which would lead
         | to a timestamp glitch, but maybe that's ok?), but we haven't
         | tried this. You also want to make sure to go through the
         | (undocumented) hoops to put Chrome's MP4 demuxer in "low delay
         | mode": see
         | https://source.chromium.org/chromium/chromium/src/+/main:med...
         | and
         | https://source.chromium.org/chromium/chromium/src/+/main:med...
         | 
         | We're using the WebSocket technique "in production" at
         | https://puffer.stanford.edu, but without the frame skipping
         | since there we're trying to keep the client's buffer closer to
         | 15 seconds. We've only used the frame-skipping and per-client
         | MP4 muxing in more limited settings
         | (https://taps.stanford.edu/stagecast/,
         | https://stagecast.stanford.edu/) but it worked great when we
         | did. Happy to talk more if anybody is interested.
         | 
         | [If you want lower than 150 ms, I think you're looking at
         | WebRTC/Zoom/FaceTime/other UDP-based techniques (e.g.,
         | https://snr.stanford.edu/salsify/), but realistically you start
         | to bump up against capture and display latencies. From a UVC
         | webcam, I don't think we've been able to get an image to the
         | host faster than ~50 ms from start-of-exposure, even capturing
         | at 120 fps with a short exposure time.]
        
           | slhck wrote:
           | This is really interesting. Have you published this approach
           | somewhere? It'd be nice to read more about it.
        
             | keithwinstein wrote:
             | Thanks! The basic video-over-WebSocket technique was part
             | of our paper here: https://puffer.stanford.edu/static/puffe
             | r/documents/puffer-p...
             | 
             | Talk here: https://www.youtube.com/watch?v=63aECX2MZvY&feat
             | ure=youtu.be
             | 
             | Code here: https://github.com/StanfordSNR/puffer
             | 
             | The "per-client muxing with frame skipping" code is
             | something we used for a few months for our Stagecast
             | project to a userbase of ~20, but not really "in prod":
             | https://github.com/stanford-
             | stagecast/audio/blob/main/src/fr...
             | 
             | Client-side JS here: https://github.com/stanford-
             | stagecast/audio/blob/main/src/we...
        
               | Scaevolus wrote:
               | Aha, you worked on Salsify too!
               | 
               | Dropped the last frame before an IDR is a very clever
               | hack to sync things up.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-01-04 23:01 UTC)