[HN Gopher] How Video Streaming Processing Works
___________________________________________________________________
How Video Streaming Processing Works
Author : dreampeppers99
Score : 258 points
Date : 2022-01-04 11:02 UTC (11 hours ago)
(HTM) web link (howvideo.works)
(TXT) w3m dump (howvideo.works)
| pjc50 wrote:
| Good top-level summary of an extremely complicated subject.
|
| The containers diagram is surprising since the arrow means
| "derived from" - the earlier formats are at the top, I initially
| thought the arrow was in the wrong direction. Containers are kind
| of a nightmare since there is a lot of hidden knowledge you need.
| Many of the tools will let you put any video in any container,
| but only certain combinations actually work properly across the
| various devices and operating systems you will actually need to
| use. It's easiest to stick to "H264 + AAC in an MPEG4 container".
|
| HLS is a horrible hack that also has the side effect of making it
| harder to download and store videos.
| GeneticGenesis wrote:
| While HLS isn't the cleanest protocol (Yay for extensions of
| the M3U playlist format...), it's actually really good at what
| it's designed to do - provide reliable video streaming while
| using HTTP/S over variable networks.
|
| Ultimately, HLS isn't designed for downloading and storing
| videos, it's designed for streaming.
| beardedetim wrote:
| Came to say this exact thing. HLS and it's fancy brother
| LLHLS aren't storage formats like MP4/FLV are. I think of HLS
| as a playback format: I'd play a Playlist when watching a
| VOD/livestream but I'd probably save it as an MP4.
| dylan604 wrote:
| >making it harder to download and store videos.
|
| And all of the studios cheered! /s
|
| But in reality, it hasn't, right? There are so many tools
| available that will download a video as a single file for you
| that's not even an issue. You should try looking for the Wizard
| with that strawman.
| ajb257 wrote:
| There's a difference between not being designed for
| something, and being designed to prevent something. HLS isn't
| designed to prevent saving, it's just optimised for
| streaming.
| danachow wrote:
| The GP stated that hls makes it harder to download and
| store videos which is mostly a dubious claim, but even if
| it were true, in most cases where it's used this is (if
| only hypothetically) a benefit since major stream providers
| generally don't want their streams stored. The reply sounds
| appropriate.
| cogman10 wrote:
| A quibble I have with this is introducing video as capturing RGB.
|
| Digital video very rarely uses that color space. Instead, the YUV
| family of colorspaces is far more common.
|
| Further, the FPS is somewhat wonk. While computer games and some
| british TV will broadcast at 60fps, most video is instead at
| 20->30fps.
| arkj wrote:
| This is a really good introduction to how video works. The only
| think I find missing from a production OTT environment is DRM.
| Also I assume when the author says "http" he means "https" there
| are very few (I knew only one and even they used https
| termination by Akamai) providers who use "http" these days for
| streaming even with DRM.
|
| If anyone is building an economically viable OTT platform they
| should consider building their own CDN. Insiders tell me the
| Singapore based OTT provider HOOQ went bankrupt paying huge bills
| to AWS and Akamai.
| capableweb wrote:
| > Also I assume when the author says "http" he means "https"
| there are very few (I knew only one and even they used https
| termination by Akamai) providers who use "http" these days for
| streaming even with DRM.
|
| For what the guide is, I think HTTP is accurate enough, as
| HTTPS is not its own protocol, it's always HTTP+TLS/SSL, so
| just saying HTTP is good enough. The ones that need to add the
| extension, will know so. For video editors (and other parts of
| the target for this guide), it's more than likely superfluous
| information.
|
| Also, as far as I know, DRM and HTTPS have different use cases.
| The DRM we see in browsers are around controlling playback on
| the client device, not encrypting the content to/from the media
| server/client device, while HTTPS is the opposite. So if they
| are using DRM or not doesn't matter when we talk about using
| HTTPS or not.
| choxi wrote:
| Does all the 1.7GB of the decoded video get copied to the GPU? Or
| is there some playback controller that knows how to read the
| "delta format" from the codecs and only copies deltas to the
| framebuffer?
|
| It still blows my mind that we can stream video at 60FPS. I was
| making an animation app that did frame-by-frame playback and
| 16.6ms goes by fast! Just unpacking a frame into memory and
| copying it to the GPU seemed like it took a while.
| pavlov wrote:
| You shouldn't copy the frame data to the GPU (assuming that's
| literally what your code was doing).
|
| Instead create a GPU texture that's backed by a fixed buffer in
| main memory. Decode into that buffer, unlock it, and draw using
| the texture. The GPU will do direct memory access over PCIe,
| avoiding the copy.
|
| The CPU can't be writing into the buffer while the GPU may be
| reading from it, so you can either use locks or double
| buffering to synchronize access.
| danachow wrote:
| > Instead create a GPU texture that's backed by a fixed
| buffer in main memory. Decode into that buffer, unlock it,
| and draw using the texture. The GPU will do direct memory
| access over PCIe, avoiding the copy.
|
| With a dedicated GPU with its own memory there still is
| usually a memory to memory copy, it just doesn't have to
| involve the CPU.
| pavlov wrote:
| Yeah, like so many other things in the GPU world, main RAM
| texture storage is more of a hint to the graphics card
| driver -- "this buffer isn't going away and won't change
| until I explicitly tell you otherwise".
|
| It definitely used to be that GPUs did real DMA texture
| reads though, at least in the early days of dedicated GPUs
| with fairly little local RAM. I'm thinking back to when the
| Mac OS X accelerated window compositor was introduced --
| the graphics RAM simply wouldn't have been enough to hold
| more than a handful of window buffers.
| pjc50 wrote:
| Smart people chuck the _encoded_ video at the GPU and let that
| deal with it: e.g. https://docs.nvidia.com/video-
| technologies/video-codec-sdk/n... ; very important on low end
| systems where the CPU genuinely can't do that at realtime
| speed. Raspberry Pi and so on.
|
| > 16.6ms
|
| That's sixteen million nanoseconds, you should be able to issue
| thirty million instructions in that time from an ordinary 2GHz
| CPU. A GPU will give you several _billion_. Just don 't waste
| them.
| cogman10 wrote:
| Agreed. GPUs support decoding a wide range of codecs (even
| though you are probably using something like H.264). So it
| doesn't make sense wasting the time to both decode the data
| and pipe it out to the GPU.
| no_time wrote:
| What I'm curious about is the actual hardware behind video
| sharing sites. Like how can Streamable,Reddit,Twitter encode such
| a massive amount of videos at scale. Do they have GPU farms?
| Dedicated hardware encoding hardware that us mortals can't buy? I
| left out YT on purpose because they have practically endless
| money to throw at the problem.
| VWWHFSfQ wrote:
| the last I checked the video metadata streamable is just using
| ffmpeg
| ggregoire wrote:
| Just for reference, I've got a server with a single RTX 2080
| that reencodes 32 HD streams in parallel with NVENC and the GPU
| is used at only 10%.
| GeneticGenesis wrote:
| Great question! The real answer is it varies, but for H.264,
| most just encode on software right now, because GPUs are
| expensive (especially in the cloud), and the failure rates are
| really high (if you try to build your own). ffmpeg and lib264
| is really fast on modern hardware with decent X86 extensions.
|
| It's also worth noting that YouTube also now builds its own
| transcoding chips [1], and AWS just launched a dedicated
| transcoding instances based on Xilinx chips:
|
| [1] https://arstechnica.com/gadgets/2021/04/youtube-is-now-
| build... [2] https://aws.amazon.com/ec2/instance-types/vt1/
| dmw_ng wrote:
| Software can be done at utterly stupid multiples of realtime at
| SD resolutions with only a few cores, depending on your quality
| target. Cores are very cheap
|
| Fancy GPUs tend to support 8 or more HD streams, even consumer
| cards using patched drivers.
|
| Then you have dedicated accelerator hardware, these can pack a
| tremendous amount of transcode into a tiny package. For example
| on AWS you have vt1 instances which support 8 (or 16?)
| simultaneous full HD/SD/QHD ladders at 2x realtime for around
| $200/mo.
|
| In answer to your actual question, at least YouTube selectively
| transcodes using fancier/more specific methods according to the
| popularity of the content. They do the cheap thing in bulk and
| the high quality thing for the 1% of content folk actually
| watch
| londons_explore wrote:
| Nearly all big sites just use ffmpeg and no hardware
| acceleration.
| dceddia wrote:
| Good overview of all the parts involved! I was hoping they'd talk
| a little more about the timing aspects, and keeping audio and
| video in sync during playback.
|
| What I've learned from working on a video editor is that "keeping
| a/v in sync" is... sort of a misnomer? Or anyway, it _sounds_
| very "active", like you'd have to line up all the frames and
| carefully set timers to play them or something.
|
| But in practice, the audio and video frames are interleaved in
| the file, and they naturally come out in order (ish - see
| replies). The audio plays at a known rate (like 44.1KHz) and
| every frame of audio and video has a "presentation timestamp",
| and these timestamps (are supposed to) line up between the
| streams.
|
| So you've got the audio and video both coming out of the file at
| way-faster-than-realtime (ideally), and then the syncing ends up
| being more like: let the audio play, and hold back the next video
| frame until it's time to show it. The audio updates a "clock" as
| it plays (with each audio frame's timestamp), and a separate loop
| watches the clock until the next video frame's time is up.
|
| There seems to be surprisingly little material out there on this
| stuff, but the most helpful I found was the "Build a video editor
| in 1000 lines" tutorial [0] along with this spinoff [1], in
| conjunction with a few hours spent poring over the ffplay.c code
| trying to figure out how it works.
|
| 0: http://dranger.com/ffmpeg/
|
| 1: https://github.com/leandromoreira/ffmpeg-libav-tutorial
| pjc50 wrote:
| > let the audio play, and hold back the next video frame until
| it's time to show it. The audio updates a "clock" as it plays
| (with each audio frame's timestamp), and a separate loop
| watches the clock until the next video frame's time is up.
|
| Yes .. but. They're interleaved within the container, but the
| encoder does not guarantee that they will be properly
| interleaved or even that they will be particularly temporally
| close to each other. So if you're operating in "pull" mode, as
| you should, then you may find that in order to find the next
| video frame you need to de-container (even if you don't fully
| decode!) a bunch of audio frames that you don't need yet, or
| vice versa.
|
| The alternative is to operate in "push" mode: decode whatever
| frames come off the stream, audio or video, and push them into
| separate ring buffers for output. This is easier to write but
| tends to err on the side of buffering more than you need.
| dceddia wrote:
| Interesting, I think I just dealt with this problem! I'd
| heard of the push/pull distinction but had interpreted it as
| "pull = drive the video based on the audio" and "push = some
| other way?". I think I saw "pull mode" referenced in the
| Chromium source and I had a hard time finding any definitive
| definition of push/pull.
|
| What I was originally doing was "push", then: pull packets in
| order, decode them into frames, put them into separate
| audio/video ring buffers. I thought this was fine and it
| avoided reading the file twice, which I was happy with.
|
| And then the other day, on some HN thread, I saw an offhand
| comment about how some files are muxed weird, like <all the
| audio><all the video> or some other pathological placement
| that would end up blocking one thread or another.
|
| So I rewrote it so that the audio and video threads are
| independent, each reading the packets they care about and
| ignoring the rest. I think that's "pull" mode, then? It seems
| to be working fine, the code is definitely simpler, and I
| realized that the OS would probably be doing some intelligent
| caching on the file anyway.
|
| Your mention of overbuffering reminds me, though - I still
| have a decent size buffer that's probably overkill now. I'll
| cut that back.
| mtrovo wrote:
| That's a really nice introduction. One newbie question: how is
| this influenced by DRM on the browser? Is it all the same plus
| some security on top or do videos with DRM use a proprietary
| codecs and packaging?
| GeneticGenesis wrote:
| DRM'd video content uses the same video codecs and containers,
| but introduces segment encryption during the packaging phase.
| In most cases, this encryption is within the audio and video
| samples rather than on the entire segment. Most content is
| encrypted using MPEG Common Encryption (CENC) - though there
| are a couple of variants.
|
| Decryption keys are then exchanged using one of the common
| proprietary DRM protocols, usually Widevine (Google), Playready
| (Microsoft), or Fairplay (Apple). The CDM (Content Decryption
| Module) in the browser is then passed the decryption key, so
| the browser. Can decrypt the content for playback.
| manorwar8 wrote:
| there is a nice series explaining DRM here
| https://sander.saares.eu/post/drm-is-not-a-black-box-01/
|
| I think the DRM part is made during the packaging phase the
| https://howvideo.works/#packaging (where they need to signalize
| the media has drm and also encrypt the media)
| rgovostes wrote:
| As good as thread as any to ask: I have a lot of footage that is
| recorded continuously over several hours. Most of the time
| nothing notable happens in the video, but I have a set of
| timestamps of interesting events. (Think security camera with
| motion detection events, though this is not that.)
|
| Currently I extract the frame at each interesting timestamp and
| serve them as JPEGs. I have a web-based playback UI that includes
| a scrubber and list of events to jump to a particular timestamp.
|
| I would love to upgrade to a real video player that streams the
| underlying video, allowing the user to replay footage surrounding
| a timestamp. I have to be able to reliably seek to an exact frame
| from the event list, though.
|
| I've been looking for something web-based, self-hostable, that
| doesn't require I transcode my media up front (e.g., break down
| into HLS chunks). I have few users accessing it at a time so I
| can transcode on the fly with some caching (though I think it is
| already H.262 or H.264). Is there anything suitable out there?
| dylan604 wrote:
| Just because a file is encoded as H.264 does not mean it is
| streamable. The encoding needs to be done in a way that makes
| streaming realistic.
|
| For the rest of it, I would suggest an ffmpeg solution on a
| server. You can have it re-encode just the requested times
| while encoding to a streaming friendly format. There are JS
| libraries available that allow you to use HLS in the native
| <video> tag.
| foo_barrio wrote:
| I had a similar issue and worked completely around it by using
| ffmpeg to generate a video file around the point of interest.
| Instead of serving up the JPEG as in your case, I would serve a
| small 10s mp4. It was something like (-5s to +5s). Trying to
| directly stream the file and seek was unreliable for me and I
| didn't want to get into setting up a full streaming server.
| cogman10 wrote:
| What is possible is going to depend a lot on the CPU you have
| and the media you have.
|
| That said, ffmpeg is going to be the best tool (IMO) to handle
| this. You may also look at a tool like Vapoursynth or AviSynth
| if you want to do any sort of preprocessing to the images.
|
| If the video is H.262 (or it is H.264 at a insane bitrate like
| 50Mbps), I'd encourage transcoding to something not as bitrate
| heavy. AV1 and HEVC are 2 of the best in class targets (but
| require a LOT of computational horsepower... OK there is also
| technically VVC, but nothing really supports that).
|
| If time is of the essence, then I'd suggest looking into what
| sort of codecs are supported by your CPU/GPU. They won't give
| you great quality but they will give you very fast transcoding.
| You'll want to target the latest codec possible.
|
| H.264 is pretty old at this point, H.265 (HEVC) or vp9 will do
| a better job at a lower bitrate if your card supports either.
| They are also relatively well supported. VP9 is royalty free.
|
| If your GPU or CPU do not support any recent codec, you might
| look into the SVT encoders for AV1/VP9, and x264/5 for H.264 or
| H.265.
|
| All this said, if the codec is fine and at a streamable
| bitrate, ffmpeg totally supports copying the stream from
| timeslices. You'll have to play around with buffering some of
| the stream so you can have ffmpeg do the slicing, but it's not
| too hard. That's the best option if the stream is streamable
| (transcoding will always hurt quality).
|
| Oh, and you'll very likely want to compile ffmpeg from source.
| The version of ffmpeg bundled with your OS is (likely) really
| old and may not have the encoders you are after. It's a huge
| PITA, but worth it, IMO. Alternatively you can likely find a
| build with all the stuff you want... but you'll need a level of
| trust in the provider of that binary.
| mrtksn wrote:
| Not to judge the content quality but I'm curious about the
| motivation to build this website. Why? Because in the recent days
| we were discussing the poor quality Google search results and one
| argument is that the results are poor because the quality of the
| content has degraded due to the motivations of the content
| creators, i.e. optimising everything for SEO and views, likes.
| etc. with guidance of analytics.
|
| So how this works? Is the creator of the website doing SEO here?
| Will this be sold later? Is it a project for a portfolio? Why are
| we getting this good quality content? Will it be surfaced by
| Google?
|
| Not that I suspect anything nefarious, just curious about how
| original content(beyond commentary on social media) is made these
| days. What made that content possible? Why the creator(s) spent
| time to create graphics and text and payed for a domain name and
| probably hosting?
| erikbye wrote:
| Well, he does link to his personal blog, which contains a link
| to his LinkedIn. His motivation might be personal marketing,
| job hunting.
|
| Or, he just likes to share knowledge for free, some people
| still find joy in that, fortunately.
|
| Then there's that cliche of how you learn more (or cementing
| your knowledge) by teaching. Sometimes that's all the
| motivation I need.
| GeneticGenesis wrote:
| We actually built How Video Works as a side project at Mux [1]
| (inspired by How DNS Works [2]) - there's a note about it at
| the top of the page. We have contributions from our own team as
| well as others in the industry.
|
| Our main motivation is to try to educate on the complexities
| and intracies of streaming video. Despite streaming video
| representing 80+% of the internet, it's all underpinned by a
| fairly small community of engineers, which we're eager to help
| grow through tools like this, and the Demuxed community [3].
|
| Edit: I should also mention that Leandro was kind enough to
| adapt a this content from his amazing Digital Video
| Introduction [4]
|
| [1] https://mux.com/ [2] https://howdns.works/ [3]
| https://2021.demuxed.com/ [4]
| https://github.com/leandromoreira/digital_video_introduction
| pdxandi wrote:
| I've been looking for something educational and introductory
| about like this for a while. Appreciate you folks at Mux
| putting this together.
| mrtksn wrote:
| Cool. So how do you measure success for this project? How
| would your analytics dashboard look like?
| ziggus wrote:
| Uh, if you read a bit about Leandro, you'll learn that he's a
| senior engineer at Grupo Globo in Brazil. I'll leave it to you
| to discover more about Globo.
| bambax wrote:
| Slightly OT, but the OP links to this video which I found very
| interesting:
|
| https://www.youtube.com/watch?v=VQOdmckqNro
|
| In it two guys discuss how to reconstruct an audio file from its
| image representation, and it turns out to be pretty
| straightforward.
|
| In the end they are discussing the legal implications of being
| able to reconstruct audio from an image: if you buy the rights to
| the image does it give you the right to the audio (probably not!)
|
| But what it makes me wonder is how one could maybe draw sound, or
| have some kind of generative art program that could be used to
| first draw a wave and then listen to it. Maybe this has been done
| already?
| dusted wrote:
| not as an art project, but software like milkytracker allows
| you to directly draw (short) waveforms with the mouse and play
| them back to listen to
| rlyshw wrote:
| A quick search for "latency" in here has one little hand-wavey
| blurb about Mux working to optimize HLS.
|
| >Using various content delivery networks, Mux is driving HTTP
| Live Streaming (HLS) latency down to the lowest levels possible
| levels, and partnering with the best services at every mile of
| delivery is crucial in supporting this continued goal.
|
| In my experience, HLS and even LLHLS are a nightmare for latency.
| I jokingly call it "High Latency Streaming", since it seems very
| hard to (reliably) obtain glass-to-glass latency in the LL range
| (under 4 seconds). Usually Latency with cloud streaming gets to
| at least 30+s.
|
| I've dabbled with implementing WebRTC solutions to obtain Ultra
| Low Latency (<1s) delivery but that is even more complicated and
| fragmented with all of the browsers vying for standardization.
| The solution I've cooked up in the lab with mediasoup requires an
| FFMPEG shim to convert from MPEGTS/h264 via UDP/SRT to MKV/YP9
| via RTP, which of course drives up the latency. Mediasoup has a
| ton of opinionated quirks for RTP ingest too, of course. Still
| I've been able to prove out 400ms "glass-to-glass" which has been
| fun.
|
| I wonder if Mux or really anyone has intentions to deliver
| scalable, on cloud or on prem solutions to fill the web-native
| LL/Ultra LL void left by the death of flash. I'm aware of some
| niche solutions like Softvelum's nimble streamer, but I hate
| their business model and I don't know anything about their
| scalability.
| GeneticGenesis wrote:
| Hey, I work in the Product team at Mux, and worked on the LL-
| HLS spec and our implementation, I own our real-time video
| strategy too.
|
| We do offer LL-HLS in an open beta today [1], which in the best
| case will get you around 4-5 seconds of latency on a good
| player implementation, but this does vary with latency to our
| service's origin and edge. We have some tuning to do here, but
| best case, the LL-HLS protocol will get to 2.5-3 seconds.
|
| We're obviously interested in using WebRTC for use cases that
| require more real-time interactions, but I don't have anything
| I can publicly share right now. For sub-second streaming using
| WebRTC, there are a lot of options out there at the moment
| though, including Millicast [2] and Red5Pro [3] to name a
| couple.
|
| Two big questions comes up when I talk to customers about
| WebRTC at scale:
|
| The first is how much reliability and perceptual quality people
| are willing to sacrifice to get to that magic 1 second latency
| number. WebRTC implementations today are optimised for latency
| over quality, and have a limited amount of customisability - my
| personal hope is that the client side of the WebRTC will become
| more unable for PQ and reliability, allowing target latencies
| of ~1s rather than <= 200ms.
|
| The second is cost. HLS, LL-HLS etc. can still be served on
| commodity CDN infrastructure, which can't currently serve
| WebRTC traffic, making it an order of magnitude cheaper than
| WebRTC.
|
| [1] https://mux.com/blog/introducing-low-latency-live-
| streaming/ [2] https://www.millicast.com/ [3]
| https://www.red5pro.com/
| majormajor wrote:
| It's usually layers of HLS at that. For live broadcasts,
| someone has a camera somewhere. Bounce that from the sports
| stadium to a satellite, and someone else has a satellite
| pulling that down. So far so good, low latency.
|
| But that place pulling down the feed usually isn't the
| streaming service you're watching! There are third parties in
| that space, and third party aggregators of channel feeds, and
| you may have a few hops before the files land at whichever
| "streaming cable" service you're watching on. So even if they
| do everything perfectly on the delivery side, you could already
| be 30s behind, since those media files and HLS playlist files
| have already been buffered a couple times since they can come
| late or out of order at any of those middleman steps. Going
| further and cutting all the acquisition latency out? That
| wasn't something really commonly talked about a few years ago
| when I was exposed to the industry. It was complained about
| once a year for the Super Bowl, and then fell down the backlog.
| You'd likely want to own in-house signal acquisition and build
| a completely different sort of CDN network.
|
| Last I talked to someone familiar with it, the way stuff that
| cares about low latency (like streaming video game services)
| does it is much more like what you talk about with custom
| protocols.
| thrashh wrote:
| The funny thing is that the web used to have a well-supported
| low latency streaming protocol... and it was via Flash. When
| the world switched away from Flash, we created a bunch of
| CDN-friendly formats like HLS but by their design, they
| couldn't be low latency.
|
| And it broke all my stuff because I was relying on low
| latency. And I remember reading around at the time -- not a
| single person talked about the loss of a low latency option
| so I just assumed no one cared for low latency.
| slimscsi wrote:
| Flash "low latency" was just RTMP. CDNs used to offer RTMP
| solutions, but they were always priced significantly higher
| than their corresponding HTTP solutions.
|
| When the iPhone came out, HTTP video was the ONLY way to
| stream video to it. It was clear Flash would never be
| supported on the iPhone. Flash was also a security
| nightmare.
|
| So in that environment, The options were:
|
| 1) Don't support video on iOS
|
| 2) Build a system that can deliver video to iOS, but keep
| the old RTMP infrastructure running too.
|
| 3) Build a system that can deliver video to iOS, Deprecate
| the old RTMP infrastructure. This option also has a
| byproduct of reduced bandwidth bills.
|
| For a company, Option 3 is clearly the best choice.
|
| edit: And for the record, latency was discussed a lot
| during that transition (maybe not very publicly). But
| between needing iOS support, and reducing bandwidth costs,
| latency was a problem that was decided to be solved later.
| thrashh wrote:
| I'm familiar with all of what you're saying. I set up
| RTMP servers.
|
| I'm more taking from the standpoint of like Apple or
| Google. HLS is by Apple after all.
| londons_explore wrote:
| Google puts quite a lot of effort into low latency
| broadcast for their Youtube Live product. They have
| noticed that they get substantially more user retention
| if there are a few seconds of latency vs a minute. When
| setting up a livestream, there are even choices for the
| user to trade quality for latency.
|
| That's mostly because streamers want to interact with
| their audience, and lag there ruins the experience.
| torginus wrote:
| What's wrong with WebRTC? Other than it not being simple. In my
| experience it's supported well enough by browsers. On the
| hosting side, you've got Google's C++ implementation, or you
| there's a GStreamer backend, so you can hook it up with
| whatever GStreamer can output. In the stuff I'm doing for work,
| we can get well below 100ms latency out of it. Since Google
| uses it for Stadia, i'm pretty sure it can do far better than
| that? What do you need low latency for, what's your use case?
| Video conferencing? App/Game streaming?
| slimscsi wrote:
| Cost and scale. HTTP video is significantly cheaper to
| deliver because of the robust and competitive CDN market.
|
| You can deliver all your video via WebRTC with lower latency,
| but your bandwidth bill will be an order of magnitude higher.
| torginus wrote:
| But if you are using a CDN you are not really streaming,
| are you?
| slimscsi wrote:
| "Streaming" in the media industry just means you don't
| need to download the entire file before playing it back.
| The majority of streaming services use something like HLS
| or DASH that breaks up the video into a bunch of little 2
| to 10 seconds files. The player will then download them
| as needed.
|
| But even then, many CDNs CAN "stream" using chunked
| transfer encoding.
| dmw_ng wrote:
| It's just packet switching with much larger packets, the
| streaming you're thinking of is essentially the same,
| just with 16-50 ms sample size rather than 2-10 seconds.
| rlyshw wrote:
| Love this. A great point. HLS via CDN is really just
| "downloading files but the source is provided kinda fast"
| rlyshw wrote:
| Yeah as the sibling comment mentions these WebRTC
| implementations do not scale. While you "can hook it up" for
| hyper-specific applications and use cases, it does not scale
| to say an enterprise, where a single SA needs to support LL
| streaming out to tens of thousands of users.
|
| I imagine the (proprietary) stadia implementation is highly
| tuned to that specific implementation, with tons of control
| over the video source (cloud GPUs) literally all the way down
| to the user's browser(modern chrome implementations). Plus
| their scale likely isn't in the tens of thousands from a
| single origin. Even still, I continue to be blown away by the
| production latency numbers achieved by game streaming
| services.
|
| And my use-case is no use-case or every use-case. I'm just a
| lowly engineer that has seen this gap in the industry.
| relueeuler wrote:
| What makes you write that "these" WebRTC implementations do
| not scale? Which implementations do you have in mind and
| why do you think they do not scale? Where do they fall
| over, and at what point?
| itisit wrote:
| Live streaming latency does not jibe well with sports. I've
| since learned to disable any push notifications that reveal
| what happened 30 seconds prior to my witnessing it. What can be
| done, at scale, to get us back to the "live" normally
| experienced with cable or satellite?
| giantrobot wrote:
| > What can be done, at scale, to get us back to the "live"
| normally experienced with cable or satellite?
|
| Stick with satellite distribution? You're going to have a
| devil of a time scaling any sort of real-time streaming over
| an IP network. Every hop adds some latency and scaling pretty
| much requires some non-zero amount of buffering.
|
| IP Multicast might help but you have to sacrifice bandwidth
| for the multicast streams and have support all down the line
| for QoS. It's a hard problem which is why no one has cracked
| it yet. You need a setup with real-time capability from
| network ingest, through peering connections, all the way down
| to end-user terminals.
| keithwinstein wrote:
| Hmm, we're getting <200 ms glass-to-glass latency by streaming
| H.264/MP4 video over a WebSocket/TLS/TCP to MSE in the browser
| (no WebRTC involved). Of course browser support for this is not
| universal.
|
| The trick, which maybe you don't want to do in production, is
| to mux the video on a per-client basis. Every wss-server gets
| the same H.264 elementary stream with occasional IDRs, the
| process links with libavformat (or knows how to produce an MP4
| frame for an H.264 NAL), and each client receives essentially
| the same sequence of H.264 NALs but in a MP4 container made
| just for it, with (very occasional) skipped frames so the
| server can limit the client-side buffer.
|
| When the client joins, the server starts sending the video
| starting with the next IDR. The client runs a JavaScript
| function on a timer that occasionally reports its sourceBuffer
| duration back to the server via the same WebSocket. If the
| server is unhappy that the client-side buffer remains too long
| (e.g. minimum sourceBuffer duration remains over 150 ms for an
| extended period of time, and we haven't skipped any frames in a
| while), it just doesn't write the last frame before the IDR
| into the MP4 and, from an MP4 timestamping perspective, it's
| like that frame never happened and nothing is missing. At 60
| fps and only doing it occasionally this is not easily
| noticeable, and each frame skip reduces the buffer by about 17
| ms. We do the same for the Opus audio (without worrying about
| IDRs).
|
| In our experience, you can use this to reliably trim the
| client-side buffer to <70 ms if that's where you want to fall
| on the latency-vs.-stall tradeoff curve, and the CPU overhead
| of muxing on a per-client basis is in the noise, but obviously
| not something today's CDNs will do for you by default. Maybe
| it's even possible to skip the per-client muxing and just
| surgically omit the MP4 frame before an IDR (which would lead
| to a timestamp glitch, but maybe that's ok?), but we haven't
| tried this. You also want to make sure to go through the
| (undocumented) hoops to put Chrome's MP4 demuxer in "low delay
| mode": see
| https://source.chromium.org/chromium/chromium/src/+/main:med...
| and
| https://source.chromium.org/chromium/chromium/src/+/main:med...
|
| We're using the WebSocket technique "in production" at
| https://puffer.stanford.edu, but without the frame skipping
| since there we're trying to keep the client's buffer closer to
| 15 seconds. We've only used the frame-skipping and per-client
| MP4 muxing in more limited settings
| (https://taps.stanford.edu/stagecast/,
| https://stagecast.stanford.edu/) but it worked great when we
| did. Happy to talk more if anybody is interested.
|
| [If you want lower than 150 ms, I think you're looking at
| WebRTC/Zoom/FaceTime/other UDP-based techniques (e.g.,
| https://snr.stanford.edu/salsify/), but realistically you start
| to bump up against capture and display latencies. From a UVC
| webcam, I don't think we've been able to get an image to the
| host faster than ~50 ms from start-of-exposure, even capturing
| at 120 fps with a short exposure time.]
| slhck wrote:
| This is really interesting. Have you published this approach
| somewhere? It'd be nice to read more about it.
| keithwinstein wrote:
| Thanks! The basic video-over-WebSocket technique was part
| of our paper here: https://puffer.stanford.edu/static/puffe
| r/documents/puffer-p...
|
| Talk here: https://www.youtube.com/watch?v=63aECX2MZvY&feat
| ure=youtu.be
|
| Code here: https://github.com/StanfordSNR/puffer
|
| The "per-client muxing with frame skipping" code is
| something we used for a few months for our Stagecast
| project to a userbase of ~20, but not really "in prod":
| https://github.com/stanford-
| stagecast/audio/blob/main/src/fr...
|
| Client-side JS here: https://github.com/stanford-
| stagecast/audio/blob/main/src/we...
| Scaevolus wrote:
| Aha, you worked on Salsify too!
|
| Dropped the last frame before an IDR is a very clever
| hack to sync things up.
| [deleted]
___________________________________________________________________
(page generated 2022-01-04 23:01 UTC)