[HN Gopher] Show HN: WebRTC Nuts and Bolts, A holistic way of un...
       ___________________________________________________________________
        
       Show HN: WebRTC Nuts and Bolts, A holistic way of understanding how
       WebRTC runs
        
       Hi HN!  I'm so excited to show my first open-source project and
       first post here.  I initially started this project to learn Go
       language, it is an experimental project. The main goal is to track
       the adventure of a WebRTC stream from start to finish, by debugging
       the project or tracking the output at console.  By trying out this
       project, you will deep dive into the steps which are taken while
       starting up a WebRTC session, and more.  It consists of a web UI
       (TypeScript) and a server back-end (Golang) projects. They can run
       on Docker containers, in development mode or production mode, you
       can find details in the README file.  After some progress on the
       development, I decided to pivot my experimental work to a
       walkthrough document. Because although there are lots of resources
       that exist already on the Internet, they cover small chunks of
       WebRTC concepts or protocols atomically. And they use the standard
       way of inductive method which teaches in pieces then assembles
       them.  But my style of learning leans on the deductive method
       instead of others, so instead of learning atomic pieces and
       concepts first, going linearly from beginning to the end, and
       learning an atomic piece on the time when learning this piece is
       required.  I know it's in a very niche technical domain, but hope
       you will like my project. Please check it out and I'd love to read
       your thoughts!  https://github.com/adalkiran/webrtc-nuts-and-bolts
        
       Author : adalkiran
       Score  : 107 points
       Date   : 2022-05-29 10:27 UTC (12 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jawmes8 wrote:
       | Been learning Go recently too and WebRTC sounds like a great
       | path. I realize this is a educational resource, but I wish all
       | codebases were laid out like this! Each file and its purpose
       | clearly laid out.
        
         | adalkiran wrote:
         | Thanks for your nice comment :)
        
       | sibit wrote:
       | Is anyone on HN using WebRTC in production? I recall watching a
       | conference talk by Martin Kleppmann a few years ago where he was
       | discussing CRDTs and Automerge. He mentioned how they attempted
       | to use WebRTC but it wasn't reliable so they had to use Web
       | Sockets with a custom message relay server instead.
        
         | billylindeman wrote:
         | We use it at https://tandem.chat. I work on the AV stack.
         | WebRTC is pretty awesome.
        
         | monsecchris wrote:
         | My software uses it to send video, audio and metadata from a
         | C++ server to the browser. I found WebRTC to be an nightmare
         | causing this feature to take months longer than expected to
         | implement.
        
           | Sean-Der wrote:
           | What library did you use for WebRTC? What were the pain
           | points you hit?
        
             | monsecchris wrote:
             | Originally used googles WebRTC but it didn't offer enough
             | control, now we use libdatachannel. Getting the peer
             | connections setup, getting what we wanted out of the SDP
             | negotiation, restructuring the video and audio data in the
             | correct format were some issues I remember running in to.
        
         | dimgl wrote:
         | Pretty sure Discord uses WebRTC.
        
         | cjsawyer wrote:
         | Not in customer hands yet but yes, webrtc with a WSS to signal.
         | It's been incredibly reliable
         | 
         | Edit: that's Web Socket Server
        
         | kwindla wrote:
         | Google Meet is WebRTC in production. As is Discord video and
         | audio, Facebook messenger video, and Whatsapp calls. There are
         | several WebRTC-as-a-service platforms that are relatively large
         | and are used by diverse applications (meetings, telehealth,
         | teaching/tutoring, events). I co-founded and work at one of
         | them (Daily.co YC W16).
         | 
         | WebRTC is almost the only choice for low-latency video and
         | audio inside a web browser. The open source libwebrtc [1]
         | implementation that's in Chromium and Safari is now mature
         | enough to be used in other native applications if you have a
         | medium-sized engineering team and are comfortable with C++.
         | (Again, WebRTC-as-a-service platforms often provide native
         | libraries that wrap libwebrtc to give you easier to use full
         | stack iOS, Android, (etc) SDKs.)
         | 
         | The three big challenges with WebRTC are that low-latency media
         | is its own domain and the learning curve is steep, that scaling
         | sessions to more than two or three people requires a lot of
         | server-side packet routing code (you can't do pure peer-to-peer
         | with lots of participants), and that there aren't yet the
         | mature "off the shelf" cloud building blocks that exist for
         | HTTP-ish workloads.
         | 
         | WebRTC data channels are the non-video/audio part of the WebRTC
         | spec. My hot take: data channels are rarely the right solution
         | to any problem description that doesn't start with, "well, I
         | already have a WebRTC transport open ..."
         | 
         | [1] https://webrtc.googlesource.com/src
        
           | shaunxcode wrote:
           | I think the trick is whether they are using TURN or STUN wrt
           | reliability.
        
         | wbobeirne wrote:
         | Our team got off the ground really quickly using
         | https://github.com/feross/simple-peer to handle the majority of
         | the WebRTC client implementation. We're sending video and
         | voice, so websockets aren't feasible. I'd say it was a lot
         | easier than I expected coming in cold, and about 95% of
         | connections establish quickly and don't have any problems.
         | 
         | However for that remaining 5%, I have a lot to learn. Using an
         | abstraction is great when it works, but I'm interested in going
         | through OP's project to get a better sense of what's happening
         | when things go wrong.
        
           | feross wrote:
           | Glad that simple-peer was helpful to you :)
        
           | adalkiran wrote:
           | You're so right. In most cases, you won't have a problem if
           | you implemented it with existing products, but for 5%, lots
           | of cases may happen :) The complete reason behind WebRTC Nuts
           | and Bolts is this!
        
         | greenpizza13 wrote:
         | We use WebRTC in production as a way to provide remote desktop
         | access to data center computers in a browser. It has its ups
         | and downs.
        
         | other_herbert wrote:
         | I am.. my use case is a computer with two devices in the same
         | building and they share data over webrtc datachannels..
         | 
         | There are some quirks in getting all the signaling working but
         | there's now more of a standard to do that process the right
         | way...
         | 
         | Anyway since the devices are on the same network latency is
         | nearly 0...
         | 
         | I do have an ack and retry process ... I should add logging to
         | see how often that happens though
        
         | defied wrote:
         | We use it at https://testingbot.com to provide a realtime video
         | stream of remote desktops and mobile device screens. Mostly
         | with Pion (Go)
        
         | adalkiran wrote:
         | Hi, we used for development of a browser-based video
         | conferencing system with my team at the company I used to work
         | for. You're right, WebRTC datachannels may be unreliable. But
         | the nature of UDP is prone to packet loss, unordered packets
         | etc.. To solve these problems, the WebRTC standard offers some
         | error detection/correction/prevention technics. I used it in
         | production for video/audio transfer but I preferred for data
         | streaming (and also signaling) WebSockets. The WebRTC standard
         | requires some quirks, so it is the reason why my project
         | (WebRTC Nuts and Bolts) was born.
        
         | Sean-Der wrote:
         | Yes! I have used it in production with these.
         | 
         | * FaceTime @ Apple https://support.apple.com/en-us/HT212619
         | 
         | * KVS and Chime @ AWS https://github.com/awslabs/amazon-
         | kinesis-video-streams-webr.... Lots of security cameras and
         | robots use it, not public though.
         | 
         | * Lightstream https://golightstream.com . Cloud compositing and
         | other magic.
         | 
         | I also have something I am working on now that isn't public yet
         | that is using WebRTC. Really excited to see what people build
         | with it/what it inspires next.
         | 
         | It is kind of amazing everywhere you will find WebRTC. Stadia,
         | Boston Dynamics, Zoom, Meet, Security Systems, Drones etc... It
         | is probable that you use WebRTC in production everyday :)
        
           | thecleaner wrote:
           | I think Zoom doesn't use WebRTC, they have their own decoder,
           | transmission, retry layer. If you try to transmit raw frames
           | directly, you end up with codec issues since frames have
           | dependencies among them. E.g. VPX series cannot decide frames
           | without a golden frame or a key frame so these need to be
           | kept as state for a decoder. Every other intra-frame needs to
           | refer to these frames and there's no limit usually as to how
           | many intra-frames one can have between inter-frames (key-
           | frame or a golden frame).
        
             | Sean-Der wrote:
             | I don't work at Zoom/only have outsider information. It
             | looks like they are using Media over DataChannels[0]. So
             | they are still using WebRTC!
             | 
             | When Google announced WebCodecs/WebTransport they said Zoom
             | was involved, so maybe they will switch to that eventually?
             | 
             | [0] https://webrtchacks.com/zoom-avoids-using-webrtc/
        
               | thecleaner wrote:
               | Does data channels over WebRTC enforce different
               | semantics than say QUIC/Http2 ? I understand that the
               | difference b/w TCP and WebRTC based comm would be
               | application level transmission guarantee but does it
               | really differ from UDP based HTTP implementations ?
        
               | kwindla wrote:
               | The biggest problem with sending media over data channels
               | is that there's no good way to do bandwidth estimation.
               | Data channels weren't designed to be used for media, and
               | the current WebRTC spec (and javascript implementation)
               | doesn't expose enough control of either the codec or the
               | network stack to implement real bandwidth estimation and
               | bandwidth control. This is presumably the main reason
               | that Zoom's in-browser implementation is so limited in
               | functionality.
               | 
               | There's a spec for RTP over QUIC [1]. It's really cool!
               | But obviously very early days.
               | 
               | [1] https://datatracker.ietf.org/doc/draft-engelbart-rtp-
               | over-qu...
        
         | pledess wrote:
         | In research by Stanford a few years ago, WebRTC was
         | substantially worse than some other communication systems if
         | the available data rate of the network connection varies in
         | certain ways (e.g., the available data rate becomes lower for
         | about ten seconds):
         | https://www.youtube.com/watch?v=nuI4F5akBIs&t=2571s
        
           | Sean-Der wrote:
           | This conflates implementation and protocol. You can use w/e
           | Bandwidth Estimator/Congestion Control algorithm you want.
           | 
           | I think it is worth measuring how Google's implementation
           | works, but it is tuned for a very specific use case by a
           | single company.
        
             | keithwinstein wrote:
             | (Speaker in that video here)
             | 
             | You're right that this is measuring the WebRTC.org codebase
             | (as used in Chrome, Firefox, etc.), not necessarily the
             | WebRTC protocol. Better URL and demo/talk videos are here:
             | https://snr.stanford.edu/salsify
             | 
             | But the issue probably isn't with the bandwidth estimator
             | or congestion-control algorithm -- you probably can't fix
             | this by taking some WebRTC implementation and plugging in
             | better ones. The core issues as we see them are about the
             | architecture of the WebRTC.org codebase, and frankly all
             | WebRTC source/sink implementations we're aware of, in
             | particular:
             | 
             | (a) even with perfect bandwidth estimation, libvpx and
             | libx264 (and, we think, typical hardware encoders) as
             | configured are bad at achieving the requested bitrate over
             | short timescales, meaning that "overlarge" coded frames are
             | regularly being sent, and techniques like reference
             | invalidation or golden/altref encoding are never (?) used
             | to skip sending an overlarge coded frame or to retry
             | encoding the same frame at a lower quality before sending.
             | [At a technical level, the interface between the encoder
             | VBV buffer model and the bandwidth estimator/CC algorithm
             | is very inconvenient -- to have these two control loops
             | running independently, trying to do similar things at
             | similar timescales, isn't great.]
             | 
             | (b) loss recovery does not work well, and again features
             | like reference invalidation to recover quickly do not seem
             | to be well-used in practice (the second half of
             | https://youtu.be/jaDelb4JnP4 makes this pretty clear),
             | 
             | (c) the WebRTC.org codebase is so complex, with so many
             | modes that it can get settled in, that trying to reason
             | about these behaviors or explore them systematically is
             | quite challenging, and
             | 
             | (d) because there are several layers of buffering on the
             | receiver-side, and because the sender-side code will change
             | things like the camera's frame rate, it's hard to measure
             | application-level metrics [e.g. lens-to-display and
             | microphone-to-speaker latency] robustly in a deployment,
             | especially across a diverse hardware or OS base. (And it's
             | easy to get a false sense of security from the network-
             | level WebRTC metrics that are available.)
             | 
             | It's probably possible to produce a WebRTC source/sink
             | implementation that works better over challenged networks
             | and has good monitoring of application-level latency, but
             | it would be a big job afaik. Our work was partly funded by
             | Google, we had a high-up Google sponsor, we gave multiple
             | talks at Google, etc., but it was challenging even to find
             | "the people in charge" of this cross-modular stuff to talk
             | with them, because I think the codebase in some respects
             | mirrors the org chart. E.g. you have video compression
             | people worrying about the encoder (and wanting to be able
             | to plug in libvpx, libx264, and a bunch of hardware
             | encoders to the same interface), and networking people
             | worrying about the bandwidth estimator and CC algorithm,
             | and it's sort of way too late to say that the interfaces or
             | architecture needs to be refactored or that the complexity
             | has gotten out of control. To Google's credit, they have
             | since driven the industry to produce standardized APIs for
             | "functional" codecs, and functional decoder ASICs now exist
             | (not sure about encoders yet), so there is progress being
             | made on that front at least.
        
         | bryans wrote:
         | It may not meet the qualifications of "production," but about a
         | year ago I made an OBS.Ninja clone with some specifics for live
         | streamers, and I was extremely satisfied with the reliability
         | of WebRTC -- multiple hour streams with zero dropouts, and
         | without any fancy code to handle reconnects or adjusting for
         | lower bandwidth. It just kinda magically works. The browser
         | implementations are an absolute disaster, but if you can make
         | those limitations work for the project (or if you don't need to
         | use a browser at all), then I'd feel pretty confident using it
         | in production and at scale.
         | 
         | NewTek actually uses WebRTC for NDI's remote networking, and
         | while the NDI software itself is prone to crashing and probably
         | not usable for production, the connection to the remote system
         | is never an issue.
        
           | hrnn wrote:
           | Did you use homebrew stun/turn servers too?
           | 
           | I don't find the webrtc signaling and set up particularly
           | noteworthy, but once you try to connect nodes on different
           | networks you're pretty much dependant on some third party.
        
             | bryans wrote:
             | Luckily there are a lot of cheap STUN/TURN services out
             | there, and if you really need something under your control,
             | there are containerized projects on GH that make it easy to
             | run your own. Though even when I used it as a Zoom
             | replacement for meetings, I never ran into a situation
             | where TURN was necessary, and that includes people behind
             | corporate firewalls. It seems as though corporate netops
             | learned some lessons during the pandemic and loosened
             | restrictions.
        
         | foxbarrington wrote:
         | I use it for Rambly.app. It works pretty well, but there are
         | definitely cases where it fails for some people.
        
         | fancy_pantser wrote:
         | Absolutely! https://www.daily.co/
        
       | sanjayts wrote:
       | Good stuff; approximately how long did this project take to
       | develop?
        
         | adalkiran wrote:
         | Hi, approximately it took my two months, including the
         | documentation, without a full-time strict calendar. But before
         | it, I worked on development of a browser-based video
         | conferencing system (using open-source modules, with some other
         | languages except Go), so I have enough know-how about the
         | domain, since the start of the pandemic (of course not deeply
         | as this project).
        
       ___________________________________________________________________
       (page generated 2022-05-29 23:01 UTC)