[HN Gopher] How to build large-scale end-to-end encrypted group ...
___________________________________________________________________
How to build large-scale end-to-end encrypted group video calls
Author : jiripospisil
Score : 147 points
Date : 2021-12-15 20:06 UTC (2 hours ago)
(HTM) web link (signal.org)
(TXT) w3m dump (signal.org)
| johnisgood wrote:
| Great, now they should just stop using telephone numbers as
| identifiers.
| maxwell wrote:
| What do you suggest?
| sam_lowry_ wrote:
| A login+password, like in IRC.
| tptacek wrote:
| IRC tracks metadata serverside!
| johnisgood wrote:
| I do not think that OP was referring to implementing it
| the same, or even similar way, but to use a
| username/password pair. OP is free to correct me if I am
| wrong though.
| zamadatix wrote:
| Signal has had standard usernames on the roadmap for years.
| johnisgood wrote:
| Usernames work. You could even use UUIDs these days as QR is
| an increasingly common way of sharing data. But yeah,
| usernames would be a great improvement.
| tptacek wrote:
| Usernames _do not just work_. The Signal team is not
| unaware of usernames and Signal is not a weird scheme to
| get all your phone numbers. The difference between Signal
| and systems that use usernames (or email addresses) is that
| Signal deliberately doesn 't operate a serverside directory
| or buddy list service. By contrast, other relatively
| popular messengers essentially keep a plaintext database of
| who talks to who on their service.
|
| What phone numbers allow Signal to do is to piggyback off
| the contact lists people already have on their devices.
| kitkat_new wrote:
| > is that Signal deliberately doesn't operate a
| serverside directory or buddy list service.
|
| how do people again discover each other on Signal most of
| the time?
|
| Anyways, nothing prevents Signal form creating it's own
| contact list within the app, perhaps bootstrapped from
| the existing one
| tptacek wrote:
| They can do that, but then when you switch devices, you
| lose your contact list. That's not what happens with the
| built-in contact list.
|
| This issue has been rehashed dozens of times on HN before
| (use the search bar below) and has basically nothing to
| do with the article.
| kitkat_new wrote:
| actually, the contact list could include the signal
| identifiers
| stormbrew wrote:
| I mean, phone numbers also don't really "work." Do you
| know how many old phone numbers I have in my phone's
| contact list that aren't actually owned by the person
| they're listed on anymore? Using signal I get "Person You
| Knew 10 Years Ago Is On Signal!" notifications every now
| and then and.. yeah I can assure you that's not them.
|
| For example, I have literally 6 phone numbers in my phone
| for my sister because every time she job hops she ends up
| with a new number. I'm not even sure which one is
| actually her.
|
| Phone numbers are not permanent identities, any more than
| usernames or email addresses are. There's no single
| perfect answer to identity online and if there is, I'm
| sorry, it's not a number that can be changed, stolen,
| lost, etc.
| [deleted]
| remus wrote:
| I don't know what their threat model is but it's interesting that
| they don't seem too bothered about reducing meta data collection
| potential on the server. I bet you could put together some pretty
| interesting graphs of who is talking to who, how much they talk
| and when.
| tptacek wrote:
| Their messaging substrate is Signal itself, for whatever that's
| worth, so at least the signaling component of the system should
| inherit the guarantees Signal already makes. But it's a good
| question.
| Naac wrote:
| >> There is no off the shelf software that would allow us to
| support calls of that size while ensuring that all communication
| is end-to-end encrypted, so we built our own open source Signal
| Calling Service to do the job
|
| But wasn't there Jitsi? [0]
|
| I think its great we have competition among Free Software
| projects so that both can improve. But sometimes I feel like
| maybe duplicated efforts create two 5/10 solutions. Instead what
| we really want is one 8/10 solution, or better.
|
| [0] https://meet.jit.si/
| estaseuropano wrote:
| While I love jitsi, i don't think it is E2E?
| dest wrote:
| AFAIK it's E2E for 1:1 video chats, but not when more are
| there.
| bilal4hmed wrote:
| Jitsi does support e2ee for groups as well
| https://jitsi.org/e2ee-in-jitsi/
| [deleted]
| Naac wrote:
| AFAIK this _was_ a work in progress[0]. I am not sure what
| the status of this is now.
|
| [0] https://jitsi.org/blog/e2ee/
| jkepler wrote:
| I think Jitsi group calls can be end to end encrypted,
| provided all participants use Chromium 83, per
| https://jitsi.org/security/.
| Vinnl wrote:
| It's the first of the links where they say "When building
| support for group calls, we evaluated many open source SFUs",
| so I suppose it's either not one of the two with "adequate
| congestion control", or is the one that did not reliably scale
| past 8 participants?
| landstrom wrote:
| Daily.co has a developer friendly offering that accomplishes
| this as well. Many offerings available and many reasons to not
| take on this added complexity.
| jcelerier wrote:
| As much as I like Jitsi conceptually, it has consistently
| performed much more poorly than Zoom starting from 5/6 ppl
| skybrian wrote:
| There is some duplication of effort but sometimes progress
| happens via rewrites and that might actually be a faster way to
| an 8/10 system than direct collaboration?
|
| Also I think it's interesting to see how this builds on
| Google's work (the googcc algorithm). Which of course builds on
| previous open source work. The underlying technical
| collaboration happens even with quite different organizational
| goals and different codebases.
| [deleted]
| johnisgood wrote:
| There is also https://jami.net/. I have no clue how group video
| calls are implemented though. It seems like it is not an easy
| thing to do.
|
| https://wire.com/en/ seems to support it, too, although not
| exactly "large-scale". Audio calls allow for up to 100
| participants, for one.
| 1vuio0pswjnm7 wrote:
| "Full mesh: Each call participant sends its media (audio and
| video) directly to each other call participant. This works for
| very small calls, but does not scale to many participants. Most
| people just don't have an Internet connection fast enough to send
| 40 copies of their video at the same time.
|
| Server mixing: Each call participant sends its media to a server.
| The server "mixes" the media together and sends it to each
| participant. This works with many participants, but is not
| compatible with end-to-end encryption because it requires that
| the server be able to view and alter the media.
|
| Selective Forwarding: Each participant sends its media to a
| server. The server "forwards" the media to other participants
| without viewing or altering it. This works with many
| participants, and is compatible with end-to-end-encryption."
|
| Imagine an end user who is interested in "very small calls" with
| friends and family. She is not interested in communicating to an
| infinitely large audience ("broadcasting"). She never has group
| calls on Signal with 40 people. We have to use our imagination
| because this user does not actually exist.
|
| The imaginary user reads this blog post and she thinks to herself
| "Full mesh sounds like the best design. There is less/no reliance
| on a third party, traffic does not need to be sent to a third
| party server." With full mesh, there is no need to mention the
| caveat "without viewing or altering it" (or selectively choosing
| not to forward it to certain recipients). Full mesh seems to give
| the user the most control and require the least dependence on
| third party servers (not necessarily none, but the least).
|
| Then she reads this line: "Because Signal must have end-to-end
| encryption _and scale to many participants_ , we use selective
| forwarding."
|
| The make-believe user wonders "Why must Signal scale to many
| particpants." For this user, "scal[ing] to a many participants"
| appears to be an artificial constraint. She has no such need.
| "Perhaps Signal is not designed for users like me. Maybe Signal
| is trying to compete with Facebook, TikTok, Zoom, etc. Signal is
| supposedly non-commercial and should be free from such pressures
| to compete. Does this mean that if I make a call to two people,
| the traffic has to be sent to third party servers so they can
| "forward" the audio/video the appropriate recipients."
|
| "Why can't I be the one to choose at run-time whether full mesh
| or selective forwarding is used."
|
| Finally she comes to her senses. "This blog post was not written
| for me. It seems to be a form of show and tell by the people
| working at Signal not an birectional dialogue with Signal users."
| prophesi wrote:
| Just an FYI full mesh would still require communicating with a
| third-party server, at the very least for initial networking
| when joining/leaving a group call.
|
| The whole point of E2E encryption is so that passing data
| through a third party shouldn't matter in the first place.
|
| And lastly, even when you have just a 1:1 video chat, sending
| and receiving full resolution/quality multimedia can still be
| way too much for some peoples' internet connections. UX is
| extremely important for Signal, as unreliable video chat is a
| surefire way for those less caring about privacy to hop back
| over to a privacy-violating alternative.
|
| I feel sorry for those working on bringing security/privacy to
| everyone, as they have to appease power users and privacy
| absolutists, along with one's grandmother and the TikTok
| generation.
| sneak wrote:
| They have the bandwidth for relaying video streams to 40 people
| but won't let me send full res jpegs in 1:1 messages?
|
| And no, I can't just rebuild my client, because I'm on iOS and
| non-official builds won't receive push notifications from the
| official developers.
| Vinnl wrote:
| That's not really related to this article, but I can select
| photo quality if I send a photo on Android. Appears to have
| been added in May.
| sneak wrote:
| The article specifically mentions that they operate the
| infrastructure for relaying encrypted video streams for up to
| 40 participants.
|
| I can also select media quality on iOS. My options are
| "compressed way too much" and "compressed too much". I assume
| you have the same options.
|
| I would like to be able to attach images as files and have
| them come though unmodified. It is a general purpose
| communications tool, it should not be editorializing over my
| attachments.
|
| I use Signal to communicate privately with my attorney. Why
| does anyone think tampering with evidence in transit is okay?
|
| Apple also doesn't support open source in the App Store, so I
| can't fix the problem myself.
| wyager wrote:
| How does signal get money to cover costs of running compute-
| intensive services?
| sandstrom wrote:
| They recently added support for in-app donations:
| https://www.theverge.com/2021/12/2/22814934/signal-launches-...
|
| I hope they'll take it a step further and require payment for
| certain functionality (maybe video calls?, or desktop client
| support?).
| keewee7 wrote:
| One of the the WhatsApp founders, Brian Acton, donated $100
| million to them as an unsecured loan due to be repaid in 2068:
|
| https://en.wikipedia.org/wiki/Brian_Acton#Signal
|
| https://en.wikipedia.org/wiki/Signal_(software)#Developers_a...
| sorenjan wrote:
| How long does that last? Telegram uses a few hundred million
| dollars each year, although they are significantly larger.
|
| > As Telegram approaches 500 million active users, many of
| you are asking the question - who is going to pay to support
| this growth? After all, more users mean more expenses for
| traffic and servers. A project of our size needs at least a
| few hundred million dollars per year to keep going.
|
| https://t.me/durov/142
| new_stranger wrote:
| > needs at least a few hundred million dollars per year to
| keep going
|
| I'm pretty sure that is not server cost. This is probably
| the standard approach of companies hiring tons of personal
| and spending tens of thousands or hundreds of thousands on
| ads every single day.
| benlivengood wrote:
| To scale to thousands (is this even useful?) of e2e users build a
| tree of participants who can remix each other's video.
|
| Pick a handy mixing ratio like 4:1 or 9:1 (a square helps, since
| they compose nicely if downscaled to a grid vs. active talker
| stays fullscreen) and nodes with the highest bandwidth and lowest
| latency take M-1 streams and add it to their own to make an M:1
| mix which can be forwarded to a node closer to the root which
| produces another M:1 stream, and the root sends a single mixed
| stream down the tree until every participant has the mix. Max
| bandwidth at each node is M down and M up. Minimal spanning tree
| with max M edges per node recomputed as participants leave and
| join. Build 3 or 4 distinct trees and leave the connections open
| for more rapid switching if intermediate nodes stop
| participating.
| JoeAltmaier wrote:
| Oh this all brings back memories, of Sococo in the 2000's. We
| faced all these problems and had similar solutions to them all.
|
| We even had a rapidly adapting network make-and-break recovery
| layer. You unplug your laptop from a wired connection, switch to
| wireless - we recovered in milliseconds. You heard barely a
| click.
|
| The encryption issue is fun - we had a rotate-key message in-
| band. The receiver loaded new keys and tried them in sequence to
| ease the turnover time - out-of-order packets etc could make it
| ambiguous for a short while which key to use. A cache and aging
| keys out made it work pretty well.
|
| Remixing on user stations proved to be problematic (mentioned
| elsewhere on this thread). You'd think if 6 people at one site
| were conferencing with a dozen elsewhere, you could elect one at
| each site to mix-and-forward. But corporate networks made it hard
| to determine who was 'adjacent' - they were often layered and
| without uPNP (is that what the router protocol is called?) you
| couldn't tell if somebody at the next desk was even in your
| company.
|
| We had up to 100 people in a conference, and our enter-the-
| conference time was on the order of 100ms. Click into an all-
| hands, and be able to hear everybody before you finger left the
| mouse button. It was wonderful.
|
| Sococo today is a sad shadow of that. They went open-source and
| lost all our IP instantly. Just another WebRTC client last I
| knew.
| narush wrote:
| > They went open-source and lost all our IP instantly.
|
| Can you explain what this means? Like - other people copied
| your work?
|
| Genuinely wondering, OSS noob here...
| JoeAltmaier wrote:
| There was little or nothing in WebRTC to match what we'd
| spend 5 years creating. So they were back to 1-5 people in a
| conference, with 1-3 second connect times, and no resilience
| to network changes.
|
| The excuse they gave was "We can't rely on 6 people in Iowa
| for our core IP". So they switched to some open source mix
| node that was the pet project of 2 guys in Italy. Two
| academics, who gave it hardly any attention. And it had zero
| IP; just a collection of APIs stitched together to give you
| the impression of having a mix node.
|
| We said all that at the time. But such was the power of the
| magic words "Open Source" that it all bounced off their
| mental shields.
| BitPirate wrote:
| Are there any plans to add VP9 support?
| kitkat_new wrote:
| Next step: decentralizing encrypted group calls [0]
|
| [0]: https://2021.commcon.xyz/talks/extending-matrix-s-e2ee-
| calls...
___________________________________________________________________
(page generated 2021-12-15 23:00 UTC)