[HN Gopher] Scuttlebutt Protocol Guide
       ___________________________________________________________________
        
       Scuttlebutt Protocol Guide
        
       Author : tosh
       Score  : 167 points
       Date   : 2021-12-24 10:51 UTC (12 hours ago)
        
 (HTM) web link (ssbc.github.io)
 (TXT) w3m dump (ssbc.github.io)
        
       | formerly_proven wrote:
       | > Before signing a message it must be serialized according to a
       | specific canonical JSON format. This means for any given message
       | there is exactly one way to serialize it as a sequence of bytes,
       | which is necessary for signature verification to work.
       | 
       | Design mistake. Don't sign abstract messages, sign bags of bytes.
       | Doing otherwise means (1) have to parse messages completely, even
       | if their signatures are invalid (2) requires canonical
       | representations, a major PITA and source of bugs (3) is overall
       | uglier to implement.
        
         | Rygian wrote:
         | On the contrary, lack of canonical format means plenty of room
         | for implementations to fiddle with the format and introduce
         | bugs and incompatibilities. The same bag of bytes ends up
         | parsed as two different things on two different systems, which
         | leads to SSRF-like vulnerabilities.
         | 
         | Throw away anything at the first byte that breaks canonical
         | representation, don't bother verifying its signature.
        
           | yencabulator wrote:
           | If you do that, then there's no point in using JSON -- you
           | won't be able to use preexisting JSON parsers or writers
           | anyway.
           | 
           | Meanwhile, the sign-the-bytes crowd can use any preexisting
           | format.
        
           | mlyle wrote:
           | > Throw away anything at the first byte that breaks canonical
           | representation, don't bother verifying its signature.
           | 
           | IMO, the opposite order is better because it means you verify
           | that the message is sent by someone you minimally trust
           | before exposing parsers, etc, to attack, and someone
           | attempting attacks must sign the messages. Here, getting an
           | identity is so easy that it doesn't make much of a
           | difference.
           | 
           | As far as attack surface, the cryptographic implementation is
           | hopefully smaller than a higher level parser. The performance
           | implications are unclear (both parsing and signature
           | verification can be expensive).
        
         | legutierr wrote:
         | > Don't sign abstract messages, sign bags of bytes.
         | 
         | Isn't the relevant question, "what are the bytes that are being
         | signed?" That's what the specification seems to be answering
         | here.
         | 
         | I believe it's implicit that the signature is actually of a
         | hash of the message itself, which is why they need to serialize
         | the message in a deterministic fashion, because otherwise the
         | hash would be different in different circumstances for the same
         | message. If you want to validate the signature, you need to
         | make sure that the hash being signed is of the same exact
         | string that you are verifying.
         | 
         | > requires canonical representations, a major PITA and source
         | of bugs
         | 
         | Yes, absolutely, especially with regards to JSON, which doesn't
         | have a fully deterministic serialization standard. The
         | scuttlebutt specification seems to assume that JSON.stringify()
         | will produce a consistent output, but that is not really the
         | case, per my understanding, at least not with regards to
         | objects.
         | 
         | From the Scuttlebutt specification:
         | 
         | > The canonical format is defined by the ECMA-262 6th Edition
         | section JSON.stringify. For an example, see how the above
         | message is formatted.
         | 
         | JSON object serialization in Javascript depends on the
         | insertion order of the members of an object, so if you somehow
         | change the ordering in which your keys are updated as the
         | object is built, you will get a different string output, even
         | if the data of the underlying object is the same. And this is
         | just within the JavaScript runtime--if you are implementing
         | this using another programming language, you can't just rely on
         | the ECMA standard to determine the ordering of your object
         | members at the time of serialization.
         | 
         | If I recall properly, JSON serialization is ambiguous in the
         | specification in a variety of ways, which results in different
         | implementations outputting subtly different strings, even under
         | ordinary circumstances.
         | 
         | It seems weird that you would write a formal, generalized
         | protocol specification relying on the idiosyncratic
         | implementation details of JavaScript, for such an important
         | thing as cryptographic signatures, as the Scuttlebutt
         | specification seems to do here.
        
           | progval wrote:
           | Matrix does it like this as well. Servers have to serialize
           | an object as JSON (specified as Python code), then add the
           | signature to the object and serialize it again to send it on
           | the network.
           | https://spec.matrix.org/v1.1/appendices/#signing-json
        
           | EGreg wrote:
           | If you need determinism, I recommend avoiding objects and
           | using pairs of arrays, plus they can generalize to matrices.
           | Just use the same index in all.
        
           | sbazerque wrote:
           | Exactly this. You need deterministic serialization, because
           | you need to be sure that when the _same_ object is
           | constructed in different settings, it is going to hash
           | consistently. In Hyper Hyper Space [1], the set of basic
           | types as well as the composition primitives used to construct
           | all data structures have built-in deterministic
           | serialization, just for this reason (e.g. a set will
           | serialize into a deterministically ordered list, etc.)
           | 
           | [1] https://www.hyperhyperspace.org
        
             | formerly_proven wrote:
             | > Exactly this. You need deterministic serialization,
             | because you need to be sure that when the _same_ object is
             | constructed in different settings, it is going to hash
             | consistently.
             | 
             | I can see how this might matter in some specific systems,
             | but when we're talking about signatures only the signer
             | constructs the object. Stuff like the "JWS/CT using JWS and
             | JSON Canonicalization" recommended in a sibling comment is
             | a complete misdesign for virtually all signing use cases.
             | That's why "our signature scheme _requires_ canonical
             | representations " is a red flag.
        
               | sbazerque wrote:
               | But "the signer" here is a cryptographic identity, that
               | may be present in more than one device. So, even when
               | conceptually it is just one entity, in practice it may be
               | several computers doing something independently, and one
               | may need the result to be the same given identical
               | inputs.
        
           | cel wrote:
           | > It seems weird that you would write a formal, generalized
           | protocol specification relying on the idiosyncratic
           | implementation details of JavaScript, for such an important
           | thing as cryptographic signatures, as the Scuttlebutt
           | specification seems to do here.
           | 
           | The Protocol Guide was created after the initial
           | implementation and its protocol were already in wide use, and
           | the quirks were discovered while re-implementing it. More
           | info about implementations here (Node.js, Go, Rust x2, and
           | Python; additionally there are implementations of varying
           | states in Java, C, Haskell, Erlang and probably others):
           | https://dev.scuttlebutt.nz/#/?id=implementations
           | 
           | ---
           | 
           | If making a new protocol using signatures over JSON objects,
           | one might use this: JWS Clear Text JSON Signature Option
           | (JWS/CT) https://datatracker.ietf.org/doc/html/draft-jordan-
           | jws-ct JWS/CT uses JSON Web Signature (JWS) [RFC7515], JSON
           | Canonicalizion Scheme (JCS) [RFC8785], and I-JSON [RFC7493]
           | (subset of JSON for interoperability)
           | 
           | Or for signatures over JSON-LD objects / RDF datasets:
           | https://w3c-ccg.github.io/ld-proofs/ https://json-
           | ld.github.io/rdf-dataset-canonicalization/spec/
        
         | crypt0x wrote:
         | Glad you picked up on it. It's not on the protocol guide yet
         | but there are two or three new formats in discussion which all
         | just sign opaque bytes. Wider adoption pending but the meta
         | feeds is in JS already.
         | 
         | Here is the most recent one and it links to the other two as
         | well: https://github.com/ssb-ngi-pointer/bendy-butt-spec
        
           | crypt0x wrote:
           | ps: I did implement the v8 pretty-printer in go and it was a
           | nightmare.. I'm sure it still has some corner cases that are
           | not covered....
        
       | joshuakelly wrote:
       | SSB has a special place in my heart. I wrote this a year ago on
       | HN and it seems truer than ever:
       | 
       | > There's no global timeline -- just archipelagos. It assumes
       | that network heterogeny is the default, and is transmission layer
       | agnostic. Breakages will occur. Maybe you're living on a
       | catamaran in the South Pacific and you only have connectivity
       | once a month -- SSB will work even then.
       | 
       | > Your own timeline is a sigchain -- a sequenced list of signed
       | messages. You replicate the content in your network (2 hops
       | away). Bridges between communities can be built or burned. Many
       | islands can exist without needing to erase the others from even
       | existing -- mutual separation is possible. Consensus is not
       | necessary. #againstconsensus
       | 
       | > Is global network culture still possible? If it is, in the
       | midst of the national internets we now live inside of, I suspect
       | it will look something like this. A little different from what we
       | were promised, but maybe a little better too.
        
       | southerntofu wrote:
       | Gossip protocols are amazing! I wish there was less of a gap
       | between federated and p2p protocols: SSB has "pubs" which like
       | "centralized" places of gossip (tongues untie after a beer!) but
       | the other way around is really uncommon (federated protocols
       | supporting gossip when your server is unreachable).
       | 
       | The only problem with the gossip-first architecture is that every
       | message needs to be public which is a not a tradeoff everybody's
       | willing to make, if only because you need a lot of storage space
       | for everyone's blabber :)
       | 
       | On a more technical level, SSB protocol is really nice but part
       | of me really hates that signatures are inlined in the JSON
       | message. This means you have to do two levels of
       | (de)serialization every message?! It would be more resource-
       | efficient to use a special prefix/suffix for signatures, like so:
       | \EOFsigtype:signature\EOF         {            ...         }
        
         | soapdog wrote:
         | Pubs are being phased out in favour of rooms[1]. You can
         | totally have SSB work without pubs and rooms, it just takes
         | longer for you to see some messages depending on how far away
         | you are from the gossip. Rooms are much simpler than pubs, they
         | just provide a tunnel for peers to gossip as if they were on
         | LAN. It is more of a convenience than a requirement.
         | 
         | > every message needs to be public
         | 
         | Not at all. SSB has multiple types of private messages. They
         | can only be decrypted by the intended recipients since the key
         | used to sign it is a derivation of a combination of the keys
         | from the sender and the recipients [2]. There is even a new
         | spec for private groups[3].
         | 
         | [1]: https://ssb-ngi-pointer.github.io/rooms2 [2]:
         | https://ssbc.github.io/scuttlebutt-protocol-guide/#private-m...
         | [3]: https://github.com/ssbc/ssb-tribes/
        
           | southerntofu wrote:
           | > Not at all. SSB has multiple types of private messages.
           | 
           | I didn't mean that everything has to be plaintext, but has to
           | be public in order for gossip to work. Or did i miss
           | something? It's fine if metadata aggregation is not a concern
           | of yours, and if you're rather certain your encryption
           | algorithms won't be broken in the next decades.
           | 
           | That rooms2 proposal looks very interesting (didn't read it
           | all), but i'm curious if there's a state-of-the-art review
           | you can link me to. I just don't understand the necessity to
           | develop yet another protocol if the goal is just to break
           | through NAT?
        
             | soapdog wrote:
             | I understand your concerns regarding metadata aggregation.
             | SSB was not designed with those considerations. I think
             | that with those concerns as a constraint you kinda need a
             | different protocol.
             | 
             | I think that rooms2 required a new protocol because of the
             | handshake required for two peers to connect. I'm not an
             | expert on protocol stuff, I write one of the clients but
             | I'm usually working on higher levels than protocol work. I
             | suspect that rooms2 is not just NAT breakage but it has
             | some of the handshake and security involved in it.
             | 
             | If you want to dig deeper, I suggest asking for links on
             | their rooms 2.0 repo. I'm not aware of a review like that,
             | but I'm usually working with other stuff. I know there were
             | some papers published recently, maybe those will appeal to
             | you. :-)
        
           | olah_1 wrote:
           | are there any ideas to introduce rotating keys?
           | 
           | i dont like that the feeds are permanent and if you leak a
           | key, people see the whole history.
        
       | tgsovlerkhgsel wrote:
       | Scuttlebutt sounded like exactly something I've always wanted,
       | but I got a vague feeling of something about the project behind
       | it being weird, disorganized, esoteric - basically a gut feeling
       | making me worried about the protocol and whether it's soundly
       | designed - and most of the documentation at that time seemed to
       | focus on philosophy rather than the intricacies of the protocol.
       | 
       | The 35C3 talk about it
       | (https://media.ccc.de/v/35c3-9635-scuttlebutt) turned me off from
       | it completely.
       | 
       | I'm glad to see a concise, clean technical spec. Does the
       | protocol have meaningful adoption anywhere?
        
       | pabs3 wrote:
       | I enjoyed the interview with Joey Hess about Scuttlebutt:
       | 
       | https://librelounge.org/episodes/episode-14-secure-scuttlebu...
       | 
       | Sounds like they made some interesting choices on the protocol.
        
         | tgsovlerkhgsel wrote:
         | Would you be willing to share a quick summary?
        
       | gardnr wrote:
       | I just updated the dependencies on this project. It helps find
       | vanity keys that start with a particular string. E.g.
       | @gardner1/lu4h
       | 
       | https://github.com/gardner/vanityssb
        
       | armchairhacker wrote:
       | Scuttlebutt is a very interesting protocol, but I wonder how much
       | benefits you actually get from decentralization. You could
       | implement something similar with a central server and clients
       | sending their locations. You could even implement E2E encryption
       | for privacy.
       | 
       | Lots of people argue for decentralized protocols (e.g.
       | blockchain), but in practice centralization is usually fine. Just
       | because you have a trusted central server, it doesn't have to be
       | expensive and anti-privacy like Google or Facebook.
        
         | soapdog wrote:
         | > I wonder how much benefits you actually get from
         | decentralization.
         | 
         | I'm active in the SSB community. These are just some anecdotes
         | that might amuse you and maybe help you glimpse how I see the
         | protocol and ecosystem.
         | 
         | I was on a transatlantic flight from Brazil to Paris. I had SSB
         | on my computer and that gave me access to multiple years worth
         | of messages by my friends and their friends. I spent time
         | learning all sorts of cool stuff, and reading amazing convos. I
         | could reply, like, interact with all of it, even without a
         | network connection. Once I landed and my computer found a
         | connection again, it gossiped all my changes.
         | 
         | The Internet was not working well at a conference. All the
         | decentralization workshops and talks were suffering because of
         | it. The local venue firewall was preventing them from reaching
         | their DHTs and known services. We didn't noticed any of it as
         | our machines were gossiping locally. Our workshop just moved
         | on.
         | 
         | Wanted to onboard a friend on SSB. We were at a beach house
         | without decent internet. I was using macOS, he was using Linux.
         | Some other people were using SSB as storage for NPM and Git
         | artefacts. I managed to get the source from a client from the
         | SSB feed, copied over to him, and onboarded him by doing git
         | clone, npm install, all from SSB.
         | 
         | A machine of mine had broken down -- my fault really, I learned
         | the hard way that renaming the single admin user on Windows is
         | not an easy task -- and I ended up having to reinstall
         | everything. I reinstalled the SSB client, and copied over my
         | keys. It restored all my data and feed by asking my friends for
         | it.
         | 
         | > in practice centralization is usually fine
         | 
         | For many cases, yes. Centralization doesn't mean expensive or
         | anti-privacy. I agree with you there. I don't think it is a
         | binary situation. In many cases, centralization is the way to
         | go.
         | 
         | I do enjoy decentralization though, especially when it doesn't
         | rely on blockchains and cryptocurrency. I want my
         | decentralization without financial incentives and cheap to
         | compute.
        
           | folex wrote:
           | how to use NPM through SSB? is there a doc or guide?
        
           | honungsburk wrote:
           | Often centralization is chosen specifically because it is
           | easier to monetize. I love FOSS just as much as the next guy
           | but it has it downsides too... Just look at the log4j bug and
           | all the crap those developers get for work they've done on
           | their free time. Sorry, but I get triggered any time anyone
           | says they want software without paying for it.
        
             | mlyle wrote:
             | Some things are products. Some things are bits of
             | infrastructure and agreed standards that products can run
             | on top of, that can't necessarily be directly monetized and
             | may be relatively open. The world is better for having
             | both.
             | 
             | Something like a gossip protocol is definitely more like
             | one of those infrastructure pieces in today's world.
        
           | joek1301 wrote:
           | do you have any advice for "bootstrapping" your SSB
           | experience as a new user? The original post inspired me to go
           | download Patchwork and join a public pub, but I'm having
           | difficulty finding real interesting conversations to join in
           | on.
        
       | zackmorris wrote:
       | Anyone know why they used UTF-16 for the message type, instead of
       | UTF-8?
       | 
       | Also, I'm skeptical that two peers can exchange keys without a
       | third party, without being susceptible to man-in-the-middle
       | attacks. But maybe the paper linked in the article proves that
       | it's possible?
       | 
       | https://dominictarr.github.io/secret-handshake-paper/shs.pdf
       | 
       | When I was playing with p2p 20 years ago, I found that NAT
       | traversal was far harder than the messaging protocol (I just
       | copied a few commands from IRC at the time). So hard in fact,
       | that I failed to solve it after 2 years of effort. I realized
       | later that NAT is part of a cluster of problems with networking,
       | the main one being that TCP should have been a layer above UDP
       | (not beside it) and that connectedness was never really a good
       | concept (it should have used identifiers like this instead of IP
       | addresses so the stream survives changing LAN/Wi-Fi/mobile
       | networks). Unfortunately, the article doesn't talk about UDP much
       | other than for broadcasting peer identifiers on the LAN. So I'm
       | not sure how much use this would have in the real world for stuff
       | like game networking, much less state transfer with something
       | like a software transactional memory (STM). I wonder if anyone's
       | made an STM that runs on a readonly transaction log like this,
       | kind of like CouchDB, RethinkDB or Firebase maybe? But I digress.
       | 
       | And I might have used the empty string "" to represent a null
       | hash.. wait I misread that, they use actual null to represent the
       | null pointer in the linked list of message hash addresses, which
       | is great!
       | 
       | Could this be used to build a web of trust? Or is it meant to be
       | more transient, like maybe people broadcast on throwaway
       | identities? Could we drop PGP into this?
       | 
       | Maybe this is more like an RSS feed than something realtime like
       | WebRTC?
       | 
       | Other than that, this seems like a pretty decent protocol, these
       | are just some thoughts/concerns that stood out for me.
        
         | cel wrote:
         | > Anyone know why they used UTF-16 for the message type,
         | instead of UTF-8?
         | 
         | It is an artefact of the original implementation, which was not
         | discovered until the protocol was already in wide use and being
         | independently implemented [1].
         | 
         | > Also, I'm skeptical that two peers can exchange keys without
         | a third party, without being susceptible to man-in-the-middle
         | attacks. But maybe the paper linked in the article proves that
         | it's possible?
         | 
         | SSB uses the Secret Handshake (SHS) protocol in that paper
         | (with some errata [2]). SHS is a authenticated key exchange
         | (key agreement) protocol [3]. The two peers authenticate
         | eachother to their respective public key, and establish a
         | shared secret that is used to bulk-encrypt the rest of the
         | session/connection. With SHS, the client (the peer that
         | initiates the connection) must know the server's public key
         | ahead of time. Both parties must know and previously agree on
         | an additional network capability key (that is usually hard-
         | coded to a specific value in the SSB implementations).
         | 
         | It should be immune to MitM if the party's private keys are
         | kept secret. There are ephemeral keypairs involved, so if a
         | later compromise occurs of the long-term (identity) private
         | keys, that should not reveal previous/existing sessions. SHS
         | has been verified using Tamarin [4].
         | 
         | > I wonder if anyone's made an STM that runs on a readonly
         | transaction log like this, kind of like CouchDB, RethinkDB or
         | Firebase maybe?
         | 
         | I'm not familiar with STM as such, but I am familiar with
         | CouchDB. There are various ways of mutable data structures on
         | SSB. Typically messages are indexed, in general ways (e.g.
         | message type, author, backlinks) and/or application-specific
         | ways. Applications query the indexes to construct some result.
         | Graph processing is often done to handle concurrent operations
         | by different feeds (i.e. using CRDTs).
         | 
         | Here is a document describing threads, a common data structure
         | on SSB: https://hackmd.io/GQ8aTw6STpuSFu6oH5Z63w
         | 
         | > Could this be used to build a web of trust? Or is it meant to
         | be more transient, like maybe people broadcast on throwaway
         | identities? Could we drop PGP into this?
         | 
         | Yes, the main SSB network constitutes a web of trust. People do
         | create throwaway identities though, just trying it out and then
         | not returning. But some persist, and people develop and express
         | relationships. Some people share other public keys for PGP,
         | OMEMO, Briar, RetroShare, Dat/Hyper, etc. PGP-signed messages
         | have been published occasionally. Creating a temporary identity
         | is not recommended for broadcasting, because it will not have
         | much reach: message distribution and visibility depends on the
         | social graph.
         | 
         | > Maybe this is more like an RSS feed than something realtime
         | like WebRTC?
         | 
         | Yes. A SSB feed is identified by its public key, and contains
         | an append-only list of messages. Each message is identified by
         | its content hash. However, the RPC protocol used for SSB could
         | be extended to support ephemeral content.
         | 
         | WebRTC DataChannels could be used for gossip connections. But
         | there is still the problem of needing to exchange message to
         | establish the WebRTC connection. Historically SSB has addressed
         | P2P network architecture using Pubs [5], more recently with
         | Rooms [6].
         | 
         | [1] https://news.ycombinator.com/item?id=29675263
         | 
         | [2] https://github.com/auditdrivencrypto/secret-
         | handshake/issues...
         | 
         | [3] https://en.wikipedia.org/wiki/Authenticated_Key_Exchange
         | 
         | [4] https://github.com/keks/tamarin-shs
         | 
         | [5] https://ssbc.github.io/scuttlebutt-protocol-guide/#pubs
         | 
         | [6] https://ssb-ngi-pointer.github.io/rooms2/
        
       ___________________________________________________________________
       (page generated 2021-12-24 23:01 UTC)