[HN Gopher] Adventures in Fuzzing Matrix's Encryption
       ___________________________________________________________________
        
       Adventures in Fuzzing Matrix's Encryption
        
       Author : Arathorn
       Score  : 64 points
       Date   : 2021-06-14 18:00 UTC (5 hours ago)
        
 (HTM) web link (matrix.org)
 (TXT) w3m dump (matrix.org)
        
       | sohei wrote:
       | Is there a Matrix design document at a similar level of
       | abstraction to [1,2,3]?
       | 
       | I want to use it, but the client (Riot? Element?) scares me with
       | all its features, and I'm not sure I have a working mental model
       | about what information is shared with whom and under what
       | circumstances.
       | 
       | Is there a "blessed" client that doesn't run in the browser?
       | Written in Python or Java?
       | 
       | [1] https://www.chromium.org/chromium-os/chromiumos-design-
       | docs/... [2] https://www.chromium.org/chromium-os/chromiumos-
       | design-docs/... [3] https://www.chromium.org/chromium-
       | os/chromiumos-design-docs/...
        
         | ognarb wrote:
         | There is a spec[1] as well as many third parties clients[2]
         | 
         | [1]: https://spec.matrix.org/unstable/
         | 
         | [2]: https://matrix.org/clients/
        
           | feanaro wrote:
           | For a nicely documented and clean Matrix client
           | implementation in Python, see weechat-matrix[1] and the
           | matrix-nio[2] library it's based on. There's also Mirage[3]
           | which is also based on matrix-nio and is a GUI client.
           | 
           | [1]: https://github.com/poljar/weechat-matrix
           | 
           | [2]: https://github.com/poljar/matrix-nio
           | 
           | [3]: https://github.com/mirukana/mirage
        
       | sharedfrog wrote:
       | > Classic fuzzers famously have a hard time circumventing magic
       | values and checksums, and cryptography is full of these. This is
       | further complicated by the fact that the double ratchet algorithm
       | is very stateful and depends on the two ratchets evolving in
       | lockstep.
       | 
       | Can the first issue be solved by scraping all the magic values
       | from the codebase and putting them in the fuzzer's dictionary
       | file? I wonder if this could even be automated when doing white-
       | box fuzzing -- have the fuzzer scan the code when placing
       | instrumentation and extract every "interesting" constant from
       | e.g. `if` checks.
       | 
       | Regarding the second issue and the double ratchet specifically,
       | are there even ideas on how it could be approached for feasible
       | fuzzing?
        
         | infogulch wrote:
         | Perhaps the solution is to extend the idea of the fuzzer
         | dictionary to include not only byte string constants but also
         | _functions_. Throw in magic values, yes, but also sha-3 and the
         | main primitive for the ratchet algorithm, etc.
         | 
         | That way the fuzzer doesn't have to reinvent hashing and
         | encryption algorithms from scratch, which I expect would be as
         | impossible as generating the preimage of a hash or outright
         | breaking the encryption. Maybe those should be fuzzed too, but
         | let's not swallow the whole cow at once when we just want to
         | fuzz one protocol.
        
         | TonyTrapp wrote:
         | AFL++ already does that (it scans occurrences of strcmp,
         | memcmp, etc. for magic values). The problem here might rather
         | be that those magic constants are not expected to show up in
         | the bit stream being read, but rather being the result of a
         | calculation. If the result of a calculation must be 42 for a
         | function to continue its execution, it's not so useful to have
         | 42 in the fuzzer dictionary. If 42 is expected to be read from
         | the file instead, it's easier and a dictionary will help.
        
         | dkasak wrote:
         | Author here. As one of the other comments mentions, afl++ (and
         | to some extent vanilla afl) already has capability to
         | automatically scrape magic values from arguments to special
         | functions like `strcmp` and the like. The older technique is
         | called libtokencap (https://github.com/AFLplusplus/AFLplusplus/
         | blob/stable/utils...), but afl++ also has a newer feature
         | called AUTODICT (https://github.com/AFLplusplus/AFLplusplus/blo
         | b/stable/instr...).
         | 
         | But this only solves the problem of magic _constants_ expected
         | in the input. If the check depends on dynamic properties of the
         | input or happens deeper in the code after the input 's already
         | been through some transformations, it can't be solved like
         | this. There are other techniques to help with this, though. One
         | of the earlier attempts to solve such types of more complex
         | checks is called laf-intel (https://github.com/AFLplusplus/AFLp
         | lusplus/blob/stable/instr...) and boils down to transforming a
         | more complex check into a nested series of simpler checks. This
         | makes it more probable that the fuzzer's random mutation will
         | be able to solve the outer check and hence hit new coverage,
         | enabling the fuzzer to detect the mutation as productive.
         | 
         | afl++ has a more modern variant of this called CmpLog (https://
         | github.com/AFLplusplus/AFLplusplus/blob/stable/instr...) which
         | is based on the RedQueen technique. The paper for RedQueen is a
         | really interesting read: https://www.syssec.ruhr-uni-
         | bochum.de/media/emma/veroeffentl...
         | 
         | The problem of checksums is at times also solved by simply
         | modifying the binary so that the checksum is neutralized and
         | always succeeds, especially if you have access to source code.
         | 
         | As for the problem of fuzzing stateful things like the double
         | ratchet, one way of tackling the problem is to think of the
         | input to the fuzzer as not only the raw bytes that you'll be
         | passing to the program you're fuzzing, but as a blueprint
         | specifying which high-level operations you'll be performing on
         | the input. Then you teach your fuzzer to be smarter and be able
         | to perform a bunch of those operations.
         | 
         | So, let's say you take 512 bytes as the input to the fuzzer.
         | You treat the first 256 bytes as the message to decode and the
         | latter 256 bytes as the high-level cryptographic operations to
         | perform on this message, each byte specifying one of those
         | operations. So you could say a byte of value 1 represents the
         | operation "ENCRYPT WITH KEY K1", 2 represents "ENCRYPT WITH KEY
         | K2", 3 represents "DECRYPT WITH KEY K1", 4 represents "DECRYPT
         | WITH KEY K2", 5 represents "PERFORM SHA2" and so on. Now you
         | can feasibly end up with a sequence which will take a message
         | encrypted with key K1, decrypt it, modify the message, then re-
         | encrypt with key K2. Or, in the case of the double ratchet
         | algorithm, have it perform multiple successive encryption steps
         | to evolve the state of the ratchet and be able to fuzz more
         | deeply.
         | 
         | Of course, the encoding needs to be rather dense for this to
         | work well so that ideally each low-level bit mutation the
         | fuzzer does on an input still encodes a valid sequence of valid
         | high-level operations.
        
       ___________________________________________________________________
       (page generated 2021-06-14 23:00 UTC)