hngopher.com

       [HN Gopher] The Ken Thompson Hack
       ___________________________________________________________________
        
       The Ken Thompson Hack
        
       Author : stephc_int13
       Score  : 108 points
       Date   : 2024-03-30 12:41 UTC (10 hours ago)
        
 (HTM) web link (wiki.c2.com)
 (TXT) w3m dump (wiki.c2.com)
        
       | legobmw99 wrote:
       | Surprisingly, it took until last year for someone to actually ask
       | Ken for the code: https://research.swtch.com/nih
        
         | rolandog wrote:
         | Thanks for the link. I had saved it for later reading, but
         | didn't get around to doing it until now.
         | 
         | It's such a well-written article.
         | 
         | Here's previous HN discussion about it:
         | 
         | https://news.ycombinator.com/item?id=38020792
        
           | dang wrote:
           | Thanks! Macroexpanded:
           | 
           |  _Running the "Reflections on Trusting Trust" Compiler_ -
           | https://news.ycombinator.com/item?id=38020792 - Oct 2023 (67
           | comments)
           | 
           | (posted by rsc himself!)
        
       | stephc_int13 wrote:
       | There is something nightmarish about this kind of exploit, and
       | that is maybe why we've been collectively in denial for such a
       | long time.
       | 
       | How many supply-chain generated backdoors in the wild today?
        
         | kibwen wrote:
         | Supply-chain backdoors are one thing, but the saving grace
         | about the backdoors injected by trusting-trust attacks like in
         | the OP is that they're quite fragile. If the shape of the
         | expected target program changes sufficiently, then the backdoor
         | will fail to propagate. And propagating the backdoor into
         | programs that have yet to be written requires an effectively
         | omnisicient adversary. As long as you have multiple
         | implementations of a toolchain, you can repeatedly use them to
         | compile each other in every possible configuration (and then
         | use the outputs of those compilation steps as inputs to the
         | same process, repeatedly) and then compare that the outputs are
         | bit-for-bit identical until you reach a point where you're
         | assured that either none of the compilers are backdoored or all
         | of them would have to be.
         | 
         | So yes, they're scary, but there are countermeasures.
        
         | legobmw99 wrote:
         | One of my favorite short stories plays on the natural horror of
         | this realization:
         | https://www.teamten.com/lawrence/writings/coding-machines/
        
         | Veserv wrote:
         | I mean, this class of attack is defeated as a matter of course
         | in high reliability applications like aerospace. You just check
         | the compiler output precisely corresponds to the compiler
         | input. You need to do that anyways to prevent miscompilation
         | errors in general (this class of attack is just intentional
         | miscompilation), so it hardly counts as a nightmarish problem,
         | just a annoying one.
        
           | azornathogron wrote:
           | How is this normally done? Manual review of machine code?
        
             | Veserv wrote:
             | I do not work on verification directly, so I do not know
             | the details. I believe at a high level you generally have a
             | automated first pass which generates a source<->machine
             | trace. You then have multiple independent reviews
             | validating the correctness and exhaustiveness of the trace.
             | 
             | Generally in these environments you build with
             | optimizations off unless needed to ease this validation
             | process as, obviously, you must validate the
             | released/deployed version.
        
           | kwhitefoot wrote:
           | > You just check the compiler output precisely corresponds to
           | the compiler input.
           | 
           |  _just_ is doing a lot of work there.
        
             | Veserv wrote:
             | What do you want me to say? That is the work that you must
             | do and that is routinely done. It is annoying, but it is
             | not hard or special. Hardly a nightmarish problem if it
             | gets solved routinely at scale across a entire industry.
        
       | ludocode wrote:
       | The team at bootstrappable.org have been working very hard at
       | creating compilers that can bootstrap from scratch to prevent
       | this kind of attack (the "trusting trust" attack is another name
       | for it.) They've gotten to the point where they can bootstrap in
       | freestanding so they don't need to trust any OS binaries anymore
       | (see builder-hex0.)
       | 
       | I've spent a lot of my spare time the past year or so working on
       | my own attempt at a portable bootstrappable compiler. It's partly
       | to prevent this attack, and also partly so that future
       | archaeologists can easily bootstrap C even if their computer
       | architectures can't run any binaries from the present day.
       | 
       | https://github.com/ludocode/onramp
       | 
       | It's nowhere near done but I'm starting a new job soon so I felt
       | like I needed to publish what I have. It does at least bootstrap
       | from handwritten x86_64 machine code up to a compiler for most of
       | C89, and I'm working on the final stage that will hopefully be
       | able to compile TinyCC and other similar C compilers soon.
        
         | pfortuny wrote:
         | Impressive work and truly necessary! Thanks for sharing.
        
         | toolslive wrote:
         | really nice, and impressive work. However, I'm left wondering
         | if a route via Forth would not have been a lot shorter.
        
         | necheffa wrote:
         | So bootstrap in freestanding does make this kind of attack much
         | more difficult to pull off, but with contemporary hardware, it
         | does not fully prevent the attack.
         | 
         | What if the trojan is in microcode? No amount of bootstrap in
         | freestanding can protect you here.
        
           | ludocode wrote:
           | It is true that there are many layers of code below the OS
           | level. UEFI for example is probably hundreds of thousands of
           | lines of compiled code. Modern processors have Intel IME and
           | equivalent with their own secret firmware. Almost all modern
           | peripherals will have microcontrollers with their own
           | compiled code.
           | 
           | These are all genuine attack vectors but they are not really
           | solvable from the software side. At least for Onramp I
           | consider these problems to be out of scope. It may be
           | possible to solve these with open hardware but a solution
           | will look very different from the kind of software
           | bootstrapping we're doing.
        
         | hinkley wrote:
         | Correct me if I'm wrong, but isn't this recreating a thing that
         | used to exist? I have memories of being told of a compiler
         | older than GCC that could compile itself using... I want to say
         | a bash script. It took forever to run because you had to run
         | the script which of course was slow, and then it output a
         | completely unoptimized compiler. And if memory serves that
         | output didn't have any of the optimization logic in it. So you
         | had to compile it again to get the optimizer passes to be
         | compiled in, then compile it again to get a fast compiler (self
         | optimization).
        
       | markbnj wrote:
       | Clicked on the first Bell Labs link hoping to read the original
       | and ended up on some VPN service's landing page. Sigh. :(
        
         | cratermoon wrote:
         | I preferred when link rot led to a 404 or site not found. I
         | have recently been working with some stuff I wrote ~2004 and so
         | many links are now dead. I've been giving archive.org a real
         | workout.
        
         | bhelyer wrote:
         | A live link:
         | http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thom...
        
       | cratermoon wrote:
       | It looks like the xz hack riffs on this idea in getting the
       | payload into the final binary, and attempting to hide itself.
        
       | raspyberr wrote:
       | >"infinite spinner"
       | 
       | >"This site uses features not available in older browsers."
       | 
       | >Enable Javascript.
       | 
       | Black plaintext on white background with hyperlinks.
       | Revolutionary.
        
         | anthk wrote:
         | https://kidneybone.com/c2/wiki/TheKenThompsonHack
        
       | anthk wrote:
       | Javascript-less link:
       | 
       | https://kidneybone.com/c2/wiki/TheKenThompsonHack
        
       | dwheeler wrote:
       | For more, including information about the "Diverse Double-
       | Compiling" (DDC) countermeasure, see my page:
       | https://dwheeler.com/trusting-trust/
        
         | legobmw99 wrote:
         | Question from reading the abstract: how does one acquire this
         | "second (trusted) compiler" when you are genuinely suspicious
         | you are facing an adversary using an attack like this one?
        
       | nunez wrote:
       | From the ACM paper [0]:
       | 
       | > Acknowledgment. I first read of the possibility of such a
       | Trojan horse in an Air Force critique [4] of the security of an
       | early implementation of Multics. I cannot find a more specific
       | reference to this document. I would appreciate it if anyone who
       | can supply this reference would let me know.
       | 
       | ofc the US Gov was behind this. Incredible.
       | 
       | [0]: https://dl.acm.org/doi/pdf/10.1145/358198.358210
        
         | jimhefferon wrote:
         | I'm not sure it is fair to claim that the Gov is behind the
         | white paper, in that paying for a study is different than a
         | program to develop malicious code.
        
       | kkfx wrote:
       | A small side note: the website look like a classic web website,
       | BUT if you run a privacy attentive WebVM [1] you get at first
       | 
       | [1] the monsters normally known as "browsers", witch happen to be
       | something like `less`, `more`, `most` and so on, not much like a
       | JVM "javascript required to view this site" while perfectly
       | classic looking and modern looking websites works very well
       | without js, if you enable js but keep third party contents
       | restricted you get a "page does not exist"...
       | 
       | A critics just to say "please, if your audience is some tech
       | savvy cohort, try to design things for them.
        
       | ellis0n wrote:
       | What's the potential percentage of malware on 100GB
       | (100,000,000,000 bytes) in your system if a backdoor occupies 100
       | bytes?
        
         | kbenson wrote:
         | 100GB of _what_?
         | 
         | Text documents? Word documents? CSV files? Excel files? Cat
         | videos? Porn videos? Source code from a trusted source? Source
         | code from randos? Games or apps installed from a marketplace?
         | Games or apps installed as downloads directly from random
         | websites?
         | 
         | It's like asking what's the chance you'll die from a fall if
         | you live 80 years. The chance is hard to know, but it's
         | probably more likely if you're a free solo rock climber than if
         | you aren't.
        
           | ellis0n wrote:
           | Sure modern Mac/Windows/Linux system files with browser
           | cache. Code+data files after build. This is enough to never
           | find out about those 100 bytes.
           | 
           | Not to mention hardware backdoors in chips. Have all
           | trillions of transistors and their schematics in each chip
           | version been checked for possible backdoors in the Verilog
           | compiler? One extra connection can provide root access and
           | cost millions of dollars.
        
         | fsflover wrote:
         | Not all data on your PC are equally dangerous. The key is to
         | keep Trusted Computing Base as small as possible. See: Qubes
         | OS.
        
           | ellis0n wrote:
           | QubesOS is amazing! I've been working on QubesOS for several
           | years. When it was just starting out, it was a fairly simple
           | solution, but now it's a big software, and it has just as
           | many problems as any Linux distribution. Complexity is the
           | only parameter by which you can determine the amount of
           | malware. For 0 bytes of useful code, there will be 0
           | backdoors. I also developed ACPU OS for this reason 12 years
           | ago, but later realized that no one would pay for simplicity.
        
       | anthk wrote:
       | GNU Mes tries to avoid this by bootstrapping _everything_.
        
       | acchow wrote:
       | I actually thought his Turing award was for this attack. Turns
       | out it was "just" his acceptance speech!
        
       ___________________________________________________________________
       (page generated 2024-03-30 23:02 UTC)