[HN Gopher] Best of show - abuse of libc
       ___________________________________________________________________
        
       Best of show - abuse of libc
        
       Author : mooreds
       Score  : 351 points
       Date   : 2021-01-08 20:52 UTC (1 days ago)
        
 (HTM) web link (www.ioccc.org)
 (TXT) w3m dump (www.ioccc.org)
        
       | rramadass wrote:
       | The printf format string is actually a little language. In the
       | book _The Practice of Programming_ , Kernighan and Pike show how
       | you can devise a similar format string to pack/unpack network
       | packets.
        
       | lxe wrote:
       | How did printf end up here in the first place? Decades of feature
       | additions, or were these features a part of an early spec?
        
         | wahern wrote:
         | %n was defined in C89, the first C standard:
         | http://port70.net/~nsz/c/c89/c89-draft.html#4.9.6.1
         | 
         | Looking at old source code, the earliest implementation I found
         | is 4.3BSD Tahoe (1988). See https://www.tuhs.org/cgi-
         | bin/utree.pl?file=4.3BSD-Tahoe/usr/... Second oldest I found
         | was Tenth Edition [Research] Unix (1989). See ocvt_n at
         | https://www.tuhs.org/cgi-bin/utree.pl?file=V10/libc/stdio/vf...
         | I couldn't find support in earlier implementations archived on
         | that site.
        
           | Narishma wrote:
           | I'm pretty sure the compilers from Microsoft and Borland
           | supported %n earlier than that. The earliest one I have easy
           | access to that supports it is Microsoft C 4.0 from 1986.
        
             | gambiting wrote:
             | Does anyone know why it was introduced in the first place?
             | I mean.....the return value of printf gives you the exact
             | same information, no? Why give printf the ability to write
             | anything in the first place?
        
               | gambiting wrote:
               | Ah, I found out why - %n prints out the number of
               | characters printed up to the point where the %n is.
               | Printf returns the total number of characters printed.
        
           | segfaultbuserr wrote:
           | > _Looking at old source code, the earliest implementation I
           | found is 4.3BSD Tahoe (1988)._
           | 
           | You are the HN historian of the day.
        
             | wahern wrote:
             | Another interesting factoid is that macOS only supports %n
             | if the format string is located in read-only memory. Per
             | printf(3) on macOS:
             | 
             | > For this reason, a format argument containing %n is
             | assumed to be untrustworthy if located in writable memory
             | (i.e. memory with protection PROT_WRITE; see mprotect(2))
             | and any attempt to use such an argument is fatal.
             | Practically, this means that %n is permitted in literal
             | format strings but disallowed in format strings located in
             | normal stack- or heap-allocated memory.
             | 
             | The manual page seems correct:                 % cat test.c
             | #include <stdio.h>       int main(void) {
             | printf((char[]){ "%n" }, &(int){ 0 });         return 0;
             | }       % cc -o test test.c
             | % ./test
             | zsh: abort      ./test
        
               | saagarjha wrote:
               | Someone should inform The Open Group about this violation
               | of POSIX ;)
               | 
               | Another fun fact: glibc does this too, if you compile
               | with -D_FORTIFY_SOURCE=2. However, since Linux lacks the
               | nice vm_region APIs the code opens up /proc/self/maps :/
        
               | astrange wrote:
               | dyld on Darwin has an API to ask if any pointer is to a
               | read-only section of a binary. It's useful because you
               | can e.g. skip strcpys and other allocations.
        
               | saagarjha wrote:
               | Hmm, can you tell me more? I can't think of any situation
               | where skipping on a strcpy is legal, since you provide
               | the second buffer and so the copy must occur. And I know
               | that there is heavy uniquing going on for things like
               | selectors and CFStrings at compile time, but where is the
               | dyld API being used at runtime?
        
               | astrange wrote:
               | Oh, I meant strdup. Look for stdupIfMutable() calls in
               | libobjc.
        
               | pjmlp wrote:
               | That is the beauty of POSIX, write once, debug
               | everywhere, fix with plenty of spaghetti #ifdefs.
        
         | segfaultbuserr wrote:
         | Fun fact, on glibc, an extension feature is that you can define
         | your own custom conversion specifiers for printf().
        
       | olliej wrote:
       | %n has frequently been used as an attack vector - generally in
       | the context of the other poor practice of printf(<attacker
       | controlled string>, ...)
       | 
       | It's actually intentionally disallowed in some libc
       | implementations.
        
       | blue-dragonfly wrote:
       | A winner from 1993 is very interesting too:
       | 
       | https://www.ioccc.org/years.html#1993_dgibson
       | 
       | It implements Conway's Game of Life by creating a DSL using the C
       | preprocessor and printf. The output is a program (several initial
       | boards are supplied to bootstrap) which is the input program to
       | be compiled and run to create the next generation. This is the
       | program for a second generation:                   LIFE
       | L _ _ _ _ _         L _ _ O _ _         L _ _ _ O _         L _ O
       | O O _         L _ _ _ _ _              GEN 2 STAT 328960
       | END
       | 
       | Each symbol like "LIFE" is a macro, the board is the program.
        
         | [deleted]
        
       | yakubin wrote:
       | I'm writing my first C99 compiler. IOCCC sounds like a great
       | source of test material.
        
       | acekingspade wrote:
       | It's hard to believe that this is the same person with multiple
       | widely-cited ML papers[0]. It's jaw-dropping how talented someone
       | can be.
       | 
       | https://scholar.google.com/citations?user=q4qDvAoAAAAJ&hl=en...
        
         | onurgu wrote:
         | Thank you for the pointer, it is really amazing.
        
       | badsectoracula wrote:
       | Up next: a C compiler that compiles to printf statements :-P
        
         | hahajk wrote:
         | https://github.com/HexHive/printbf
         | 
         | well this is a brainfuck interpreter inside printf. I'm pretty
         | sure there are plenty of c-to-bf transpilers.
        
           | felixr wrote:
           | This is by the same author as the ioccc entry and also one of
           | authors of the paper showing the turing completeness of
           | printf http://nebelwelt.net/publications/#15SEC
        
             | lifthrasiir wrote:
             | Ah, that explains everything. I have already seen this
             | technique before and wondered why this entry _has_ to be
             | the best of show---I don 't doubt it is worth the prize,
             | just that it didn't sound very novel. But it all makes
             | sense if the technique is not well known and authors tried
             | to revitalize that.
        
           | klyrs wrote:
           | That's fun, but esoteric languages in general and brainfuck
           | in specific tend to lack things you'd want out of c: file
           | system access, system calls, etc.
        
             | archi42 wrote:
             | Hm, I think you could add numeric syscalls, similar to what
             | happens at the asm level. E.g. put the syscall id and some
             | parameters on the "stack", then let the interpreter run the
             | syscall with a new "instruction" e.g. '!'. This could even
             | substitute '.' (putchar) and ',' (getchar), since these are
             | very much just syscalls. So that would reduce the number of
             | instructions by one (to 7).
             | 
             | Oh, getting to 6 would also be fun: One might replace '['
             | and ']' with a conditional branch '?'. It just needs two
             | parameters: condition and (signed) number of instructions
             | to jump. Adds the bonus (much like normal asm) to write
             | moch more ~~horribly abusive~~ flexible control flow than a
             | structured "while(*ptr)".
        
               | enedil wrote:
               | It's already implemented:
               | https://github.com/ajyoon/systemf There is even an HTTP
               | server built with it.
        
               | archi42 wrote:
               | Why am I not even surprised...? I thought about writing a
               | sentence about how (relatively) easy it would be to build
               | a verified compiler (think CompCert-for-brainfuck); I'd
               | guess the outcome is one of (a) "someone already did that
               | as well, here is the link" or (b) "I spent the weekend
               | with that, here is the project on github". The Internet
               | is awesome, as are people :)
        
               | jcande wrote:
               | That was my approach for an analogous program that uses
               | memcpy instead of printf. I didn't go with the jit-style
               | you describe, however. If you're curious here's how I
               | setup the syscalls https://github.com/jcande/xenocryst/bl
               | ob/master/src/gadgets.... and here's the main loop https:
               | //github.com/jcande/xenocryst/blob/master/src/exec.c#L...
        
         | lathiat wrote:
         | There is somewhere a compiler that outputs to all sorts of
         | crazy languages including awk, sed, printf, etc.. but I can't
         | find it right now. Hopefully someone knows what I'm talking
         | about.
         | 
         | I feel like it did LLVM IR to a bunch of languages or something
         | like that.. but my memory is faulty.
        
           | lifthrasiir wrote:
           | You are looking for ELVM: https://github.com/shinh/elvm/ (I
           | have seen many others, but in terms of activity it seems the
           | most maintained one.)
        
             | lathiat wrote:
             | Thankyou, that is the one! :) I love projects like that.
        
       | dang wrote:
       | General thread here:
       | https://news.ycombinator.com/item?id=25651942
        
         | aftbit wrote:
         | Personally I find it amusing that the 0-signal comment "Thanks
         | for all dang" is upvoted while the opposite 0-signal comment
         | "Thanks for nothing dang" is downvoted. I mean, I think dang is
         | chill, but neither of these really contributes to the
         | discussion any more than the other, so shouldn't they have the
         | same score? Upvotes really are a popularity contest these days.
        
           | dang wrote:
           | This was addressed by pg over a decade ago:
           | https://news.ycombinator.com/newswelcome.html
           | 
           |  _Empty comments can be ok if they 're positive. There's
           | nothing wrong with submitting a comment saying just "Thanks."
           | What we especially discourage are comments that are empty and
           | negative--comments that are mere name-calling._
           | 
           | If you think in terms of what's good/bad for community it may
           | make more sense.
           | 
           | (I hope it's clear this applies whether or not the mods were
           | mentioned in either a positive or negative way.)
        
           | michaelcampbell wrote:
           | Always have been.
        
           | thedufer wrote:
           | The negative comment is from a hellbanned account (which is,
           | frankly, unsurprising). It's not even possible to downvote
           | it, as far as I can tell, due to being DOA.
        
           | saagarjha wrote:
           | It's not clear what the votes are. However, the latter was
           | written by someone who has banned for what appears to be
           | their habit of leaving low-value comments.
        
         | navaati wrote:
         | Thanks for all, dang !
        
         | [deleted]
        
         | SftwreEngnr wrote:
         | Thanks for nothing, dang!
        
       | kderbyma wrote:
       | awesome. I didn't know about that printf hack....time for some
       | fun experiments
        
         | saagarjha wrote:
         | Be careful, though, you don't want anyone to hack you through
         | printf ;)
        
           | segfaultbuserr wrote:
           | In the late 90s, looking for "printf(string)" [0] in the code
           | was a great way to discover remote code execution 0days ;-)
           | 
           | [0] should be "printf("%s", string)".
        
             | stevekemp wrote:
             | Very much so, it took a long time for this to become
             | obvious as a security problem.
             | 
             | My memory wuftpd was the first big program to suffer from
             | this class of attacks.
        
               | segfaultbuserr wrote:
               | It reminds me of a talk by infosec researcher "The Grugq"
               | about opsec techniques used by blackhat hackers. Its
               | subtitle was _because jail is only for wuftpd_ , I
               | couldn't stop laughing at it.
        
       | Groxx wrote:
       | > _Format specifiers can take extra "arguments". - "%hhn": store
       | the number of bytes written mod 256 to the char pointer ..._
       | 
       | Oh boy. I'll put that down for my "thing I don't think I wanted
       | to know" of the day.
        
         | londons_explore wrote:
         | Are there any scanners out there that will detect user input
         | ending up as the format string of a printf?
         | 
         | Perhaps a scanner than I can run against all of github, and
         | then rank results by the number of times that code is exposed
         | on a high value server connected to the internet...?
        
           | Someone wrote:
           | Any modern C compiler will already warn you if the format
           | string isn't a string literal
           | (https://stackoverflow.com/questions/32362918/error-format-
           | st...)
           | 
           | I don't think it's worth the effort to extend that to look
           | for tainted strings, not because it wouldn't be useful, but
           | because it would be hard to do (as an extreme example: is
           | data read from a file user input? It could be a file
           | containing internationalization info)
           | 
           | The (relatively) few programs that construct format strings
           | on the fly will have to add pragmas to disable these
           | warnings.
        
         | rightbyte wrote:
         | There is some innocent beauty in the twistedness of printf -
         | especially with GNU extensions.
        
           | flatiron wrote:
           | So I can port doom to gnu printf?
        
             | [deleted]
        
             | npongratz wrote:
             | GNU's printf is Turing complete[0]... so "yes."
             | 
             | [0] Mentioned (but not directly linked) by TFA:
             | 
             | https://www.usenix.org/conference/usenixsecurity15/technica
             | l...
        
               | StavrosK wrote:
               | I've always wanted a Turing-complete printing function.
        
               | josefx wrote:
               | We already have post script for that.
        
             | anthk wrote:
             | Port a zmachine first ;).
        
             | bluGill wrote:
             | I'd start by creating a llvm backend for printf. Should be
             | fun.
        
           | tomjakubowski wrote:
           | GNU's printf specifier language is Turing complete, I
           | believe.
        
             | FartyMcFarter wrote:
             | Presumably it needs a loop around it, so it's not Turing-
             | complete by itself?
        
               | npongratz wrote:
               | No need to use a loop around it, printf can take care of
               | that pesky detail for you! To quote [0] (my emphasis):
               | 
               | > To achieve full Turing-complete computation, we need a
               | way to loop a format string. This is possible by
               | overwriting the pointer inside printf() that tracks which
               | character in the format string is currently being
               | executed. The attacker is unlucky in that at the time the
               | "%n" format specifier is used, this value is saved in a
               | register on our 64-bit system. However, we identify one
               | point in time in which the attacker can always mount the
               | attack. The printf() function makes calls to puts() for
               | the static components of the string. When this function
               | call is made, all registers are saved to the stack. _It
               | turns out that an attacker can overwrite this pointer
               | from within the puts() function. By doing this, the
               | format string can be looped_.
               | 
               | > An attacker can cause puts() to overwrite the desired
               | pointer. Prior to printf() calling puts(), the attacker
               | uses "%n" format specifiers to overwrite the stdout FILE
               | object so that the temporary buffer is placed directly on
               | top of the stack where the index pointer will be saved.
               | Then, we print the eight bytes corresponding to the new
               | value we want the pointer to have. Finally, we use more
               | "%n" format specifiers to move the buffer back to some
               | other location so that more unintended data will not be
               | overwritten.
               | 
               | [0] https://www.usenix.org/system/files/conference/usenix
               | securit..., Appendix B "Printf is Turing-complete".
        
               | saagarjha wrote:
               | Or, you know, you can just use printf to overwrite the
               | return address and ROP your way to a shell.
        
               | xyproto wrote:
               | Herein lies madness
        
               | moonchild wrote:
               | Not portable, but if you can get the address of the stack
               | you can force printf to overwrite the return address,
               | obviating the loop.
        
               | shakna wrote:
               | It does not need the loop. It might be easier to
               | understand by looking at something like this. [0] That
               | printf allows for this kind of Turing complete control
               | flow is well known [1].
               | 
               | [0] https://github.com/HexHive/printbf
               | 
               | [1] http://nebelwelt.net/publications/files/15SEC.pdf
        
               | moonchild wrote:
               | That code is wrapped in a loop - https://github.com/HexHi
               | ve/printbf/blob/master/src/pbf_pre.c...
        
               | fulafel wrote:
               | The implementation is accidentalyl turing-complete
               | because you can exploit it to get arbitrary memory
               | writes. But the language as specified is not Turing-
               | complete.
        
             | alisonkisk wrote:
             | Theres a great example of what you can do with this,
             | submitted and discussed on HN here:
             | https://news.ycombinator.com/item?id=25690319
        
               | notretarded wrote:
               | > _Format specifiers can take extra "arguments". -
               | "%hhn": store the number of bytes written mod 256 to the
               | char pointer ..._
               | 
               | Oh boy. I'll put that down for my "thing I don't think I
               | wanted to know" of the day.
        
               | snerp wrote:
               | There is some innocent beauty in the twistedness of
               | printf - especially with GNU extensions.
        
               | SahAssar wrote:
               | GNU's printf specifier language is Turing complete, I
               | believe.
        
               | gu5 wrote:
               | What is up with this thread? These comments are
               | duplicated from the top thread...
        
               | exikyut wrote:
               | (The usernames are different)
        
               | dang wrote:
               | Is it possible that it's a joke based on the material of
               | the OP? Maybe we're trapped in some sort of International
               | Obfuscated Internet Thread Contest...
               | 
               | Edit: ok, I'm guessing it's a joke about Turing
               | completeness. Loops, you know.
        
               | gu5 wrote:
               | Ohhhh... Now I get it.
        
               | saagarjha wrote:
               | It's a joke about the recursion introduced here:
               | https://news.ycombinator.com/item?id=25691615.
        
               | Bootvis wrote:
               | Well executed jokes on HN. 2021 already is a crazy year.
        
               | dfcowell wrote:
               | So I can port doom to gnu printf?
        
               | a1369209993 wrote:
               | GNU's printf is Turing complete[0]... so "yes."
               | 
               | [0] Mentioned (but not directly linked) by TFA:
               | 
               | https://www.usenix.org/conference/usenixsecurity15/techni
               | cal...
        
               | kortilla wrote:
               | GNU's printf specifier language is Turing complete, I
               | believe.
        
               | [deleted]
        
               | [deleted]
        
       | saagarjha wrote:
       | For those interested in more Turing complete format strings, look
       | no further than the "sprint" challenge from this year's Google
       | CTF Quals: https://ctftime.org/task/12834. It's sprintf in a loop
       | this time and the program simulates a maze:
       | https://github.com/google/google-ctf/tree/master/2020/quals/...
        
         | enedil wrote:
         | The author works at Google, so I suspect he's the same who
         | created this challenge. Really enjoyable, although I didn't
         | manage to solve it during contest.
        
       ___________________________________________________________________
       (page generated 2021-01-09 23:02 UTC)