[HN Gopher] Best of show - abuse of libc
___________________________________________________________________
Best of show - abuse of libc
Author : mooreds
Score : 351 points
Date : 2021-01-08 20:52 UTC (1 days ago)
(HTM) web link (www.ioccc.org)
(TXT) w3m dump (www.ioccc.org)
| rramadass wrote:
| The printf format string is actually a little language. In the
| book _The Practice of Programming_ , Kernighan and Pike show how
| you can devise a similar format string to pack/unpack network
| packets.
| lxe wrote:
| How did printf end up here in the first place? Decades of feature
| additions, or were these features a part of an early spec?
| wahern wrote:
| %n was defined in C89, the first C standard:
| http://port70.net/~nsz/c/c89/c89-draft.html#4.9.6.1
|
| Looking at old source code, the earliest implementation I found
| is 4.3BSD Tahoe (1988). See https://www.tuhs.org/cgi-
| bin/utree.pl?file=4.3BSD-Tahoe/usr/... Second oldest I found
| was Tenth Edition [Research] Unix (1989). See ocvt_n at
| https://www.tuhs.org/cgi-bin/utree.pl?file=V10/libc/stdio/vf...
| I couldn't find support in earlier implementations archived on
| that site.
| Narishma wrote:
| I'm pretty sure the compilers from Microsoft and Borland
| supported %n earlier than that. The earliest one I have easy
| access to that supports it is Microsoft C 4.0 from 1986.
| gambiting wrote:
| Does anyone know why it was introduced in the first place?
| I mean.....the return value of printf gives you the exact
| same information, no? Why give printf the ability to write
| anything in the first place?
| gambiting wrote:
| Ah, I found out why - %n prints out the number of
| characters printed up to the point where the %n is.
| Printf returns the total number of characters printed.
| segfaultbuserr wrote:
| > _Looking at old source code, the earliest implementation I
| found is 4.3BSD Tahoe (1988)._
|
| You are the HN historian of the day.
| wahern wrote:
| Another interesting factoid is that macOS only supports %n
| if the format string is located in read-only memory. Per
| printf(3) on macOS:
|
| > For this reason, a format argument containing %n is
| assumed to be untrustworthy if located in writable memory
| (i.e. memory with protection PROT_WRITE; see mprotect(2))
| and any attempt to use such an argument is fatal.
| Practically, this means that %n is permitted in literal
| format strings but disallowed in format strings located in
| normal stack- or heap-allocated memory.
|
| The manual page seems correct: % cat test.c
| #include <stdio.h> int main(void) {
| printf((char[]){ "%n" }, &(int){ 0 }); return 0;
| } % cc -o test test.c
| % ./test
| zsh: abort ./test
| saagarjha wrote:
| Someone should inform The Open Group about this violation
| of POSIX ;)
|
| Another fun fact: glibc does this too, if you compile
| with -D_FORTIFY_SOURCE=2. However, since Linux lacks the
| nice vm_region APIs the code opens up /proc/self/maps :/
| astrange wrote:
| dyld on Darwin has an API to ask if any pointer is to a
| read-only section of a binary. It's useful because you
| can e.g. skip strcpys and other allocations.
| saagarjha wrote:
| Hmm, can you tell me more? I can't think of any situation
| where skipping on a strcpy is legal, since you provide
| the second buffer and so the copy must occur. And I know
| that there is heavy uniquing going on for things like
| selectors and CFStrings at compile time, but where is the
| dyld API being used at runtime?
| astrange wrote:
| Oh, I meant strdup. Look for stdupIfMutable() calls in
| libobjc.
| pjmlp wrote:
| That is the beauty of POSIX, write once, debug
| everywhere, fix with plenty of spaghetti #ifdefs.
| segfaultbuserr wrote:
| Fun fact, on glibc, an extension feature is that you can define
| your own custom conversion specifiers for printf().
| olliej wrote:
| %n has frequently been used as an attack vector - generally in
| the context of the other poor practice of printf(<attacker
| controlled string>, ...)
|
| It's actually intentionally disallowed in some libc
| implementations.
| blue-dragonfly wrote:
| A winner from 1993 is very interesting too:
|
| https://www.ioccc.org/years.html#1993_dgibson
|
| It implements Conway's Game of Life by creating a DSL using the C
| preprocessor and printf. The output is a program (several initial
| boards are supplied to bootstrap) which is the input program to
| be compiled and run to create the next generation. This is the
| program for a second generation: LIFE
| L _ _ _ _ _ L _ _ O _ _ L _ _ _ O _ L _ O
| O O _ L _ _ _ _ _ GEN 2 STAT 328960
| END
|
| Each symbol like "LIFE" is a macro, the board is the program.
| [deleted]
| yakubin wrote:
| I'm writing my first C99 compiler. IOCCC sounds like a great
| source of test material.
| acekingspade wrote:
| It's hard to believe that this is the same person with multiple
| widely-cited ML papers[0]. It's jaw-dropping how talented someone
| can be.
|
| https://scholar.google.com/citations?user=q4qDvAoAAAAJ&hl=en...
| onurgu wrote:
| Thank you for the pointer, it is really amazing.
| badsectoracula wrote:
| Up next: a C compiler that compiles to printf statements :-P
| hahajk wrote:
| https://github.com/HexHive/printbf
|
| well this is a brainfuck interpreter inside printf. I'm pretty
| sure there are plenty of c-to-bf transpilers.
| felixr wrote:
| This is by the same author as the ioccc entry and also one of
| authors of the paper showing the turing completeness of
| printf http://nebelwelt.net/publications/#15SEC
| lifthrasiir wrote:
| Ah, that explains everything. I have already seen this
| technique before and wondered why this entry _has_ to be
| the best of show---I don 't doubt it is worth the prize,
| just that it didn't sound very novel. But it all makes
| sense if the technique is not well known and authors tried
| to revitalize that.
| klyrs wrote:
| That's fun, but esoteric languages in general and brainfuck
| in specific tend to lack things you'd want out of c: file
| system access, system calls, etc.
| archi42 wrote:
| Hm, I think you could add numeric syscalls, similar to what
| happens at the asm level. E.g. put the syscall id and some
| parameters on the "stack", then let the interpreter run the
| syscall with a new "instruction" e.g. '!'. This could even
| substitute '.' (putchar) and ',' (getchar), since these are
| very much just syscalls. So that would reduce the number of
| instructions by one (to 7).
|
| Oh, getting to 6 would also be fun: One might replace '['
| and ']' with a conditional branch '?'. It just needs two
| parameters: condition and (signed) number of instructions
| to jump. Adds the bonus (much like normal asm) to write
| moch more ~~horribly abusive~~ flexible control flow than a
| structured "while(*ptr)".
| enedil wrote:
| It's already implemented:
| https://github.com/ajyoon/systemf There is even an HTTP
| server built with it.
| archi42 wrote:
| Why am I not even surprised...? I thought about writing a
| sentence about how (relatively) easy it would be to build
| a verified compiler (think CompCert-for-brainfuck); I'd
| guess the outcome is one of (a) "someone already did that
| as well, here is the link" or (b) "I spent the weekend
| with that, here is the project on github". The Internet
| is awesome, as are people :)
| jcande wrote:
| That was my approach for an analogous program that uses
| memcpy instead of printf. I didn't go with the jit-style
| you describe, however. If you're curious here's how I
| setup the syscalls https://github.com/jcande/xenocryst/bl
| ob/master/src/gadgets.... and here's the main loop https:
| //github.com/jcande/xenocryst/blob/master/src/exec.c#L...
| lathiat wrote:
| There is somewhere a compiler that outputs to all sorts of
| crazy languages including awk, sed, printf, etc.. but I can't
| find it right now. Hopefully someone knows what I'm talking
| about.
|
| I feel like it did LLVM IR to a bunch of languages or something
| like that.. but my memory is faulty.
| lifthrasiir wrote:
| You are looking for ELVM: https://github.com/shinh/elvm/ (I
| have seen many others, but in terms of activity it seems the
| most maintained one.)
| lathiat wrote:
| Thankyou, that is the one! :) I love projects like that.
| dang wrote:
| General thread here:
| https://news.ycombinator.com/item?id=25651942
| aftbit wrote:
| Personally I find it amusing that the 0-signal comment "Thanks
| for all dang" is upvoted while the opposite 0-signal comment
| "Thanks for nothing dang" is downvoted. I mean, I think dang is
| chill, but neither of these really contributes to the
| discussion any more than the other, so shouldn't they have the
| same score? Upvotes really are a popularity contest these days.
| dang wrote:
| This was addressed by pg over a decade ago:
| https://news.ycombinator.com/newswelcome.html
|
| _Empty comments can be ok if they 're positive. There's
| nothing wrong with submitting a comment saying just "Thanks."
| What we especially discourage are comments that are empty and
| negative--comments that are mere name-calling._
|
| If you think in terms of what's good/bad for community it may
| make more sense.
|
| (I hope it's clear this applies whether or not the mods were
| mentioned in either a positive or negative way.)
| michaelcampbell wrote:
| Always have been.
| thedufer wrote:
| The negative comment is from a hellbanned account (which is,
| frankly, unsurprising). It's not even possible to downvote
| it, as far as I can tell, due to being DOA.
| saagarjha wrote:
| It's not clear what the votes are. However, the latter was
| written by someone who has banned for what appears to be
| their habit of leaving low-value comments.
| navaati wrote:
| Thanks for all, dang !
| [deleted]
| SftwreEngnr wrote:
| Thanks for nothing, dang!
| kderbyma wrote:
| awesome. I didn't know about that printf hack....time for some
| fun experiments
| saagarjha wrote:
| Be careful, though, you don't want anyone to hack you through
| printf ;)
| segfaultbuserr wrote:
| In the late 90s, looking for "printf(string)" [0] in the code
| was a great way to discover remote code execution 0days ;-)
|
| [0] should be "printf("%s", string)".
| stevekemp wrote:
| Very much so, it took a long time for this to become
| obvious as a security problem.
|
| My memory wuftpd was the first big program to suffer from
| this class of attacks.
| segfaultbuserr wrote:
| It reminds me of a talk by infosec researcher "The Grugq"
| about opsec techniques used by blackhat hackers. Its
| subtitle was _because jail is only for wuftpd_ , I
| couldn't stop laughing at it.
| Groxx wrote:
| > _Format specifiers can take extra "arguments". - "%hhn": store
| the number of bytes written mod 256 to the char pointer ..._
|
| Oh boy. I'll put that down for my "thing I don't think I wanted
| to know" of the day.
| londons_explore wrote:
| Are there any scanners out there that will detect user input
| ending up as the format string of a printf?
|
| Perhaps a scanner than I can run against all of github, and
| then rank results by the number of times that code is exposed
| on a high value server connected to the internet...?
| Someone wrote:
| Any modern C compiler will already warn you if the format
| string isn't a string literal
| (https://stackoverflow.com/questions/32362918/error-format-
| st...)
|
| I don't think it's worth the effort to extend that to look
| for tainted strings, not because it wouldn't be useful, but
| because it would be hard to do (as an extreme example: is
| data read from a file user input? It could be a file
| containing internationalization info)
|
| The (relatively) few programs that construct format strings
| on the fly will have to add pragmas to disable these
| warnings.
| rightbyte wrote:
| There is some innocent beauty in the twistedness of printf -
| especially with GNU extensions.
| flatiron wrote:
| So I can port doom to gnu printf?
| [deleted]
| npongratz wrote:
| GNU's printf is Turing complete[0]... so "yes."
|
| [0] Mentioned (but not directly linked) by TFA:
|
| https://www.usenix.org/conference/usenixsecurity15/technica
| l...
| StavrosK wrote:
| I've always wanted a Turing-complete printing function.
| josefx wrote:
| We already have post script for that.
| anthk wrote:
| Port a zmachine first ;).
| bluGill wrote:
| I'd start by creating a llvm backend for printf. Should be
| fun.
| tomjakubowski wrote:
| GNU's printf specifier language is Turing complete, I
| believe.
| FartyMcFarter wrote:
| Presumably it needs a loop around it, so it's not Turing-
| complete by itself?
| npongratz wrote:
| No need to use a loop around it, printf can take care of
| that pesky detail for you! To quote [0] (my emphasis):
|
| > To achieve full Turing-complete computation, we need a
| way to loop a format string. This is possible by
| overwriting the pointer inside printf() that tracks which
| character in the format string is currently being
| executed. The attacker is unlucky in that at the time the
| "%n" format specifier is used, this value is saved in a
| register on our 64-bit system. However, we identify one
| point in time in which the attacker can always mount the
| attack. The printf() function makes calls to puts() for
| the static components of the string. When this function
| call is made, all registers are saved to the stack. _It
| turns out that an attacker can overwrite this pointer
| from within the puts() function. By doing this, the
| format string can be looped_.
|
| > An attacker can cause puts() to overwrite the desired
| pointer. Prior to printf() calling puts(), the attacker
| uses "%n" format specifiers to overwrite the stdout FILE
| object so that the temporary buffer is placed directly on
| top of the stack where the index pointer will be saved.
| Then, we print the eight bytes corresponding to the new
| value we want the pointer to have. Finally, we use more
| "%n" format specifiers to move the buffer back to some
| other location so that more unintended data will not be
| overwritten.
|
| [0] https://www.usenix.org/system/files/conference/usenix
| securit..., Appendix B "Printf is Turing-complete".
| saagarjha wrote:
| Or, you know, you can just use printf to overwrite the
| return address and ROP your way to a shell.
| xyproto wrote:
| Herein lies madness
| moonchild wrote:
| Not portable, but if you can get the address of the stack
| you can force printf to overwrite the return address,
| obviating the loop.
| shakna wrote:
| It does not need the loop. It might be easier to
| understand by looking at something like this. [0] That
| printf allows for this kind of Turing complete control
| flow is well known [1].
|
| [0] https://github.com/HexHive/printbf
|
| [1] http://nebelwelt.net/publications/files/15SEC.pdf
| moonchild wrote:
| That code is wrapped in a loop - https://github.com/HexHi
| ve/printbf/blob/master/src/pbf_pre.c...
| fulafel wrote:
| The implementation is accidentalyl turing-complete
| because you can exploit it to get arbitrary memory
| writes. But the language as specified is not Turing-
| complete.
| alisonkisk wrote:
| Theres a great example of what you can do with this,
| submitted and discussed on HN here:
| https://news.ycombinator.com/item?id=25690319
| notretarded wrote:
| > _Format specifiers can take extra "arguments". -
| "%hhn": store the number of bytes written mod 256 to the
| char pointer ..._
|
| Oh boy. I'll put that down for my "thing I don't think I
| wanted to know" of the day.
| snerp wrote:
| There is some innocent beauty in the twistedness of
| printf - especially with GNU extensions.
| SahAssar wrote:
| GNU's printf specifier language is Turing complete, I
| believe.
| gu5 wrote:
| What is up with this thread? These comments are
| duplicated from the top thread...
| exikyut wrote:
| (The usernames are different)
| dang wrote:
| Is it possible that it's a joke based on the material of
| the OP? Maybe we're trapped in some sort of International
| Obfuscated Internet Thread Contest...
|
| Edit: ok, I'm guessing it's a joke about Turing
| completeness. Loops, you know.
| gu5 wrote:
| Ohhhh... Now I get it.
| saagarjha wrote:
| It's a joke about the recursion introduced here:
| https://news.ycombinator.com/item?id=25691615.
| Bootvis wrote:
| Well executed jokes on HN. 2021 already is a crazy year.
| dfcowell wrote:
| So I can port doom to gnu printf?
| a1369209993 wrote:
| GNU's printf is Turing complete[0]... so "yes."
|
| [0] Mentioned (but not directly linked) by TFA:
|
| https://www.usenix.org/conference/usenixsecurity15/techni
| cal...
| kortilla wrote:
| GNU's printf specifier language is Turing complete, I
| believe.
| [deleted]
| [deleted]
| saagarjha wrote:
| For those interested in more Turing complete format strings, look
| no further than the "sprint" challenge from this year's Google
| CTF Quals: https://ctftime.org/task/12834. It's sprintf in a loop
| this time and the program simulates a maze:
| https://github.com/google/google-ctf/tree/master/2020/quals/...
| enedil wrote:
| The author works at Google, so I suspect he's the same who
| created this challenge. Really enjoyable, although I didn't
| manage to solve it during contest.
___________________________________________________________________
(page generated 2021-01-09 23:02 UTC)