[HN Gopher] There is no 'printf'
       ___________________________________________________________________
        
       There is no 'printf'
        
       Author : pr0zac
       Score  : 112 points
       Date   : 2021-10-20 15:01 UTC (1 days ago)
        
 (HTM) web link (www.netmeister.org)
 (TXT) w3m dump (www.netmeister.org)
        
       | monocasa wrote:
       | There is 'printf'. It's just that printf (and the rest of the
       | standard library) is technically as much a part of the C language
       | as the language grammar itself, and C compilers are welcome to
       | use innate knowledge of those functions for optimizations. The
       | other place you typically see this is calls to functions like
       | memcpy/memset being elided to inline vector ops or CISC copies,
       | or on simpler systems, large manual zeroing and copying being
       | elided the other way to a memset or memcpy call.
       | 
       | C compilers will typically have an escape hatch for envs like
       | deeply embedded systems and kernels like gcc's -ffreestanding and
       | -fno-builtin that says "but for real though, don't assume std lib
       | functions exist or you know what they are based on the function's
       | name".
       | 
       | <rust_task_force> One of my favorite parts of rust as someone who
       | uses it for deeply embedded systems is the separation of core and
       | std (where core is the subset of std that only requires memcpy,
       | memset, and one other I'm forgetting). The rest of the standard
       | library is ultimately an optional part of the language with
       | compiler optimizations focused on general benefits rather than
       | knowing at the complier how something like printf works. no_std
       | is such a nicer env than the half done ports of newlib or pdclib
       | that everyone uses in C embedded land. </rust_task_force>
        
       | tptacek wrote:
       | Huh, this is pretty great; I've always fussily used fputs() when
       | I'm just printing static strings, and apparently I don't need to
       | bother, since the compiler will just do it for me.
        
       | guerrilla wrote:
       | Moar please. I'm loving these counterintuitive C optimization
       | gotchas lately[1]. They are like little brain teasers.
       | 
       | 1. https://news.ycombinator.com/item?id=28930271
        
         | 0xcde4c3db wrote:
         | About a year ago there was something of a "joke isEven()
         | implementation discourse" on Twitter, which eventually evolved
         | a sort of informal optimizer abuse contest. For example:
         | 
         | https://twitter.com/zeuxcg/status/1291872698453258241
         | 
         | https://twitter.com/jckarter/status/1428071485827022849
        
           | aw1621107 wrote:
           | OK, those are horrifying and fascinating, and they basically
           | break my brain.
           | 
           | Is there a explanation somewhere of why the first one
           | "works"? The second one I think is the compiler assuming the
           | default case will never be hit since it'll result in infinite
           | recursion, which is UB under C++, so it's basically assuming
           | 0<=x<=3 and optimizing from there. Is that correct?
           | 
           | The first one I'm less certain about. The only thing I can
           | think of is that the compiler deduces an upper limit of
           | INT_MAX - 1 to avoid signed overflow, and then somehow
           | figuring out the true/false pattern from there? Still a bit
           | of a gap in my understanding there.
        
             | barsonme wrote:
             | My guess: since overflowing int is UB, and the only value
             | of n that stops the recursion is zero, the compiler assumes
             | that n must be zero and checks accordingly.
             | 
             | That doesn't explain why it uses test dil, 1 instead of
             | test dil, dil or cmp 0 or whatever.
        
             | davemp wrote:
             | Optimizers have to keep the same input/output pairs unless
             | there is undefined behavior. In the second function the
             | truth table looks like:                   in    | out
             | ----------         0b000 | 1         0b001 | 0
             | 0b010 | 1         0b011 | 0         0b100 | don't care
             | .               .               .         MAX   | don't
             | care
             | 
             | The compiler just chooses the most efficient way it knows
             | to get the filled out entries correct which happens to be:
             | in    | ~in[0]         ----------         0b000 | 1
             | 0b001 | 0         0b010 | 1         0b011 | 0         0b100
             | | 1               .               .               .
             | MAX   | 1
             | 
             | It would have been just as valid to do:
             | in    | in[2] or ~in[0]         ----------         0b000 |
             | 1         0b001 | 0         0b010 | 1         0b011 | 0
             | 0b100 | 1         0b101 | 1               .               .
             | .         MAX   | 1
             | 
             | The first function's table looks like:                   in
             | | out         ----------         0b000 | 1         0b001 |
             | don't care         0b010 | don't care               .
             | .               .         MAX   | don't care
             | 
             | And the compiler still likes the even check in this case,
             | which makes sense.
        
               | notriddle wrote:
               | The first function (the `n == 0 || !isEven(n+1)`
               | recursive function) has defined behavior for negative
               | numbers. That's probably why it compiled to an even
               | number check.
        
         | archi42 wrote:
         | It's all fun and games until you write (or review) C/C++ test
         | cases for a compiler or disassembler ;-) It never stopped to
         | amaze me how good the compiler was to figure out that I
         | actually wrote very complicated "return 0".
        
       | eikenberry wrote:
       | https://web.archive.org/web/20211019052752/https://www.netme...
        
       | GoblinSlayer wrote:
       | Imagine somebody thought omitting the return statement and doing
       | whatever the compiler likes is a good feature to have.
        
         | dboreham wrote:
         | Like Scala?
        
           | dnautics wrote:
           | pretty sure scala (and most FP) has a well-defined "what to
           | do when you leave off the return statement", not one that "is
           | up to the compiler"
        
       | [deleted]
        
       | qwerty456127 wrote:
       | > puts(3) only returns "a nonnegative integer on success and EOF
       | on error"
       | 
       | How does it decide which nonnegative integer to return?
        
         | robotresearcher wrote:
         | It's arbitrary. The article shows an implementation that
         | returns 10 (ASCII '\n'). But the spec says it doesn't matter,
         | so you should only be using it to test >0 for success.
        
           | Bayart wrote:
           | The correct implementation is _obviously_ to return 1 on
           | success !
        
         | woodruffw wrote:
         | That's answered below:
         | 
         | > On success, puts(3) appears to return '\n', the newline or
         | line feed (LF) character, which has ASCII value... 10.
         | 
         | But note that that isn't standard behavior. The language in
         | POSIX[1] is identical to that in the blog post. `puts` is free
         | to return whatever positive number it wants on return.
         | 
         | [1]:
         | https://pubs.opengroup.org/onlinepubs/9699919799/functions/p...
        
       | cyberge99 wrote:
       | Apparently there is no available capacity for that site either.
        
         | Bang2Bay wrote:
         | https://search.yahoo.com/ for
         | 
         | There is no 'printf'
         | 
         | and look through the cache
        
       | ltr_ wrote:
       | [off topic] I always wondered how '%n' is used in production
       | code.
        
       | mormegil wrote:
       | So, why does puts do "return r ? EOF : '\n';"? Some backwards
       | compatibility? Or is there a logical reason for that?
        
         | _kst_ wrote:
         | That particular implementation probably returns the result of
         | the last fputc() or equivalent that it called.
         | 
         | puts() returns EOF (typically -1) on error, or some unspecified
         | non-negative value on success.
         | 
         | fputc() returns EOF on error or the written character, treated
         | as an unsigned char and converted to int, on success.
         | 
         | Don't expect all puts() implementations to do the same thing.
         | For example, the glibc implementation appears to return the
         | number of characters written on success. Implementations are
         | free to rely on implementation-defined behavior. User code
         | that's intended to be portable cannot.
        
           | LukeShu wrote:
           | That particular implementation (NetBSD's) (which is
           | transcribed in to the article) does something more optimized
           | than making repeated calls to `putchar()`.
           | 
           | But as pdw's link shows, what you suggest is exactly what the
           | historical implementation was. So NetBSD is simply matching
           | historical Unix.
        
         | masklinn wrote:
         | Per the man:
         | 
         | > puts() and fputs() return a nonnegative number on success, or
         | EOF on error.
         | 
         | r is the result of the write, if it's nonzero the write failed
         | and thus so did puts.
        
           | m45t3r wrote:
           | Yeah, but I think the question was why EOF and "\n". It could
           | as easily just return 1 or -1 for example, and it would make
           | more sense I think.
        
             | kevin_thibedeau wrote:
             | puts() always adds a line termination so success means that
             | '\n' is the last char for that implementation.
        
         | pdw wrote:
         | It's what historic Unix did:
         | https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libc...
         | 
         | Why it did that? I'm not sure, but at the time C did not have
         | 'void' functions: every function returned a value. They
         | probably wanted to make the behavior of the stdlib functions
         | deterministic, even if the return value was useless and
         | undocumented.
        
       | anonymousiam wrote:
       | Compiler optimization can sometimes cause unpredictable or even
       | incorrect behavior. Below is a blob of C code for the TI MSP430
       | compiler that exemplifies at least one of TI's optimization bugs:
       | 
       | // Define Common Communications Frame
       | 
       | typedef volatile union commFrameType
       | 
       | {                 struct            {              unsigned
       | SyncHeader:16;              unsigned MessageID:8;
       | unsigned short MessageData[msgDataSize];  // ID-unique data
       | unsigned CRC:8;             // LSB of CCITT-16 for above data
       | } __attribute__ ((packed)) Frame;            unsigned char
       | b[16];         // Accessible as raw bytes as well
       | unsigned short w[8];          // Accessible as raw words as well
       | unsigned long  l[4];          // Accessible as raw long words as
       | well
       | 
       | } __attribute__ ((packed)) CommFrame;
       | 
       | static CommFrame IpcMessage = { FRAME_SYNC_R, IpcBlankMessage };
       | // If frame was accepted into TX queue, prepare next frame for
       | transmission
       | 
       | // IpcMessage.Frame.MessageID++; // Bump up to next message type
       | 
       | // IpcMessage.Frame.MessageID += 1;
       | 
       | // The above two lines that are commented out cause a bizzare
       | linker error if either are used instead of the line below.
       | IpcMessage.Frame.MessageID = IpcMessage.Frame.MessageID + 1; //
       | Bump up to next message type
       | 
       | The MSP-430 is a 16-bit microcontroller and the packed CommFrame
       | structure has Frame.MessageID on an odd-byte boundary. Some
       | processors might raise a SIGBUS, but TI says that it's okay to
       | access a byte on an odd address boundary.
       | 
       | It's pretty silly that i++; and i+=1; don't work, but i=i+1; is
       | just fine.
        
         | secondcoming wrote:
         | 'unsigned MessageID:8;' isn't the same as 'unsigned char
         | MessageId'
        
       | RcouF1uZ4gsC wrote:
       | This is a bit like saying there is no '+';
       | 
       | Because if you put in                   return 1+2+3;
       | 
       | And look at the assembly code, you will see that the compiler
       | generated something like                   return 6;
       | 
       | The compiler is allowed to take advantage of the standard to
       | substitute in more efficient code that does the same thing.
       | 
       | IIRC, for C++, it would actually be ok if std::vector was
       | implemented completely as a compiler intrinsic with no actual
       | header file. (No compiler I am aware of actually does it that
       | way).
        
         | dnautics wrote:
         | yeah but everyone knows that "there is no +"; It's an operator,
         | and in C, anyways operators are special and expected to not
         | necessarily do C-function-ey things, e.g, "take arguments of
         | different types and add them successfully" not everyone is
         | aware that C has "anointed functions" (including, I believe
         | malloc) that the compiler is allowed to fiddle with.
        
         | malkia wrote:
         | Is there more info to this, I remember this from Commmon Lisp
         | (but details evade me) that the compiler can take benefit of
         | certain specific functions and rely on them being... "open
         | coded" - e.g. it can produce more efficient code by replacing
         | these with something more suitable...
         | http://www.sbcl.org/manual/#Open-Coding-and-Inline-Expansion
         | 
         | https://www.thecodingforums.com/threads/what-is-the-meaning-...
        
         | talaketu wrote:
         | > more efficient code that does the same thing
         | 
         | In this case, it produces a different result.
        
           | masklinn wrote:
           | It produces a different ub, which is ub.
           | 
           | Furthermore observability would be defined in terms of the C
           | abstract machine, "observing" by decompiling the program is
           | out of scope.
        
             | talaketu wrote:
             | oh right
             | 
             | > But what if you're not using C99 or newer?
             | 
             | UB - that takes all the fun out of it.
        
         | Someone wrote:
         | Code that does                 #include <vector>
         | 
         | must compile, so that _header_ must exist (whether it is stored
         | in a _file_ is the implementer's choice. AFAIK, the standard
         | carefully avoids the use of the term 'header file')
         | 
         | Also, I think code that doesn't do that include must fail to
         | compile when it tries to use _std::vector_. So, logically, that
         | header must exist.
        
           | gpderetta wrote:
           | Well not really. The preprocessor is part of the compiler, so
           | it only needs set a flag to tell the compiler proper to
           | enable std::vector.
        
       | rrauenza wrote:
       | Quick Summary:
       | 
       | The C compiler optimizer replaces printf("Hello World!\n") with
       | puts("Hello World!\n") and the implicit return from main()
       | changes from 13 (the return value of printf) to 10 (the return
       | value of puts)
        
         | moffkalast wrote:
         | Calls on puts you say?
        
           | helmholtz wrote:
           | Brilliant.
        
           | enlyth wrote:
           | In other words long volatility
        
       ___________________________________________________________________
       (page generated 2021-10-21 23:00 UTC)