[HN Gopher] Main is usually a function, so when is it not? (2015)
       ___________________________________________________________________
        
       Main is usually a function, so when is it not? (2015)
        
       Author : ColinWright
       Score  : 150 points
       Date   : 2021-06-14 16:03 UTC (6 hours ago)
        
 (HTM) web link (jroweboy.github.io)
 (TXT) w3m dump (jroweboy.github.io)
        
       | turtletontine wrote:
       | Really cool post!
       | 
       | Can anyone explain why const makes the array executable? That was
       | the most surprising but for me
        
         | dj_mc_merlin wrote:
         | The real effect he's trying to cause is to move the bytes for
         | main from a R/W memory segment (like .data) to an eXecutable
         | one (like .text).
         | 
         | The const keyword tells C that a certain variable should never
         | be modified by code. Doing so would be undefined behaviour. GCC
         | is free to implement whatever guarantees for memory meant for
         | const variables. On some versions and under certain criteria,
         | it would place const variables into .text so any access would
         | cause a SIGSEGV. It can also achieve the same by putting it
         | into other write protected memory, like .rodata (which is what
         | the newest version of gcc prefers for the code in this article,
         | making it no longer work). Why GCC chooses one over the other
         | and why it would change over time are hard questions.
         | 
         | A more effective way would be to use __attribute__s (on gcc) or
         | #pragma directives to specify that the bytes need to be in the
         | .text segment. However, that ruins the magic a bit.
        
       | [deleted]
        
       | dhosek wrote:
       | I took one CS class as an undergrad. On those occasions I showed
       | up for class, I sat in the back row and wrote poetry. I think I
       | only turned in one assignment as well. It was supposed to be a
       | very simple infix calculator with 26 statically allocated
       | variables. I thought that was boring so I ended up creating an
       | algebraic solver that could use dynamically allocated variables
       | of any length as long as they began with a letter and were all
       | alphanumeric. The whole thing was done using cweb. The TA gave me
       | partial credit saying that he couldn't understand any of what the
       | code did (in his defense, the cweave output was 20+ pages long
       | and I think my classmates' programs were all a couple pages of C)
       | but that it appeared to work.
        
         | samatman wrote:
         | Well I can relate to that kind of story, although I got it
         | mostly out of my system by college.
         | 
         | I took one CS class as well. I didn't read the syllabus
         | carefully, and thus didn't realize that the lab was 10% of my
         | grade, so I got a final score of 89%. I tried to argue that
         | getting 99% of the rest of the course right should make up for
         | the missing lab work, to no avail. This is how I ended up a
         | chemistry major.
         | 
         | Anyway, I think you would get a lot out of this essay. I did at
         | least.
         | 
         | http://www.marktarver.com/bipolar.html
        
         | anyfoo wrote:
         | Depends on the assignment, and can easily backfire.
         | 
         | When I was TA'ing the CS intro class, which in my University
         | was actually using a functional programming language (ML), I
         | received a bunch of "working" programs from students who
         | already knew how to program, but not with functional
         | programming languages. They would force an imperative style
         | into ML, which was not what the assignment was, and kind of
         | showed that they must not have paid attention at all.
        
       | dang wrote:
       | Some past threads:
       | 
       |  _Main is usually a function, so when is it not? (2015)_ -
       | https://news.ycombinator.com/item?id=15206198 - Sept 2017 (65
       | comments)
       | 
       |  _Main is usually a function. So then when is it not?_ -
       | https://news.ycombinator.com/item?id=12799637 - Oct 2016 (1
       | comment)
       | 
       |  _Main is usually a function - when is it not?_ -
       | https://news.ycombinator.com/item?id=8951283 - Jan 2015 (60
       | comments)
        
       | pcstl wrote:
       | In Haskell, main is a _value_ , not a function. This is sometimes
       | overlooked because in Haskell "values" can be thought of as
       | 0-parameter functions and functions can be used as values, so
       | everything sort of blends together, but I find it gives some
       | fascinating insight into how lazy programming changes things up.
        
         | chowells wrote:
         | I like Haskell as much as anyone, but this is pretty non-
         | responsive. "main is usually a function" is a gcc warning, and
         | the whole joke is that "usually" leaves some pretty broad
         | questions. This post answers some of them.
        
       | shadowgovt wrote:
       | Is there any particular utility to this trick, or is it just a
       | neat side-effect of the linker and compiler being very permissive
       | and treating something that most languages would call a
       | compilation error as merely a warning?
        
         | Jorengarenar wrote:
         | Neat side effect
        
         | rm445 wrote:
         | Ken Thompson wrote a regex engine which compiled (at runtime)
         | regexes into data structures containing executable machine
         | code, and invoked them (from C source) by jumping into the data
         | i.e. treating its location as a function pointer. That's what's
         | happening here except it's the start code inserted by the
         | linker which is jumping into main.
         | 
         | So there's the utility, if you're hardcore enough to build
         | machine code at runtime.
         | 
         | If you wanted to abuse main() particularly, I guess you've got
         | argc and argv in registers, and your hand-compiled main
         | 'function' could maybe have some self-modifying code?
        
           | dhosek wrote:
           | I don't know that that would work since if the code is
           | generated at runtime it would live in .data and not .text. At
           | least for the architecture being targeted, you aren't allowed
           | to create executable code at runtime like that (note that the
           | original poster had to declare his main array as const to be
           | able to have it in the .text segment.
        
             | Sharlin wrote:
             | Coercing a data pointer into a function pointer is
             | undefined behavior in standard C (they don't even need to
             | be the same size), but at least on POSIX platforms the
             | compiler must do the right thing because `dlsym` depends on
             | it working. Generating and executing native code at runtime
             | is not _that_ special, mind; after all JIT compilers are
             | ubiquitous these days!
        
             | twoodfin wrote:
             | Popularity of the no-execute NX bit significantly postdates
             | Unix. As I recall, Microsoft only started flipping it on by
             | default for the 64-bit Windows NT kernel, since so many
             | preexisting 32-bit applications relied on self-modifying
             | code.
        
         | Animats wrote:
         | In C and C++, "main" is special. Too special. For historical
         | reasons, its argument and return types are not checked.
         | 
         | I once argued on the C standard forum that a C compiler should
         | not know about "main". "#include <unix.h>" should contain the
         | usual Unix declaration for "main", and "#include <windows.h>"
         | should contain the Windows declaration, which at the time was,
         | roughly:                   int WINAPI wWinMain(HINSTANCE
         | hInstance, HINSTANCE hPrevInstance, PWSTR pCmdLine, int
         | nCmdShow);
         | 
         | It's then up to the user to define their startup function to
         | match, with normal type checking.
         | 
         | This gets the compiler out of handling "main" as a special
         | case.
         | 
         | This was generally considered to be the right answer, but would
         | break too much existing code.
        
           | ornitorrincos wrote:
           | to nitpick: windows also needs to provide a main function(why
           | shouldn't?)
           | 
           | but I agree that the fact that the standard allow for 2
           | different declarations of main in a language without
           | poliformism doesn't help.
           | 
           | Not to start with the whole implementations are free to
           | define extra entry points part.
        
           | devit wrote:
           | There is no special treatment of main in the major C
           | compilers, the only "magic" thing the compiler does is
           | including the CRT startup object file in the link, which
           | defines _start as a function ultimately calling main, and
           | having the default linker script set the address of "_start"
           | as the executable entry point.
           | 
           | You can pass -nostdlib to gcc to disable linking the CRT
           | startup object (or use ld directly) and you can pass
           | --default-script /dev/null to ld to disable the linker
           | script.
           | 
           | There is no need to declare main or check arguments or return
           | types since in C arguments are both pushed and popped by the
           | caller and the language provides no typing guarantees and
           | thus there is no problem in calling functions with mismatched
           | argument or return type declarations.
        
             | mananaysiempre wrote:
             | Not _quite_ true: there's the weird thing where gcc on i*86
             | will align the stack on entry to a function called main but
             | not any other.                 $ gcc -m32 -O2 -fno-pie
             | -fno-asynchronous-unwind-tables -fomit-frame-pointer -S
             | -masm=intel -xc -o - -       int foo(void); int main(void)
             | { return foo(); }       ^D        .file "<stdin>"
             | .intel_syntax noprefix        .text        .section
             | .text.startup,"ax",@progbits        .p2align 4
             | .globl main        .type main, @function       main:
             | push ebp        mov ebp, esp        and esp, -16
             | call foo        leave        ret        .size main, .-main
             | .ident "GCC: (GNU) 11.1.0"        .section .note.GNU-
             | stack,"",@progbits
             | 
             | It doesn't do that if you set the historical stack
             | alignment, though (-mpreferred-stack-boundary=2), or if you
             | name the function anything else but main (it even does a
             | tail call). Presumably it's trying to (somewhat) recover
             | from the time when the GCC authors accidentally the SysV
             | i386 ABI[1,2].
             | 
             | [1]: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 [2]:
             | https://stackoverflow.com/a/49397524
        
             | anyfoo wrote:
             | Yes there is. I've demonstrated this in a sibling (or
             | rather, cousin) comment, but in short, you can happily not
             | return a value in main even if its type is "int
             | main(void)". Try that with another function, and the
             | compiler should at least warn. This might not be a special
             | case of code generation, but it is a special case of error
             | handling at least.
        
           | user5994461 wrote:
           | On visual studio, the main() is not the entry point of the
           | program.
           | 
           | The entry point is automatically generated by the compiler,
           | it calls a few functions depending on what the program does
           | then calls the main, I think it had to do with initializing
           | the standard library. You can see the stub using a debugger
           | or a disassembler.
           | 
           | It's possible to set the entry point point to any function
           | name. See advanced project settings.
           | 
           | Now about the arguments and return type. With main the caller
           | is responsible for pushing arguments onto the stack before
           | the call, then popping the stack after the call. the return
           | code is in the EAX register if I remember well.
           | 
           | Because of that, it doesn't matter what's the signature of
           | the main, the invocation will work irrelevant of the
           | arguments.
           | 
           | People may ask what's the point of knowing any of this? One
           | major use case is to write executable compressors like UPX.
           | Another use case is to make a custom entry point written in
           | assembler.
        
             | anyfoo wrote:
             | Not just in Visual Studio. main is usually (always?) not
             | the entry point on unixoid systems either, that's much more
             | likely to be _start, which calls main() down the line.
             | 
             | Nevertheless, main _is_ treated specially by the compiler
             | for the aforementioned historical reasons, for example to
             | not warn /error out if it does not return a value despite
             | the type clearly telling so.
             | 
             | Observe:
             | 
             | % echo 'int main(void) { }' > foo.c; clang -c foo.c
             | 
             | <no output>
             | 
             | % echo 'int foo(void) { }' > foo.c; clang -c foo.c
             | 
             | foo.c:1:17: warning: non-void function does not return a
             | value [-Wreturn-type]
             | 
             | int foo(void) { } ^ 1 warning generated.
             | 
             | As you can see, clang is happy to ignore the missing return
             | value for main(), but not for foo().
        
       | FranchuFranchu wrote:
       | Why does he add semicolons at the end of the assembly lines?
        
         | Taniwha wrote:
         | They allow you to put multiple instructions on the same line -
         | in this case you must either have ';'s or '\n's in the string -
         | having both doesn't break stuff, I guess it's more belt and
         | braces
        
         | wtetzner wrote:
         | Not sure. I guess it shouldn't hurt anything though, since
         | semicolon is the comment character.
        
           | Taniwha wrote:
           | depends on the assembler, in some assemblers ';' allows you
           | to put multiple instructions on the same line
        
         | actually_a_dog wrote:
         | They're not necessary, so I suspect some combination of reflex,
         | consistency for consistency's sake, or, possibly that they were
         | added automatically by his editor/IDE at some point.
        
       | hawski wrote:
       | Time passed. How was the assignment graded?
        
         | labster wrote:
         | If I was the TA in this class, I'd give it -5 (95 of 100) for
         | "doesn't compile on my 486, please write more portable code"
         | just to screw with the student.
        
         | Sebb767 wrote:
         | In my university, it would've probably received full points;
         | reason being that people who pull such shenanigans usually
         | don't see "hello world" as a challenge - assuming, of course,
         | the author could explain it.
        
       | ipython wrote:
       | It's not hard to get the address of data in 32-bit addressing.
       | You just interleave the data inside your assembly, something like
       | the following (pseudo code I haven't done this in a while):
       | ...           call continue           .db "Hello World!\n\0"
       | continue:           pop eax         ...
       | 
       | Since 'call' just turns into a 'push eip; jmp target'
       | (simplified, sorry), the address of the string is now pushed onto
       | the stack. Popping off the top, now eax contains the address of
       | the string "Hello World!\n\0". Since in 32-bit ABI most
       | parameters are passed on the stack, many times you don't even
       | need to 'pop' the address off the stack, it'll just be part of
       | your arguments to the function.
       | 
       | Old school malware used this a lot to 1) run regardless of the
       | memory base address it was loaded at and 2) confuse some
       | disassemblers (you can use silly conditionals that are always
       | true or false to control whether you execute the 'call'
       | instruction or not, forcing the disassembler to try and
       | 'disassemble' the string into valid x86 opcodes)
        
         | zh3 wrote:
         | More or less how Fortran works on PDP-11's.
        
       | bregma wrote:
       | In C (and C++) main is always a function with zero parameters,
       | two parameters as described in detail, or some implementation-
       | specific set of parameters. ANything else is undefined behaviour.
       | 
       | In other words, you can write `main` as anything you want, and it
       | might do something when presented to a C compiler, and that might
       | do something when linked through the system linker, but it's not
       | C code.
       | 
       | Doing something that's by definition incorrect and having it
       | maybe do something somewhere sometimes, or maybe not, is not
       | really all that impressive when you think about it.
        
         | samatman wrote:
         | > _Doing something that 's by definition incorrect and having
         | it maybe do something somewhere sometimes, or maybe not, is not
         | really all that impressive when you think about it._
         | 
         | See I would have called this _hacking_ , which here on _Hacker
         | News_ is its own special kind of impressive.
        
         | scintill76 wrote:
         | You could make main() a standard C function that merely calls
         | this machine code hack function.
        
       | jpegqs wrote:
       | Can be both, a string and a function:                        char
       | main              [/*x86*/]            __attribute__
       | ((section(".text"))        )="WTYH)9Zj8_j7H)9]R"
       | "H)9^\350\0\0\0\0H)1^R"        "H))Z8<2u\366j<)9Xj9"
       | ")9j9VY)<$[S_H\xbd^["         "H$@\xcd\200-XP\xf"
       | "\5XP_j<W\xeb\xe2]"          "Hello World!\n";
       | /*Linux_Only!*/
        
         | failwhaleshark wrote:
         | Now make it a quine too. ;-)
        
       | im3w1l wrote:
       | Floats being more mysterious and intimidating than ints I prefer
       | 
       | const float main[] = {-8.10373123e+22, 6.16571324e-43,
       | 1.58918456e-40, -7.11823707e-31, 5.81398733e-42, 1.26058568e-39,
       | 6.72382769e-36, 2.17817833e-41, 2.16139414e-29, 1.10873646e+27,
       | 1.76400414e+14, 1.74467096e+22, -221.039566};
        
         | jalbertoni wrote:
         | Genuine question, can you be sure the conversion wouldn't
         | introduce a wrong bit here or there? Maybe in a different
         | architecture or something?
         | 
         | I'm not that good with CPUs past 16 bits, this is really out of
         | my comfort zone heh
        
           | grishka wrote:
           | You only depend on the compiler to interpret these floats
           | correctly and generate their binary representation that
           | decodes into valid instructions. As far as the CPU executing
           | this code is concerned, it's machine code either way.
           | 
           | > Maybe in a different architecture or something?
           | 
           | Of course this isn't portable across CPU architectures,
           | neither is it portable across operating systems due to at
           | least ABI differences.
        
           | banana_giraffe wrote:
           | Unless I'm missing something, this code is already
           | architecture dependent .. adding more architecture
           | dependencies won't really hurt.
        
           | bombcar wrote:
           | I think the format for single precision and double is defined
           | by the standard. Beyond that may be implementation dependent.
        
             | _kst_ wrote:
             | The format for floating-point is specified by the IEEE
             | floating-point standard (or whatever it's officially called
             | these days). C permits but does not require IEEE format.
             | Most implementations these days use it.
        
         | taneliv wrote:
         | Better yet, try to find the corresponding ints (or maybe more
         | realistically shorts or chars) from usual #include headers, and
         | use the #define or const mnemonics for all numbers.
         | 
         | Bonus points for finding them all in the same header file, or
         | with like names, so as to give appearance of them actually
         | meaning something in the context of the prank.
        
       | monocasa wrote:
       | I interestingly did this just the other day. I had a testing
       | reason for main to consist of a single specific illegal
       | instruction that I know the hex of anyway. It was less work for
       | the system's Makefile to compile a .c rather than a .s file, and
       | I knew everything I needed to make this trick work, but didn't
       | know how to for sure disable function prologues for this arch.
       | 
       | It's the first time I had a legtimate excuse to whip out this
       | technique since seeing it in an ancient obfuscated C contest
       | entry for the PDP-11 probably a decade ago. mullender.c I think?
        
       | ltbarcly3 wrote:
       | Well, I guess technically they could expel him because he had
       | someone do part of the assignment for him.
        
       | _kst_ wrote:
       | One of the winners of the 1st International Obfuscated C Code
       | Contest (1984) used this technique.
       | 
       | https://www.ioccc.org/1984/mullender/mullender.c
       | 
       | https://www.ioccc.org/1984/mullender/hint.text
       | short main[] = {                 277, 04735, -4129, 25, 0, 477,
       | 1019, 0xbef, 0, 12800,                 -113, 21119, 0x52d7,
       | -1006, -7151, 0, 0x4bc, 020004,                 14880, 10541,
       | 2056, 04010, 4548, 3044, -6716, 0x9,                 4407, 6,
       | 5568, 1, -30460, 0, 0x9, 5570, 512, -30419,
       | 0x7e82, 0760, 6, 0, 4, 02400, 15, 0, 4, 1280, 4, 0,
       | 4, 0, 0, 0, 0x8, 0, 4, 0, ',', 0, 12, 0, 4, 0, '#',
       | 0, 020, 0, 4, 0, 30, 0, 026, 0, 0x6176, 120, 25712,
       | 'p', 072163, 'r', 29303, 29801, 'e'         };
        
         | alin23 wrote:
         | That's already mentioned in the third paragraph but it's nice
         | of you to also include the bytecode in the comment.
         | 
         | > Apparently in 1984, a strange program won the IOCCC where
         | main was declared as a short main[] = {...} and somehow this
         | did stuff and printed to the screen!
        
           | ianhanschen wrote:
           | Barely mentioned, and the author makes us all die inside when
           | they say "Too bad it was written for a whole different
           | architecture and compiler so there is really no easy way for
           | me to find out what it did."
        
             | _kst_ wrote:
             | As the hint explains, it's a combination of PDP-11 and VAX
             | machine code, set up so that either system will run its own
             | code and ignore the foreign code.
             | 
             | You can extract a few ASCII strings from the data. As the
             | hint says: "Can you guess what is printed? We knew you
             | couldn't! :-)"
             | 
             | The ASCII strings I found were "vax", "pdp", "str",
             | "write", and " :-)".
        
       | _kst_ wrote:
       | The C standard requires main to be defined as a function, but
       | failure to do so is not a constraint violation, so no diagnostic
       | is required. If you define it as something else, the behavior is
       | undefined.
       | 
       | A conforming C compiler could reject a program that defines main
       | as an array, but is not required to do so.
       | 
       | gcc doesn't complain by default, but with warning enabled it says
       | "warning: 'main' is usually a function".
        
       ___________________________________________________________________
       (page generated 2021-06-14 23:00 UTC)