[HN Gopher] Main is usually a function, so when is it not? (2015)
___________________________________________________________________
Main is usually a function, so when is it not? (2015)
Author : ColinWright
Score : 150 points
Date : 2021-06-14 16:03 UTC (6 hours ago)
(HTM) web link (jroweboy.github.io)
(TXT) w3m dump (jroweboy.github.io)
| turtletontine wrote:
| Really cool post!
|
| Can anyone explain why const makes the array executable? That was
| the most surprising but for me
| dj_mc_merlin wrote:
| The real effect he's trying to cause is to move the bytes for
| main from a R/W memory segment (like .data) to an eXecutable
| one (like .text).
|
| The const keyword tells C that a certain variable should never
| be modified by code. Doing so would be undefined behaviour. GCC
| is free to implement whatever guarantees for memory meant for
| const variables. On some versions and under certain criteria,
| it would place const variables into .text so any access would
| cause a SIGSEGV. It can also achieve the same by putting it
| into other write protected memory, like .rodata (which is what
| the newest version of gcc prefers for the code in this article,
| making it no longer work). Why GCC chooses one over the other
| and why it would change over time are hard questions.
|
| A more effective way would be to use __attribute__s (on gcc) or
| #pragma directives to specify that the bytes need to be in the
| .text segment. However, that ruins the magic a bit.
| [deleted]
| dhosek wrote:
| I took one CS class as an undergrad. On those occasions I showed
| up for class, I sat in the back row and wrote poetry. I think I
| only turned in one assignment as well. It was supposed to be a
| very simple infix calculator with 26 statically allocated
| variables. I thought that was boring so I ended up creating an
| algebraic solver that could use dynamically allocated variables
| of any length as long as they began with a letter and were all
| alphanumeric. The whole thing was done using cweb. The TA gave me
| partial credit saying that he couldn't understand any of what the
| code did (in his defense, the cweave output was 20+ pages long
| and I think my classmates' programs were all a couple pages of C)
| but that it appeared to work.
| samatman wrote:
| Well I can relate to that kind of story, although I got it
| mostly out of my system by college.
|
| I took one CS class as well. I didn't read the syllabus
| carefully, and thus didn't realize that the lab was 10% of my
| grade, so I got a final score of 89%. I tried to argue that
| getting 99% of the rest of the course right should make up for
| the missing lab work, to no avail. This is how I ended up a
| chemistry major.
|
| Anyway, I think you would get a lot out of this essay. I did at
| least.
|
| http://www.marktarver.com/bipolar.html
| anyfoo wrote:
| Depends on the assignment, and can easily backfire.
|
| When I was TA'ing the CS intro class, which in my University
| was actually using a functional programming language (ML), I
| received a bunch of "working" programs from students who
| already knew how to program, but not with functional
| programming languages. They would force an imperative style
| into ML, which was not what the assignment was, and kind of
| showed that they must not have paid attention at all.
| dang wrote:
| Some past threads:
|
| _Main is usually a function, so when is it not? (2015)_ -
| https://news.ycombinator.com/item?id=15206198 - Sept 2017 (65
| comments)
|
| _Main is usually a function. So then when is it not?_ -
| https://news.ycombinator.com/item?id=12799637 - Oct 2016 (1
| comment)
|
| _Main is usually a function - when is it not?_ -
| https://news.ycombinator.com/item?id=8951283 - Jan 2015 (60
| comments)
| pcstl wrote:
| In Haskell, main is a _value_ , not a function. This is sometimes
| overlooked because in Haskell "values" can be thought of as
| 0-parameter functions and functions can be used as values, so
| everything sort of blends together, but I find it gives some
| fascinating insight into how lazy programming changes things up.
| chowells wrote:
| I like Haskell as much as anyone, but this is pretty non-
| responsive. "main is usually a function" is a gcc warning, and
| the whole joke is that "usually" leaves some pretty broad
| questions. This post answers some of them.
| shadowgovt wrote:
| Is there any particular utility to this trick, or is it just a
| neat side-effect of the linker and compiler being very permissive
| and treating something that most languages would call a
| compilation error as merely a warning?
| Jorengarenar wrote:
| Neat side effect
| rm445 wrote:
| Ken Thompson wrote a regex engine which compiled (at runtime)
| regexes into data structures containing executable machine
| code, and invoked them (from C source) by jumping into the data
| i.e. treating its location as a function pointer. That's what's
| happening here except it's the start code inserted by the
| linker which is jumping into main.
|
| So there's the utility, if you're hardcore enough to build
| machine code at runtime.
|
| If you wanted to abuse main() particularly, I guess you've got
| argc and argv in registers, and your hand-compiled main
| 'function' could maybe have some self-modifying code?
| dhosek wrote:
| I don't know that that would work since if the code is
| generated at runtime it would live in .data and not .text. At
| least for the architecture being targeted, you aren't allowed
| to create executable code at runtime like that (note that the
| original poster had to declare his main array as const to be
| able to have it in the .text segment.
| Sharlin wrote:
| Coercing a data pointer into a function pointer is
| undefined behavior in standard C (they don't even need to
| be the same size), but at least on POSIX platforms the
| compiler must do the right thing because `dlsym` depends on
| it working. Generating and executing native code at runtime
| is not _that_ special, mind; after all JIT compilers are
| ubiquitous these days!
| twoodfin wrote:
| Popularity of the no-execute NX bit significantly postdates
| Unix. As I recall, Microsoft only started flipping it on by
| default for the 64-bit Windows NT kernel, since so many
| preexisting 32-bit applications relied on self-modifying
| code.
| Animats wrote:
| In C and C++, "main" is special. Too special. For historical
| reasons, its argument and return types are not checked.
|
| I once argued on the C standard forum that a C compiler should
| not know about "main". "#include <unix.h>" should contain the
| usual Unix declaration for "main", and "#include <windows.h>"
| should contain the Windows declaration, which at the time was,
| roughly: int WINAPI wWinMain(HINSTANCE
| hInstance, HINSTANCE hPrevInstance, PWSTR pCmdLine, int
| nCmdShow);
|
| It's then up to the user to define their startup function to
| match, with normal type checking.
|
| This gets the compiler out of handling "main" as a special
| case.
|
| This was generally considered to be the right answer, but would
| break too much existing code.
| ornitorrincos wrote:
| to nitpick: windows also needs to provide a main function(why
| shouldn't?)
|
| but I agree that the fact that the standard allow for 2
| different declarations of main in a language without
| poliformism doesn't help.
|
| Not to start with the whole implementations are free to
| define extra entry points part.
| devit wrote:
| There is no special treatment of main in the major C
| compilers, the only "magic" thing the compiler does is
| including the CRT startup object file in the link, which
| defines _start as a function ultimately calling main, and
| having the default linker script set the address of "_start"
| as the executable entry point.
|
| You can pass -nostdlib to gcc to disable linking the CRT
| startup object (or use ld directly) and you can pass
| --default-script /dev/null to ld to disable the linker
| script.
|
| There is no need to declare main or check arguments or return
| types since in C arguments are both pushed and popped by the
| caller and the language provides no typing guarantees and
| thus there is no problem in calling functions with mismatched
| argument or return type declarations.
| mananaysiempre wrote:
| Not _quite_ true: there's the weird thing where gcc on i*86
| will align the stack on entry to a function called main but
| not any other. $ gcc -m32 -O2 -fno-pie
| -fno-asynchronous-unwind-tables -fomit-frame-pointer -S
| -masm=intel -xc -o - - int foo(void); int main(void)
| { return foo(); } ^D .file "<stdin>"
| .intel_syntax noprefix .text .section
| .text.startup,"ax",@progbits .p2align 4
| .globl main .type main, @function main:
| push ebp mov ebp, esp and esp, -16
| call foo leave ret .size main, .-main
| .ident "GCC: (GNU) 11.1.0" .section .note.GNU-
| stack,"",@progbits
|
| It doesn't do that if you set the historical stack
| alignment, though (-mpreferred-stack-boundary=2), or if you
| name the function anything else but main (it even does a
| tail call). Presumably it's trying to (somewhat) recover
| from the time when the GCC authors accidentally the SysV
| i386 ABI[1,2].
|
| [1]: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 [2]:
| https://stackoverflow.com/a/49397524
| anyfoo wrote:
| Yes there is. I've demonstrated this in a sibling (or
| rather, cousin) comment, but in short, you can happily not
| return a value in main even if its type is "int
| main(void)". Try that with another function, and the
| compiler should at least warn. This might not be a special
| case of code generation, but it is a special case of error
| handling at least.
| user5994461 wrote:
| On visual studio, the main() is not the entry point of the
| program.
|
| The entry point is automatically generated by the compiler,
| it calls a few functions depending on what the program does
| then calls the main, I think it had to do with initializing
| the standard library. You can see the stub using a debugger
| or a disassembler.
|
| It's possible to set the entry point point to any function
| name. See advanced project settings.
|
| Now about the arguments and return type. With main the caller
| is responsible for pushing arguments onto the stack before
| the call, then popping the stack after the call. the return
| code is in the EAX register if I remember well.
|
| Because of that, it doesn't matter what's the signature of
| the main, the invocation will work irrelevant of the
| arguments.
|
| People may ask what's the point of knowing any of this? One
| major use case is to write executable compressors like UPX.
| Another use case is to make a custom entry point written in
| assembler.
| anyfoo wrote:
| Not just in Visual Studio. main is usually (always?) not
| the entry point on unixoid systems either, that's much more
| likely to be _start, which calls main() down the line.
|
| Nevertheless, main _is_ treated specially by the compiler
| for the aforementioned historical reasons, for example to
| not warn /error out if it does not return a value despite
| the type clearly telling so.
|
| Observe:
|
| % echo 'int main(void) { }' > foo.c; clang -c foo.c
|
| <no output>
|
| % echo 'int foo(void) { }' > foo.c; clang -c foo.c
|
| foo.c:1:17: warning: non-void function does not return a
| value [-Wreturn-type]
|
| int foo(void) { } ^ 1 warning generated.
|
| As you can see, clang is happy to ignore the missing return
| value for main(), but not for foo().
| FranchuFranchu wrote:
| Why does he add semicolons at the end of the assembly lines?
| Taniwha wrote:
| They allow you to put multiple instructions on the same line -
| in this case you must either have ';'s or '\n's in the string -
| having both doesn't break stuff, I guess it's more belt and
| braces
| wtetzner wrote:
| Not sure. I guess it shouldn't hurt anything though, since
| semicolon is the comment character.
| Taniwha wrote:
| depends on the assembler, in some assemblers ';' allows you
| to put multiple instructions on the same line
| actually_a_dog wrote:
| They're not necessary, so I suspect some combination of reflex,
| consistency for consistency's sake, or, possibly that they were
| added automatically by his editor/IDE at some point.
| hawski wrote:
| Time passed. How was the assignment graded?
| labster wrote:
| If I was the TA in this class, I'd give it -5 (95 of 100) for
| "doesn't compile on my 486, please write more portable code"
| just to screw with the student.
| Sebb767 wrote:
| In my university, it would've probably received full points;
| reason being that people who pull such shenanigans usually
| don't see "hello world" as a challenge - assuming, of course,
| the author could explain it.
| ipython wrote:
| It's not hard to get the address of data in 32-bit addressing.
| You just interleave the data inside your assembly, something like
| the following (pseudo code I haven't done this in a while):
| ... call continue .db "Hello World!\n\0"
| continue: pop eax ...
|
| Since 'call' just turns into a 'push eip; jmp target'
| (simplified, sorry), the address of the string is now pushed onto
| the stack. Popping off the top, now eax contains the address of
| the string "Hello World!\n\0". Since in 32-bit ABI most
| parameters are passed on the stack, many times you don't even
| need to 'pop' the address off the stack, it'll just be part of
| your arguments to the function.
|
| Old school malware used this a lot to 1) run regardless of the
| memory base address it was loaded at and 2) confuse some
| disassemblers (you can use silly conditionals that are always
| true or false to control whether you execute the 'call'
| instruction or not, forcing the disassembler to try and
| 'disassemble' the string into valid x86 opcodes)
| zh3 wrote:
| More or less how Fortran works on PDP-11's.
| bregma wrote:
| In C (and C++) main is always a function with zero parameters,
| two parameters as described in detail, or some implementation-
| specific set of parameters. ANything else is undefined behaviour.
|
| In other words, you can write `main` as anything you want, and it
| might do something when presented to a C compiler, and that might
| do something when linked through the system linker, but it's not
| C code.
|
| Doing something that's by definition incorrect and having it
| maybe do something somewhere sometimes, or maybe not, is not
| really all that impressive when you think about it.
| samatman wrote:
| > _Doing something that 's by definition incorrect and having
| it maybe do something somewhere sometimes, or maybe not, is not
| really all that impressive when you think about it._
|
| See I would have called this _hacking_ , which here on _Hacker
| News_ is its own special kind of impressive.
| scintill76 wrote:
| You could make main() a standard C function that merely calls
| this machine code hack function.
| jpegqs wrote:
| Can be both, a string and a function: char
| main [/*x86*/] __attribute__
| ((section(".text")) )="WTYH)9Zj8_j7H)9]R"
| "H)9^\350\0\0\0\0H)1^R" "H))Z8<2u\366j<)9Xj9"
| ")9j9VY)<$[S_H\xbd^[" "H$@\xcd\200-XP\xf"
| "\5XP_j<W\xeb\xe2]" "Hello World!\n";
| /*Linux_Only!*/
| failwhaleshark wrote:
| Now make it a quine too. ;-)
| im3w1l wrote:
| Floats being more mysterious and intimidating than ints I prefer
|
| const float main[] = {-8.10373123e+22, 6.16571324e-43,
| 1.58918456e-40, -7.11823707e-31, 5.81398733e-42, 1.26058568e-39,
| 6.72382769e-36, 2.17817833e-41, 2.16139414e-29, 1.10873646e+27,
| 1.76400414e+14, 1.74467096e+22, -221.039566};
| jalbertoni wrote:
| Genuine question, can you be sure the conversion wouldn't
| introduce a wrong bit here or there? Maybe in a different
| architecture or something?
|
| I'm not that good with CPUs past 16 bits, this is really out of
| my comfort zone heh
| grishka wrote:
| You only depend on the compiler to interpret these floats
| correctly and generate their binary representation that
| decodes into valid instructions. As far as the CPU executing
| this code is concerned, it's machine code either way.
|
| > Maybe in a different architecture or something?
|
| Of course this isn't portable across CPU architectures,
| neither is it portable across operating systems due to at
| least ABI differences.
| banana_giraffe wrote:
| Unless I'm missing something, this code is already
| architecture dependent .. adding more architecture
| dependencies won't really hurt.
| bombcar wrote:
| I think the format for single precision and double is defined
| by the standard. Beyond that may be implementation dependent.
| _kst_ wrote:
| The format for floating-point is specified by the IEEE
| floating-point standard (or whatever it's officially called
| these days). C permits but does not require IEEE format.
| Most implementations these days use it.
| taneliv wrote:
| Better yet, try to find the corresponding ints (or maybe more
| realistically shorts or chars) from usual #include headers, and
| use the #define or const mnemonics for all numbers.
|
| Bonus points for finding them all in the same header file, or
| with like names, so as to give appearance of them actually
| meaning something in the context of the prank.
| monocasa wrote:
| I interestingly did this just the other day. I had a testing
| reason for main to consist of a single specific illegal
| instruction that I know the hex of anyway. It was less work for
| the system's Makefile to compile a .c rather than a .s file, and
| I knew everything I needed to make this trick work, but didn't
| know how to for sure disable function prologues for this arch.
|
| It's the first time I had a legtimate excuse to whip out this
| technique since seeing it in an ancient obfuscated C contest
| entry for the PDP-11 probably a decade ago. mullender.c I think?
| ltbarcly3 wrote:
| Well, I guess technically they could expel him because he had
| someone do part of the assignment for him.
| _kst_ wrote:
| One of the winners of the 1st International Obfuscated C Code
| Contest (1984) used this technique.
|
| https://www.ioccc.org/1984/mullender/mullender.c
|
| https://www.ioccc.org/1984/mullender/hint.text
| short main[] = { 277, 04735, -4129, 25, 0, 477,
| 1019, 0xbef, 0, 12800, -113, 21119, 0x52d7,
| -1006, -7151, 0, 0x4bc, 020004, 14880, 10541,
| 2056, 04010, 4548, 3044, -6716, 0x9, 4407, 6,
| 5568, 1, -30460, 0, 0x9, 5570, 512, -30419,
| 0x7e82, 0760, 6, 0, 4, 02400, 15, 0, 4, 1280, 4, 0,
| 4, 0, 0, 0, 0x8, 0, 4, 0, ',', 0, 12, 0, 4, 0, '#',
| 0, 020, 0, 4, 0, 30, 0, 026, 0, 0x6176, 120, 25712,
| 'p', 072163, 'r', 29303, 29801, 'e' };
| alin23 wrote:
| That's already mentioned in the third paragraph but it's nice
| of you to also include the bytecode in the comment.
|
| > Apparently in 1984, a strange program won the IOCCC where
| main was declared as a short main[] = {...} and somehow this
| did stuff and printed to the screen!
| ianhanschen wrote:
| Barely mentioned, and the author makes us all die inside when
| they say "Too bad it was written for a whole different
| architecture and compiler so there is really no easy way for
| me to find out what it did."
| _kst_ wrote:
| As the hint explains, it's a combination of PDP-11 and VAX
| machine code, set up so that either system will run its own
| code and ignore the foreign code.
|
| You can extract a few ASCII strings from the data. As the
| hint says: "Can you guess what is printed? We knew you
| couldn't! :-)"
|
| The ASCII strings I found were "vax", "pdp", "str",
| "write", and " :-)".
| _kst_ wrote:
| The C standard requires main to be defined as a function, but
| failure to do so is not a constraint violation, so no diagnostic
| is required. If you define it as something else, the behavior is
| undefined.
|
| A conforming C compiler could reject a program that defines main
| as an array, but is not required to do so.
|
| gcc doesn't complain by default, but with warning enabled it says
| "warning: 'main' is usually a function".
___________________________________________________________________
(page generated 2021-06-14 23:00 UTC)