[HN Gopher] Demystifying the (Shebang): Kernel Adventures
___________________________________________________________________
Demystifying the (Shebang): Kernel Adventures
Author : thunderbong
Score : 69 points
Date : 2025-04-10 18:21 UTC (4 hours ago)
(HTM) web link (crocidb.com)
(TXT) w3m dump (crocidb.com)
| kazinator wrote:
| Fun fact: you can stick a null byte into the shebang line to
| terminate it, as an alterantive to the newline.
|
| It's possible to have a scripting language support extra command
| line arguments after the null byte, which is less disruptive to
| the syntax than recognizing arguments from a second line.
|
| I.e. #!/path/to/interpreter --arg<NUL>--more
| --args<LF>
|
| Or #!/usr/bin/env interpreter<NUL>--all
| --args<LF>
|
| On some OS's, you only get one arg: everything after the space,
| to the end of the line, is one argument.
|
| When we stick a <NUL> there, that argument stops there; but our
| interpreter can read the whole line including the <NUL> up to the
| <LF> and then extract additional arguments between <NUL> and <LF>
|
| https://www.nongnu.org/txr/txr-manpage.html#N-74C247FD
|
| The interpreter could get the arguments in other ways, like from
| a second line after the hash bang line. But with the null hack,
| all the processing revolves around just the one hash bang line.
| You can retrofit this logic into an interpreter that already
| knows how to ignore the hash bang line, without doing any work
| beyond getting it to load the line properly with the embedded
| nul, and extract the arguments. You dont have to alter the syntax
| to specially recognize a hash bang continuation line.
| CalChris wrote:
| Less fun fact: you can't substitute a <cr><nl> for <nl>.
|
| I had a Perl script (way) back in the day that came from a
| Windows system and it wouldn't work on Linux. After I figured
| out <cr><nl> was causing the problem, I figured it out what
| bin_script (might have been in bin_misc) was doing wrong.
| bin_script sees "/bin/perl<cr>" and then fails to find that
| interpreter.
|
| So I proposed a one line change which fixed the glitch and
| posted it to LKML ... and promptly got yelled at by Allan Cox
| for breaking compatibility. I dunno if the null byte breaks the
| same compatibility. Chapter and verse weren't cited.
| kazinator wrote:
| Null _de facto_ works, and it 's almost certainly due to a
| consequence of the kernel treating the result of extracting
| the argument as a C string. For instance, it might actually
| be scanning past the NUL and earnestly finding the newline.
| Even if that entire datum is copied into the argument vector
| and passed to the interpreter. the interpreter will only see
| the argument up to the null terminator, due to it being a C
| string.
|
| About the only way it could break would be if the kernel used
| a string function to look for the newline, like a range-
| limited form of strchr, and then aborted the hash bang
| dispatch with an error upon not finding the newline, rather
| than accepting that the argument is delimited by a null.
|
| I tested it on various platforms like MacOS, Solaris, some
| BSDs, Cygwin, Linux. Far from exhaustive but a good coverage
| of the modern desktop and server landscape.
|
| The null byte would have fixed your Perl script without
| having to convert the line endings; the argument would have
| been delimited, in spite of the line ending in <CR><LF>.
| davis wrote:
| Articles like this are just such a delight. History + common
| software + code snippets is a great combo
| Imustaskforhelp wrote:
| Read your article, it's really nice. I really feel much less
| demystified by this.
|
| But can you / somebody please explain what this means
|
| According to the official Kernel Admin Guide:
|
| This Kernel feature allows you to invoke almost (for restrictions
| see below) every program by simply typing its name in the shell.
| This includes for example compiled Java(TM), Python or Emacs
| programs. To achieve this you must tell binfmt_misc which
| interpreter has to be invoked with which binary. Binfmt_misc
| recognises the binary-type by matching some bytes at the
| beginning of the file with a magic byte sequence (masking out
| specified bits) you have supplied. Binfmt_misc can also recognise
| a filename extension aka .com or .exe.
|
| It's another way to tell the Kernel what interpreter to run when
| invoking a program that's not native (ELF). For scripts (text
| files) we mostly use a shebang, but for byte-coded binaries, such
| as Java's JAR or Mono EXE files, it's the way to go!
|
| Like, can you give me an example by what you mean. What are its
| use cases, if any. I read it many times and always with some sort
| of enthusiasm because of this sentence ending in exclamation
| point making me feel like it's huge yet I just can't understand
| it's significance.
|
| Does it mean we can have .jar files which can then run shebang
| like, so we don't need #! , can this also be used for main.go or
| every other language which has some issues with #! ,
|
| I see there being some interpreter for golang, rust etc. which
| just compiles it but it was just too complex. I am just imagining
| something like a simple go file which is valid golang but can be
| run by linux simply by ./ And it autocompiles it...
| ckatri wrote:
| The best and most common uses for this are Wine and qemu-
| static.
|
| For example, the following (which I grabbed from Wikipedia)
| `:DOSWin:M::MZ::/usr/bin/wine:` will register `/usr/bin/wine`
| to run as the wrapper for any .exe that gets executed, with no
| extra config needed. It simply sees that you tried to run a PE
| file and will run it in wine.
| pkaye wrote:
| Yes you can use binfmt_misc to allow arbitrary executable file
| format to be passed to an interpreter matched either by a
| filename extension or a magic number at a specific offset
| within the executable.
|
| https://en.wikipedia.org/wiki/Binfmt_misc
| spudlyo wrote:
| If you found this article interesting, you might also enjoy "My
| Own Private Binary: An Idiosyncratic Introduction to Linux Kernel
| Modules"[0] and the previous discussion[1] of it on HN.
|
| [0]: https://www.muppetlabs.com/~breadbox/txt/mopb.html
|
| [1]: https://news.ycombinator.com/item?id=29291804
| lelandfe wrote:
| > _Since I never remember which one is which, a good way to check
| is using the utility `file`: `file $(which useradd)`_
|
| While we're here, can someone explain why `which` prints some
| locations, and for others the whole darn file? Like `which npm`
| prints the location; `which nvm` prints the whole darn file.
| YardenZamir wrote:
| I can't say the reason, but i can note the pattern. If it's
| something in your path, like a program or a script which will
| show you where it is. If it's a shell function sourced you will
| see the whole thing.
|
| If you write a function in your current session for example
| which will show you the content of that command. If you write
| that command in a file and put that file in your path which
| will show you where it is
| awbraunstein wrote:
| `nvm` isn't a file, it is a bash function defined in some file
| (likely ~/.nvm/nvm.sh). So when you say `which nvm` it prints
| out the definition of the `nvm` function. This is setup when
| you added something like: export
| NVM_DIR="$HOME/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && \.
| "$NVM_DIR/nvm.sh" # This loads nvm [ -s
| "$NVM_DIR/bash_completion" ] && \.
| "$NVM_DIR/bash_completion" # This loads nvm bash_completion
|
| to your bashrc.
| AlienRobot wrote:
| That sounds odd. Try using command -v instead?
| cstrahan wrote:
| Are you sure that what is being printed is the contents of a
| file? And which shell are you using?
|
| If your which command is a shell builtin, and nvm is a
| function, then you're likely seeing the content of that
| function.
| adrianmonk wrote:
| For this situation, in bash, use 'type nvm' (instead of 'which
| nvm' or 'file nvm'). It will tell you what 'nvm' is
| (executable, shell alias, shell function, shell builtin, etc.),
| which will probably solve the mystery.
| amelius wrote:
| How do I fix my kernel so that I can use the setuid bit with
| shebang?
___________________________________________________________________
(page generated 2025-04-10 23:00 UTC)