[HN Gopher] Demystifying the (Shebang): Kernel Adventures
       ___________________________________________________________________
        
       Demystifying the (Shebang): Kernel Adventures
        
       Author : thunderbong
       Score  : 69 points
       Date   : 2025-04-10 18:21 UTC (4 hours ago)
        
 (HTM) web link (crocidb.com)
 (TXT) w3m dump (crocidb.com)
        
       | kazinator wrote:
       | Fun fact: you can stick a null byte into the shebang line to
       | terminate it, as an alterantive to the newline.
       | 
       | It's possible to have a scripting language support extra command
       | line arguments after the null byte, which is less disruptive to
       | the syntax than recognizing arguments from a second line.
       | 
       | I.e.                 #!/path/to/interpreter --arg<NUL>--more
       | --args<LF>
       | 
       | Or                 #!/usr/bin/env interpreter<NUL>--all
       | --args<LF>
       | 
       | On some OS's, you only get one arg: everything after the space,
       | to the end of the line, is one argument.
       | 
       | When we stick a <NUL> there, that argument stops there; but our
       | interpreter can read the whole line including the <NUL> up to the
       | <LF> and then extract additional arguments between <NUL> and <LF>
       | 
       | https://www.nongnu.org/txr/txr-manpage.html#N-74C247FD
       | 
       | The interpreter could get the arguments in other ways, like from
       | a second line after the hash bang line. But with the null hack,
       | all the processing revolves around just the one hash bang line.
       | You can retrofit this logic into an interpreter that already
       | knows how to ignore the hash bang line, without doing any work
       | beyond getting it to load the line properly with the embedded
       | nul, and extract the arguments. You dont have to alter the syntax
       | to specially recognize a hash bang continuation line.
        
         | CalChris wrote:
         | Less fun fact: you can't substitute a <cr><nl> for <nl>.
         | 
         | I had a Perl script (way) back in the day that came from a
         | Windows system and it wouldn't work on Linux. After I figured
         | out <cr><nl> was causing the problem, I figured it out what
         | bin_script (might have been in bin_misc) was doing wrong.
         | bin_script sees "/bin/perl<cr>" and then fails to find that
         | interpreter.
         | 
         | So I proposed a one line change which fixed the glitch and
         | posted it to LKML ... and promptly got yelled at by Allan Cox
         | for breaking compatibility. I dunno if the null byte breaks the
         | same compatibility. Chapter and verse weren't cited.
        
           | kazinator wrote:
           | Null _de facto_ works, and it 's almost certainly due to a
           | consequence of the kernel treating the result of extracting
           | the argument as a C string. For instance, it might actually
           | be scanning past the NUL and earnestly finding the newline.
           | Even if that entire datum is copied into the argument vector
           | and passed to the interpreter. the interpreter will only see
           | the argument up to the null terminator, due to it being a C
           | string.
           | 
           | About the only way it could break would be if the kernel used
           | a string function to look for the newline, like a range-
           | limited form of strchr, and then aborted the hash bang
           | dispatch with an error upon not finding the newline, rather
           | than accepting that the argument is delimited by a null.
           | 
           | I tested it on various platforms like MacOS, Solaris, some
           | BSDs, Cygwin, Linux. Far from exhaustive but a good coverage
           | of the modern desktop and server landscape.
           | 
           | The null byte would have fixed your Perl script without
           | having to convert the line endings; the argument would have
           | been delimited, in spite of the line ending in <CR><LF>.
        
       | davis wrote:
       | Articles like this are just such a delight. History + common
       | software + code snippets is a great combo
        
       | Imustaskforhelp wrote:
       | Read your article, it's really nice. I really feel much less
       | demystified by this.
       | 
       | But can you / somebody please explain what this means
       | 
       | According to the official Kernel Admin Guide:
       | 
       | This Kernel feature allows you to invoke almost (for restrictions
       | see below) every program by simply typing its name in the shell.
       | This includes for example compiled Java(TM), Python or Emacs
       | programs. To achieve this you must tell binfmt_misc which
       | interpreter has to be invoked with which binary. Binfmt_misc
       | recognises the binary-type by matching some bytes at the
       | beginning of the file with a magic byte sequence (masking out
       | specified bits) you have supplied. Binfmt_misc can also recognise
       | a filename extension aka .com or .exe.
       | 
       | It's another way to tell the Kernel what interpreter to run when
       | invoking a program that's not native (ELF). For scripts (text
       | files) we mostly use a shebang, but for byte-coded binaries, such
       | as Java's JAR or Mono EXE files, it's the way to go!
       | 
       | Like, can you give me an example by what you mean. What are its
       | use cases, if any. I read it many times and always with some sort
       | of enthusiasm because of this sentence ending in exclamation
       | point making me feel like it's huge yet I just can't understand
       | it's significance.
       | 
       | Does it mean we can have .jar files which can then run shebang
       | like, so we don't need #! , can this also be used for main.go or
       | every other language which has some issues with #! ,
       | 
       | I see there being some interpreter for golang, rust etc. which
       | just compiles it but it was just too complex. I am just imagining
       | something like a simple go file which is valid golang but can be
       | run by linux simply by ./ And it autocompiles it...
        
         | ckatri wrote:
         | The best and most common uses for this are Wine and qemu-
         | static.
         | 
         | For example, the following (which I grabbed from Wikipedia)
         | `:DOSWin:M::MZ::/usr/bin/wine:` will register `/usr/bin/wine`
         | to run as the wrapper for any .exe that gets executed, with no
         | extra config needed. It simply sees that you tried to run a PE
         | file and will run it in wine.
        
         | pkaye wrote:
         | Yes you can use binfmt_misc to allow arbitrary executable file
         | format to be passed to an interpreter matched either by a
         | filename extension or a magic number at a specific offset
         | within the executable.
         | 
         | https://en.wikipedia.org/wiki/Binfmt_misc
        
       | spudlyo wrote:
       | If you found this article interesting, you might also enjoy "My
       | Own Private Binary: An Idiosyncratic Introduction to Linux Kernel
       | Modules"[0] and the previous discussion[1] of it on HN.
       | 
       | [0]: https://www.muppetlabs.com/~breadbox/txt/mopb.html
       | 
       | [1]: https://news.ycombinator.com/item?id=29291804
        
       | lelandfe wrote:
       | > _Since I never remember which one is which, a good way to check
       | is using the utility `file`: `file $(which useradd)`_
       | 
       | While we're here, can someone explain why `which` prints some
       | locations, and for others the whole darn file? Like `which npm`
       | prints the location; `which nvm` prints the whole darn file.
        
         | YardenZamir wrote:
         | I can't say the reason, but i can note the pattern. If it's
         | something in your path, like a program or a script which will
         | show you where it is. If it's a shell function sourced you will
         | see the whole thing.
         | 
         | If you write a function in your current session for example
         | which will show you the content of that command. If you write
         | that command in a file and put that file in your path which
         | will show you where it is
        
         | awbraunstein wrote:
         | `nvm` isn't a file, it is a bash function defined in some file
         | (likely ~/.nvm/nvm.sh). So when you say `which nvm` it prints
         | out the definition of the `nvm` function. This is setup when
         | you added something like:                   export
         | NVM_DIR="$HOME/.nvm"         [ -s "$NVM_DIR/nvm.sh" ] && \.
         | "$NVM_DIR/nvm.sh"  # This loads nvm         [ -s
         | "$NVM_DIR/bash_completion" ] && \.
         | "$NVM_DIR/bash_completion"  # This loads nvm bash_completion
         | 
         | to your bashrc.
        
         | AlienRobot wrote:
         | That sounds odd. Try using command -v instead?
        
         | cstrahan wrote:
         | Are you sure that what is being printed is the contents of a
         | file? And which shell are you using?
         | 
         | If your which command is a shell builtin, and nvm is a
         | function, then you're likely seeing the content of that
         | function.
        
         | adrianmonk wrote:
         | For this situation, in bash, use 'type nvm' (instead of 'which
         | nvm' or 'file nvm'). It will tell you what 'nvm' is
         | (executable, shell alias, shell function, shell builtin, etc.),
         | which will probably solve the mystery.
        
       | amelius wrote:
       | How do I fix my kernel so that I can use the setuid bit with
       | shebang?
        
       ___________________________________________________________________
       (page generated 2025-04-10 23:00 UTC)