[HN Gopher] Handling argc==0 in the Linux kernel
       ___________________________________________________________________
        
       Handling argc==0 in the Linux kernel
        
       Author : rwmj
       Score  : 151 points
       Date   : 2022-02-03 10:01 UTC (1 days ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | eriksjolund wrote:
       | A few years ago I wrote a small program and script that search
       | for executables that crash when they are executed with argc==0
       | https://github.com/eriksjolund/empty-argv-segfault-check
        
       | ezekiel68 wrote:
       | It's amusing to me that some people seem to think it's possible
       | to create "secure computing" with no footguns whatsoever. A
       | surprising amount of the infrastructure upon which we write
       | computer programs (including a kernel and its extended OS) relies
       | upon "mere" conventions. Indeed one of the lowest levels of all
       | is named the "calling convention". Software which breaks these
       | conventions, yet somehow "runs" or functions to some extent
       | before causing problems is broken software, period. It is not a
       | cool hack or an interesting lead for further research. Any hand-
       | wringing about it whatsoever is a waste of time.
       | 
       | If some individuals scale the protective barrier and jump to
       | their deaths in the Grand Canyon, this does not indicate any
       | deficiency in the protective barrier. It was not (and should not)
       | be designed to prevent such misuse.
        
       | netr0ute wrote:
       | How did polkit even fall victim to this? I sometimes feel
       | hesitant to include an argv[0] in my execvX functions because I
       | know that it's only "a convention" and not an enforced rule, and
       | no reasonable program should expect it to be there.
        
         | josephcsible wrote:
         | I wrote a summary here:
         | https://news.ycombinator.com/item?id=30091667
         | 
         | The tl;dr is that pkexec writes to argv, and envp comes right
         | after argv on the stack, so a missing check caused it to
         | accidentally write to envp when argc == 0.
        
       | charcircuit wrote:
       | argc has always been able to be 0 and I've always handled that
       | case
       | 
       | It's not the kernels job to change this just because some
       | developers don't understand it may be 0.
        
       | dmatech wrote:
       | It seems abundantly clear that programs are allowed to pass an
       | empty list as argv to execve, although these programs might not
       | be POSIX-compliant. As for the "main" function, argc is only
       | guaranteed to be nonnegative. A well-written program therefore
       | _cannot_ assume that argc is not zero. Additionally, there is no
       | guarantee that a POSIX-compliant program will be called by
       | another POSIX-compliant program.
       | 
       | So it looks like pkexec made the mistake here by not carefully
       | adhering to the standard. One can argue that argc==0 is a bad
       | idea, and it is. But it's technically legal, and changing this
       | would break userspace programs.
       | 
       | 1.
       | https://pubs.opengroup.org/onlinepubs/9699919799/functions/e...
       | 
       | 2. https://eel.is/c++draft/basic.start.main
        
         | tptacek wrote:
         | Obviously Linux programs _can_ pass an empty list to argv,
         | since that 's how we got the pkexec vulnerability.
        
         | stefan_ wrote:
         | argc = 0 is a bad idea? When did we enter into this insane
         | parallel universe?
         | 
         | When I first came across (argc, argv) a long long time ago I
         | was befuddled by the choice of having argv[0] be something
         | vaguely related to your binary, maybe the path to it. That
         | seemed a very weird choice seeing how most programs have no use
         | for that and now all the arguments you actually care about are
         | shifted.
         | 
         | So now you have a patch to ensure argc is always 1. Only of
         | course if you accept that there are programs that misbehave
         | with argc=0, surely you must also accept there are programs
         | that misbehave if argv[0] is not the program path? This is
         | after all the convention that got you into this misery in the
         | first place!
         | 
         | That's the moment the stupidity fully sets in. So now you write
         | a patch to make sure argv[0] is really the path! Except just as
         | we have discovered with this innocuous patch, you could write a
         | dissertation on the question of "what _exactly_ is meant with
         | the program path " alone. And that really whoever was relying
         | on some specific interpretation on the value of argv[0] would
         | have been much better off with some other syscall to tell him
         | what he was after.
        
           | Someone wrote:
           | > _When I first came across (argc, argv) a long long time ago
           | I was befuddled by the choice of having argv[0] be something
           | vaguely related to your binary, maybe the path to it._
           | 
           | I've seen a Unix (?maybe HP-UX?) that seemed to construct the
           | _argv_ strings by taking a copy of the entire command line
           | and writing zero bytes at the end of each argument.
           | 
           | I don't think that's too weird. Why do multiple allocations
           | to make room for the strings if you know a reasonably tight
           | upper limit to the number of bytes you need to store all of
           | them? (Reasonably tight because the command line _could_
           | contain consecutive spaces and because of backspaces and
           | quotes around arguments)
           | 
           | On that OS, _argv* strings also were writable, so programs
           | could even replace those zeroes by spaces to, in many cases
           | (I don't remember how that worked with quoted arguments), get
           | back the command line used to launch the program:
           | for(i=1; i<argc; ++i) argv[i][-1] = ' ';       printf("%s\n",
           | argv[0]);
           | 
           | (Yes, on top of, AFAIK, not being guaranteed to work by any
           | standard, that's undefined behavior in C, but it did work on
           | that OS)
           | 
           | So, I guess passing something resembling the executable's
           | name was something they got for free. It might even
           | originally have been a bug, for all I know._
        
       | kortex wrote:
       | So let me paraphrase to see if I understand the context
       | correctly:
       | 
       | - the only guaranteeS are argc>=0, argv[argc] == NULL, and envp
       | comes right after argv
       | 
       | - argc is _almost always_ >= 1
       | 
       | - a widely used sbin (polkit) just assumes argv is always at
       | least 1 element and so iterates starting at argv[1] (without
       | checking argc)
       | 
       | - said polkit also mutates argv, so if it is fed with argc
       | actually zero, it actually mutates envp
       | 
       | - some devs want a patch to the kernel to prevent calling execve
       | with argc==0, (which probably should have been in the original
       | spec, but that ship has sailed) as a way of preventing other
       | similar footguns.
       | 
       | - other devs are like "that's gonna potentially break userspace"
       | which is a valid concerne
       | 
       | Is that more or less it?
        
         | AdamJacobMuller wrote:
         | aside from "that ship has sailed" yes.
         | 
         | The debate seems to be if they can actually do it safely or
         | not, and from the article it seems like they think they can
         | actually do it safely.
        
         | MrStonedOne wrote:
        
         | johnny22 wrote:
         | wasn't it pkexec specifically, not polkit generally that was
         | the problem?
        
       | SiebenHeaven wrote:
       | "We do not break the userland"
        
         | kardos wrote:
         | Indeed. But a sysctl to switch the patch on/off seems like the
         | pragmatic solution
        
           | josefx wrote:
           | That isn't pragmatic, it silently breaks programs that rely
           | on specified behavior just to fix one of many self inflicted
           | security issues polkit had over the last decade.
        
             | kardos wrote:
             | The sysctl can have three settings: 0 to do nothing, 1 to
             | emit a warning, 2 to fully enable the patch that blocks
             | argc=0. Use 1 by default as not to break userspace, let
             | people opt-in to 2 for the additional security
        
             | TheRealPomax wrote:
             | Which is fine:
             | https://news.ycombinator.com/item?id=30208963 is pretty on
             | the money here. Patch this behaviour, and fix the extremely
             | low number of offending applications concurrently.
        
             | colejohnson66 wrote:
             | What specified behavior?
        
               | josefx wrote:
               | Posix apparently explicitly allows calling programs with
               | an empty argv, so it isn't just a Linux implementation
               | detail Polkit failed to handle.
        
         | [deleted]
        
       | dtech wrote:
       | I'm continually (positively) amazed at the commitment to
       | backwards compatibility from the Linux kernel team and
       | continually abhorred by the number of footguns in C/POSIX/Linux.
        
         | mschuster91 wrote:
         | At least for the old POSIX stuff this is sort of excusable...
         | it operated under very much more relaxed assumptions, primarily
         | that people could trust each other to a somewhat large degree.
         | 
         | Obviously, the rise of mass-population Internet access and
         | financial (or "lulz") incentives for hackers changed the game,
         | and we're still paying for all the decisions to be backwards-
         | compatible, even decades later.
        
           | pjmlp wrote:
           | The Morris worm already shown what the future would look like
           | in the early 80's.
        
           | [deleted]
        
         | bch wrote:
         | I think the spec has not meaningfully changed in this regard
         | since C89, and there's a case of collective false memories[0]
         | happening here.
         | 
         | From a copy[1] of C89 I see the relevant description as:
         | * The value of argc shall be nonnegative.            *
         | argv[argc] shall be a null pointer.
         | 
         | So the spec is coherent and explained plainly.
         | Developers/application code just needs to be mindful of
         | what/how it's dereferencing.
         | 
         | [0] https://www.discovermagazine.com/mind/collective-false-
         | memor...
         | 
         | [1] http://port70.net/~nsz/c/c89/c89-draft.html
        
           | jandrese wrote:
           | 0 is a nonnegative number.
        
             | bch wrote:
             | > 0 is a nonnegative number.
             | 
             | Correct. So it needs to be accounted for.
        
           | dtech wrote:
           | False memories? What are you blabbering about.
           | 
           | Just because it's in the spec doesn't mean it's not a
           | footgun. If something is in the spec and it blows up in you
           | face if you're not careful that's by definition a footgun.
        
           | [deleted]
        
           | johncolanduoni wrote:
           | Just because it's in a spec, doesn't mean it's not a footgun.
           | No "collective false memories" required.
        
             | inetknght wrote:
             | There's no footgun here except for apps which make false
             | assumptions about argc and/or argv.
        
               | tremon wrote:
               | Indeed, that's the footgun.
        
               | inetknght wrote:
               | At that point using C is the footgun.
               | 
               | It's always been the case that argc is not-negative which
               | means it could be zero. It's always been the case that
               | argv[argc] is null. People who assume otherwise are poor
               | C developers.
        
               | zucker42 wrote:
               | Yes, C is full of footguns; that's one fact that has
               | motivated the creation of alternate systems programming
               | languages.
               | 
               | I think your definition of footgun is different from mine
               | and that of the commenters you're responding to (and how
               | I've generally seen it used).
        
           | tedunangst wrote:
           | There is some additional commentary in the rationale section
           | of posix.
           | 
           | https://pubs.opengroup.org/onlinepubs/9699919799/functions/e.
           | ..
        
             | bch wrote:
             | This is pretty interesting - some tension between what C
             | guarantees and what _Strictly Conforming POSIX_ requires.
             | I'm not saying there's no confusion in the real world (it's
             | obvious there is). Is Postel's Law                  Be
             | conservative in what you send; be liberal in what you
             | accept.
             | 
             | relevant here? I'm not sure.
             | 
             | The interesting looking parts of @tedunangst linked
             | document are:                  The argument argv is an
             | array of character pointers to null-terminated strings. The
             | application shall ensure that the last member of this array
             | is a null pointer. These strings shall constitute the
             | argument list available to the new process image. The value
             | in argv[0] should point to a filename string that is
             | associated with the process being started by one of the
             | exec functions.
             | 
             | [...]                  Early proposals required that the
             | value of argc passed to main() be "one or greater". This
             | was driven by the same requirement in drafts of the ISO C
             | standard. In fact, historical implementations have passed a
             | value of zero when no arguments are supplied to the caller
             | of the exec functions. This requirement was removed from
             | the ISO C standard and subsequently removed from this
             | volume of POSIX.1-2017 as well. The wording, in particular
             | the use of the word should, requires a Strictly Conforming
             | POSIX Application to pass at least one argument to the exec
             | function, thus guaranteeing that argc be one or greater
             | when invoked by such an application. In fact, this is good
             | practice, since many existing applications reference
             | argv[0] without first checking the value of argc.
             | The requirement on a Strictly Conforming POSIX Application
             | also states that the value passed as the first argument be
             | a filename string associated with the process being
             | started. Although some existing applications pass a
             | pathname rather than a filename string in some
             | circumstances, a filename string is more generally useful,
             | since the common usage of argv[0] is in printing
             | diagnostics. In some cases the filename passed is not the
             | actual filename of the file; for example, many
             | implementations of the login utility use a convention of
             | prefixing a <hyphen-minus> ( '-' ) to the actual filename,
             | which indicates to the command interpreter being invoked
             | that it is a "login shell".
        
         | im3w1l wrote:
         | I don't think the behavior is wrong per se. Rather the issue is
         | that it was tribal knowledge that the way you got the name of
         | your binary was through argv[0]. And that this misinformation
         | spread unchecked.
        
       | josephcsible wrote:
       | Anything the kernel does to try to mitigate this would break
       | userspace. It's one thing to do that when necessary to fix a
       | kernel vulnerability, but I don't like the idea of breaking
       | userspace just to try to work around some buggy userspace
       | programs.
        
         | chippiewill wrote:
         | I'll be very interested to see if Linus bites at all.
         | 
         | This is probably the most sympathetic I've been to the Kernel
         | breaking userspace programs that rely on bad behaviour, but
         | Linus has been pretty clear on the matter in the past.
         | 
         | > WE DO NOT BREAK USERSPACE!
         | 
         | https://lkml.org/lkml/2012/12/23/75
        
           | Arnavion wrote:
           | Eh, Linux has broken userspace in the past. Twice for me
           | personally in the last ~10 years, one of which could've
           | resulted in hardware damage if the stars had aligned slightly
           | differently :D I wouldn't count it out.
        
         | phendrenad2 wrote:
         | They should take Windows 95 approach and try to detect programs
         | that need special behavior, perhaps even include a list of
         | binary names. This goes against the ethos of clean elegant
         | solutions that the FOSS world likes, but at some point we have
         | to realize that the kernel is enterprise code, and we need
         | enterprise solutions sometimes (read: hacky).
        
           | rwmj wrote:
           | In open source this isn't necessary since you can just go in
           | and fix the broken programs and not carry deadweight
           | workarounds forever. And some don't care too much about
           | broken closed source programs.
        
             | pjmlp wrote:
             | I would bet IBM and Red-Hat care a lot.
        
               | rwmj wrote:
               | Red Hat works with software partners to port and fix
               | their software for Linux - in fact I was doing some of
               | that exact activity earlier today, helping to port a
               | program for a partner to GCC 12 (combined with hardening
               | flags, it breaks a bunch of previously working C/C++
               | code).
        
             | phendrenad2 wrote:
             | You're exactly right. However, not all programs running on
             | Linux are FOSS. And while you may not care about broken
             | closed-source programs, you can't really say "we don't
             | care" because you don't speak for the entire FOSS
             | community, and I think if you look at it again, you'll find
             | that the FOSS community DOES care (in actions not so much
             | in words) about corporate sponsorship, especially for the
             | Linux kernel.
        
         | josefx wrote:
         | Only way to completely fix buggy user land is to error on any
         | attempt to execute a program. Before this polkit had an
         | authentication bypass last year, trusted user supplied ids
         | around 2018^1, ... . The list goes on.
         | 
         | ^1 which ironically enough was an attempt to avoid racy uid
         | checks on setuid binaries.
         | 
         | https://gitlab.freedesktop.org/polkit/polkit/-/commits/maste...
        
         | 10000truths wrote:
         | Yeah, I'm surprised that this patch is getting traction at all.
         | If it's a bug in the application, it should be fixed in the
         | application. The kernel shouldn't be changing the kernel-
         | userspace interface to work around userspace bugs.
         | 
         | There are plenty of bespoke binaries out there with no/lost
         | source code that won't get the benefit of scrutiny from the
         | kernel dev team. I certainly wouldn't assume that none of them
         | invoke execve with a NULL or 0-item arg parameter, and merging
         | this patch would break them without recourse.
        
           | jandrese wrote:
           | Given that this behavior is already disallowed (and faulting)
           | on BSD and at least some commercial Unix flavors there
           | shouldn't be too much software out there that relies on it.
           | 
           | I would lean towards making it an API error BSD style and
           | taking breakage on a case by case basis. I'm not the person
           | who would be called up however, so it's easy for me to say.
           | 
           | Worst case you can create a sysctl variable to enable/disable
           | the check for those corner cases where it can't be fixed and
           | other precautions can be taken against the security
           | vulnerability.
        
             | josephcsible wrote:
             | > Given that this behavior is already disallowed (and
             | faulting) on BSD and at least some commercial Unix flavors
             | there shouldn't be too much software out there that relies
             | on it.
             | 
             | How much commercial software targets UNIX these days?
        
               | pjmlp wrote:
               | Enough to keep sales of AIX, Solaris, mainframes and
               | several POSIX based RTOS going.
               | 
               | And that little one named macOS as well.
        
           | kelnos wrote:
           | Platform APIs should be designed in such a way that it is
           | easy to use them securely. That's why we don't use strcpy()
           | and friends anymore. (Yes, I know, not quite the same, as you
           | can mitigate this with a new function and implore people not
           | to use the old one.)
           | 
           | Sure, the pkexec bug is now fixed, but I expect we'll see
           | another similar one eventually. I know I personally have
           | written a lot of C programs that assume argc>0 and
           | argv!=NULL, and I'm sure there are thousands of such
           | programs.
           | 
           | I think returning EINVAL on exec in this case may be too
           | much. I think the right move is probably for the kernel to
           | just fix things up when exec is called in this way, so argv
           | will always have at least one element, and that element 0
           | won't be NULL. I think any program that would break with that
           | change is something I'd be able to live with.
        
             | dmatech wrote:
             | It's very easy to use argc/argv securely as they currently
             | are. POSIX clearly states that argc can be zero, but polkit
             | didn't know that for some reason. When you're writing
             | privileged software, you can't make assumptions about user
             | input (and the specs) like this.
             | 
             | I get that the vulnerability could have been prevented if
             | Linux deviated from the spec like some of the BSDs might
             | have, but we shouldn't make it the responsibility of kernel
             | developers to make logic errors in user applications less
             | likely.
        
           | takeda wrote:
           | Unlike other bugs, this change shouldn't break any
           | application that is being called. argc >= 1 is the normal use
           | case.
           | 
           | So this would break anything that calls execve() with
           | argc==0. Is such call providing any functionality?
           | 
           | In addition to returning EINVAL being produced by not well
           | written callers, another option would be to force argc==0 and
           | fill up argv[0] with process name. That fix would make
           | everything continue to work.
        
             | yuubi wrote:
             | The ldd program that lists the shared libraries used by an
             | executable at least used to exec with an empty argv, which
             | the dynamic linker detected and listed libraries instead of
             | linking and running. I learned that when I screwed up an
             | inetd configuration and saw a list of libraries instead of
             | the desired results.
             | 
             | Edit: just tried on windows 10 WSL, and it had a sad (blue
             | screen).
        
       | bombcar wrote:
       | Instead of fixing this in the kernel, couldn't this be added to
       | SELinux as something it keeps an eye out for?
       | 
       | The kernel could _log_ it, and hopefully track them down, but
       | breaking the ABI seems harsh.
        
         | staticassertion wrote:
         | Maybe, but not necessarily. If a program's job is "elevate
         | other program's privileges arbitrarily" your policy is gonna be
         | pretty loose. Also, a lot of distros don't run SELinux.
         | 
         | It's not to say that SELinux wouldn't be helpful, it's just
         | that ideally we'd not have these vulnerabilities to begin with.
        
       | amelius wrote:
       | I wish the kernel could handle
       | argc==ARBITRARY_NUMBER
       | 
       | It's crazy that with 64GB of internal memory, I'm still getting
       | "argument list too long".
        
       | staticassertion wrote:
       | It seems straightforward to just enumerate every setuid program
       | that ships on the vast majority of distros and check for this
       | issue and patch what's there. That seems like the simplest
       | approach with no concern for breaking anything.
       | 
       | Also, a sysctl for rejecting the call would seem reasonable.
        
         | hedora wrote:
         | Agreed. Also, the existence of this bug makes me wonder if I
         | can simply delete polkit. What other rookie mistakes lie
         | within? I learned how to parse argv properly in CS 50 (so,
         | before CS 101).
        
           | WesolyKubeczek wrote:
           | The whole discussion of alternatives is, alas, akin to
           | passing a hot potato around. I, for one, don't quite see how
           | keeping a daemon whose only job is to execute arbitrary
           | privileged commands any safer. You can have a hole where it
           | parses the request. You can have a hole where it checks
           | whether the requesting party is indeed authorized to make
           | such requests. You can have a hole at the point of it
           | invoking execve(). Indeed, you can put in the same hole
           | pkexec used to have! You still need to exercise due diligence
           | to write that daemon as tightly as can be so all those places
           | don't pose a risk.
           | 
           | But with having a resident daemon approach you lose the
           | process ownership, which sometimes is damn handy to have.
           | Accidentally granting root to do a wrong thing and then not
           | even being able to kill the thing with Ctrl-C or xkill is
           | quite disempowering.
        
             | hedora wrote:
             | I still haven't gotten over the transition from init to
             | systemd. One reason is that I'm unconvinced it makes sense
             | to have a daemon that passes out root privileges after
             | boot. I'm a fan of "keep it simple," and UNIX/Linux/BSD
             | survived for decades without such a daemon.
        
           | rwmj wrote:
           | pkexec was a kludge to replace consolehelper and both were
           | actually supposed to have been removed in 2013(!) but the
           | transition didn't happen fully. In fact some are still using
           | consolehelper. For the whole sorry tale,
           | https://lwn.net/Articles/883547/
           | https://lwn.net/SubscriberLink/883547/f9a8f4e4be157f91/
        
             | fulafel wrote:
             | Interesting.
             | 
             | Ironically the reason pkexec exists on Ubuntu server
             | installs seems to be that they want to have a package
             | management service (PackageKit) hanging off dbus even in
             | headless setups, for reasons that aren't obvious. There
             | have been earlier local root bugs caused by PackageKit as
             | well[1].
             | 
             | (The irony being, the LWN documented discussion where a dev
             | defends ipc-accessible privileged daemons as more secure
             | and justifies pkexec as facilitating access to these).
             | 
             | [1] https://ubuntu.com/security/notices/USN-4538-1
        
               | simcop2387 wrote:
               | I believe it gets used for the unattended-upgrades system
               | to manage security updates. Not sure if that's the best
               | way to do it but I believe it's using packagekit to do
               | the actual work.
        
           | captainmuon wrote:
           | Polkit (I think it is polkit) is so annoying. When I forget
           | to use sudo, it asks me in a nonstandard prompt for a
           | password. But I don't have a password on some machines, I
           | logged in via ssh key and have passwordless sudo.
           | 
           | At the same time, you can _still_ not elevate inside an
           | application. I would love to run a command to open a single
           | file as root in vim, or VS code, or copy one file as root in
           | Nautilus. I think you can click a little lock in Gnome
           | settings somewhere and unlock certain pages, but it is far
           | from widespread.
        
         | gnulinux wrote:
         | What are some setuid program in popular distros? The only ones
         | I'm aware of are `sudo` and `su`? Do programs such as `systemd`
         | require setuid?
        
           | soneil wrote:
           | I count 20 on mine. Surprisingly, none of them are systemd.
           | (find / -perm 4000)
        
             | bonzini wrote:
             | systemd developers generally prefer having a privileged
             | daemon, and an unprivileged program doing RPC with the
             | other. See for example systemd/systemctl,
             | timesyncd/timesyncctl, etc.
        
           | josephcsible wrote:
           | There's a bunch, e.g., mount, umount, fusermount, crontab,
           | passwd, and chsh.
        
         | captainmuon wrote:
         | It seems a bit extreme to add a syscall for this. As you said,
         | they have to bite the bullet and fix the suid programs.
         | 
         | And why not just fix the exec call in userland? I'm not sure
         | who supplies it - the C standard library or the kernel headers.
         | You would fix all the cases of _accidentally_ calling something
         | with argc==0 and it would not run afoul of the kernel breaking
         | userspace.
        
       | tptacek wrote:
       | It's weird to have strong opinions about userspace here given
       | that the code search Kees Cook provided includes only test cases
       | (and, amusingly, what looks like one piece of shellcode) --- the
       | followup search on `execlp` turned up some things that looked
       | like they might be non-test hits, but it turned out that only the
       | test case hits were argc==0.
       | 
       | There's also comments throughout the thread that POSIX requires
       | programs to accept argc==0, which does not appear to be the case
       | --- the opposite seems closer to the truth, in that "strictly
       | conforming" POSIX programs must have argc>0.
       | 
       | Not breaking userspace is an important goal, but breaking bad
       | test cases doesn't meaningfully break userland.
       | 
       | The flip side of this is that people went looking for additional
       | instances of the pkexec bug and none have, as far as I can see,
       | turned up yet.
        
         | tedunangst wrote:
         | "But what if it breaks userland" does seem to be something of a
         | fig leaf. Change would imply there may have been something
         | wrong with the previous code.
        
       | tremon wrote:
       | I don't really understand the rationale for changing the Linux
       | kernel at all here; yes, this is yet another footgun (tm), but
       | this is purely a userspace problem. Especially given the examples
       | regarding backwards compatibility given in the comments, I would
       | expect a long-lasting userspace fix to come in the form of a new,
       | better libc API. Not by prohibiting certain behaviour from the
       | kernel-side of things.
       | 
       | That said, why is the obvious fix not simply "always populate
       | argv[0] with pathname if called with NULL argv"?
        
         | [deleted]
        
         | MatmaRex wrote:
         | Your question is answered in the article:
         | 
         | > Given the code we've found that depends on NULL argv, I think
         | we likely can't make the change outright, so we're down this
         | weird rabbit hole of trying to reject what we can and create
         | work-around behaviors for the cases that currently exist.
        
           | tremon wrote:
           | It's not clear from that quote if they're talking about code
           | calling execve(,NULL,) or a program expecting main() to be
           | entered with argv[0]==NULL.
           | 
           | (edit: found an answer here: https://lwn.net/ml/linux-
           | kernel/5e963fab-88d4-2039-1cf4-6661...)
           | 
           |  _multicall programs will try to use argv[0] and might crash
           | in this scenario. If we 're going to fake an argv, I guess we
           | should try to do it right._
        
             | formerly_proven wrote:
             | (multicall = things like busybox which look at argv[0] to
             | determine what program they should act like)
        
           | Animats wrote:
           | Right, backwards compatibility with existing bugs.
           | 
           | It really isn't a kernel problem at all, of course.
           | 
           | The POSIX spec says "The first argument, argv[0], is required
           | and must contain the name of the executable file for the new
           | process image."[1] Linux is not fully POSIX-compliant, but it
           | generally follows the POSIX standard for kernel calls defined
           | in POSIX.
           | 
           | FreeBSD's man page says "At least one argument must be
           | present in the array; by custom, the first element should be
           | the name of the executed program (for example, the last
           | component of path)."[2] So, "must", compatible with the POSIX
           | spec.
           | 
           | QNX implements the POSIX API, pretty much per the spec. The
           | QNX spec says: "The value in argv[0] must point to a filename
           | that's associated with the process being started."[4] So,
           | "must", again compatible with the spec.
           | 
           | The Linux man page says, unfortunately, "By convention, the
           | first of these strings (i.e., argv[0]) should contain the
           | filename associated with the file being executed."[3] That's
           | the problem. POSIX says "must" but the Linux page only says
           | "should". That was a mistake.
           | 
           | One somewhat kludgy fix would be to disallow argv[0] == NULL
           | && envp[0] != NULL. That allows launching a program with no
           | args at all, which might be used somewhere in some ancient
           | program, while disallowing launch with no command line args
           | but with environment args, which is probably a bug.
           | 
           | (I liked the QNX approach to program loading. The kernel
           | isn't involved. The "exec" functions link to a shared object,
           | which requests memory and does the loading using regular I/O
           | operations. The kernel contains no loader and is thus immune
           | to problems from malformed executables.)
           | 
           | [1] https://support.sas.com/documentation/onlinedoc/sasc/doc7
           | 50/...
           | 
           | [2] https://manpages.debian.org/unstable/freebsd-
           | manpages/execve...
           | 
           | [3] https://www.man7.org/linux/man-pages/man2/execve.2.html
           | 
           | [4] https://www.qnx.com/developers/docs/6.4.0/neutrino/lib_re
           | f/e...
        
             | hvdijk wrote:
             | That isn't the POSIX spec.
        
               | Animats wrote:
               | You're right. _This_ is (I think) the actual spec.[1] The
               | other one is SAS 's reference which claims to describe
               | the POSIX spec. The actual spec says "should". Bleah.
               | 
               |  _" The arguments represented by arg0,... are pointers to
               | null-terminated character strings. These strings shall
               | constitute the argument list available to the new process
               | image. The list is terminated by a null pointer. The
               | argument arg0 should point to a filename string that is
               | associated with the process being started by one of the
               | exec functions."_
               | 
               |  _" The argument argv is an array of character pointers
               | to null-terminated strings. The application shall ensure
               | that the last member of this array is a null pointer.
               | These strings shall constitute the argument list
               | available to the new process image. The value in argv[0]
               | should point to a filename string that is associated with
               | the process being started by one of the exec functions."_
               | 
               | [1] https://pubs.opengroup.org/onlinepubs/9699919799/func
               | tions/e...
        
               | bch wrote:
               | I think what you're describing above is the qualified
               | _"Strictly Conforming POSIX"._
        
               | [deleted]
        
               | torstenvl wrote:
               | > _The actual spec says "should". Bleah._
               | 
               | I don't agree with your reading that argv being non-NULL
               | is a "should."
               | 
               | The POSIX Specification does not say "The value in
               | argv[0] should _exist_ " it says "The value in argv[0]
               | should _point to a filename string..._ "
               | 
               | The value _existing_ is a foundational assumption in the
               | specification. It is therefore required.
               | 
               | (This is consistent with the fact that it's explicitly
               | required that argv[argc] == NULL, which necessarily means
               | that argv != NULL.)
        
             | tedunangst wrote:
             | FreeBSD did not require non null argv[0] until 9 days ago.
             | 
             | https://github.com/freebsd/freebsd-
             | src/commit/773fa8cd136a57...
        
               | kevans91 wrote:
               | Only a number of years too late...
               | 
               | It's worth noting that as far as changes in FreeBSD
               | forbidding long-standing permitted behavior goes, this
               | one was notably very uncontroversial.
        
         | akira2501 wrote:
         | That was my issue with the article as well:
         | 
         | "Fixing this issue is a simple matter of making pkexec check
         | that argc is at least one. But there are surely other programs
         | out there containing similar assumptions."
         | 
         | That's _not_ an assumption. That's just a bug, in a setuid
         | binary.
        
           | sedatk wrote:
           | It doesn't matter if it's a bug in the program or not. "We
           | don't break the userspace" --Linus Torvalds
        
             | akira2501 wrote:
             | I'm describing an issue with the article, not with the
             | kernel development ideology. What I don't understand is why
             | Corbett decided to write this up as an "assumption" rather
             | than a "bug" which it clearly is.
        
               | zerocrates wrote:
               | It's both, what's the problem with what he wrote?
               | 
               | It's a bug that results from the developers incorrectly
               | assuming that argc is always at least 1.
        
       | mortenlarsen wrote:
       | OpenBSD has handled this case for some years. I do not know if
       | there was any breakage or fallout from this.
       | 
       | https://github.com/openbsd/src/commit/74212563870067f5b1e271...
        
         | roastedpeacock wrote:
         | Thanks for answering my next question, how much world of hurt
         | would be involved with this solution in terms of breaking
         | userspace.
         | 
         | Kind of insane how Linux kernel developers didn't test and
         | resolve this in a similar manner before this became an issue.
        
           | gnubison wrote:
           | argc==0 is a well known part of the API, even if not everyone
           | handles it correctly. I guarantee that >95% of kernel
           | developers (if not 100% of them) understand this design and
           | its flaws.
        
         | simcop2387 wrote:
         | That's not a bad precedent for saying it shouldn't break much
         | of anything if it's done. I also love the name of the label for
         | handling the error condition there.
        
           | cozzyd wrote:
           | Slightly funnier would be "harmful"
        
         | sedatk wrote:
         | FWIW, Windows (MSVC) also guarantees "argc > 0" and non-empty
         | argv with non-empty argv[0]. It also returns EINVAL if execve
         | validation fails.
        
           | garaetjjte wrote:
           | Windows doesn't have argv at all, it uses single string as
           | cmdline arguments.
        
             | sedatk wrote:
             | The C runtime bridges that gap.
             | https://docs.microsoft.com/en-us/cpp/cpp/main-function-
             | comma...
        
       ___________________________________________________________________
       (page generated 2022-02-04 23:00 UTC)