[HN Gopher] Deep Down the Rabbit Hole: Bash, OverlayFS, and a 30...
       ___________________________________________________________________
        
       Deep Down the Rabbit Hole: Bash, OverlayFS, and a 30-Year-Old
       Surprise
        
       Author : Deeg9rie9usi
       Score  : 54 points
       Date   : 2025-06-25 13:43 UTC (9 hours ago)
        
 (HTM) web link (sigma-star.at)
 (TXT) w3m dump (sigma-star.at)
        
       | x0x0 wrote:
       | Saving this to explain why software is hard.
       | 
       | For a long time, inode numbers from readdir() had certain
       | semantics. Supporting overlay filesystems required changing those
       | semantics. Piles of software were written against the old
       | semantics; and even some of the most common have not been
       | upgraded.
        
         | JdeBP wrote:
         | The opposite, if anything. Very little was written against the
         | old semantics, with most of the time the supplied C library
         | providing what was needed, and so the code that _did_ rely upon
         | old semantics barely got exercised. A little-used shim that had
         | been broken wasn 't noticed, in other words, until just the
         | right combination of circumstances got the shim being used on a
         | platform where it would break.
         | 
         | What there _are_ piles of, are softwares that reinvent the C
         | library, all too often in little bits of conditionally-compiled
         | code that have either been reinvented or nicked from some old C
         | library and sit unused in every platform that that application
         | is nowadays ported to. Every time that I see a build log
         | dutifully informing me that it has checked for  <string.h> or
         | some other thing that has been standard for 35 years I wonder
         | (a) why that is thought to be necessary in 2025, and (b) what
         | sort of shims would get used if the check ever failed.
        
       | saurik wrote:
       | FWIW, if you are cross-compiling, while you might get a vaguely
       | usable result by ignoring all of the warnings and letting worst-
       | common-denominator defaults get applied, you absolutely should be
       | paying more attention and either manually providing autoconf the
       | answers it needs or (if at all possible, as this is more general)
       | make sure to tell it how to run a binary on the target system
       | (maybe in an emulator or over ssh)... you shouldn't just be
       | YOLOing a cross-compile like this and expecting it to work (not
       | to say that this wasn't a good bug in the fallback to fix, just
       | that the premise is awkward).
        
         | iforgotpassword wrote:
         | Like for example when compiling Linux (plus user space) from
         | Windows XP using only the official Services for Unix package
         | from Microsoft as a starting point.
        
       | pogopop77 wrote:
       | Interesting investigation, good read. Definitely illustrates how
       | new paradigms (i.e. overlay filesystems) can subtly affect
       | behaviors in ways that are complex to track down.
        
       | akoboldfrying wrote:
       | Remember, folks: It's not enough to check $WEARING_PANTS before
       | stepping outside. You need to check !$PANTS_BROKEN && !$SOLARIS
       | too.
        
       | jwilk wrote:
       | > Once the bug report becomes publicly visible, it will be linked
       | here.
       | 
       | Here it is: https://lists.gnu.org/archive/html/bug-
       | bash/2025-06/msg00149...
        
       | chubot wrote:
       | Wow great bug!
       | 
       | > Bash forgot to reset errno before the call. For about 30 years,
       | no one noticed
       | 
       | I have to say, this part of the POSIX API is maddening!
       | 
       | 99% of the time, you don't need to set errno = 0 before making a
       | call. You check for a non-zero return, and only then look at
       | errno.
       | 
       | But SOMETIMES you need to set errno = 0, because in this case
       | readdir() returns NULL on both error and EOF.
       | 
       | I actually didn't realize this before working on
       | https://oils.pub/
       | 
       | ---
       | 
       | And it should go without saying: Oils simply uses libc - we don't
       | need to support system with a broken getcwd()!
       | 
       | Although a funny thing is that I just fixed a bug related to $PWD
       | that AT&T ksh (the original shell, that bash is based on) hasn't
       | fixed for 30+ years too!
       | 
       | (and I didn't realize it was still maintained)
       | 
       | https://www.illumos.org/issues/17442
       | 
       | https://github.com/oils-for-unix/oils/issues/2058
       | 
       | There is a subtle issue with respect to:
       | 
       | 1) "trusting" the $PWD value you inherit from another process
       | 
       | 2) Respecting symlinks - this is the reason the shell can't just
       | call getcwd() !                   if (*p != '/' || stat(p, &st1)
       | || stat(".", &st2) ||             st1.st_dev != st2.st_dev ||
       | st1.st_ino != st2.st_ino)             p = 0;
       | 
       | Basically, the shell considers BOTH the inherited $PWD and the
       | value of getcwd() to determine its $PWD. It can't just use one or
       | the other!
        
       | justincormack wrote:
       | Most of the stuff that configure scripts check is obsolete, and
       | breaks in situations like this as the checks are often not
       | workable without running code. It is likely the check does not
       | apply to any system that has existed for decades. Lots of systems
       | have disabled eg Nix in 2017 [1]
       | 
       | [1]
       | https://github.com/NixOS/nixpkgs/commit/dff0ba38a243603534c9...
        
         | arp242 wrote:
         | I had a look at the bash source code a few years back, and
         | there are tons of hacks and workarounds for 1980s-era systems.
         | Looking at the git log, GETCWD_BROKEN was added in bash 1.14
         | from 1996, presumably to work around some system at the time (a
         | system which was perhaps already old in 1996, but it's not
         | detailed which).
         | 
         | Also, that getcwd.c which contains the getcwd() fallback and
         | bug is in K&R C, which should be a hint at how well maintained
         | all of this is. Bash takes "don't fix it if it ain't broke" to
         | new levels, to the point of introducing breakage like here (the
         | bash-malloc is also notorious for this - no idea why that's
         | still enabled by default).
        
       | malkia wrote:
       | Autoconf is the prime example of easy vs simple.
       | 
       | It looks easy on the surface to roll down support for any kind of
       | operating system there is, based on auto-detection and then #if
       | HAVE_THIS or #if HAVE_THAT, but it breaks in ways that maybe
       | really hard to untangle later.
       | 
       | I'd rather have a limited set set of configurations targeting
       | specific platforms/flavors, and knowing that no matter how I
       | compile it, I would know what is `#define`-d and what is not,
       | instead of guessing on what the "host" might have.
        
       ___________________________________________________________________
       (page generated 2025-06-25 23:01 UTC)