[HN Gopher] The history and use of /etc./glob in early Unixes
       ___________________________________________________________________
        
       The history and use of /etc./glob in early Unixes
        
       Author : zdw
       Score  : 61 points
       Date   : 2025-01-13 05:44 UTC (17 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | yjftsjthsd-h wrote:
       | > PS: I don't know why expanding shell wildcards used a separate
       | program in V6 and earlier, but part of it may have been to keep
       | the shell smaller and more minimal so that it required less
       | memory.
       | 
       | See, I thought it was a nice separation of concerns and wondered
       | why we lost such a nice approach, until I read:
       | 
       | > How escaping wildcards works in the V5 and V6 shell is that all
       | characters in commands and arguments are restricted to being
       | seven-bit ASCII. The shell and /etc/glob both use the 8th bit to
       | mark quoted characters, which means that such quoted characters
       | don't match their unquoted versions and won't be seen as
       | wildcards by either the shell
       | 
       | at which point I suddenly became a fan of ditching it. I do
       | wonder if there's not some better way to factor that
       | functionality out...
        
         | hnlmorg wrote:
         | The way Murex works is each parameter is first compiled into an
         | AST, and then globing only works against the unquoted tokens.
         | 
         | Globbing is also a separate built in, which allows for other
         | types of wildcard matches like regex too. Eg
         | https://murex.rocks/tour.html#filesystem-wildcards-globbing
         | 
         | So you have have the best of both worlds: inline globbing for
         | convenience and also wildcard matching as a function too.
        
         | eru wrote:
         | > at which point I suddenly became a fan of ditching it. I do
         | wonder if there's not some better way to factor that
         | functionality out...
         | 
         | Just use backslash escaping like we do practically everywhere
         | else in the Unix world?
        
           | rini17 wrote:
           | That's kind of cure worse than disease. Just ditch escaping
           | completely.
        
             | yjftsjthsd-h wrote:
             | If you completely ditch escaping, how do you handle
             | filenames that contain special characters (in this context,
             | mostly ? and *, but ()[] are also perennial favorites)? And
             | to preempt the most obvious answer: No, you can't just ban
             | them, because existing OSs and filesystems allow them and
             | you need interoperability.
        
               | rini17 wrote:
               | There are ways, no idea why doing anything here is so
               | reviled.
               | 
               | Find and xargs can delimite filenames by NUL, which is
               | not allowed in filenames. Best practice in SQL was to
               | abandon parameters escaping completely and pass them out
               | of band. For internal representation, use array
               | datastructures with length information.
               | 
               | Actually, would it be that bad, to ban * and ? in
               | filenames? If you accept them in the name of interop,
               | something inevitably breaks later. Better to fail
               | upfront. Many applications do sanitize filenames already
               | and when they need to use binary data as file name,
               | convert it to hex instead. It's a hassle otherwise.
        
         | Joker_vD wrote:
         | Why would I want to factor out some syntactic functionality of
         | one specific (and not very well thought out) shell to reuse,
         | again?
         | 
         | But if you really insist, you can write your own glob(1) that
         | would invoke glob(3) for you, sure. There is also wordexp(3)
         | although I believe its implementations had security problems
         | for quite some time?
        
         | p_l wrote:
         | Important thing to remember is that even after the move to
         | PDP-11, early Unix systems had to deal with 32kB as entire
         | space available to userland program, both code and data
         | (including stack)
        
           | kjs3 wrote:
           | You mean 32k _words_ , not 32k _bytes_ , right[1]? And AFAIK
           | by V5 or V6, Unix could use split instruction and data if the
           | MMU supported it giving a bit more headroom. But, yeah,
           | memory was very tight, and a lot of very clever tricks were
           | used to get around it.
           | 
           | [1] Even worse, the top 4kW/8kB was reserved for I/O.
        
         | BoingBoomTschak wrote:
         | There's a sane language that never went that route:
         | https://www.tcl.tk/man/tcl9.0/TclCmd/glob.html
         | 
         | It also ditched another special case recently: the leading ~.
        
       | timewizard wrote:
       | Sweet.
       | 
       | I use xterm.js a lot and have a "shell backbone" that I use to
       | make shell based access to APIs, S3 and other things "cloud."
       | This is essentially how I implement globbing as well. The
       | convenience is that you can run glob by itself to get an idea of
       | exactly what kind of automated nightmare you are about to kick
       | off.
       | 
       | Anyways.. mine currently has V3 behavior. My shell command exec
       | routine could actually benefit from that hack. What's old is new
       | again?
        
       | ginko wrote:
       | Why is there a period after etc in the title? Another example of
       | HN's stupid automated title editing?
        
         | mkl wrote:
         | Probably the submitter typed it on a phone instead of copy-
         | paste and "etc" got autoincorrected.
        
       | rollcat wrote:
       | This is php.ini level of madness, and I'm glad it's gone from
       | (semi-)modern shells. A formal (e.g. programming) language should
       | be defined in its entirety by its formal grammar, its semantics
       | by a formal spec, etc. There's barely any good reason to let the
       | system administrator change the logic and semantics of deployed
       | code.
       | 
       | You could argue that Lisp reader macros also somewhat violate
       | this rule. As a longtime Lisp fan, I dislike reader macros, but
       | I'm more conflicted about macros in general. A good macro system
       | should aim to provide enough context for IDEs and LSPs to aid the
       | developer, but Lisp macros are entirely about just transforming
       | the AST. It's usually just better to evolve the language.
        
         | JdeBP wrote:
         | It's not there to give the system administrator flexibility.
         | It's there because early Unix was heavily constrained, and
         | doing thing with lots of little overlays (and what was decades
         | later known as "Bernstein chaining") rather than 1 big program
         | was the way to architect stuff. exit(1), goto(1), and if(1)
         | were all external commands in the Thompson shell.
         | 
         | * https://v6sh.org
        
           | rollcat wrote:
           | I would argue with almost anyone else, that this is a poor
           | design, but...
           | 
           | Thank you for your perspective, work, and contributions.
        
             | hnlmorg wrote:
             | The other thing to bear in mind is that it's undergone
             | literally decades of evolution while still being backwards
             | compatible.
             | 
             | The shells weren't originally intended to be Turing
             | complete. They were just a job launcher. What you use today
             | would have been unimaginable when these shells were first
             | designed.
             | 
             | Whereas all other programming languages have had a
             | drastically smaller evolution in comparison and yet still
             | had a worse compatibility story.
             | 
             | It's very easy to be critical of the Bourne shell (and
             | compatible shells too) because they are archaic by modern
             | standards. But they weren't written to solve modern
             | problems. So it's like looking at a bicycle and complaining
             | how the designers didn't design a sports car while ignoring
             | the fact that technology didn't exist and still push bikes
             | are good enough for millions to use daily.
        
             | pwg wrote:
             | You are likely looking at this design from a modern system
             | perspective.
             | 
             | But the PDP-11 system that many of these designs were made
             | upon had a minimum memory size of 4K bytes and with varying
             | models that had different maximum memory sizes that are
             | smaller than a single JPEG photo in today's world: PDP
             | 11/45 max memory 256kbyte - PDP 11/70 max memory 4Mbyte.
             | 
             | And this was the total memory for everything, the OS, and
             | the users, and the system supported multiple users sharing
             | the same machine at the same time.
             | 
             | With those resource constraints, the design rules that
             | determine good from poor are radically different than with
             | one of today's systems with multiple Gb of RAM.
        
         | ars wrote:
         | What in the world is "php.ini level of madness"?
         | 
         | If you are trying to attack php you are not doing a good job of
         | it, especially because there were good reason for using a
         | separate program for glob.
        
       | gjvc wrote:
       | binaries in /etc/ -- i mean __really__
        
         | NekkoDroid wrote:
         | Fun fact: the linux kernel itself actually also looks for
         | `/etc/init` before it even looks for `/bin/init`
         | 
         | https://github.com/torvalds/linux/blob/4a5df37964673effcd9f8...
        
         | tedunangst wrote:
         | Yes, really. That's what /etc was for.
        
           | gjvc wrote:
           | I know. I'm saying it's sick. I hate computers.
        
             | kps wrote:
             | Why sick? That was the directory for binaries that weren't
             | meant to be run directly -- `getty`, `login`, etc.
             | 
             | Today there's _much_ more software, so some things got
             | moved into finer-grained locations like  /libexec and
             | /sbin. That wasn't the case in the /etc/glob era when the
             | entire UNIX system was smaller than today's average web
             | page.
        
               | gjvc wrote:
               | and /sbin was full
        
         | stevekemp wrote:
         | Even now you'll come across this, for example "/etc/rmt"
         | probably exists, and other tape-related binaries if installed.
        
       | imglorp wrote:
       | User defined functions were implemented similarly as external
       | execs in early shells. As the script was parsed, functions were
       | dropped into /tmp without their wrappings and then called as
       | external programs. Since they would still reference parameters as
       | $1, $2 etc, it just worked: function bodies and standalone sh
       | scripts had the same interface! Such a clever idea to avoid
       | managing an interpreted call stack in the parent.
        
       | amelius wrote:
       | Recent versions of Bash don't expand the * (et cetera) patterns
       | when there is no match, which although sometimes useful, I still
       | feel it's a hack.
        
         | Joker_vD wrote:
         | That's been around since the original Bourne shell; /etc/glob,
         | from what I can see from its source, would refuse to run the
         | command if the resulting expansion turned out completely empty;
         | and the globs with no matches would be simply removed.
        
           | amelius wrote:
           | That's not how it works in recent Ubuntu releases. If there
           | is no match, the command runs with the wildcard chars not
           | substituted.                   # echo foo*bar         foo*bar
        
         | pwg wrote:
         | The action to take upon no match is configurable in recent Bash
         | versions.
         | 
         | The 'failglob' shopt option will cause an error to be generated
         | if a glob matches nothing.
         | 
         | The 'nullglob' shopt option toggles between no match expanding
         | to an empty string and the traditional default of no match
         | leaving the glob characters untouched.
        
       | miohtama wrote:
       | The linked C source file is an excellent example of ancient C,
       | when it was still more closer to high level assembly:
       | 
       | https://www.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/glob.c
        
         | tomtomtom777 wrote:
         | And when buffer overflows were (attempted to be) avoided by
         | guestimating a large enough buffer size.
        
         | LeFantome wrote:
         | I assume that you are referring to the liberal use of "goto".
         | Of course, "if", "while", and even "switch" are also used.
         | Quite the mix.
         | 
         | Directly calling into system calls ("write") is interesting.
        
       ___________________________________________________________________
       (page generated 2025-01-13 23:02 UTC)