[HN Gopher] The history and use of /etc./glob in early Unixes
___________________________________________________________________
The history and use of /etc./glob in early Unixes
Author : zdw
Score : 61 points
Date : 2025-01-13 05:44 UTC (17 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| yjftsjthsd-h wrote:
| > PS: I don't know why expanding shell wildcards used a separate
| program in V6 and earlier, but part of it may have been to keep
| the shell smaller and more minimal so that it required less
| memory.
|
| See, I thought it was a nice separation of concerns and wondered
| why we lost such a nice approach, until I read:
|
| > How escaping wildcards works in the V5 and V6 shell is that all
| characters in commands and arguments are restricted to being
| seven-bit ASCII. The shell and /etc/glob both use the 8th bit to
| mark quoted characters, which means that such quoted characters
| don't match their unquoted versions and won't be seen as
| wildcards by either the shell
|
| at which point I suddenly became a fan of ditching it. I do
| wonder if there's not some better way to factor that
| functionality out...
| hnlmorg wrote:
| The way Murex works is each parameter is first compiled into an
| AST, and then globing only works against the unquoted tokens.
|
| Globbing is also a separate built in, which allows for other
| types of wildcard matches like regex too. Eg
| https://murex.rocks/tour.html#filesystem-wildcards-globbing
|
| So you have have the best of both worlds: inline globbing for
| convenience and also wildcard matching as a function too.
| eru wrote:
| > at which point I suddenly became a fan of ditching it. I do
| wonder if there's not some better way to factor that
| functionality out...
|
| Just use backslash escaping like we do practically everywhere
| else in the Unix world?
| rini17 wrote:
| That's kind of cure worse than disease. Just ditch escaping
| completely.
| yjftsjthsd-h wrote:
| If you completely ditch escaping, how do you handle
| filenames that contain special characters (in this context,
| mostly ? and *, but ()[] are also perennial favorites)? And
| to preempt the most obvious answer: No, you can't just ban
| them, because existing OSs and filesystems allow them and
| you need interoperability.
| rini17 wrote:
| There are ways, no idea why doing anything here is so
| reviled.
|
| Find and xargs can delimite filenames by NUL, which is
| not allowed in filenames. Best practice in SQL was to
| abandon parameters escaping completely and pass them out
| of band. For internal representation, use array
| datastructures with length information.
|
| Actually, would it be that bad, to ban * and ? in
| filenames? If you accept them in the name of interop,
| something inevitably breaks later. Better to fail
| upfront. Many applications do sanitize filenames already
| and when they need to use binary data as file name,
| convert it to hex instead. It's a hassle otherwise.
| Joker_vD wrote:
| Why would I want to factor out some syntactic functionality of
| one specific (and not very well thought out) shell to reuse,
| again?
|
| But if you really insist, you can write your own glob(1) that
| would invoke glob(3) for you, sure. There is also wordexp(3)
| although I believe its implementations had security problems
| for quite some time?
| p_l wrote:
| Important thing to remember is that even after the move to
| PDP-11, early Unix systems had to deal with 32kB as entire
| space available to userland program, both code and data
| (including stack)
| kjs3 wrote:
| You mean 32k _words_ , not 32k _bytes_ , right[1]? And AFAIK
| by V5 or V6, Unix could use split instruction and data if the
| MMU supported it giving a bit more headroom. But, yeah,
| memory was very tight, and a lot of very clever tricks were
| used to get around it.
|
| [1] Even worse, the top 4kW/8kB was reserved for I/O.
| BoingBoomTschak wrote:
| There's a sane language that never went that route:
| https://www.tcl.tk/man/tcl9.0/TclCmd/glob.html
|
| It also ditched another special case recently: the leading ~.
| timewizard wrote:
| Sweet.
|
| I use xterm.js a lot and have a "shell backbone" that I use to
| make shell based access to APIs, S3 and other things "cloud."
| This is essentially how I implement globbing as well. The
| convenience is that you can run glob by itself to get an idea of
| exactly what kind of automated nightmare you are about to kick
| off.
|
| Anyways.. mine currently has V3 behavior. My shell command exec
| routine could actually benefit from that hack. What's old is new
| again?
| ginko wrote:
| Why is there a period after etc in the title? Another example of
| HN's stupid automated title editing?
| mkl wrote:
| Probably the submitter typed it on a phone instead of copy-
| paste and "etc" got autoincorrected.
| rollcat wrote:
| This is php.ini level of madness, and I'm glad it's gone from
| (semi-)modern shells. A formal (e.g. programming) language should
| be defined in its entirety by its formal grammar, its semantics
| by a formal spec, etc. There's barely any good reason to let the
| system administrator change the logic and semantics of deployed
| code.
|
| You could argue that Lisp reader macros also somewhat violate
| this rule. As a longtime Lisp fan, I dislike reader macros, but
| I'm more conflicted about macros in general. A good macro system
| should aim to provide enough context for IDEs and LSPs to aid the
| developer, but Lisp macros are entirely about just transforming
| the AST. It's usually just better to evolve the language.
| JdeBP wrote:
| It's not there to give the system administrator flexibility.
| It's there because early Unix was heavily constrained, and
| doing thing with lots of little overlays (and what was decades
| later known as "Bernstein chaining") rather than 1 big program
| was the way to architect stuff. exit(1), goto(1), and if(1)
| were all external commands in the Thompson shell.
|
| * https://v6sh.org
| rollcat wrote:
| I would argue with almost anyone else, that this is a poor
| design, but...
|
| Thank you for your perspective, work, and contributions.
| hnlmorg wrote:
| The other thing to bear in mind is that it's undergone
| literally decades of evolution while still being backwards
| compatible.
|
| The shells weren't originally intended to be Turing
| complete. They were just a job launcher. What you use today
| would have been unimaginable when these shells were first
| designed.
|
| Whereas all other programming languages have had a
| drastically smaller evolution in comparison and yet still
| had a worse compatibility story.
|
| It's very easy to be critical of the Bourne shell (and
| compatible shells too) because they are archaic by modern
| standards. But they weren't written to solve modern
| problems. So it's like looking at a bicycle and complaining
| how the designers didn't design a sports car while ignoring
| the fact that technology didn't exist and still push bikes
| are good enough for millions to use daily.
| pwg wrote:
| You are likely looking at this design from a modern system
| perspective.
|
| But the PDP-11 system that many of these designs were made
| upon had a minimum memory size of 4K bytes and with varying
| models that had different maximum memory sizes that are
| smaller than a single JPEG photo in today's world: PDP
| 11/45 max memory 256kbyte - PDP 11/70 max memory 4Mbyte.
|
| And this was the total memory for everything, the OS, and
| the users, and the system supported multiple users sharing
| the same machine at the same time.
|
| With those resource constraints, the design rules that
| determine good from poor are radically different than with
| one of today's systems with multiple Gb of RAM.
| ars wrote:
| What in the world is "php.ini level of madness"?
|
| If you are trying to attack php you are not doing a good job of
| it, especially because there were good reason for using a
| separate program for glob.
| gjvc wrote:
| binaries in /etc/ -- i mean __really__
| NekkoDroid wrote:
| Fun fact: the linux kernel itself actually also looks for
| `/etc/init` before it even looks for `/bin/init`
|
| https://github.com/torvalds/linux/blob/4a5df37964673effcd9f8...
| tedunangst wrote:
| Yes, really. That's what /etc was for.
| gjvc wrote:
| I know. I'm saying it's sick. I hate computers.
| kps wrote:
| Why sick? That was the directory for binaries that weren't
| meant to be run directly -- `getty`, `login`, etc.
|
| Today there's _much_ more software, so some things got
| moved into finer-grained locations like /libexec and
| /sbin. That wasn't the case in the /etc/glob era when the
| entire UNIX system was smaller than today's average web
| page.
| gjvc wrote:
| and /sbin was full
| stevekemp wrote:
| Even now you'll come across this, for example "/etc/rmt"
| probably exists, and other tape-related binaries if installed.
| imglorp wrote:
| User defined functions were implemented similarly as external
| execs in early shells. As the script was parsed, functions were
| dropped into /tmp without their wrappings and then called as
| external programs. Since they would still reference parameters as
| $1, $2 etc, it just worked: function bodies and standalone sh
| scripts had the same interface! Such a clever idea to avoid
| managing an interpreted call stack in the parent.
| amelius wrote:
| Recent versions of Bash don't expand the * (et cetera) patterns
| when there is no match, which although sometimes useful, I still
| feel it's a hack.
| Joker_vD wrote:
| That's been around since the original Bourne shell; /etc/glob,
| from what I can see from its source, would refuse to run the
| command if the resulting expansion turned out completely empty;
| and the globs with no matches would be simply removed.
| amelius wrote:
| That's not how it works in recent Ubuntu releases. If there
| is no match, the command runs with the wildcard chars not
| substituted. # echo foo*bar foo*bar
| pwg wrote:
| The action to take upon no match is configurable in recent Bash
| versions.
|
| The 'failglob' shopt option will cause an error to be generated
| if a glob matches nothing.
|
| The 'nullglob' shopt option toggles between no match expanding
| to an empty string and the traditional default of no match
| leaving the glob characters untouched.
| miohtama wrote:
| The linked C source file is an excellent example of ancient C,
| when it was still more closer to high level assembly:
|
| https://www.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/glob.c
| tomtomtom777 wrote:
| And when buffer overflows were (attempted to be) avoided by
| guestimating a large enough buffer size.
| LeFantome wrote:
| I assume that you are referring to the liberal use of "goto".
| Of course, "if", "while", and even "switch" are also used.
| Quite the mix.
|
| Directly calling into system calls ("write") is interesting.
___________________________________________________________________
(page generated 2025-01-13 23:02 UTC)