[HN Gopher] Unix shells and the current directory
       ___________________________________________________________________
        
       Unix shells and the current directory
        
       Author : ingve
       Score  : 76 points
       Date   : 2023-11-26 07:41 UTC (15 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | ChrisSD wrote:
       | > Complicating this picture is shells. For a long time, many
       | shells have kept track of a name for their current directory
       | themselves, often materializing this in the '$PWD' environment
       | variable. The shell has to keep track of this name as a text
       | string or the rough equivalent, which makes it potentially less
       | accurate than the kernel's version. However, it has some
       | advantages, because unlike the kernel, the shell knows what name
       | you typed in order to get to the directory, which may not be the
       | actual filesystem name of the directory because of things like
       | symbolic links. Shells often use this knowledge so that names
       | like '..' and even '.' work on the text version, not the
       | filesystem version.
       | 
       | Related reading: Lexical File Names in Plan 9 or Getting Dot-Dot
       | Right (https://9p.io/sys/doc/lexnames.html)
        
       | tyingq wrote:
       | Where it exists, there's also /proc/self/cwd, so you can do:
       | readlink -e /proc/self/cwd
        
         | rezonant wrote:
         | While readlink is the correct way to just read the link, I
         | often just use `file ...` since it will also show the symbolic
         | link destination (on GNU at least).
        
           | didntcheck wrote:
           | ls -l also works. I often do                 ls -l
           | /proc/`pidof foo`/fd
           | 
           | to see what files a taciturn process is working on
        
       | seligman99 wrote:
       | Along the same lines, in the Windows world:
       | 
       | The current directory is managed with
       | SetCurrentDirectory/GetCurrentDirectory, however the cmd.exe
       | command-line shell also stores the current directory for each
       | drive in an environment variable like "=C:", and the CRT and
       | shell hides all environment variables that start with a "=".
       | 
       | It gets mightily confused if these two concepts of current
       | directory ever diverge.
        
         | c0pium wrote:
         | Who is still using cmd.exe? I understand that there are system
         | processes that still need it, but if you see a human using cmd
         | in the year of our lord 2023, that's a cry for help!
        
           | schemescape wrote:
           | What should I be using instead?
           | 
           | I don't mind cmd.exe and it launches instantly (same reason I
           | frequently use notepad.exe for quick edits). That latter
           | quality is very hard to find :)
           | 
           | Edit: but if you meant for scripting, yeah, batch files are
           | terrible.
        
             | hiccuphippo wrote:
             | Personally I use the bash that comes with git for Windows.
             | I only need to use cmd.exe for creating symlinks since
             | mklink is a built-in.
        
             | switch007 wrote:
             | On my Windows 10 with no profile it takes 1-2 seconds
             | (Ryzen 3600/M2/32GB RM). Like, what is it doing? I get
             | annoyed if bash on Linux takes like 250ms.
        
           | PhilipRoman wrote:
           | It's installed everywhere on any version of windows and works
           | fine for interactive tasks (personally I wouldn't write
           | anything but the simplest scripts for it, anything with for
           | loops is a big no-no)
        
             | vel0city wrote:
             | Powershell is installed everywhere on any version of
             | Windows that still receives security updates.
        
           | toast0 wrote:
           | I do. I don't like PowerShell (and it took me years to
           | realize it wasn't a diagnostic tool for power management),
           | and I find bash for Windows to be ill fitting. I don't do a
           | lot of stuff in the command line on Windows, so working like
           | it has for decades is a plus.
        
       | Someone wrote:
       | Apart from history and standards, what is the reason for having
       | the path to the current directory or even the current directory
       | known to the kernel?
       | 
       | The shell already seems to track it, so presumably, that logic
       | could have been part of the standard library, and get tracked
       | from user-mode.
       | 
       | If the kernel has to track the current directory (e.g. for
       | performance reasons, to make accessing files relative to a
       | particular directory more efficient), wouldn't just remembering
       | the device ID and inode be easier for the kernel?
       | 
       | Alternatively, there could be kernel calls taking (device, inode)
       | pairs, and the kernel could be completely ignorant of the
       | 'current directory' concept.
       | 
       | That can work; except for naming them 'directory ID' instead of
       | 'inode', that's what the first Mac OS hierarchical file system
       | did; paths were second-class citizens here.
        
         | cryptonector wrote:
         | The kernel doesn't have to know the current directory's path.
         | It's enough that it know -and retain an open file reference to-
         | the current directory's inode/dnode.
         | 
         | However, if you want things like DTrace, eBPF, or even just
         | reading the /proc/PID/cwd symlink to be useful, it helps to
         | cache the actual path in the kernel. A DTrace/eBPF script will
         | not be able to loop to chase ..s, much less will it be able to
         | do the I/O needed to work out the cwd.
         | 
         | The same applies to the names of the files that each FD refer
         | to.
         | 
         | Caching these things is just for observability.
        
         | didntcheck wrote:
         | Yeah, I remember being surprised by two things when I first
         | started learning about Unix implementation
         | 
         | * That $PATH is just an ordinary env var, that many programs
         | use by convention
         | 
         | * That CWD _isn 't_, and is in fact a first-class kernel
         | concept. I had assumed that it was just a conventional envvar
         | that stdlibs prepended before passing absolute paths to
         | syscalls
         | 
         | I'm sure there's good reasons why the other way wouldn't work,
         | it just amused me that I'd got it wrong in both ways
        
         | benou wrote:
         | My guess is kernel needs to know the current directory of a
         | process so that when said process tries to open a file without
         | an absolute path (eg. just "file.txt" and not "/tmp/file.txt"),
         | it can open "$CWD/file.txt".
         | 
         | This must be tracked by kernel, because not all syscalls go
         | through libc, you can issue the open syscall directly from a
         | process.
         | 
         | There might be other reasons, but I'd bet it's the main one.
        
         | toast0 wrote:
         | A reference to the current directory is needed in order to open
         | relative filenames. You could conceivably retain a string path
         | rather than a reference, but the behavior would be different
         | when directories are renamed or unlinked.
        
         | siebenmann wrote:
         | The current directory is a long-standing Unix concept, so you'd
         | have to trace its history back quite far to hear arguments
         | about why it was there. One obvious reason is that relative
         | paths are convenient for all sorts of reasons and they require
         | a point to be relative to, which is basically 'the current
         | directory' in some form.
         | 
         | The kernel knowing the name for the current directory is not
         | specific to current directories; it is part of a general system
         | of caching the name mappings for directory entries ('dnodes' in
         | Linux, a 'name cache' in FreeBSD). Unix kernels added these
         | caches because Unix programs spend a lot of time looking up
         | names, making the operation worth optimizing in general. Once
         | you have a general name cache, you might as well pin the
         | entries for actively used entities like current directories and
         | open files so that they don't get expired out of the cache and
         | you always know (some) name for them.
         | 
         | (One useful complexity of name caches is that you can cache
         | negative entries, ie that a given name is not present in a
         | directory. In the modern Unix shared library environment where
         | shared libraries may be probed for in a whole collection of
         | directories every time a program starts up, I suspect this
         | saves a nice chunk of kernel CPU time.)
        
       | comex wrote:
       | > Shells often use this knowledge so that names like '..' and
       | even '.' work on the text version, not the filesystem version.
       | 
       | Which has the odd result that '..' behaves differently between
       | shell builtins and normal commands. `cd ..; ls` uses the text
       | version, but `ls ..` uses the filesystem version. `cat < ../x`
       | uses the text version, but `cat ../x` uses the filesystem
       | version.
       | 
       | I like the text behavior in theory, but this inconsistency is
       | weird enough that I question the benefit of having the text
       | behavior at all.
        
       | 1vuio0pswjnm7 wrote:
       | "Sometimes people then write shell scripts and other code that
       | assumes '$PWD' is accurate if it's present, which is not
       | necessarily true."
       | 
       | Unless one only wants the current directory name at the start of
       | a script why not just use the builtin pwd command, $(pwd). Or
       | getcwd() if it's "other code".
        
       ___________________________________________________________________
       (page generated 2023-11-26 23:01 UTC)