[HN Gopher] Unix shells and the current directory
___________________________________________________________________
Unix shells and the current directory
Author : ingve
Score : 76 points
Date : 2023-11-26 07:41 UTC (15 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| ChrisSD wrote:
| > Complicating this picture is shells. For a long time, many
| shells have kept track of a name for their current directory
| themselves, often materializing this in the '$PWD' environment
| variable. The shell has to keep track of this name as a text
| string or the rough equivalent, which makes it potentially less
| accurate than the kernel's version. However, it has some
| advantages, because unlike the kernel, the shell knows what name
| you typed in order to get to the directory, which may not be the
| actual filesystem name of the directory because of things like
| symbolic links. Shells often use this knowledge so that names
| like '..' and even '.' work on the text version, not the
| filesystem version.
|
| Related reading: Lexical File Names in Plan 9 or Getting Dot-Dot
| Right (https://9p.io/sys/doc/lexnames.html)
| tyingq wrote:
| Where it exists, there's also /proc/self/cwd, so you can do:
| readlink -e /proc/self/cwd
| rezonant wrote:
| While readlink is the correct way to just read the link, I
| often just use `file ...` since it will also show the symbolic
| link destination (on GNU at least).
| didntcheck wrote:
| ls -l also works. I often do ls -l
| /proc/`pidof foo`/fd
|
| to see what files a taciturn process is working on
| seligman99 wrote:
| Along the same lines, in the Windows world:
|
| The current directory is managed with
| SetCurrentDirectory/GetCurrentDirectory, however the cmd.exe
| command-line shell also stores the current directory for each
| drive in an environment variable like "=C:", and the CRT and
| shell hides all environment variables that start with a "=".
|
| It gets mightily confused if these two concepts of current
| directory ever diverge.
| c0pium wrote:
| Who is still using cmd.exe? I understand that there are system
| processes that still need it, but if you see a human using cmd
| in the year of our lord 2023, that's a cry for help!
| schemescape wrote:
| What should I be using instead?
|
| I don't mind cmd.exe and it launches instantly (same reason I
| frequently use notepad.exe for quick edits). That latter
| quality is very hard to find :)
|
| Edit: but if you meant for scripting, yeah, batch files are
| terrible.
| hiccuphippo wrote:
| Personally I use the bash that comes with git for Windows.
| I only need to use cmd.exe for creating symlinks since
| mklink is a built-in.
| switch007 wrote:
| On my Windows 10 with no profile it takes 1-2 seconds
| (Ryzen 3600/M2/32GB RM). Like, what is it doing? I get
| annoyed if bash on Linux takes like 250ms.
| PhilipRoman wrote:
| It's installed everywhere on any version of windows and works
| fine for interactive tasks (personally I wouldn't write
| anything but the simplest scripts for it, anything with for
| loops is a big no-no)
| vel0city wrote:
| Powershell is installed everywhere on any version of
| Windows that still receives security updates.
| toast0 wrote:
| I do. I don't like PowerShell (and it took me years to
| realize it wasn't a diagnostic tool for power management),
| and I find bash for Windows to be ill fitting. I don't do a
| lot of stuff in the command line on Windows, so working like
| it has for decades is a plus.
| Someone wrote:
| Apart from history and standards, what is the reason for having
| the path to the current directory or even the current directory
| known to the kernel?
|
| The shell already seems to track it, so presumably, that logic
| could have been part of the standard library, and get tracked
| from user-mode.
|
| If the kernel has to track the current directory (e.g. for
| performance reasons, to make accessing files relative to a
| particular directory more efficient), wouldn't just remembering
| the device ID and inode be easier for the kernel?
|
| Alternatively, there could be kernel calls taking (device, inode)
| pairs, and the kernel could be completely ignorant of the
| 'current directory' concept.
|
| That can work; except for naming them 'directory ID' instead of
| 'inode', that's what the first Mac OS hierarchical file system
| did; paths were second-class citizens here.
| cryptonector wrote:
| The kernel doesn't have to know the current directory's path.
| It's enough that it know -and retain an open file reference to-
| the current directory's inode/dnode.
|
| However, if you want things like DTrace, eBPF, or even just
| reading the /proc/PID/cwd symlink to be useful, it helps to
| cache the actual path in the kernel. A DTrace/eBPF script will
| not be able to loop to chase ..s, much less will it be able to
| do the I/O needed to work out the cwd.
|
| The same applies to the names of the files that each FD refer
| to.
|
| Caching these things is just for observability.
| didntcheck wrote:
| Yeah, I remember being surprised by two things when I first
| started learning about Unix implementation
|
| * That $PATH is just an ordinary env var, that many programs
| use by convention
|
| * That CWD _isn 't_, and is in fact a first-class kernel
| concept. I had assumed that it was just a conventional envvar
| that stdlibs prepended before passing absolute paths to
| syscalls
|
| I'm sure there's good reasons why the other way wouldn't work,
| it just amused me that I'd got it wrong in both ways
| benou wrote:
| My guess is kernel needs to know the current directory of a
| process so that when said process tries to open a file without
| an absolute path (eg. just "file.txt" and not "/tmp/file.txt"),
| it can open "$CWD/file.txt".
|
| This must be tracked by kernel, because not all syscalls go
| through libc, you can issue the open syscall directly from a
| process.
|
| There might be other reasons, but I'd bet it's the main one.
| toast0 wrote:
| A reference to the current directory is needed in order to open
| relative filenames. You could conceivably retain a string path
| rather than a reference, but the behavior would be different
| when directories are renamed or unlinked.
| siebenmann wrote:
| The current directory is a long-standing Unix concept, so you'd
| have to trace its history back quite far to hear arguments
| about why it was there. One obvious reason is that relative
| paths are convenient for all sorts of reasons and they require
| a point to be relative to, which is basically 'the current
| directory' in some form.
|
| The kernel knowing the name for the current directory is not
| specific to current directories; it is part of a general system
| of caching the name mappings for directory entries ('dnodes' in
| Linux, a 'name cache' in FreeBSD). Unix kernels added these
| caches because Unix programs spend a lot of time looking up
| names, making the operation worth optimizing in general. Once
| you have a general name cache, you might as well pin the
| entries for actively used entities like current directories and
| open files so that they don't get expired out of the cache and
| you always know (some) name for them.
|
| (One useful complexity of name caches is that you can cache
| negative entries, ie that a given name is not present in a
| directory. In the modern Unix shared library environment where
| shared libraries may be probed for in a whole collection of
| directories every time a program starts up, I suspect this
| saves a nice chunk of kernel CPU time.)
| comex wrote:
| > Shells often use this knowledge so that names like '..' and
| even '.' work on the text version, not the filesystem version.
|
| Which has the odd result that '..' behaves differently between
| shell builtins and normal commands. `cd ..; ls` uses the text
| version, but `ls ..` uses the filesystem version. `cat < ../x`
| uses the text version, but `cat ../x` uses the filesystem
| version.
|
| I like the text behavior in theory, but this inconsistency is
| weird enough that I question the benefit of having the text
| behavior at all.
| 1vuio0pswjnm7 wrote:
| "Sometimes people then write shell scripts and other code that
| assumes '$PWD' is accurate if it's present, which is not
| necessarily true."
|
| Unless one only wants the current directory name at the start of
| a script why not just use the builtin pwd command, $(pwd). Or
| getcwd() if it's "other code".
___________________________________________________________________
(page generated 2023-11-26 23:01 UTC)