Subj : Re: Windows vs Linux
To   : boraxman
From : tenser
Date : Mon Apr 25 2022 14:34:56

On 25 Apr 2022 at 01:01p, boraxman pondered and said...
 
 bo>  te> Ahh, typical ESR foolishness: that book does not have a
 bo>  te> great reputation for a reason.  CSV obviously _is_ a
 bo>  te> delimited text format.  Since you mention /etc/passwd,
 bo>  te> suppose a side wanted to put a colon in the GECOS field;
 bo>  te> how would one do it?  Or they wanted to put arbitrary
 bo>  te> commas in fields (say, 'LastName, FirstName' was the
 bo>  te> local convention), but you still wanted compatibility
 bo>  te> with tools like `chfn` and `finger`?
 bo>  te> 
 bo>  te> There's a reason structured data formats have become
 bo>  te> popular.
 bo> 
 bo> He explains it.  You use an escape character.

With /etc/passwd?  Nope.  That doesn't work.  Because
the parser is are built into a library.  And even on
systems where I can hack the library, I might use
something like LDAP or NIS, or even shell scripts and
rsync or rdist to copy those to machines where I can't
hack the library for some reason.  So no, that doesn't
work for /etc/passwd.

Or did you mean for delimited text formats generally?
In which case, don't CSV files support quoted strings?

 bo> I've written a CSV parser, and one which is based on the delimited
 bo> format.  The latter is far easier.

If one is going to appeal to authority or personal
experience, it's best if one checks one's priors.

I learned compilers from Al Aho, and I've written
parsers for full programming languages with context
sensitive grammars.  Some of them are Internet
facing and used daily by millions of users.  So I
think I speak with some authority when I say that
CSV is not significantly harder than simple
delimited lines of text, which are themselves
trivial to parse.

However, neither is very extensible.  Consider what
happens when one needs to add a new field.  To go
back to the /etc/passwd example, when this last
happened with both Linux and BSD, they had to invent
a new file format that lived in a separate file next
to the legacy V7 format file, and they had to develop
specialized tools to keep these in sync.

Delimited lines of text are great because they're
simple to use and easy to get going.  They work well
in Unix pipelines because most filters were evolved
to work best with that kind of textual data.

They're not so great because they generally don't
evolve gracefully: too much is implicit in the format
itself ("field 3 is always an integer and it's always
the user ID number").  There are no universally
agreed upon formats to represent the full range of
representable data expressible on modern machines.

This is schematized structured formats are useful,
though they are harder to get started with.  However,
once you start using those, informally specified
things like Unix filters start to break down because
they don't understand the structured format.

This naturally led to the rise of things like PowerShell,
which attempt to fit a much richer data model into the
filter paradigm.  Things like nushell, or even things like
Michael Greenberg's work on formally shell specifications
and smoosh are more recent advances.

 bo> CSV is OK to use from a users POV (and if you have a parser already,
 bo> better than a format only accessible to its parent application), but if
 bo> you were making your own format, you wouldn't use it.

Practically every programming language in common use today
has a high-quality CSV library available.

 bo> Steve Balmer.  Sheesh!  I remember that too.  So what?

You missed the point.  Microsoft invested heavily in the
developer experience for Windows, and developers wanted to
use Windows.

 bo>  te> I'd put this rather differently.  Unix wasn't so much designed
 bo>  te> as it emerged as a reaction to overly complex systems squeezed
 bo>  te> onto a tiny (but affordable!) machine.  That first machine
 bo>  te> seemed promising and gave way to another small but affordable
 bo>  te> machine; pipes came a few years later.
 bo> 
 bo> Perhaps, but I find it more pragmatic.  Solutions born from people
 bo> trying to solve problems have stood the test of time.  They may not be
 bo> optimal, often arent, and you could do better if we tried again, but
 bo> they are established and understood.

More pragmatic than what, exactly?

The interesting thing about a research system is that
it is designed to solve problems that are interesting
in some place at some point in time.  Unix is one of
those very rare systems indeed where the research
interests coincided with commercial interests in such
a way that it could _successfully_ make the jump from
research to commercial development.

However, that doesn't mean that the system doesn't owe
its origins -- not to mention its major design
principles -- to the research context it was created
in.  The point is that Unix wasn't designed as a pragmatic
solution to production data processing problems as much
as it evolved to answer interesting research questions.

What's even more interesting is that every system since
has similarly had the benefit of that research.  To bring
this back to the original point -- again -- you may prefer
Linux, but truly, there's very little in there that cannot
be implemented on just about any other base system.

--- Mystic BBS v1.12 A47 2021/12/24 (Linux/64)
 * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)

.