[HN Gopher] Earth-1217: where every CSV file has two header rows...
       ___________________________________________________________________
        
       Earth-1217: where every CSV file has two header rows: column names
       and units
        
       Author : edward
       Score  : 32 points
       Date   : 2024-02-18 20:30 UTC (2 hours ago)
        
 (HTM) web link (hachyderm.io)
 (TXT) w3m dump (hachyderm.io)
        
       | justsomehnguy wrote:
       | > where every CSV file has two header rows
       | 
       |  _sigh_
       | 
       | Why people always try to make it _harder_?
       | Name,string;Qty,units;Weight,kg;Volume,m^3;Just for the sake of
       | it with spaces,string;No don't use comma in the header names - it
       | doesn't make sense,string;
        
         | readthenotes1 wrote:
         | I think it has more to do with Excel properly recognizing the
         | data type rather than humans doing so. It has been too long
         | since I dealt with it to recall the troubles but I know there
         | are things that should be easy that aren't since you can't tell
         | Excel how to interpret the data.
        
           | Kluggy wrote:
           | In the fictional world where excel understands the second
           | header row, it could just as easily understand a unit
           | delimiter and units in a single header row.
           | 
           | Either would make my day as a programmer though.
        
         | 2nenaoki wrote:
         | SSV?
        
         | bobbylarrybobby wrote:
         | Why not do it the other way? Name;string,Qty;units so that if
         | you just double click to open in excel, or use a naive reader
         | that doesn't understand the format (which most won't), the
         | column names are basically correct?
        
         | secondcoming wrote:
         | > No don't use comma in the header names
         | 
         | You just know that someone somewhere will do just that and
         | you'll have to deal with it as best you can.
        
       | demondemidi wrote:
       | There's a simpler answer: if you need something more
       | sophisticated than CSV, then don't use CSV.
       | 
       | This attitude in young programmers kills me. It's the old "let's
       | make one tool that does everything, and i (re)invented it!"
       | attitude they get after a few years on the job.
        
         | elzbardico wrote:
         | I am old too, but if anything I think that young programmers
         | nowadays are not nearly as iconoclasts and rebels as they used
         | to be. And I do miss that.
        
           | demondemidi wrote:
           | that's because of the cambrian explosion of what being a
           | "programmer" means today. there's no time to learn all the
           | history anymore. at the start of the PC era it was possible
           | to understand every microprocessor architecture on the market
           | and how to program them and programming languages for PC
           | applications were sparse (assembly was pretty much the only
           | viable choice although some profitable applications were
           | written in Pascal). i may be off base here but i think 40
           | years ago the majority of programmers knew 80-90% of the
           | history and you just can't do that today unless your career
           | is "historian".
        
             | nullserver wrote:
             | These days it's massive challenge to stay on top of
             | JavaScript and its frameworks. Never-mind anything else.
        
         | croes wrote:
         | Isn't it the opposite?
         | 
         | Instead of make something completely new that does everything
         | expand the well known old by some useful features?
        
           | demondemidi wrote:
           | My 0.02$: For something as deeply embedded and controversial
           | as CSV this won't work. I say this because I've been watching
           | the battle since the 1990s. CSV has a lot of problems, but it
           | is only an issue if you try to make it do too much. For a
           | current example look at JSON. The original BNR grammar fit on
           | the back of a business card. It has major issues too. Yet it
           | hasn't successfully been replaced or augmented without
           | creating even worse problems, like compatibility. Putting
           | lipstick on a pig, like this suggestion for CSV, just creates
           | incompatibility and even more headaches.
        
         | Klonoar wrote:
         | No programmer I know _wants_ to use CSV. It's often a
         | requirement foisted on them by another team or entity that they
         | just have to work around.
        
           | IshKebab wrote:
           | Right. And that other team should not use CSV. Adding
           | multiple column headers isn't any more of a solution because
           | it also requires that other team to fix their shit.
        
             | anon84873628 wrote:
             | In other words it's a classic multi-agent coordination
             | problem (aka network effects), and saying "why doesn't one
             | side unilaterally start acting differently" is a well
             | answered question by game theory.
        
       | SiempreViernes wrote:
       | Imagine their face when they discover they actually live on
       | Earth-1217: https://docs.astropy.org/en/stable/io/ascii/ecsv.html
        
       | malkia wrote:
       | I suggest listening to this
       | https://www.youtube.com/watch?v=OBmNu6lQU2o while reading :)
        
       | Animats wrote:
       | Column headers are a known issue in research data management.[1]
       | Research projects tend to generate large amounts of data from
       | instruments. The raw data is often archived. Years or decades
       | later, someone may find something important in that expensively
       | collected data.
       | 
       | [1] https://libguides.d.umn.edu/c.php?g=307731&p=2176534
        
       | csrtraveler wrote:
       | Only Microsoft can fix this; no enhanced CSV proposal will matter
       | until it's baked into Excel. TSV is marginally better, but not
       | enough.
       | 
       | CSV is the universal file format, and will be forever; nothing
       | else is ever as readable and as portable. You can't easily get
       | JSON out of an Excel sheet, or easily read it in any text editor.
       | 
       | CSV needs just a few tweaks to make it easier to avoid comma,
       | double-quote, and newline issues that suck a bazillion dollars of
       | productivity out of the economy each year with broken data
       | imports.
       | 
       | This is probably the lowest hanging fruit in all of technology: a
       | small change to Excel to dramatically improve data processing /
       | import-export for all humanity.
        
         | no_wizard wrote:
         | I agree. Excel being so dominant in its market segments makes
         | data interop harder for no other reason than they use
         | proprietary default formats and don't quite support CSV in a
         | conformant way
         | 
         | Not quite related but I once saw a proposal that file formats
         | should be regulated to require documented interoperability.
         | This change would foster a ton of competition and evolve
         | standards as an industry to the benefit to all is the argument
         | in a nutshell.
         | 
         | If XLSX was documented for interop it would also go a long way
         | in data independence
        
           | zokier wrote:
           | You mean like ECMA-376, ISO/IEC 29500? See also
           | https://learn.microsoft.com/en-us/openspecs/main/ms-
           | openspec...
           | 
           | Some historical note:
           | https://www.computerworld.com/article/2526864/microsoft-
           | offe...
        
         | secondcoming wrote:
         | How can they fix anything if other software isn't written to
         | obey, or expect, headers?
        
           | anon84873628 wrote:
           | Instead of looking at it in terms of the distinct number of
           | different software products, instead look at the number of
           | physical "integration operations" that happen every day.
           | 
           | A good portion of those (whether it is "the majority", who
           | can say) are actually between Excel and itself, e.g. two
           | people in a company sending files to each other.
           | 
           | This is a multi-agent coordination problem. If Excel (the
           | dominant market participant) makes a change because it is
           | helpful to their own users anyway, that alters the incentive
           | calculus for the other software products too. Suddenly there
           | is a large enough market / population of users / number of
           | integration events where making an investment in the
           | enhancement makes sense.
        
             | politelemon wrote:
             | I agree it makes a lot of sense, and to help with the
             | "marketing" they could even give it a slightly distinct but
             | easy name to indicate it's a CSV flavour. Having something
             | repeatable verbally makes it easier in conversations. I've
             | sent you an Earthy CSV. Oh you can use x, it supports
             | earthy.
        
       | zokier wrote:
       | Or just use something like parquet that has both typed columns
       | and allows arbitrary metadata for columns:
       | https://mungingdata.com/pyarrow/arbitrary-metadata-parquet-t...
        
         | anon84873628 wrote:
         | Well, sending data between machines isn't the problem.
         | Presumably the reason for choosing CSV in the first place is to
         | make it editable by humans in a basic text editor. People get
         | in to trouble when they forget that or try to use it for other
         | uses cases instead.
        
       | nicoburns wrote:
       | What I really want is a tabular format the enables formatting to
       | be interpolated between sections of data with a clean separation
       | of structured data and formatted human-readable content. It would
       | need a new editor of course, but I reckon it could be popular if
       | that editor was good.
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:00 UTC)