[HN Gopher] Earth-1217: where every CSV file has two header rows...
___________________________________________________________________
Earth-1217: where every CSV file has two header rows: column names
and units
Author : edward
Score : 32 points
Date : 2024-02-18 20:30 UTC (2 hours ago)
(HTM) web link (hachyderm.io)
(TXT) w3m dump (hachyderm.io)
| justsomehnguy wrote:
| > where every CSV file has two header rows
|
| _sigh_
|
| Why people always try to make it _harder_?
| Name,string;Qty,units;Weight,kg;Volume,m^3;Just for the sake of
| it with spaces,string;No don't use comma in the header names - it
| doesn't make sense,string;
| readthenotes1 wrote:
| I think it has more to do with Excel properly recognizing the
| data type rather than humans doing so. It has been too long
| since I dealt with it to recall the troubles but I know there
| are things that should be easy that aren't since you can't tell
| Excel how to interpret the data.
| Kluggy wrote:
| In the fictional world where excel understands the second
| header row, it could just as easily understand a unit
| delimiter and units in a single header row.
|
| Either would make my day as a programmer though.
| 2nenaoki wrote:
| SSV?
| bobbylarrybobby wrote:
| Why not do it the other way? Name;string,Qty;units so that if
| you just double click to open in excel, or use a naive reader
| that doesn't understand the format (which most won't), the
| column names are basically correct?
| secondcoming wrote:
| > No don't use comma in the header names
|
| You just know that someone somewhere will do just that and
| you'll have to deal with it as best you can.
| demondemidi wrote:
| There's a simpler answer: if you need something more
| sophisticated than CSV, then don't use CSV.
|
| This attitude in young programmers kills me. It's the old "let's
| make one tool that does everything, and i (re)invented it!"
| attitude they get after a few years on the job.
| elzbardico wrote:
| I am old too, but if anything I think that young programmers
| nowadays are not nearly as iconoclasts and rebels as they used
| to be. And I do miss that.
| demondemidi wrote:
| that's because of the cambrian explosion of what being a
| "programmer" means today. there's no time to learn all the
| history anymore. at the start of the PC era it was possible
| to understand every microprocessor architecture on the market
| and how to program them and programming languages for PC
| applications were sparse (assembly was pretty much the only
| viable choice although some profitable applications were
| written in Pascal). i may be off base here but i think 40
| years ago the majority of programmers knew 80-90% of the
| history and you just can't do that today unless your career
| is "historian".
| nullserver wrote:
| These days it's massive challenge to stay on top of
| JavaScript and its frameworks. Never-mind anything else.
| croes wrote:
| Isn't it the opposite?
|
| Instead of make something completely new that does everything
| expand the well known old by some useful features?
| demondemidi wrote:
| My 0.02$: For something as deeply embedded and controversial
| as CSV this won't work. I say this because I've been watching
| the battle since the 1990s. CSV has a lot of problems, but it
| is only an issue if you try to make it do too much. For a
| current example look at JSON. The original BNR grammar fit on
| the back of a business card. It has major issues too. Yet it
| hasn't successfully been replaced or augmented without
| creating even worse problems, like compatibility. Putting
| lipstick on a pig, like this suggestion for CSV, just creates
| incompatibility and even more headaches.
| Klonoar wrote:
| No programmer I know _wants_ to use CSV. It's often a
| requirement foisted on them by another team or entity that they
| just have to work around.
| IshKebab wrote:
| Right. And that other team should not use CSV. Adding
| multiple column headers isn't any more of a solution because
| it also requires that other team to fix their shit.
| anon84873628 wrote:
| In other words it's a classic multi-agent coordination
| problem (aka network effects), and saying "why doesn't one
| side unilaterally start acting differently" is a well
| answered question by game theory.
| SiempreViernes wrote:
| Imagine their face when they discover they actually live on
| Earth-1217: https://docs.astropy.org/en/stable/io/ascii/ecsv.html
| malkia wrote:
| I suggest listening to this
| https://www.youtube.com/watch?v=OBmNu6lQU2o while reading :)
| Animats wrote:
| Column headers are a known issue in research data management.[1]
| Research projects tend to generate large amounts of data from
| instruments. The raw data is often archived. Years or decades
| later, someone may find something important in that expensively
| collected data.
|
| [1] https://libguides.d.umn.edu/c.php?g=307731&p=2176534
| csrtraveler wrote:
| Only Microsoft can fix this; no enhanced CSV proposal will matter
| until it's baked into Excel. TSV is marginally better, but not
| enough.
|
| CSV is the universal file format, and will be forever; nothing
| else is ever as readable and as portable. You can't easily get
| JSON out of an Excel sheet, or easily read it in any text editor.
|
| CSV needs just a few tweaks to make it easier to avoid comma,
| double-quote, and newline issues that suck a bazillion dollars of
| productivity out of the economy each year with broken data
| imports.
|
| This is probably the lowest hanging fruit in all of technology: a
| small change to Excel to dramatically improve data processing /
| import-export for all humanity.
| no_wizard wrote:
| I agree. Excel being so dominant in its market segments makes
| data interop harder for no other reason than they use
| proprietary default formats and don't quite support CSV in a
| conformant way
|
| Not quite related but I once saw a proposal that file formats
| should be regulated to require documented interoperability.
| This change would foster a ton of competition and evolve
| standards as an industry to the benefit to all is the argument
| in a nutshell.
|
| If XLSX was documented for interop it would also go a long way
| in data independence
| zokier wrote:
| You mean like ECMA-376, ISO/IEC 29500? See also
| https://learn.microsoft.com/en-us/openspecs/main/ms-
| openspec...
|
| Some historical note:
| https://www.computerworld.com/article/2526864/microsoft-
| offe...
| secondcoming wrote:
| How can they fix anything if other software isn't written to
| obey, or expect, headers?
| anon84873628 wrote:
| Instead of looking at it in terms of the distinct number of
| different software products, instead look at the number of
| physical "integration operations" that happen every day.
|
| A good portion of those (whether it is "the majority", who
| can say) are actually between Excel and itself, e.g. two
| people in a company sending files to each other.
|
| This is a multi-agent coordination problem. If Excel (the
| dominant market participant) makes a change because it is
| helpful to their own users anyway, that alters the incentive
| calculus for the other software products too. Suddenly there
| is a large enough market / population of users / number of
| integration events where making an investment in the
| enhancement makes sense.
| politelemon wrote:
| I agree it makes a lot of sense, and to help with the
| "marketing" they could even give it a slightly distinct but
| easy name to indicate it's a CSV flavour. Having something
| repeatable verbally makes it easier in conversations. I've
| sent you an Earthy CSV. Oh you can use x, it supports
| earthy.
| zokier wrote:
| Or just use something like parquet that has both typed columns
| and allows arbitrary metadata for columns:
| https://mungingdata.com/pyarrow/arbitrary-metadata-parquet-t...
| anon84873628 wrote:
| Well, sending data between machines isn't the problem.
| Presumably the reason for choosing CSV in the first place is to
| make it editable by humans in a basic text editor. People get
| in to trouble when they forget that or try to use it for other
| uses cases instead.
| nicoburns wrote:
| What I really want is a tabular format the enables formatting to
| be interpolated between sections of data with a clean separation
| of structured data and formatted human-readable content. It would
| need a new editor of course, but I reckon it could be popular if
| that editor was good.
___________________________________________________________________
(page generated 2024-02-18 23:00 UTC)