[HN Gopher] The shape of data
___________________________________________________________________
The shape of data
Author : luu
Score : 50 points
Date : 2022-03-29 20:48 UTC (1 days ago)
(HTM) web link (www.scattered-thoughts.net)
(TXT) w3m dump (www.scattered-thoughts.net)
| [deleted]
| cupofpython wrote:
| Honestly, I dont see the issue with JSON. It is capturing user
| generated content. It's not that '43' is a logged as a string
| instead of an int - it is that '43' is the raw data in quotes. To
| me, that is the same spirit as using "read" instead of "eval" as
| mentioned elsewhere. Yes the read-print-loop fails for JSON - but
| JSON only has this failing when you are working with code-
| generated values. At the end of the day - a user type the 4 and 3
| keys on their keyboard and that was captured. To say it is an int
| or a str or whatever brings back the need to understand memory
| representations.
|
| for example - when parsing json with python, you can apply the
| same principles you would to python objects. That is, assume the
| item is the format you know it should be (or test it first to be
| safe)
|
| so even though the json is {'43' : ['bob','alice']} - you can do
| an int() cast if you need to do something with that data that
| requires it to have a type. Otherwise it is represented as it was
| typed.
|
| I do agree with the article overall though!
| joshlemer wrote:
| So, JSON has non-string values in other positions (as elements
| of arrays, or values in an object). Wouldn't your argument also
| lead to the conclusion that we don't need numbers at all, since
| we could get by with
|
| { "foo": "42", "bar": ["1", "2", "3"] }
|
| There's also the issue of values with multiple equivalent
| string representations. I want 42.1 to equal 42.10 and 42.100.
| I also want {"foo":1,"bar":2} to equal {"bar":2,"foo":1} but
| with just strings you don't get that:
|
| { "{\"foo\":1,\"bar\":2}": 1, "{\"bar\":2,\"foo\":1}": 1,
| "42.1": 2, "42.10": 2, "42.100": 2 }
|
| should have 2 keys but has 5
| thedudeabides5 wrote:
| This wishlist sounds like rose.ai
|
| "Wishlist
|
| Data model:
|
| A small set of primitives eg writing an inspector gui eg
| searching for references to some id But still able to represent
| types and invariants Able to reify changes as data eg for undo
| log eg for real-time collaboration
|
| All data has some name/path/location by which it can be referred
| to eg no hidden state in closures eg no hidden closures in the
| event loop queues Avoid depending on pointers for identity
|
| Data notation:
|
| A textual representation which is easy to read/write Used
| consistently everywhere - one standard way of picturing data
| Self-describing - doesn't require out-of-band type/schema
|
| Uses layering to add capabilities while mimicking familiar
| notation Uses shorthands and exploits context to reduce redundant
| information eg clojure namespace aliases eg unison names
|
| Code:
|
| The notation for code is a superset of the notation for data eg
| can print data and copy-paste into code / repl
|
| Can choose the mapping between tags in data notation and types in
| code Code can be represented as data with low mental distance
|
| The codebase is also data - can trivially analyze whole thing
| including dependencies without having to execute side effects
|
| Maybe, if possible, reify the execution of code as data
|
| Crucially, the data model and the data notation need to be co-
| designed, because it's so easy to make choices in the data model
| that prevent creating a good data notation later."
___________________________________________________________________
(page generated 2022-03-30 23:01 UTC)