[HN Gopher] A Journey building a fast JSON parser and full JSONPath
___________________________________________________________________
A Journey building a fast JSON parser and full JSONPath
Author : atomicnature
Score : 105 points
Date : 2023-10-12 06:36 UTC (14 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tomthe wrote:
| I like the "Simple Encoding Notation" (SEN) of the underlying
| library: https://github.com/ohler55/ojg/blob/develop/sen.md
|
| " A valid example of a SEN document is:
|
| { one: 1 two: 2 array: [a b c] yes: true } "
| koito17 wrote:
| An interesting observation: if you move the colon on the
| opposite side then you get valid EDN data!
|
| {:one 1 :two 2 :array [a b c] :yes true}
|
| cf. https://github.com/edn-format/edn
|
| Likewise, commas are considered whitespace. They are sometimes
| added to make lengthy maps easier to read.
| kubanczyk wrote:
| > Which is the same as the following JSON: {
| "one": 1, "two": 2, "array": ["a", "b", "c"],
| "yes": true }
|
| That example also caught my attention, but in a bad way. It
| looks just like a comeback of one of the worst ideas of YAML.
|
| My immediate question would be what's the JSON for this SEN
| I've crafted: { array: [string1 string2
| "true" true True TRUE yes y] }
|
| For more fun, there's a single problematic entry here, can you
| spot it?: 1.20.4 1.204.4 1.20
| 1.204 1.20.0 1.20.00 1.20-rc2
|
| Or, level expert, there's exactly one problem here as well:
| 0a1f 0bfd 0c0c 0d01 0e02
| tomthe wrote:
| Thank you for thinking more deeply about this than I did! But
| I do not see a problem in your first example, only true is
| the true true (according to my browser and the linked
| definition on https://www.json.org)
|
| I don't get your other examples, can you explain? I assumed
| that 1.20.4 is not a valid SEN entry, because it starts with
| a digit but is not a number.
| jarym wrote:
| I'm not following: ` { array: [string1 string2 "true" true
| True TRUE yes y] } ` Doesn't look like a valid SEN or JSON.
| The `y` `yes`, `True`, TRUE` aren't valid
| keywords/variables/consts and `string1` and `string2` look
| like variable references which aren't something SEN or JSON
| support. The closest valid thing I can imagine is:
|
| ` { array: ["string1" "string2" "true" true "True" "TRUE"
| "yes" "y"] } `
| mjpa86 wrote:
| aren't they implied strings? If "[a b c]" is an array of 3
| strings, "a", "b" and "c", then True is a string "True".
| That's the problem.
| jarym wrote:
| I must be missing why you think they're implied strings -
| I don't see that in the spec. What I do see is:
|
| "Strings can also be delimited with a single quote
| character which allows for a string to be either "abc" or
| 'abc'."
|
| There's no mention of having a string without a
| delimeter.
| ReleaseCandidat wrote:
| The example below is this:
|
| > array: [a b c]
| jarym wrote:
| ohhh I see it now, that looks like a recipe for...
| issues.
| pjc50 wrote:
| Let me guess: 0e02 is interpreted as floating point?
| k_process wrote:
| Ditto 1.20, and when interpreted as floating point the
| trailing zero loses significance. So as a version this is
| indistinguishable from 1.2
| lazyasciiart wrote:
| Am I missing something about the definition of "tokenStart"? It
| can be 'letter' or three other characters: but all those other
| characters (and more) are already in the definition of
| 'letter'?
| pjc50 wrote:
| See the comment upthread about S-expressions, but .. given that
| this doesn't have a marker for "atom" which it badly needs,
| isn't it strictly worse than S-expressions.
| ithkuil wrote:
| reminder of recent efforts at standardizing JSONPath:
| https://datatracker.ietf.org/wg/jsonpath/about/
| baz00 wrote:
| Is JSON XML yet? Nearly!
|
| I'm going to invent Baz's 11th law of computing here: any data
| format that isn't XML will evolve into a badly specified version
| of XML over time.
| kevingadd wrote:
| With respect for the pain everyone has suffered through due to
| XML... at this point I prefer XML with a good schema to JSON
| any day, even if it's more verbose and more awkward to hand-
| edit. It's just so much easier to validate it or generate code
| to handle it, and you get things like XSLT or XPath if you want
| them.
| Deukhoofd wrote:
| I mean, you can use JSON Schema as well to have similar
| functionality to XML Schema.
| znpy wrote:
| That's exactly the point being made: json is becoming xml.
| tgv wrote:
| The point also feels like passive-aggressively ignoring
| the reason why people use JSON and not XML.
| w23j wrote:
| Can you name some of these reasons? Or give me link?
| Honest question!
| alpaca128 wrote:
| One reason would be massively reduced syntax overhead and
| better readability. I've seen plenty of XML files where
| XML syntax makes up more than 50% of the file's content,
| and trying to read the actual content is tedious. Now
| JSON isn't ideal either - technically you could get rid
| of all commas, colons, and the quotes around most keys -
| but I sure prefer `{"foo": "some \"stuff\""}` over
| something like `<foo><![CDATA[some <stuff>]]></foo>`
| w23j wrote:
| I agree, I would prefer JSON (or YAML) for example for
| configuration files. That is for stuff that humans
| actually read. I was thinking about using JSON/XML as a
| data exchange format between computers, because the
| context of this discussion has revolved about things like
| JSON/XML-Schema, JSON/XPath and SOAP/OpenAPI. There is a
| large trend to replace XML with JSON as data format for
| inter machine communication, and it is confusing to me.
| tgv wrote:
| XML is too unwieldy for human consumption. Editing it is
| error-prone, and those schema-directed editors are even
| worse, because everything requires clicking and clicking
| and clicking.
|
| For machine-to-machine communication, it's very well
| suited, but most data is simple enough, and the XML
| libraries I've used tended to be --let's say-- over-
| engineered, while there are no hoops to jump through when
| you want to parse JSON.
|
| And one thing I always disliked about XML was the CDATA
| section: it makes the message even harder to read, and
| it's not like you're going to use that binary data
| unparsed/unchecked.
|
| XML just tried to formalize data transfer and description
| prematurely, which made it rigid and not even
| sufficiently powerful. I must say that XSLT and XPath
| were great additions, though.
| eviks wrote:
| It's unreadable
| Devasta wrote:
| Honestly for a lot of people they use JSON because thats
| what they have always used; XMLs heyday was like 15 years
| ago, you could be a very senior engineer now and have
| never touched XML.
| w23j wrote:
| I haven't looked at JSON Schema in detail so please correct
| me if I am wrong, but I had the impression that the JSON
| Schema specification is still largely unfinished and
| evolving. That means you need to know which version the
| tool you use supports. And when I was looking for JSON
| Schema validators for Java all I found were projects on
| GitHub, which often were abandoned and referred the user to
| another GitHub project which was also abandoned. There does
| not seem to be support from an established project or
| vendor.
|
| Compare that to XML where we have a plethora of established
| tools (Woodstoxx, JAXB, etc.).
|
| What I have trouble to understand, which everybody else
| just seems to accept as obvious, is why one would take on
| these problems? Is JSON Schema more powerful than XML
| Schema? Does the use of JSON have advantages over using
| XML? When we are talking about a client program calling a
| server API with JSON/XML, why do we care about the format
| of data exchanged? What advantages does JSON have in this
| case in contrast to XML (or for that matter a binary format
| like Protocol Buffers)? Isn't this the most boring part of
| the application, which you would want to just get out of
| the way and work? What are the advantages of JSON over XML
| that would lead me to deal with the problems of evolving
| specifications and unreliable tooling?
|
| (And just to repeat, since everybody seems to have a
| different opinion about this than me, I must be missing
| something and really would like to learn what!)
| pydry wrote:
| All schema languages are a bit like that. You can almost
| always add another layer on top of the validation and
| screw down the validation a bit harder. The strictest
| validation will only be achievable using a turing
| complete language.
|
| OpenAPI is probably used a bit more than json schema, but
| it's contextually limited to APIs (which, to be fair, is
| mostly what JSON is used for).
| w23j wrote:
| I probably phrased my question poorly. Why would I use a
| tool which is not or poorly maintained for a probably
| already outdated version of a specs, when I can use
| something else, that has been used for years by countless
| companies in productions? The advantages must be huge.
| And I don't know what they are.
|
| OpenAPI is another example. There are threads on hacker
| news about generating code from OpenAPI specs. These
| always seem to say "oh, yes don't use tool X, use tool Y
| it does not have that problem, although it also doesn't
| support Z". The consensus seems to be to not generate
| code from an OpenAPI specification but to just use it as
| documentation, since all generators are more or less
| broken. Contrast that with for example JAXB (which is not
| an exact replacement I know), which has been battle
| tested for years.
| pydry wrote:
| I've used jsonschema and it was fine. I didn't think it
| was poorly maintained. By contrast with most XML
| libraries I've used had a myriad of broken edge cases and
| security vulnerabilities brought on by its
| overcomplication and the maintainers' inability to keep
| up.
|
| >The consensus seems to be to not generate code from an
| OpenAPI specification but to just use it as
| documentation, since all generators are more or less
| broken.
|
| OpenAPI still functions just fine as a means of
| documentation and validation.
|
| I'm allergic to all forms of code generation, to be
| honest. If there is an equivalent of XML in this I
| imagine it's even more horrendous. I can just imagine
| chasing down compiler errors indirectly caused by an XML
| switch not set _shudder_.
|
| >Contrast that with for example JAXB
|
| JAXB looks like a bolt on to work around XML's
| deficiencies. There's no need to marshal JSON to special
| funky data structures in your code because lists and
| hashmaps are already built in. You can just use those. An
| equivalent doesn't need to exist.
|
| For schema validation, I think XML has, what, 3 ways of
| doing it? DTDs? XMLSchema? And now JAXB does a bit of
| that on the side too? Does that sound like a healthy
| ecosystem to you? Because it sounds like absolute dogshit
| to me.
| Deukhoofd wrote:
| > I'm allergic to all forms of code generation, to be
| honest. If there is an equivalent of XML in this I
| imagine it's even more horrendous. I can just imagine
| chasing down compiler errors indirectly caused by an XML
| switch not set shudder.
|
| WSDL comes to mind
| w23j wrote:
| I see. Thanks for taking the time to reply!
| Deukhoofd wrote:
| > That means you need to know which version the tool you
| use supports
|
| Honestly the same issue with versioning has been my
| primary issue with XML Schemas in the past. XSD 1.1 for
| example came out over a decade ago, but is still very
| badly supported in most tooling I tried out.
|
| > When we are talking about a client program calling a
| server API with JSON/XML, why do we care about the format
| of data exchanged?
|
| We shouldn't care much, beyond debuggability (can a
| developer easily see what's going on), (de)serialization
| speed, and bandwith use. JSON and protobuf tend to be a
| decent chunk smaller than XML, JSON is a bit easier to
| read, and Protobuf is faster to (de)serialize. This means
| they should generally be preferred.
|
| In the case of a client program calling a server API I'd
| personally have the server do the required validation on
| a deserialized object, instead of doing so through a
| schema. This is generally easier to work on for all
| developers in my team, and gets around all the issues
| with tooling. The only real reason I use schemas is when
| I'm writing a file by hand, and want autocompletion and
| basic validations. In that case versioning and tooling
| issues are completely in my control.
| Traubenfuchs wrote:
| As someone who greatly enjoyed the rigidity of SOAP/xml, which
| made proper architectural planning and careful deprecation
| mandatory, I wonder where we went so wrong. I feel like it's
| all connected to the impreciseness and typelessness of
| JavaScript. SOAP/xml to generate well defined client and server
| entry points in Java is how things should be done and SoapUI
| was a pleasure to use.
| Devasta wrote:
| Honestly, I think a big reason is that Stack Overflow didn't
| exist at XMLs peak, so you had people generating XML by
| concatenation, to predictably disastrous results.
|
| One of the first XSLT transforms I was ever given to maintain
| generated XML by the same method.
| <xsl:text><PRICE></xsl:text><xsl:value-of
| select="PRICE"/><xsl:text></PRICE></xsl:text> and so
| on.
| pjc50 wrote:
| > made proper architectural planning and careful deprecation
| mandatory
|
| That's why it never caught on.
|
| The ability of JSON/Javascript to tape together kinda-working
| solutions _before and instead of_ any kind of specification
| works is hugely powerful, because it allows iterating on the
| requirements by having actual users use the app.
| touisteur wrote:
| I mean I've always found this enlightening, when hearing
| json is 'simple':
| https://seriot.ch/projects/parsing_json.html
| aidos wrote:
| The S stands for Simple
|
| http://harmful.cat-v.org/software/xml/soap/simple
| PhilipRoman wrote:
| Thanks for sharing this, somehow I missed this while
| reading cat-v. Definitely applicable to a couple of other
| technologies too...
| another2another wrote:
| Oh that was a good read.
|
| I lived through all that, and can totally understand why
| people turned away in disgust and agreed on REST instead.
| usrusr wrote:
| In my experience SOAP was near-universally used as an RPC
| encoding, where the schema was whatever types the exposed API
| defined and no-one gave the tiniest anything about the data
| representation on the wire. If you insisted on schema first
| SOAP, people looked at you as if you had fallen through a
| dimensional gate from an alternative history parallel
| universe full of Zeppelins and domesticated dinosaurs. JSON
| on the other hand came riding on that REST wave, where the
| data models on the wire were given more consideration than
| just an outcome of the serializion process best never looked
| at. Some people even considered idempotency more than just a
| funny sequence of letters. No, I'm not surprised at all the
| SOAP mindset disappeared. (But SoapUI was really a pleasure
| to use, spent an ungodly amount of hours staring at that
| thing, never in anger)
| nine_k wrote:
| I'd say there must exist a more ancient law, stating that a
| representation of s-expressions is reinvented whenever a need
| arises for a generic data format.
|
| S-expressions are the most direct representation of a tree:
| (root node node ...). Trees are everywhere, they represent any
| nested structure; lists are logically a subset of trees.
|
| XML is a tree. It has the weird "attribute" node types, a
| legacy of SGML text markup notation. JSON is a tree, obviously.
| So is protobuf, thrift, etc. They all could be serialized as
| s-expressions.
|
| Now, a schema that destined a tree is also a tree. Hence XML
| Schema, JSONSchema, etc.
|
| More, an abstract program that describes a transformation of a
| tree is also a tree; this products homoiconic languages, from
| XSLT to Lisps.
|
| There is nothing special about XML; it's just a particular case
| of a generic law.
| baz00 wrote:
| Completely agree on all points. But there is something
| special about XML: everyone has failed to make something
| better.
| alpaca128 wrote:
| If you said nothing better became an industry standard I
| could see your point, but how exactly is XML better than
| s-expressions? Or, if you want something less generalized,
| KDL (which is roughly XML with 90% less syntax overhead)?
| baz00 wrote:
| XML has superset defined functionality of standardised
| schemas, transformations and query. The same is not true
| for s-expressions.
|
| I've not looked at KDL before but a quick scan suggests
| it's interesting. I will look into it.
| jerf wrote:
| XML has a lot more defined structure than s-expressions.
| S-expressions make cute demos when people just take some
| chunk of data and blast out a conversion to drop into the
| conversation and hold it up as a standard, but it's not a
| fair comparison to take something actually defined and
| then splat out an undefined ad-hoc format in the spur of
| the moment. Of course the latter looks awesome by
| comparison; the example was literally structured to look
| awesome in this exact context.
|
| When you read the s-expression alternatives proposed to
| XML with an eye to "How would I actually code against
| this? How would I actually convince multiple people to
| use the _exact_ same standard as me? How do I support
| _all_ the use cases of interest to me? " they completely
| fall apart. They're _too_ simple. The very fact I have to
| use the plural for _s-expression alternative_ since no
| two of them are every _quite_ the same says quite a bit.
|
| When you need that structure, XML is actually a very good
| choice; the error people made was using it when they
| didn't need that structure. Note how much of the
| complaint about using XML, even in this very
| conversation, is (quite correctly!) "what do I do with
| all these extra structural elements?" If you don't have a
| clear answer to that, don't use XML. If you do, don't jam
| it into s-exprs or JSON either, you end up with an even
| worse mess.
| alpaca128 wrote:
| > How would I actually convince multiple people to use
| the exact same standard as me?
|
| The same way you agree on an XML schema? I don't know if
| I quite understand what you want to say - as I see it
| both are tree structured formats which means they both
| can represent the same information, just that
| s-expressions are less verbose but XML has more existing
| tooling for defining & validating a structure. Though the
| latter is more an aspect of the ecosystem than the format
| itself.
| dragonwriter wrote:
| > Completely agree on all points. But there is something
| special about XML: everyone has failed to make something
| better.
|
| XML's decline from its peak of adoption mean lots of people
| working with data disagree with you.
| hardware2win wrote:
| Why focus on s expr then?
|
| Every data format will eventually evolve into a tree
| nine_k wrote:
| S-exprs are just the simplest.
| cxr wrote:
| > I'd say there must exist a more ancient law, stating that a
| representation of s-expressions is reinvented whenever a need
| arises for a generic data format.
|
| That more ancient law would Greenspun's tenth rule, FYI--or a
| corollary to it, at least.
|
| The law proposed here (as Baz's 11th law) was intended to be
| a humorous and obvious pastiche crafted with Greenspun's quip
| in mind, with the idea being that the reader would be in on
| the joke (being already familiar with it).
|
| 1. <https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule>
| tannhaeuser wrote:
| JSON can be parsed using SGML [1], by instructing SGML to
| interpret JSON tokens such as colons, quotation marks, and
| curly braces as markup. The underlying technique for custom
| lightweight markup is called SHORTREF and can be applied to
| markdown etc. as well.
|
| So considering XML is subsetted from SGML, I guess the answer
| is closer to yes than thought.
|
| Though probably it's worth citing the following quote from that
| paper:
|
| > _If the sweet spot for XML and SGML is marking up "prose
| documents", the sweet spot for JSON is collections of atomic
| values._
|
| [1]:
| https://www.balisage.net/Proceedings/vol17/html/Walsh01/Bali...
| lifthrasiir wrote:
| > JSON can be parsed using SGML, [...]. So considering XML is
| subsetted from SGML, I guess the answer is closer to yes than
| thought.
|
| In the other words, SGML was way too powerful than what we
| actually needed. Of course we are with the benefit of
| hindsight though.
| tannhaeuser wrote:
| > _SGML was way too powerful_
|
| The widespread use of markdown and other lightweight markup
| rather than rigid XML-style fully tagged markup for
| authoring tells otherwise though. And so does the continued
| use of HTML chock full of SGMLisms such as tag inference
| and attribute shortforms that weren't included in the XML
| subset/profile when XML (XHTML) was created to replace
| HTML.
|
| So while XML isn't used as an authoring format on the web
| (nor as delivery format), it's still useful as canonical
| archival format I guess.
| lifthrasiir wrote:
| SGML is a meta-language unlike every other example in
| your reply, so the prevalence of such semi-structured
| languages (including SGML applications) doesn't justify
| SGML itself. Even HTML is not exactly an SGML application
| (except for HTML 4), and in my knowledge implementing
| HTML with a generic SGML implementation was rarely done.
| So the fact that SGML is a near superset of both JSON and
| XML doesn't mean much.
| dgellow wrote:
| XML has other abominations such as XSLT.
| baz00 wrote:
| I'd definitely rather write XSLT than YAML festering in the
| same pot as go-template.
| strken wrote:
| People say this, and yet XML's origins as a markup language
| make it baffling as a data format. No sane human being should
| choose a data format with such confusion between properties
| that no user knows whether to go with <Foo>
| <Shininess>HIGH</Shininess> <Luck>7</Luck>
| </Foo>
|
| or <Foo shininess="HIGH" luck="7" />
|
| and yet countless thousands decided to do just that, for
| reasons that are totally inexplicable to me.
|
| Obviously as a markup language this is fine; as a _data format_
| it 's bizarre, since the division between attribute vs child
| doesn't match most in-memory data structures.
| pydry wrote:
| Yeah, it's a weird attitude. XML died out because it was an
| overcomplicated design-by-committee mess. Quite apart from
| the fact that meant it wouldn't map cleanly to lists and
| hashmaps, necessitating a query language it also led to
| embarrassing debacles like the billion laughs vulnerability -
| problem in the very core of XML.
|
| With some niche exceptions where it has clung on, XML
| basically died. It's time to move on. The fact that we do
| similar sorts of stuff with JSON like data transformations
| and schema validation does not, in any way, shape or form,
| invalidate its flaws.
| baz00 wrote:
| XML is fine.
|
| The overcomplicated mess was the WS-* garbage.
| smikhanov wrote:
| no user knows
|
| The described problem literally doesn't exist in XML. Your
| XML-validating editor will check your document against the
| schema and will not allow for an attribute where the sub-
| element is required and vice versa.
| tyingq wrote:
| I believe they mean for designing the schema in the first
| place. Meaning the impedance match between JSON and their
| chosen language is usually more natural.
| tannhaeuser wrote:
| I'm not disagreeing but the reason XML was used as data
| format is that it has native support in browsers (remember
| XML was created as a simplified SGML subset for eventually
| replacing HTML), the idea being that you can display service
| payloads via simple stylesheet applications or element
| replacement/decoration rather than having to rely on
| JavaScript or other Turing-complete environment for arbitrary
| scripting which was seen as having no place as a central
| technique in classic document-oriented browsing.
|
| JSON became only popular because of similar opportunistic
| effects (ie being already part of the stack via eval()). If
| you look at how typical non-JS backends such as Java or .net
| deal with service request/response data, there's absolutely
| no advantage for either JSON or XML - both are represented as
| class/structure and (de-)serialized via binding frameworks
| and annotations.
| strken wrote:
| There's no particular machine advantage to any human-
| readable format over an equivalent binary format, sure.
| However, if you look at human-"readable" formats that
| predate XML (like HL7[0]) you can appreciate the advantages
| of a tree-like structure with labelled fields when it comes
| to human comprehension. I think XML is often difficult for
| humans to read, and certainly to write, and since this is
| the only reason to use either language it's an important
| factor.
|
| I guess you could argue we should all use Protocol Buffers,
| pickle, Thrift, etc.[1] and only switch to JSON for
| debugging. I wouldn't disagree. Protobuf is apparently
| faster than JSON in the browser.
|
| [0] See https://www.interfaceware.com/hl7-message-structure
| for an example message
|
| [1] I missed Corba and spent the early years of my
| professional life trying not to touch the SOAP, just in
| case I dropped it
| nrclark wrote:
| JSON does have one advantage over XML: it maps cleanly onto
| primitive types in Python and many other languages. XML
| attributes don't really have an unambiguous way to be
| represented using list and map primitives (other than maybe
| an "everything is a map" model, which sucks from a
| usability perspective).
| aforwardslash wrote:
| I beg to differ. JSON only provides a subset of commonly
| available data types (quick example: show me a proper 64
| bit int, a proper date type or a proper money type). And
| "everything is a map" is pretty much how python works,
| but they prefer to call it dicts. I could go on and
| explain how JSON is evolving to have exactly all the
| problems of xml without any of the advantages, and how
| people keep reinventing the wheel (pun intended for
| python fans) ignoring why xml is the way it is (and it is
| quite more robust than anything json). Xml biggest defect
| was verbosity, specially in a http 1.0 context. With http
| 1.1 (so nowadays, legacy tech) , most of these problems
| disappear. I know, parsing of json is quite simple - the
| reason is the format is lacking.
| the8472 wrote:
| > since the division between attribute vs child doesn't match
| most in-memory data structures.
|
| vtables are attributes for pointers. hypergraphs (as used in
| some tagging systems) have attributes on everything,
| including attributes. CBOR has optional type-tags on its
| items.
| baz00 wrote:
| Actually you should never use attributes in XML at all to
| represent data. Your first example is correct.
|
| Everyone is just confused because people who didn't know this
| designed HTML. But also everyone is confused because HTML and
| XML aren't necessarily related other than some parentage in
| SGML.
| tannhaeuser wrote:
| Nope. In markup, _attributes_ are for "metadata", that is,
| anything not rendered to the reader/user, as opposed to
| (element) _content_. The entire purpose of markup is to
| provide a rich text format via decorating plain text usable
| from any text editor. Data exchange, or any other
| application where there is no concept of "rendering to the
| user", is no primary application for markup.
|
| If anything, what's wrong with HTML in this respect is that
| JavaScript and CSS can be put inline into content when
| these should always go into attributes and/or external
| resources linked via src/href attributes. And this flaw
| shows indeed where HTML deviates from SGML proper: when the
| style and script elements were introduced, their "content"
| needed to be put into SGML comment tags <!-- and --> such
| that browsers wouldn't render JavaScript snd CSS as text
| content. I mean, who came up with this brain-dead design?
|
| But CSS is a lost cause anyway. What does it tell you about
| its designers that they thought, starting with a markup
| language having pretty intense syntactic constructs
| already, to tunnel _yet another_ item=value syntax in
| regular markup attributes? Like replacing <h2
| bgcolor=black> by <h2 style="background-color: black"> and
| then claiming attributes are for "behavior" or whatever
| nonsense after the fact. Whoever came up with this clearly
| wasn't a CompSci person. And the syntactic proliferation in
| CSS completely became out of hand, for the simple reason
| that HTML evolution was locked down while W3C was focussed
| on XML/XHTML for over a decade, while the CSS spec process
| was lenient.
| Communitivity wrote:
| I haven't used XML is a long while, but there was a trick I
| had when I designed schemas, back when I did use XML all the
| time. Use an attribute if the data is a primitive String,
| number, or boolean. Break into multiple attributes if the
| data is structured but has only one level and has few
| children. Otherwise use an element. The three rules are
| simple, but produce schemas easy to read, easy to maintain,
| and easy to implement against. One code smell is if you start
| winding up with tons of attributes on one element. That may
| mean you should break the logical concept that element
| represents into multiple concepts, have those concepts be
| nested elements, each with its related attributes.
| OnlyMortal wrote:
| With the origins in SGML in the early 90s, there were some
| basic editors for manual creation.
|
| I suspect the popularity was due to the sax parser and
| "interop" between C++ and Java.
|
| To me coming from ObjC++, json is just a serialised
| dictionary.
| heresie-dabord wrote:
| Corollary: The number (N) of ad hoc support tools needed to do
| any serious work with a given mark-up language is proportional
| to the naivety of the implementation (Y).
| baz00 wrote:
| I like this one a lot.
| crabmusket wrote:
| I see this take often and I think it's pretty bad. JSON (data
| format) and XML (markup format) are very different. Building
| tools for JSON doesn't change that in any way.
|
| And it turns out that both JSON and XML are used for data
| interchange, and when people have data interchange problems,
| they build tooling to help solve those problems (like schema
| validation). That doesn't make JSON "like XML", it just means
| they're discovering the same problem and solving it for the
| format they're using.
| deepakarora3 wrote:
| Nice work! I see that that this is for processing / parsing large
| data sets and where documents do not conform to a fixed structure
| and for Go language.
|
| I made something similar in Java - unify-jdocs -
| https://github.com/americanexpress/unify-jdocs - though this is
| not for parsing - it is more for reading and writing when the
| structure of the document is known - read and write any JSONPath
| in one line of code and use model documents to define the
| structure of the data document (instead of using JSONSchema which
| I found very unwieldy to use) - no POJOs or model classes - along
| with many other features. Posting here as the topic is relevant
| and it may help people in the Java world. We have used it
| intensively within Amex for a very large complex project and it
| has worked great for us.
| latchkey wrote:
| We all know the builtin golang JSON parser is slow.
|
| How about doing comparisons against other implementations?
|
| Like this one: https://github.com/json-iterator/go
|
| Update: found this outdated repo:
| https://github.com/ohler55/compare-go-json
| pstuart wrote:
| Slightly tangential, but Go's JSON handling has long had room for
| improvement and it looks like there's going to be a serious
| overhaul of its capabilities and implementation:
| https://github.com/golang/go/discussions/63397 -- I'm looking
| forward to seeing this land.
___________________________________________________________________
(page generated 2023-10-12 21:01 UTC)