[HN Gopher] On-demand JSON: A better way to parse documents?
___________________________________________________________________
On-demand JSON: A better way to parse documents?
Author : warpech
Score : 49 points
Date : 2024-02-09 20:19 UTC (1 days ago)
(HTM) web link (onlinelibrary.wiley.com)
(TXT) w3m dump (onlinelibrary.wiley.com)
| kristianp wrote:
| So they're creating a DOM-like api in front of a sax style parser
| and getting faster results (barring FPGA and GPU research). It's
| released as part of SIMDJson.
|
| I wonder if that kind of front end was done in the age of SAX
| parsers?
|
| Such a well-written paper.
| phh wrote:
| > I wonder if that kind of front end was done in the age of SAX
| parsers?
|
| I though that XPath over SAX was a thing, and xslt was doing
| sax-like parsing, but turns out I'm wrong. Which is logical
| considering XPath can refer to previous nodes. That being said,
| it looks like there is streamable xslt in xslt 3.0, but that
| looks more niche
| jbverschoor wrote:
| Often a combination of sax and dom is usefull. You get many
| GBs of SAX stream, but it usually contains the same kind of
| documents. Creating a DOM at the end a specific token means
| fast processing, but still the easy of use of DOM.
| xiphias2 wrote:
| I don't really understand what's new here compared to what
| SIMDJSON supported already.
|
| Anyways, it's the best JSON parser I found (in any language), I
| implemented fastgron (https://github.com/adamritter/fastgron) on
| top of it because of the on demand library performance.
|
| One problem with the library was that it needed extra padding at
| the end of the JSON, so it didn't support streaming / memory
| mapping.
| asa400 wrote:
| Nice work! I will have to check out your implementation and see
| if I can borrow any of your optimization ideas. I built jindex
| (https://github.com/ckampfe/jindex) because I also wanted a
| faster gron!
| TkTech wrote:
| This on-demand model has been implemented in simdjson for
| awhile. This is just the release of the paper.
|
| Previously, simdjson only had a DOM model, where the entire
| document was parsed in one shot.
| hwestiii wrote:
| On face it, this sounds kind of like the XML::Twig perl module.
___________________________________________________________________
(page generated 2024-02-10 23:00 UTC)