[HN Gopher] On-demand JSON: A better way to parse documents?
       ___________________________________________________________________
        
       On-demand JSON: A better way to parse documents?
        
       Author : warpech
       Score  : 49 points
       Date   : 2024-02-09 20:19 UTC (1 days ago)
        
 (HTM) web link (onlinelibrary.wiley.com)
 (TXT) w3m dump (onlinelibrary.wiley.com)
        
       | kristianp wrote:
       | So they're creating a DOM-like api in front of a sax style parser
       | and getting faster results (barring FPGA and GPU research). It's
       | released as part of SIMDJson.
       | 
       | I wonder if that kind of front end was done in the age of SAX
       | parsers?
       | 
       | Such a well-written paper.
        
         | phh wrote:
         | > I wonder if that kind of front end was done in the age of SAX
         | parsers?
         | 
         | I though that XPath over SAX was a thing, and xslt was doing
         | sax-like parsing, but turns out I'm wrong. Which is logical
         | considering XPath can refer to previous nodes. That being said,
         | it looks like there is streamable xslt in xslt 3.0, but that
         | looks more niche
        
           | jbverschoor wrote:
           | Often a combination of sax and dom is usefull. You get many
           | GBs of SAX stream, but it usually contains the same kind of
           | documents. Creating a DOM at the end a specific token means
           | fast processing, but still the easy of use of DOM.
        
       | xiphias2 wrote:
       | I don't really understand what's new here compared to what
       | SIMDJSON supported already.
       | 
       | Anyways, it's the best JSON parser I found (in any language), I
       | implemented fastgron (https://github.com/adamritter/fastgron) on
       | top of it because of the on demand library performance.
       | 
       | One problem with the library was that it needed extra padding at
       | the end of the JSON, so it didn't support streaming / memory
       | mapping.
        
         | asa400 wrote:
         | Nice work! I will have to check out your implementation and see
         | if I can borrow any of your optimization ideas. I built jindex
         | (https://github.com/ckampfe/jindex) because I also wanted a
         | faster gron!
        
         | TkTech wrote:
         | This on-demand model has been implemented in simdjson for
         | awhile. This is just the release of the paper.
         | 
         | Previously, simdjson only had a DOM model, where the entire
         | document was parsed in one shot.
        
       | hwestiii wrote:
       | On face it, this sounds kind of like the XML::Twig perl module.
        
       ___________________________________________________________________
       (page generated 2024-02-10 23:00 UTC)