[HN Gopher] Lexbor - An open source HTML Renderer library
       ___________________________________________________________________
        
       Lexbor - An open source HTML Renderer library
        
       Author : bratao
       Score  : 136 points
       Date   : 2024-06-11 20:16 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bratao wrote:
       | We have been using https://github.com/rushter/selectolax as a
       | faster alternative to BeautifulSoup with html5lib because many
       | malformed webpages in the wild don't work with lxml.
        
         | thomasfromcdnjs wrote:
         | Ah this answers my question in another comment.
         | 
         | Thanks!
        
         | nwellnhof wrote:
         | The problem is that libxml2's 20-year old HTML parser never
         | supported HTML5 [1], leading to more and more problems with
         | downstream consumers like lxml, PHP or Nokogiri. PHP recently
         | switched to Lexbor [2] and Nokogiri to libgumbo [3]. That said,
         | I'm hopeful to receive enough funding to implement a HTML5
         | parser in libxml2.
         | 
         | [1] https://gitlab.gnome.org/GNOME/libxml2/-/issues/211
         | 
         | [2] https://wiki.php.net/rfc/domdocument_html5_parser
         | 
         | [3] https://github.com/sparklemotion/nokogiri/issues/2204
        
           | postepowanieadm wrote:
           | libxml is xml parser, html5 is not xml.
        
       | hliyan wrote:
       | Rarely does one see a C++ quick start guide that's actually this
       | quick: https://lexbor.com/docs/lexbor/#quick_start
        
         | boxed wrote:
         | C, not C++
        
         | lelanthran wrote:
         | > Rarely does one see a C++ quick start guide that's actually
         | this quick: https://lexbor.com/docs/lexbor/#quick_start
         | 
         | Could be because it isn't C++?
        
         | zamadatix wrote:
         | Step 1 is a bit of a "draw the rest of the owl" step in that
         | it's either done for you on your specific platform with default
         | settings already or you have to go do all of the actually hard
         | stuff of building the app (and sure enough that's where the
         | typical cmake build step is hidden as well). Step 2 is just
         | "and remember to link your code against the hard part when you
         | compile it, by the way here's a single minimal example".
        
       | thomasfromcdnjs wrote:
       | Inspiring infrastructure.
       | 
       | The module aspect is super cool, is there much adoption with any
       | other projects using the individual modules? e.g. a webparser
       | using the dom module
        
       | chearon wrote:
       | The title made me think this could actually layout and paint
       | HTML, but I couldn't find anything remotely layout-related in the
       | source tree. Then I found this comment saying even block sizing
       | isn't done:
       | https://github.com/lexbor/lexbor/issues/219#issuecomment-207....
       | Looks like a nice groundwork, though. It's nice to see things
       | like parsing and Unicode being part of the same source tree.
        
         | nicoburns wrote:
         | We have a decent chunk of layout and paint implemented in an
         | HTML renderer I'm working on
         | (https://github.com/DioxusLabs/blitz), which is targeting the
         | "electron" use case (but with a rust scripting interface rather
         | than a JS one).
         | 
         | The implementation is currently very immature and there are a
         | lot of bugs and missing features (I only got a first cut of
         | inline layout working yesterday (but we already have flexbox
         | and grid implemented)), but we're already seeing pretty decent
         | results on a bunch of real-world web pages and hope to be at
         | the point where we can render most of the web (excl. JS) in the
         | next 6 - 12 months.
         | 
         | There are some screenshots on the PR for the inline layout
         | branch https://github.com/DioxusLabs/blitz/pull/63
        
       | troupo wrote:
       | Quite unusual to see Elixir among languages supported via
       | bindings
        
         | lelanthran wrote:
         | > Quite unusual to see Elixir among languages supported via
         | bindings
         | 
         | Not due to difficulty, usually. Bindings to non-mainstream
         | languages are unusual to see.
         | 
         | I never heard of a language that couldn't interface to C in one
         | way or another; it's one of the advantages of using C over
         | (say) C++.
        
       ___________________________________________________________________
       (page generated 2024-06-12 23:00 UTC)