[HN Gopher] Lexbor - An open source HTML Renderer library
___________________________________________________________________
Lexbor - An open source HTML Renderer library
Author : bratao
Score : 136 points
Date : 2024-06-11 20:16 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| bratao wrote:
| We have been using https://github.com/rushter/selectolax as a
| faster alternative to BeautifulSoup with html5lib because many
| malformed webpages in the wild don't work with lxml.
| thomasfromcdnjs wrote:
| Ah this answers my question in another comment.
|
| Thanks!
| nwellnhof wrote:
| The problem is that libxml2's 20-year old HTML parser never
| supported HTML5 [1], leading to more and more problems with
| downstream consumers like lxml, PHP or Nokogiri. PHP recently
| switched to Lexbor [2] and Nokogiri to libgumbo [3]. That said,
| I'm hopeful to receive enough funding to implement a HTML5
| parser in libxml2.
|
| [1] https://gitlab.gnome.org/GNOME/libxml2/-/issues/211
|
| [2] https://wiki.php.net/rfc/domdocument_html5_parser
|
| [3] https://github.com/sparklemotion/nokogiri/issues/2204
| postepowanieadm wrote:
| libxml is xml parser, html5 is not xml.
| hliyan wrote:
| Rarely does one see a C++ quick start guide that's actually this
| quick: https://lexbor.com/docs/lexbor/#quick_start
| boxed wrote:
| C, not C++
| lelanthran wrote:
| > Rarely does one see a C++ quick start guide that's actually
| this quick: https://lexbor.com/docs/lexbor/#quick_start
|
| Could be because it isn't C++?
| zamadatix wrote:
| Step 1 is a bit of a "draw the rest of the owl" step in that
| it's either done for you on your specific platform with default
| settings already or you have to go do all of the actually hard
| stuff of building the app (and sure enough that's where the
| typical cmake build step is hidden as well). Step 2 is just
| "and remember to link your code against the hard part when you
| compile it, by the way here's a single minimal example".
| thomasfromcdnjs wrote:
| Inspiring infrastructure.
|
| The module aspect is super cool, is there much adoption with any
| other projects using the individual modules? e.g. a webparser
| using the dom module
| chearon wrote:
| The title made me think this could actually layout and paint
| HTML, but I couldn't find anything remotely layout-related in the
| source tree. Then I found this comment saying even block sizing
| isn't done:
| https://github.com/lexbor/lexbor/issues/219#issuecomment-207....
| Looks like a nice groundwork, though. It's nice to see things
| like parsing and Unicode being part of the same source tree.
| nicoburns wrote:
| We have a decent chunk of layout and paint implemented in an
| HTML renderer I'm working on
| (https://github.com/DioxusLabs/blitz), which is targeting the
| "electron" use case (but with a rust scripting interface rather
| than a JS one).
|
| The implementation is currently very immature and there are a
| lot of bugs and missing features (I only got a first cut of
| inline layout working yesterday (but we already have flexbox
| and grid implemented)), but we're already seeing pretty decent
| results on a bunch of real-world web pages and hope to be at
| the point where we can render most of the web (excl. JS) in the
| next 6 - 12 months.
|
| There are some screenshots on the PR for the inline layout
| branch https://github.com/DioxusLabs/blitz/pull/63
| troupo wrote:
| Quite unusual to see Elixir among languages supported via
| bindings
| lelanthran wrote:
| > Quite unusual to see Elixir among languages supported via
| bindings
|
| Not due to difficulty, usually. Bindings to non-mainstream
| languages are unusual to see.
|
| I never heard of a language that couldn't interface to C in one
| way or another; it's one of the advantages of using C over
| (say) C++.
___________________________________________________________________
(page generated 2024-06-12 23:00 UTC)