[HN Gopher] The Web Assembly Shaper
___________________________________________________________________
The Web Assembly Shaper
Author : panic
Score : 58 points
Date : 2023-07-09 08:01 UTC (15 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| nynx wrote:
| This is a great usecase for webassembly.
| [deleted]
| lyu07282 wrote:
| This gives me javascript in PDF vibes, but its probably very
| useful for highly complex writing systems. Also goes to show what
| an incredibly complex subject this is, Harfbuzz is pretty much
| the only open source text shaping library that does it properly
| as far as i can tell. Behdad Esfahbod [1] is probably one of the
| most important developers on the planet considering how
| omnipresent his work has been to this day. Harfbuzz is used in
| Firefox, Chrome and Safari for example.
|
| [1] https://en.wikipedia.org/wiki/Behdad_Esfahbod
| jfk13 wrote:
| > Harfbuzz is used in Firefox, Chrome and Safari for example
|
| Not to dispute harfbuzz's significance -- it's a crucial
| component of much of the software we use every day -- but are
| you sure about Safari? I thought it relied on Apple's Core Text
| framework to handle font shaping.
| lyu07282 wrote:
| not sure, webkit does use harfbuzz though
| armitron wrote:
| Safari (and macOS) doesn't use Harfbuzz.
| baybal2 wrote:
| [dead]
| raphlinus wrote:
| I'll give some context. Shaping is converting a string of unicode
| code points into a sequence of glyphs, each with an advance and
| also finer positioning. For typical Latin fonts, shaping is
| straightforward; each code point generally corresponds to one
| glyph, but sometimes there are ligatures ("fi" is often one
| glyph), and then there is kerning, usually placing pairs of
| glyphs such as AV closer together if there would be a gap between
| them. But for complex fonts such as Nastaliq (used in writing
| Urdu), the rules get a _lot_ more complicated.
|
| OpenType shaping is currently defined in terms of a whole bunch
| of rewrite rules, with some script-specific knowledge (such as
| reordering vowels in Indic) added in by the shaping engine. The
| original idea is that it would be declarative and fairly easy for
| font designers to work with, but in practice it's pretty clunky,
| and doesn't scale well as complexity goes up. It's also slow to
| evaluate on modern computers because sequence matching is very
| branchy, and it's not unusual to require dozens of passes.
|
| A particularly striking example is hieroglyphics. These compose
| multiple elements together into a block, with rules not unlike
| CSS grid. Sometimes there's vertical stacking, sometimes
| horizontal, occasionally more complex interactions like nestling
| inside an L. It _is_ possible to encode this into OpenType rules,
| but it 's very much a hack - at heart you need to do fairly
| simple geometry calculations to add up the total widths and
| divide them proportionally, but think about writing that as a
| regex and you'll get some idea how it comes out.
|
| This proposal replaces the OpenType shaping rules with a call
| into WASM, where you can do these sorts of calculations
| straightforwardly and in a single pass. It is another Turing
| complete language (as is OpenType shaping, as proved by Behdad a
| few years ago, and TrueType hints), but there are excellent off-
| the-shelf implementations and it's well known how to run it
| securely sandboxed.
|
| Even for Latin, this kind of thing would be useful for making a
| font that looks like real handwriting, for example. You can fake
| that in OpenType to a certain extent, but it goes beyond what the
| format was designed to handle.
|
| This is a first cut, as it only affects positioning of premade
| glyphs. One thing I'd like to see going forward is adjusting
| variation parameters. As an example, a typical Devanagari (Hindi)
| font has 6 or so different widths of "i" depending on the width
| of the consonant cluster it composes with (so ri or lki). With
| variation connected to shaping, you could have one glyph of
| variable width, and the shaping engine could just set the
| variation to the right value, and with greater precision to boot.
|
| If I were designing a new font format from scratch, this is
| _definitely_ the way I 'd do it. I'm excited to see where it
| goes.
| pavlov wrote:
| _> "The original idea is that it would be declarative and
| fairly easy for font designers to work with, but in practice it
| 's pretty clunky, and doesn't scale well as complexity goes
| up."_
|
| A sentence that also applies to CSS. Declarative systems for
| design tend to fail.
|
| I dream of the day when baroque CSS layout rules can also be
| replaced with tiny WASM programs for computing the exact layout
| your application needs.
| rikroots wrote:
| I was researching how to get a WASMified version of harfbuzz into
| my frontend 2d canvas library (because: text implementation in
| the 2D Canvas API is genuinely dire) when I discovered this issue
| had already been addressed back in 2019, when the guy who
| develops Photopea asked the question and got a positive answer -
| https://github.com/harfbuzz/harfbuzzjs/issues/10
| lioeters wrote:
| I'll try a summary: WASM shaper is part of Harfbuzz, a text
| shaping engine for font rendering. It allows embedding WASM code
| in the font for influencing the way text is rendered into glyphs.
|
| > The WASM code inside a font is expected to export a function
| called shape which takes five int32 arguments and returns an
| int32 status value.
|
| > ..The general goal of WASM shaping involves receiving and
| manipulating a buffer contents structure, which is an array of
| infos and positions (as defined below). Initially this buffer
| will represent an input string in Unicode codepoints. By the end
| of your shape function, it should represent a set of glyph IDs
| and their positions.
| slimsag wrote:
| Font rendering is incredibly complex, perhaps one of the most
| complex things in computer software.
|
| It already required execution of a virtual machine:
|
| > TrueType systems include a virtual machine that executes
| programs inside the font, processing the "hints" of the glyphs,
| in TrueType called "instructions". These distort the control
| points which define the outline, with the intention that the
| rasterizer produce fewer undesirable features on the glyph.
|
| Now it just _also_ requires potential execution of arbitrary
| WASM code, too.
| tadfisher wrote:
| There have been numerous CVEs in various text-rendering
| engines for just this reason, and that's because TTF was
| invented before it became common to render arbitrary fonts
| downloaded from the Internet. A new format with a well-
| defined WASM interface would be a welcome development; bonus
| points for a TTF-to-$newFormat conversion tool.
| bradrn wrote:
| Note also:
|
| > Specifically, Harfbuzz is purely responsible for _shaping_ ;
| although Harfbuzz does have APIs for accessing glyph outlines,
| typically other libraries in the free software text rendering
| stack are responsible for text segmentation into runs, outline
| scaling and rasterizing, setting text on lines, and so on.
|
| > Harfbuzz is therefore restricted to turning a buffer of
| codepoints for a segmented run of the same script, language,
| font, and variation settings, into glyphs and positioning them.
| This is also all that you can do with the WASM shaper; you can
| influence the process of mapping a string of characters into an
| array of glyphs, you can determine how those glyphs are
| positioned and their advance widths, but you cannot manipulate
| outlines, variations, line breaks, or affect text layout
| between texts of different font, variation, language, script or
| OpenType feature selection.
| pdpi wrote:
| Perhaps counterintuitively, this is _exactly_ the sort of
| usecase WASM is meant for. A VM that needs to be simultaneously
| lightweight /cheap and very high performance (because it's
| being used to do the heavy lifting for a slower/heavier
| runtime), but is still expected to run untrusted code and is
| therefore thoroughly sandboxed.
___________________________________________________________________
(page generated 2023-07-09 23:02 UTC)