[HN Gopher] The Web Assembly Shaper
       ___________________________________________________________________
        
       The Web Assembly Shaper
        
       Author : panic
       Score  : 58 points
       Date   : 2023-07-09 08:01 UTC (15 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | nynx wrote:
       | This is a great usecase for webassembly.
        
       | [deleted]
        
       | lyu07282 wrote:
       | This gives me javascript in PDF vibes, but its probably very
       | useful for highly complex writing systems. Also goes to show what
       | an incredibly complex subject this is, Harfbuzz is pretty much
       | the only open source text shaping library that does it properly
       | as far as i can tell. Behdad Esfahbod [1] is probably one of the
       | most important developers on the planet considering how
       | omnipresent his work has been to this day. Harfbuzz is used in
       | Firefox, Chrome and Safari for example.
       | 
       | [1] https://en.wikipedia.org/wiki/Behdad_Esfahbod
        
         | jfk13 wrote:
         | > Harfbuzz is used in Firefox, Chrome and Safari for example
         | 
         | Not to dispute harfbuzz's significance -- it's a crucial
         | component of much of the software we use every day -- but are
         | you sure about Safari? I thought it relied on Apple's Core Text
         | framework to handle font shaping.
        
           | lyu07282 wrote:
           | not sure, webkit does use harfbuzz though
        
         | armitron wrote:
         | Safari (and macOS) doesn't use Harfbuzz.
        
         | baybal2 wrote:
         | [dead]
        
       | raphlinus wrote:
       | I'll give some context. Shaping is converting a string of unicode
       | code points into a sequence of glyphs, each with an advance and
       | also finer positioning. For typical Latin fonts, shaping is
       | straightforward; each code point generally corresponds to one
       | glyph, but sometimes there are ligatures ("fi" is often one
       | glyph), and then there is kerning, usually placing pairs of
       | glyphs such as AV closer together if there would be a gap between
       | them. But for complex fonts such as Nastaliq (used in writing
       | Urdu), the rules get a _lot_ more complicated.
       | 
       | OpenType shaping is currently defined in terms of a whole bunch
       | of rewrite rules, with some script-specific knowledge (such as
       | reordering vowels in Indic) added in by the shaping engine. The
       | original idea is that it would be declarative and fairly easy for
       | font designers to work with, but in practice it's pretty clunky,
       | and doesn't scale well as complexity goes up. It's also slow to
       | evaluate on modern computers because sequence matching is very
       | branchy, and it's not unusual to require dozens of passes.
       | 
       | A particularly striking example is hieroglyphics. These compose
       | multiple elements together into a block, with rules not unlike
       | CSS grid. Sometimes there's vertical stacking, sometimes
       | horizontal, occasionally more complex interactions like nestling
       | inside an L. It _is_ possible to encode this into OpenType rules,
       | but it 's very much a hack - at heart you need to do fairly
       | simple geometry calculations to add up the total widths and
       | divide them proportionally, but think about writing that as a
       | regex and you'll get some idea how it comes out.
       | 
       | This proposal replaces the OpenType shaping rules with a call
       | into WASM, where you can do these sorts of calculations
       | straightforwardly and in a single pass. It is another Turing
       | complete language (as is OpenType shaping, as proved by Behdad a
       | few years ago, and TrueType hints), but there are excellent off-
       | the-shelf implementations and it's well known how to run it
       | securely sandboxed.
       | 
       | Even for Latin, this kind of thing would be useful for making a
       | font that looks like real handwriting, for example. You can fake
       | that in OpenType to a certain extent, but it goes beyond what the
       | format was designed to handle.
       | 
       | This is a first cut, as it only affects positioning of premade
       | glyphs. One thing I'd like to see going forward is adjusting
       | variation parameters. As an example, a typical Devanagari (Hindi)
       | font has 6 or so different widths of "i" depending on the width
       | of the consonant cluster it composes with (so ri or lki). With
       | variation connected to shaping, you could have one glyph of
       | variable width, and the shaping engine could just set the
       | variation to the right value, and with greater precision to boot.
       | 
       | If I were designing a new font format from scratch, this is
       | _definitely_ the way I 'd do it. I'm excited to see where it
       | goes.
        
         | pavlov wrote:
         | _> "The original idea is that it would be declarative and
         | fairly easy for font designers to work with, but in practice it
         | 's pretty clunky, and doesn't scale well as complexity goes
         | up."_
         | 
         | A sentence that also applies to CSS. Declarative systems for
         | design tend to fail.
         | 
         | I dream of the day when baroque CSS layout rules can also be
         | replaced with tiny WASM programs for computing the exact layout
         | your application needs.
        
       | rikroots wrote:
       | I was researching how to get a WASMified version of harfbuzz into
       | my frontend 2d canvas library (because: text implementation in
       | the 2D Canvas API is genuinely dire) when I discovered this issue
       | had already been addressed back in 2019, when the guy who
       | develops Photopea asked the question and got a positive answer -
       | https://github.com/harfbuzz/harfbuzzjs/issues/10
        
       | lioeters wrote:
       | I'll try a summary: WASM shaper is part of Harfbuzz, a text
       | shaping engine for font rendering. It allows embedding WASM code
       | in the font for influencing the way text is rendered into glyphs.
       | 
       | > The WASM code inside a font is expected to export a function
       | called shape which takes five int32 arguments and returns an
       | int32 status value.
       | 
       | > ..The general goal of WASM shaping involves receiving and
       | manipulating a buffer contents structure, which is an array of
       | infos and positions (as defined below). Initially this buffer
       | will represent an input string in Unicode codepoints. By the end
       | of your shape function, it should represent a set of glyph IDs
       | and their positions.
        
         | slimsag wrote:
         | Font rendering is incredibly complex, perhaps one of the most
         | complex things in computer software.
         | 
         | It already required execution of a virtual machine:
         | 
         | > TrueType systems include a virtual machine that executes
         | programs inside the font, processing the "hints" of the glyphs,
         | in TrueType called "instructions". These distort the control
         | points which define the outline, with the intention that the
         | rasterizer produce fewer undesirable features on the glyph.
         | 
         | Now it just _also_ requires potential execution of arbitrary
         | WASM code, too.
        
           | tadfisher wrote:
           | There have been numerous CVEs in various text-rendering
           | engines for just this reason, and that's because TTF was
           | invented before it became common to render arbitrary fonts
           | downloaded from the Internet. A new format with a well-
           | defined WASM interface would be a welcome development; bonus
           | points for a TTF-to-$newFormat conversion tool.
        
         | bradrn wrote:
         | Note also:
         | 
         | > Specifically, Harfbuzz is purely responsible for _shaping_ ;
         | although Harfbuzz does have APIs for accessing glyph outlines,
         | typically other libraries in the free software text rendering
         | stack are responsible for text segmentation into runs, outline
         | scaling and rasterizing, setting text on lines, and so on.
         | 
         | > Harfbuzz is therefore restricted to turning a buffer of
         | codepoints for a segmented run of the same script, language,
         | font, and variation settings, into glyphs and positioning them.
         | This is also all that you can do with the WASM shaper; you can
         | influence the process of mapping a string of characters into an
         | array of glyphs, you can determine how those glyphs are
         | positioned and their advance widths, but you cannot manipulate
         | outlines, variations, line breaks, or affect text layout
         | between texts of different font, variation, language, script or
         | OpenType feature selection.
        
         | pdpi wrote:
         | Perhaps counterintuitively, this is _exactly_ the sort of
         | usecase WASM is meant for. A VM that needs to be simultaneously
         | lightweight /cheap and very high performance (because it's
         | being used to do the heavy lifting for a slower/heavier
         | runtime), but is still expected to run untrusted code and is
         | therefore thoroughly sandboxed.
        
       ___________________________________________________________________
       (page generated 2023-07-09 23:02 UTC)