suckless.org

       Implement the Unicode Bidirectional Algorithm (UAX #9) - libgrapheme - unicode string library
 (HTM) git clone git://git.suckless.org/libgrapheme
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) commit 5998352d2d2e6e37531548f8e986abae5ff8ef02
 (DIR) parent dd15fea026c3e0b389381ae8cc08e0f39fa1a8f7
 (HTM) Author: Laslo Hunhold <dev@frign.de>
       Date:   Tue, 25 Oct 2022 13:20:47 +0200
       
       Implement the Unicode Bidirectional Algorithm (UAX #9)
       
       To be frank, I never heard about this until I started learning more
       about Unicode, but this is an absolute must for all languages that go
       from right to left (Hebrew, Arabic, Farsi, etc.) and any case where you
       mix RTL and LTR languages.
       
       The Unicode Bidirectional Algorithm is the normative procedure you apply
       on a string to obtain embedding levels that can then be used to reorder
       the string such that you obtain the proper reading direction. The
       central aspect is that strings are always stored LTR in memory and only
       reordered for presentation on the screen.
       
       Currently, only ICU and GNU fribidi implement the algorithm, and as
       usual it's pretty convoluted to use them. There are many memory
       allocations, kitchen-sink-madness and legacy cruft, but the demand is
       there (there's even a bidi-patch for dwm[0]).
       
       What's special about this implementation? There are no memory
       allocations at runtime. The user provides a 32-bit-integer-array which
       is then filled with the embedding levels. The levels themselves only
       range from -1 to 125 (by the standard!) and would fit in a signed
       8-bit-integer, but the algorithm naturally needs a scratchpad to store
       processing data.
       
       A complication of the algorithm is that you, at some point, have to
       break the paragraph into lines and based on the line breaks the level
       determination is affected. GNU fribidi and ICU make this very
       complicated and hard to understand. The API is not final as you see it
       here, but the final process will be (each number corresponding to a
       function):
       
               1) "preprocessing" the string up to the part where the algorithm
                  does not depend on the line breaks
               2) determining line embedding levels for a line
                  (by specifying the preprocessed data buffer and an output
                  level-buffer)
               3) reordering a line (by specifying the preprocessed data buffer
                  and an output string that is allowed to be the input string)
       
       Conformance is obviously a large priority: There are literally over a
       million automatic conformance tests for the bidirectional algorithm split
       across the files BidiTest.txt and BidiCharacterTest.txt that are
       automatically parsed into the header gen/bidirectional-test.h.
       
       Currently, only BidiTest.txt is used for tests (which we all pass),
       given bracket-pairs have not been implemented yet. This and (maybe)
       arabic shaping are what is left to be implemented, but this here is
       already a big step.
       
       One more note: Yes, the data files are very large, but they compress
       down very well and the tarball stays below 800K. It's very important
       to me that there's no need to pull any data from the web for compilation
       or testing for obvious reasons.
       
       [0]:https://dwm.suckless.org/patches/bidi/
       
       Signed-off-by: Laslo Hunhold <dev@frign.de>
       
       
       Diff is too large, output suppressed.