* * * * * Adventures in Utext > There is one point on the ASCII (American Standard Code for Information > Interchange) ↔︎ JS (JavaScript) spectrum that I haven’t seen, and it’s one > that, as I use Unicode in more complex ways on Gwern.net and have learned > how many obscure features or characters Unicode has, I increasingly think > has been neglected: only UTF (Unicode Transformation Format)-8 text > rendered by a monospace font. Not ASCII, not a weird subset of SGML > (Standard Generalized Markup Language), not troff, not raw terminal codes, > not bitmaps encoded in ASCII—just UTF-8. This document format does only > what pure Unicode text can do—but does everything that pure Unicode can do, > which turns out to be a lot. What if we take Unicode literally, but not > seriously? > > Your typical plain text output strips all formatting. At the most > ambitious, it might have a Unicode superscript or fraction. But we can do > so much more! > “Utext: Rich Unicode Documents · Gwern.net [1]” That was an interesting read (your mileage may vary). To generate the gopher and Gemini versions of my blog, I parse the HTML (HyperText Markup Language) [2] and generate either plain text (for gopher) or Gemtext for Gemini. And I'm still not entirely happy with the output. For emphasized text, I would translate that to “*emphasized*”, which is … okay, I guess? And for [DELETED-deleted-DELETED] text—that was a harder to deal with, and I ended up with “[DELETED-deleted-DELETED]” text. There's no excuse for that. But after reading about Utext, and Uncode's COMBINING SHORT STROKE OVERLAY [3] and COMBINING LOW LINE [4] I thought I might try using those for some typographical niceties that you don't normally get with plain text. And that's when I learned that not all virtual terminals support all of Unicode all that well. And wraping text is … not that trivial anymore [5]. Ah well. For now, it seems to be working, but it remains to be seen if I like the results. Update on Friday, December 8^th, 2023 I reverted this change due to issues [6]. [1] https://gwern.net/utext [2] gopher://gopher.conman.org/0Phlog:2021/12/06.2 [3] https://en.wikipedia.org/wiki/Strikethrough#Unicode [4] https://en.wikipedia.org/wiki/Underscore#Unicode [5] https://www.unicode.org/reports/tr14/ [6] gopher://gopher.conman.org/0Phlog:2023/12/08.1 Email Sean Conner at sean@conman.org .