hngopher.com

       [HN Gopher] Unicode 15.0 Slide Show
       ___________________________________________________________________
        
       Unicode 15.0 Slide Show
        
       Author : optimalsolver
       Score  : 76 points
       Date   : 2023-02-26 16:10 UTC (6 hours ago)
        
 (HTM) web link (www.babelstone.co.uk)
 (TXT) w3m dump (www.babelstone.co.uk)
        
       | paulirish wrote:
       | Perhaps a better match for some folks' expectations, the Unicode
       | Consortium's YouTube has plenty of talks on
       | https://youtube.com/@unicode. Low view counts but often quite
       | fascinating
        
       | raffy wrote:
       | I have tool which lets you scroll through all characters in a
       | large document. It also shows names, scripts, and IDNA-like
       | status.
       | 
       | https://adraffy.github.io/ens-normalize.js/test/chars.html
        
       | pavlov wrote:
       | Wikipedia has the "no original research" rule. Unicode really
       | should have had a similar "no original designs" rule.
       | 
       | Too late now -- it's become an annually refreshed collection of
       | fun fashionable clip art instead of an impartial repository of
       | humankind's symbols.
        
         | chrisshroba wrote:
         | Why shouldn't emoji be an annually refreshed collection of fun
         | fashionable clip art? That's exactly how users use emoji - for
         | culturally relevant references. As a user, I get excited to see
         | all the new emoji each release!
        
           | csande17 wrote:
           | The reason why this is a bad idea is that it's hard to remove
           | emoji when they go _out_ of fashion. You can 't just pull
           | characters out of the set without breaking the backwards
           | compatibility assumptions made by every programming language
           | that supports Unicode identifiers, and you can't change the
           | meaning of characters without a painful, politically
           | controversial collaboration between every large tech company
           | (see the pistol emoji).
           | 
           | Even ZWJ sequences are forever: we're still stuck with the
           | "eye in speech bubble" emoji despite the fact that it's a
           | logo for a defunct anti-bullying campaign that doesn't even
           | have a working website anymore.
        
             | int_19h wrote:
             | Why does it matter, though? The symbols are already drawn,
             | and the code that handles the emoji range is already
             | written. As for "obsolete" emojis, regardless of the
             | original intent, they will be appropriated in no time if
             | some suitable use comes along; the official name in the
             | Unicode chart is merely a historical curiosity (and,
             | indeed, those names are wrong for many "legitimate"
             | characters, but can't be changed for the same back-compat
             | reasons).
        
         | Groxx wrote:
         | I think there's plenty of evidence that people use pictograms.
         | Just look at the millions of discord emoji and chat stickers.
         | In that respect, they're showing remarkable constraint in only
         | adding _twenty_.
         | 
         | tbh my bigger worry is simply around implementation scope and
         | complexity - kinda like browsers, Unicode and text rendering is
         | perpetually getting harder to create new competitors / complete
         | fonts / etc. Emoji are rather small in number and simple to
         | implement though, if somewhat costly to include all those
         | images / complex svgs.
        
         | tokinonagare wrote:
         | Unicode needs a split to separate the serious work of encoding
         | existing writing systems and the politically correct meme
         | glyphs that become weirder by the day.
        
           | jurimasa wrote:
           | What makes a writing system such, and why emoji are not a
           | writing system?
        
           | Eisenstein wrote:
           | What makes a meme glyph politically correct and why are they
           | in unicode?
        
           | Name_Chawps wrote:
           | This split already exists. Emoji are kept in a separate block
           | from other characters.
        
         | Thing123456 wrote:
         | Here's the blog post for the Unicode 15 release:
         | https://blog.unicode.org/2022/09/announcing-unicode-standard...
         | 
         | A tiny portion consists of "fun fashionable clip art." The vast
         | majority of the changes are new scripts and symbols that allow
         | people to correctly input existing texts.
        
         | avgcorrection wrote:
         | All threads need at least one irrelevant pet peeve-aside.
        
         | satvikpendem wrote:
         | So what? Unicode follows language and language is descriptivist
         | not prescriptivist.
        
         | legrande wrote:
         | > it's become an annually refreshed collection of fun
         | fashionable clip art instead of an impartial repository of
         | humankind's symbols
         | 
         | Constraints are sometimes good. It's great to see how something
         | like the Eggplant & Peach emoji got hijacked and used as a
         | sexual reference.
        
         | photochemsyn wrote:
         | You can hire a ghostwriter to write a biography about yourself
         | and pay a publishing house for a limited edition print run and
         | voila, you can have your own wikipedia page based on an
         | authorized source.
         | 
         | Wikipedia's view of what's 'original research' has alway been a
         | bit murky, on top of that. Are the primary documents that
         | historians rely on acceptable as sources, or only if they've
         | been synthesized by an 'accredited historian' (whatever that
         | is) in book or research journal form? If publishing original
         | research is wrong, why publish a journalist's original research
         | which is incorporated into a newspaper article? What about
         | original research published as a blog post, is that now an
         | acceptable secondary source?
        
           | skissane wrote:
           | > You can hire a ghostwriter to write a biography about
           | yourself and pay a publishing house for a limited edition
           | print run and voila, you can have your own wikipedia page
           | based on an authorized source
           | 
           | It isn't as easy as you make it sound - to establish
           | notability, they don't accept self-published / vanity press
           | sources - so you'd need to get your book published by a
           | publisher with an established track record. That's a lot
           | harder - it is either something that money can't buy, or at
           | least you'd need a lot lot more money to buy it than self-
           | publishing charges
           | 
           | Furthermore, one high quality reliable source is generally
           | not considered enough for notability, they want multiple
           | sources. If you get your biography published, and then get
           | some journalists in established media outlets to publish
           | articles about it, you'll meet that hurdle too. But if you
           | manage that, you are probably actually are notable, as
           | opposed to just some random nobody trying to buy their way
           | into Wikipedia
        
         | chungy wrote:
         | Unicode's stuck to that principle better than Wikipedia has
         | stuck to its principle.
        
         | kens wrote:
         | Unicode has completely separate rules for characters and
         | emojis. Characters and scripts generally need solid
         | documentation that they are real existing symbols. Emojis, on
         | the other hand, are accepted largely on how likely they are to
         | be used. My impression is that the Unicode committee would
         | prefer to deal with scripts and characters, but they got stuck
         | with emojis for historical reasons and that's what most people
         | care about.
         | 
         | Refs: https://www.unicode.org/emoji/proposals.html
         | http://www.unicode.org/pending/proposals.html
        
           | avgcorrection wrote:
           | No, man. Digital communication should follow whatever history
           | for inclusion that text before 1850 did. Write a `:P` (you
           | know: HN and emojis...) as monastic marginalia for three
           | hundred years and then maybe we'll spare one code point for
           | it.
        
       | dhosek wrote:
       | I kind of expected this to be more an overview of the new stuff
       | in Unicode 15.0. As the author of a Rust Unicode crate
       | (finl_unicode), I always like to dig through the release notes to
       | see what sort of strange new stuff is on offer.
        
       | PostOnce wrote:
       | Tangent:
       | 
       | I recognized the domain and tried to remember why, and now I
       | remember.
       | 
       | I'm working on a game, and babelstone.co.uk has probably the
       | world's most comprehensive (and high quality) set of runic fonts:
       | 
       | https://www.babelstone.co.uk/Fonts/
       | 
       | https://www.babelstone.co.uk/Fonts/Runic.html
       | 
       | https://www.babelstone.co.uk/Fonts/AngloSaxon.html
        
       | [deleted]
        
       | willm wrote:
       | I found this more entertaining than the new Avatar movie.
        
       | TheRealPomax wrote:
       | Andrew's Babelmap [1] is one of those applications that, if you
       | do anything text or typography related, is basically required
       | owning. With a donation, of course.
       | 
       | [1] https://www.babelstone.co.uk/Software/BabelMap.html
        
         | virtualritz wrote:
         | I'm usually ok with what macOS Character Viewer offers. I am
         | rarely on Windows and didn't know about BabelMap. It looks like
         | it fills the gap there.
         | 
         | I work mostly on Linux so I hacked a Character Viewer clone in
         | Rust over a weekend recently[1].
         | 
         | It just does what I need but I'm planning to add features to it
         | if I find them useful.
         | 
         | So I am curious: what functions does BabelMap offer that you
         | can't live without, especially as a typographer?
         | 
         | [1] https://github.com/virtualritz/glyphana
        
           | arm wrote:
           | Since you mentioned macOS, it would be remiss of me to not
           | mention UnicodeChecker:
           | 
           | https://earthlingsoft.net/UnicodeChecker/index.html
        
       | mycall wrote:
       | I would love to see someone make an image to unicode "curve
       | fitting" algorithm or converter, similar to ANSIDRAW.
        
         | hollasch wrote:
         | See https://shapecatcher.com/.
        
       | einpoklum wrote:
       | This is the most important part of Unicode for me:
       | 
       | https://www.unicode.org/reports/tr9/tr9-46.html
       | 
       | because I speak a right-to-left language. Whoever wants to write
       | an application involving text entry, and truly support
       | localization or internationalization, should take the time to
       | read at least section 3:
       | 
       | https://www.unicode.org/reports/tr9/tr9-46.html#Basic_Displa...
        
       | phkahler wrote:
       | GNU unifont has the entire MBP hut is a bitmap font. Is there an
       | equivalent monospaced scalable font we can use in GPL software?
        
         | politelemon wrote:
         | Noto Sans?
         | https://fonts.google.com/noto/specimen/Noto+Sans+Mono
        
       | troymc wrote:
       | Some random characters didn't render in my browser. Upon
       | inspection:
       | 
       | font-family: Georgia, Serif;
       | 
       | I don't think those fonts support all of Unicode. Google created
       | their Noto fonts [1] for this purpose; I wonder why those aren't
       | being used.
       | 
       | [1] https://en.wikipedia.org/wiki/Noto_fonts
        
         | mistrial9 wrote:
         | gentium is a font with a very large number of glyphs also
         | 
         | https://software.sil.org/gentium/
        
           | jfk13 wrote:
           | But only for Latin/Greek/Cyrillic scripts; it makes no claim
           | to be a pan-Unicode font (family).
        
         | jfk13 wrote:
         | Browsers will generally do "fallback" to some other font, if
         | the font(s) named in the CSS don't support the characters
         | present in the text. But for some of the rarer characters, you
         | may not have any available font that supports them.
        
           | Someone wrote:
           | See https://en.wikipedia.org/wiki/Fallback_font. It typically
           | isn't a browser feature, but an OS one.
           | 
           | Since 1998 MacOS has a "last resort" font that has glyphs
           | (not necessarily unique) for every Unicode code point. They
           | donated it to Unicode (https://en.wikipedia.org/wiki/Fallback
           | _font#Unicode_Last_Res...), so I expect most OSes running
           | full-blown modern browsers to have it or something similar
           | (those running smaller browser engines may be too space
           | constrained to have room for it)
        
         | abudabi123 wrote:
         | http://www.chinaknowledge.de/Literature/Science/shuowenjiezi.ht
         | ml
         | 
         | I have the noto fonts and ctext dot org's hana fonts but still
         | see tofu in the above page. Whatever font is used on the
         | iPhone's Pleco app the correctness depends on context where you
         | are in the app.
         | 
         | These two examples often are confused: Ri Yue
        
       | nanis wrote:
       | And yet there is still no unambiguous lower case "I" or upper
       | case "i".
        
         | ClumsyPilot wrote:
         | thats the job of a font, not encoding
        
           | nanis wrote:
           | No, the fact that there is no codepoint that makes those
           | mappings ambiguous is due to the way Unicode decided to save
           | to codepoints for seemingly no good reason.
           | 
           | What should be _the_ value of `"I".lower()`? Or, "i".upper()?
           | 
           | And please don't bring up locales. The whole point of
           | accepting the complexity of Unicode is to be able to take a
           | document which stands on its own without external references.
           | 
           | > Early character encodings also conflicted with one another.
           | That is, two encodings could use the same number for two
           | different characters, or use different numbers for the same
           | character.
           | 
           | > The Unicode Standard provides a unique number for every
           | character, no matter what platform, device, application or
           | language.[1]
           | 
           | Those statements are outright lies: Unicode does not provide
           | a unique number fpr "upper case Turkish dotless i". Nor does
           | it provide one for "lower case Turkish dotted i".
           | 
           | If it did, it would be possible to correctly map "i" to "I"
           | or "I" and "I" to "i" or "i" without having to know anything
           | other than the source codepoint.
           | 
           | The font does not even come into play here.
           | 
           | [1]: https://unicode.org/standard/WhatIsUnicode.html
        
             | Kwpolska wrote:
             | Unicode is for representing text, not allowing arbitrary
             | manipulation of it. It isn't the job of Unicode to encode
             | those relationships. Also, the Turkish `i` stuff is just
             | the tip of the iceberg. Should Unicode be able to round-
             | trip `'ss'.upper().lower()`? Keeping the existing
             | capitalization of ss - SS, you need to define a "uppercase
             | S that used to be ss" character. Then there's the Dutch
             | `ij`, in which both characters are either uppercase or
             | lowercase (`Ij` at the start of a word is incorrect).
             | There's a ligature in Unicode, but it's only for
             | compatibility with some legacy keymaps. But is there a
             | point in adding a new version of "S" that a lot of software
             | would not recognize as equivalent to the plain old ASCII
             | "S" (and one might end up far away from a ss due to copy-
             | pasting or stuff), bringing weird bugs and security issues?
             | Should the Dutch throw out all their keyboards just so they
             | get a new key for the special IJ ligature?
        
             | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-02-26 23:00 UTC)