[HN Gopher] Unicode 15.0 Slide Show
___________________________________________________________________
Unicode 15.0 Slide Show
Author : optimalsolver
Score : 76 points
Date : 2023-02-26 16:10 UTC (6 hours ago)
(HTM) web link (www.babelstone.co.uk)
(TXT) w3m dump (www.babelstone.co.uk)
| paulirish wrote:
| Perhaps a better match for some folks' expectations, the Unicode
| Consortium's YouTube has plenty of talks on
| https://youtube.com/@unicode. Low view counts but often quite
| fascinating
| raffy wrote:
| I have tool which lets you scroll through all characters in a
| large document. It also shows names, scripts, and IDNA-like
| status.
|
| https://adraffy.github.io/ens-normalize.js/test/chars.html
| pavlov wrote:
| Wikipedia has the "no original research" rule. Unicode really
| should have had a similar "no original designs" rule.
|
| Too late now -- it's become an annually refreshed collection of
| fun fashionable clip art instead of an impartial repository of
| humankind's symbols.
| chrisshroba wrote:
| Why shouldn't emoji be an annually refreshed collection of fun
| fashionable clip art? That's exactly how users use emoji - for
| culturally relevant references. As a user, I get excited to see
| all the new emoji each release!
| csande17 wrote:
| The reason why this is a bad idea is that it's hard to remove
| emoji when they go _out_ of fashion. You can 't just pull
| characters out of the set without breaking the backwards
| compatibility assumptions made by every programming language
| that supports Unicode identifiers, and you can't change the
| meaning of characters without a painful, politically
| controversial collaboration between every large tech company
| (see the pistol emoji).
|
| Even ZWJ sequences are forever: we're still stuck with the
| "eye in speech bubble" emoji despite the fact that it's a
| logo for a defunct anti-bullying campaign that doesn't even
| have a working website anymore.
| int_19h wrote:
| Why does it matter, though? The symbols are already drawn,
| and the code that handles the emoji range is already
| written. As for "obsolete" emojis, regardless of the
| original intent, they will be appropriated in no time if
| some suitable use comes along; the official name in the
| Unicode chart is merely a historical curiosity (and,
| indeed, those names are wrong for many "legitimate"
| characters, but can't be changed for the same back-compat
| reasons).
| Groxx wrote:
| I think there's plenty of evidence that people use pictograms.
| Just look at the millions of discord emoji and chat stickers.
| In that respect, they're showing remarkable constraint in only
| adding _twenty_.
|
| tbh my bigger worry is simply around implementation scope and
| complexity - kinda like browsers, Unicode and text rendering is
| perpetually getting harder to create new competitors / complete
| fonts / etc. Emoji are rather small in number and simple to
| implement though, if somewhat costly to include all those
| images / complex svgs.
| tokinonagare wrote:
| Unicode needs a split to separate the serious work of encoding
| existing writing systems and the politically correct meme
| glyphs that become weirder by the day.
| jurimasa wrote:
| What makes a writing system such, and why emoji are not a
| writing system?
| Eisenstein wrote:
| What makes a meme glyph politically correct and why are they
| in unicode?
| Name_Chawps wrote:
| This split already exists. Emoji are kept in a separate block
| from other characters.
| Thing123456 wrote:
| Here's the blog post for the Unicode 15 release:
| https://blog.unicode.org/2022/09/announcing-unicode-standard...
|
| A tiny portion consists of "fun fashionable clip art." The vast
| majority of the changes are new scripts and symbols that allow
| people to correctly input existing texts.
| avgcorrection wrote:
| All threads need at least one irrelevant pet peeve-aside.
| satvikpendem wrote:
| So what? Unicode follows language and language is descriptivist
| not prescriptivist.
| legrande wrote:
| > it's become an annually refreshed collection of fun
| fashionable clip art instead of an impartial repository of
| humankind's symbols
|
| Constraints are sometimes good. It's great to see how something
| like the Eggplant & Peach emoji got hijacked and used as a
| sexual reference.
| photochemsyn wrote:
| You can hire a ghostwriter to write a biography about yourself
| and pay a publishing house for a limited edition print run and
| voila, you can have your own wikipedia page based on an
| authorized source.
|
| Wikipedia's view of what's 'original research' has alway been a
| bit murky, on top of that. Are the primary documents that
| historians rely on acceptable as sources, or only if they've
| been synthesized by an 'accredited historian' (whatever that
| is) in book or research journal form? If publishing original
| research is wrong, why publish a journalist's original research
| which is incorporated into a newspaper article? What about
| original research published as a blog post, is that now an
| acceptable secondary source?
| skissane wrote:
| > You can hire a ghostwriter to write a biography about
| yourself and pay a publishing house for a limited edition
| print run and voila, you can have your own wikipedia page
| based on an authorized source
|
| It isn't as easy as you make it sound - to establish
| notability, they don't accept self-published / vanity press
| sources - so you'd need to get your book published by a
| publisher with an established track record. That's a lot
| harder - it is either something that money can't buy, or at
| least you'd need a lot lot more money to buy it than self-
| publishing charges
|
| Furthermore, one high quality reliable source is generally
| not considered enough for notability, they want multiple
| sources. If you get your biography published, and then get
| some journalists in established media outlets to publish
| articles about it, you'll meet that hurdle too. But if you
| manage that, you are probably actually are notable, as
| opposed to just some random nobody trying to buy their way
| into Wikipedia
| chungy wrote:
| Unicode's stuck to that principle better than Wikipedia has
| stuck to its principle.
| kens wrote:
| Unicode has completely separate rules for characters and
| emojis. Characters and scripts generally need solid
| documentation that they are real existing symbols. Emojis, on
| the other hand, are accepted largely on how likely they are to
| be used. My impression is that the Unicode committee would
| prefer to deal with scripts and characters, but they got stuck
| with emojis for historical reasons and that's what most people
| care about.
|
| Refs: https://www.unicode.org/emoji/proposals.html
| http://www.unicode.org/pending/proposals.html
| avgcorrection wrote:
| No, man. Digital communication should follow whatever history
| for inclusion that text before 1850 did. Write a `:P` (you
| know: HN and emojis...) as monastic marginalia for three
| hundred years and then maybe we'll spare one code point for
| it.
| dhosek wrote:
| I kind of expected this to be more an overview of the new stuff
| in Unicode 15.0. As the author of a Rust Unicode crate
| (finl_unicode), I always like to dig through the release notes to
| see what sort of strange new stuff is on offer.
| PostOnce wrote:
| Tangent:
|
| I recognized the domain and tried to remember why, and now I
| remember.
|
| I'm working on a game, and babelstone.co.uk has probably the
| world's most comprehensive (and high quality) set of runic fonts:
|
| https://www.babelstone.co.uk/Fonts/
|
| https://www.babelstone.co.uk/Fonts/Runic.html
|
| https://www.babelstone.co.uk/Fonts/AngloSaxon.html
| [deleted]
| willm wrote:
| I found this more entertaining than the new Avatar movie.
| TheRealPomax wrote:
| Andrew's Babelmap [1] is one of those applications that, if you
| do anything text or typography related, is basically required
| owning. With a donation, of course.
|
| [1] https://www.babelstone.co.uk/Software/BabelMap.html
| virtualritz wrote:
| I'm usually ok with what macOS Character Viewer offers. I am
| rarely on Windows and didn't know about BabelMap. It looks like
| it fills the gap there.
|
| I work mostly on Linux so I hacked a Character Viewer clone in
| Rust over a weekend recently[1].
|
| It just does what I need but I'm planning to add features to it
| if I find them useful.
|
| So I am curious: what functions does BabelMap offer that you
| can't live without, especially as a typographer?
|
| [1] https://github.com/virtualritz/glyphana
| arm wrote:
| Since you mentioned macOS, it would be remiss of me to not
| mention UnicodeChecker:
|
| https://earthlingsoft.net/UnicodeChecker/index.html
| mycall wrote:
| I would love to see someone make an image to unicode "curve
| fitting" algorithm or converter, similar to ANSIDRAW.
| hollasch wrote:
| See https://shapecatcher.com/.
| einpoklum wrote:
| This is the most important part of Unicode for me:
|
| https://www.unicode.org/reports/tr9/tr9-46.html
|
| because I speak a right-to-left language. Whoever wants to write
| an application involving text entry, and truly support
| localization or internationalization, should take the time to
| read at least section 3:
|
| https://www.unicode.org/reports/tr9/tr9-46.html#Basic_Displa...
| phkahler wrote:
| GNU unifont has the entire MBP hut is a bitmap font. Is there an
| equivalent monospaced scalable font we can use in GPL software?
| politelemon wrote:
| Noto Sans?
| https://fonts.google.com/noto/specimen/Noto+Sans+Mono
| troymc wrote:
| Some random characters didn't render in my browser. Upon
| inspection:
|
| font-family: Georgia, Serif;
|
| I don't think those fonts support all of Unicode. Google created
| their Noto fonts [1] for this purpose; I wonder why those aren't
| being used.
|
| [1] https://en.wikipedia.org/wiki/Noto_fonts
| mistrial9 wrote:
| gentium is a font with a very large number of glyphs also
|
| https://software.sil.org/gentium/
| jfk13 wrote:
| But only for Latin/Greek/Cyrillic scripts; it makes no claim
| to be a pan-Unicode font (family).
| jfk13 wrote:
| Browsers will generally do "fallback" to some other font, if
| the font(s) named in the CSS don't support the characters
| present in the text. But for some of the rarer characters, you
| may not have any available font that supports them.
| Someone wrote:
| See https://en.wikipedia.org/wiki/Fallback_font. It typically
| isn't a browser feature, but an OS one.
|
| Since 1998 MacOS has a "last resort" font that has glyphs
| (not necessarily unique) for every Unicode code point. They
| donated it to Unicode (https://en.wikipedia.org/wiki/Fallback
| _font#Unicode_Last_Res...), so I expect most OSes running
| full-blown modern browsers to have it or something similar
| (those running smaller browser engines may be too space
| constrained to have room for it)
| abudabi123 wrote:
| http://www.chinaknowledge.de/Literature/Science/shuowenjiezi.ht
| ml
|
| I have the noto fonts and ctext dot org's hana fonts but still
| see tofu in the above page. Whatever font is used on the
| iPhone's Pleco app the correctness depends on context where you
| are in the app.
|
| These two examples often are confused: Ri Yue
| nanis wrote:
| And yet there is still no unambiguous lower case "I" or upper
| case "i".
| ClumsyPilot wrote:
| thats the job of a font, not encoding
| nanis wrote:
| No, the fact that there is no codepoint that makes those
| mappings ambiguous is due to the way Unicode decided to save
| to codepoints for seemingly no good reason.
|
| What should be _the_ value of `"I".lower()`? Or, "i".upper()?
|
| And please don't bring up locales. The whole point of
| accepting the complexity of Unicode is to be able to take a
| document which stands on its own without external references.
|
| > Early character encodings also conflicted with one another.
| That is, two encodings could use the same number for two
| different characters, or use different numbers for the same
| character.
|
| > The Unicode Standard provides a unique number for every
| character, no matter what platform, device, application or
| language.[1]
|
| Those statements are outright lies: Unicode does not provide
| a unique number fpr "upper case Turkish dotless i". Nor does
| it provide one for "lower case Turkish dotted i".
|
| If it did, it would be possible to correctly map "i" to "I"
| or "I" and "I" to "i" or "i" without having to know anything
| other than the source codepoint.
|
| The font does not even come into play here.
|
| [1]: https://unicode.org/standard/WhatIsUnicode.html
| Kwpolska wrote:
| Unicode is for representing text, not allowing arbitrary
| manipulation of it. It isn't the job of Unicode to encode
| those relationships. Also, the Turkish `i` stuff is just
| the tip of the iceberg. Should Unicode be able to round-
| trip `'ss'.upper().lower()`? Keeping the existing
| capitalization of ss - SS, you need to define a "uppercase
| S that used to be ss" character. Then there's the Dutch
| `ij`, in which both characters are either uppercase or
| lowercase (`Ij` at the start of a word is incorrect).
| There's a ligature in Unicode, but it's only for
| compatibility with some legacy keymaps. But is there a
| point in adding a new version of "S" that a lot of software
| would not recognize as equivalent to the plain old ASCII
| "S" (and one might end up far away from a ss due to copy-
| pasting or stuff), bringing weird bugs and security issues?
| Should the Dutch throw out all their keyboards just so they
| get a new key for the special IJ ligature?
| [deleted]
___________________________________________________________________
(page generated 2023-02-26 23:00 UTC)