[HN Gopher] Unicode character "" (U+A66E) is being updated
___________________________________________________________________
Unicode character "" (U+A66E) is being updated
Author : SerCe
Score : 251 points
Date : 2022-09-19 11:39 UTC (11 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| iLoveOncall wrote:
| I want to be that person that has so much time on their hand they
| can afford to waste it on pointless things like this.
| shadowgovt wrote:
| There's a career path to get there. It involves becoming
| someone who cares deeply about the ways and means of digitizing
| data stored in analog media. Drill down deep enough, and you'll
| find yourself in a fascinating world of coding an error.
|
| There are things like the "ghost characters," which are
| codepoints in Japanese that map to characters that were
| basically transcription errors when the team was putting
| together a full set of Kanji. Some characters with an extra
| horizontal line snuck into the set; they were likely caused by
| a transcription error because the character got split onto two
| pieces of paper by lines of text being copy-pasted into a
| records book, and the shadow cast by the thin extra layer of
| paper was misinterpreted as another stroke.
|
| https://www.dampfkraft.com/ghost-characters.html
| hidudeurcool wrote:
| dmz73 wrote:
| And then people wander why software developers don't care to
| support Unicode properly. First 60,000+ characters made sense,
| than few more were needed and Unicode suddenly got to play with a
| 1,000,000+ and just went off the rails.
| lifthrasiir wrote:
| You can support Unicode without ever having to display all
| possible characters "correctly".
| perihelions wrote:
| Related thread, about non-existent CJK characters ending up in
| Unicode through transcription mistakes ("ghost characters"):
|
| https://news.ycombinator.com/item?id=32095502 ( _" A Spectre Is
| Haunting Unicode"_, 180 comments)
|
| edit to add: The top thread in the 2020 repost was about ,
|
| https://news.ycombinator.com/item?id=24955536
| sshine wrote:
| (+D)+Shan +-+
| SnooSux wrote:
| Be not afraid
| drewzero1 wrote:
| Bee not afraid?
| msla wrote:
| Bee Nut Afraid.
|
| (When an apiarist is terrified.)
| thechao wrote:
| +-+no( _ no)
| loudmax wrote:
| > ===
|
| The James Webb Space Telescope.
| throwaway98797 wrote:
| cant unsee
| Izkata wrote:
| Don't worry, you'll forget about this one when it gets six
| more eyes.
| rafaelturk wrote:
| Finally!
| hulitu wrote:
| > Unicode character "" (U+A66E) is being updated
|
| I fear this will lead to a lot of "bug fixes and performance
| improvements" in Android. /s
| vintermann wrote:
| Biblically accurate O?
| Waterluvian wrote:
| I'm not sure how I feel about this. I'm not an expert by any
| means.
|
| But something just doesn't feel right when you've got unicode
| with a character with one known use from forever ago.
|
| Doesn't this open up the flood gates to just a ridiculous amount
| of work or else biased gatekeeping?
|
| How much work would it be to implement your own font of the
| entire unicode set? Or is that not actually a thing and fonts
| implement as-desired subsets?
| lifthrasiir wrote:
| > How much work would it be to implement your own font of the
| entire unicode set? Or is that not actually a thing and fonts
| implement as-desired subsets?
|
| You can't, and you are not expected to do so. You are limited
| by OpenType limit (65,535 glyphs), various shaping rules that
| possibly increase the number of required glyphs, and lack of
| local or historical typographic convention. Your best bet is
| either to recruit a large number of experts (e.g. Google Noto
| fonts) or to significantly sacrifice quality (e.g. GNU
| Unifont).
| poizan42 wrote:
| A single OpenType font file is limited to 65,535 glyphs.
| Nothing stops your font from being implemented as a series of
| .otf files (besides what people think of as a "font" when it
| comes to usage on computers).
|
| But yes, time constraints are the limiting factor. I don't
| think anyone is going to dedicate their entire life to making
| a single font.
| lifthrasiir wrote:
| While you are right that one logical font can consist of
| multiple font files (or possibly a OpenType collection),
| this constraint does affect most typical fonts. Wide-
| coverage CJK fonts already hit this limit. Fonts supporting
| only one of Chinese, Japanse and Korean don't need that
| many glyphs, and probably even two of them will be okay,
| but fonts with all three sets of glyphs won't. It is
| therefore common to provide three versions of fonts, all
| differently named.
| Waterluvian wrote:
| I wasn't aware of the 2^16 limitation. Thank you for the
| notes!
| aasasd wrote:
| I'll tell you more: there are Unicode glyphs without known
| usage.
| gumby wrote:
| There are quite a few such characters in Unicode because
| academic articles about things like cuneiform need to be
| digitized too. And because the historical record is so sparse,
| we often have vanishingly few, or only one example of a
| character, and perhaps no way to know if it was a misprint or a
| real character.
|
| Actually this character seems like a scribe's joke, no
| different from the illustrated characters at the beginning of
| medieval paragraphs (all of which are represented in Unicode as
| A, B or whatever). But the point still holds.
|
| It even holds for modern languages -- consider the ghost
| characters needed for round trip compatibility: https://weekly-
| geekly.imtqy.com/articles/418717/index.html
|
| (actually cuneiform is a poor example; perhaps Linear A would
| have been a better example)
| diimdeep wrote:
| Being stuck on macOS Catalina with Unicode 12, I think there is a
| way to upgrade to newer versions and get new emoji support [1][2]
|
| [1] https://apple.stackexchange.com/questions/278937/is-
| there-a-... [2] https://forums.macrumors.com/threads/updating-
| maverickss-emo...
| quickthrower2 wrote:
| Crazy that it renders in HN comments (which rejects a lot of
| Unicode)
| etamponi wrote:
| By the same reasoning, the 7-eyed O has now been used more than
| once, so it deserves a glyph! So the right way to do this is to
| introduce a new character for the correct glyph, and also leave
| the current one (perhaps changing the title). Otherwise these
| tweets won't make when read by someone that updated to Unicode
| 15.0
| echelon wrote:
| _This_ thread on HN won 't make sense in the future if the
| Unicode body replaces
|
| Make a new character!
| koboll wrote:
| Honestly it probably deserves the Pluto treatment:
| decertification as a character. One historical use in the 1400s
| doesn't merit a character and never did.
| Pinus wrote:
| Isn't there an entire Unicode block for the symbols on the
| Phaistos disc? Yes:
| https://en.wikipedia.org/wiki/Phaistos_Disc_(Unicode_block) .
| I suppose those occur in quite a few documents _about_ the
| disc, even though the disc itself is the only known document
| written _in_ those symbols.
| colejohnson66 wrote:
| Unicode's mission is to make _every_ document "roundtrip-
| able". Even if a character is only used once, it should be
| possible to save a plaintext version of the containing
| document without losing any information. Roughly, I should be
| able to put a transcription of that one translation from the
| 1400s on Wikisource without using images.
|
| You may disagree with me, and that's fine, but it doesn't
| change Unicode's mission. Besides, there's room for 1,112,064
| codepoints[a], and only 149,146 are in use. It's predicted
| we'll never use it up, so what harm is there in one codepoint
| no one will ever need?
|
| [a]: U+10'FFFF max; it used to be U+FFFF'FFFF, but UTF-16 and
| surrogates ruined that
| djur wrote:
| Unicode doesn't have a character for every illuminated
| initial, nor should it. I'm not clear on why this character
| should be considered any differently.
| skyyler wrote:
| http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf
|
| It was introduced with other "ocular O"s which are
| seemingly more commonly used than this one.
|
| It's not quite an illuminated initial.
| akavel wrote:
| Wow, this is probably the most actually useful and
| interesting comment in this whole discussion, thanks! For
| anyone interested, the most relevant quotes from the
| document are in particular:
|
| _" This document requests the addition of a number of
| Cyrillic characters to be added to the UCS. It also
| requests clarification in the Unicode Standard of four
| existing characters. This is a large proposal. While all
| of the characters are either Cyrillic characters (plus a
| couple which are used with the Cyrillic script), they are
| used by different communities. Some are used for non-
| Slavic minority languages and others are used for early
| Slavic philology and linguistics, while others are used
| in more recent ecclesiastical contexts. We considered the
| possibility of dividing the proposal into several
| proposals, but since this proposal involves changes to
| glyphs in the main Cyrillic block, adds a character to
| the main Cyrillic block, adds 16 characters to the
| Cyrillic Supplement block, adds 10 characters to the new
| Cyrillic Extended-A block currently under ballot, creates
| two entirely new Cyrillic blocks with 55 and 26
| characters respectively, as well as adding two characters
| to the Supplementary Punctuation block, it seemed best
| for reviewers to keep everything together in one
| document._
|
| _(...)_
|
| _MONOCULAR O , BINOCULAR O , DOUBLE MONOCULAR O , and
| MULTIOCULAR O are used in words which are based on the
| root for 'eye'. The first is used when the wordform is
| singular, as k; the second and third are used in the root
| for 'eye' when the wordform is dual, as chi, chi; and the
| last in the epithet 'many-eyed' as in serafimi
| mnogochityii 'many-eyed seraphim'. It has no upper-case
| form. See Figures 34, 41, 42, 55. "_
| j-bos wrote:
| Because it's already been added to unicode. Now it's not
| a question of whether or not to add, rather to remove,
| and unicode almost by definition does not remove.
| thayne wrote:
| Unicode does have deprecated code points though. Not that
| I necessarily think making this character deprecated
| makes sense.
| lmm wrote:
| Meanwhile one still can't roundtrip regular Japanese
| without some kind of funky out-of-band signalling. By
| itself this kind of thing is harmless, but it speaks to
| poor prioritization from Unicode.
| bityard wrote:
| Today, I wrote a document by hand containing a new symbol
| that only looks like genitalia if you squint really hard.
| Where do I apply to have it included in unicode so that it
| can be digitized properly?
| lucumo wrote:
| Rule-lawyering wise-asses try to mess with many policies.
| It's rarely a sensible indictment of a policy, nor is it
| very effective. Anyone dealing with such people just
| ignores them.
| fluoridation wrote:
| What's the criterion that includes the document in the
| tweet, but excludes the document referenced by the GP?
| bzxcvbn wrote:
| https://www.unicode.org/pending/proposals.html
|
| https://www.unicode.org/emoji/proposals.html#selection_fa
| cto...
| fluoridation wrote:
| I don't see any anything on the inclusion of symbols that
| are not icons, such as U+A66E, or the symbol proposed by
| bityard.
| koala_man wrote:
| Can you reuse or ?
| 0xbadcafebee wrote:
| And for years we've just been using eggplants!
| 411111111111111 wrote:
| ( no >= [?] <= ) no mi + - +
|
| ~ ( Jjut Jo Jjut ) ~
|
| ( // y . // y )
| [deleted]
| layer8 wrote:
| > Unicode's mission is to make every document "roundtrip-
| able".
|
| Only for characters from existing coded character sets.
| modzu wrote:
| why isnt the artist formerly known as prince in unicode?
| koboll wrote:
| Okay, let's take a look at the context where the
| multiocular o was used:
| https://en.wikipedia.org/wiki/Multiocular_O
|
| I see that near it, there is an ef (F) with a very tall
| stem.
|
| Why should that not be included as a standard unicode
| character? Surely it is used more often than the
| multiocular o.
|
| You may say "it's a decorative flourish", which is of
| course true, but so is the multiocular o. Should we allow
| every conceivable decorative flourish into unicode? What is
| the standard for where flourishes become distinct
| characters?
| shp0ngle wrote:
| The thing is, this is just a decorative way to write "o".
| It's not a specific letter by any definition.
|
| I can't speak of other letters that were added in the same
| batch in 2007. Some of them seam meaningful, I donno, I
| don't speak old church slavonic (although I am told it
| sounds like Croatian, which I understand a little)
|
| http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf
| kadoban wrote:
| For as inclusive as that mission is, it seems weird to me
| how limited in certain areas unicode is. For instance,
| people use peach emoji since there isn't one for butt,
| eggplant since there's no penis, etc.
|
| This doesn't contradict the stated goal exactly, but it
| seems against the spirit of it at least.
| yakireev wrote:
| One could argue that emoji should have never been added
| to Unicode in the first place. Peaches and butts are
| images, pictures, illustrations, whatever - but they are
| not characters. There's no writing system which has a
| colored drawing of a peach as a character.
| eternityforest wrote:
| But that doesn't change the fact that most people use
| them snd like them, and there is not much technical
| disruption. They just chose practicality over purity.
| yakireev wrote:
| Most people (me included) like funny cat videos and send
| funny cat videos. Shall we include some to Unicode?
|
| I mean, this ship has long sailed, but that was a mistake
| nevertheless. Not everything has to be a unicode
| character.
| PeterisP wrote:
| Yes there is - a widely used character set (when Unicode
| talks about "writing systems" it explicitly includes all
| the computer character sets used in practice pre-unicode)
| used by japanese 'featurephones' had emoji characters, so
| in order to be able to include that character set in
| unicode, unicode had to add emoji.
| [deleted]
| bzxcvbn wrote:
| Yes there is. We're using it right now. Even linguists
| are studying the use of emoji today.
| vcxy wrote:
| vcxy wrote:
| I tried to reply with just a unicode penis but that got
| flagged immediately, so I'll be more substantial and
| leave out the actual penis. It appears in Egyptian
| hieroglyphs, so actually there is a penis included in
| unicode.
| bhk wrote:
| Cataloging every doodle ever drawn inline with text by
| anyone at any time in history would exhaust any finite set
| of code points.
| tzs wrote:
| If that was once its mission, it was clearly abandoned long
| ago. They rejected Klingon characters on the grounds that
| it has low usage for communication, and that many of the
| people who do communicate in Klingon use a latinized form.
|
| seems to just be a fancy way of writing O. I haven't seen
| anything that says it has a different meaning. The
| arguments for excluding Klingon seem to apply even more so
| to .
| mananaysiempre wrote:
| If you look through the old mailing list postings, the
| oft-left-implicit problem with Klingon (as well as
| Tengwar, Emerson's pet project) is that it may get people
| into legal trouble (even though in a reasonable world it
| shouldn't be able to). So in the unofficial CSUR / UCSUR
| they remain.
|
| A weird solitary character from the 1400s isn't subject
| to that, and even if it's a mistake it's probably not
| worth breaking compatibility at this point (I think the
| last such break with code points genuinely changing
| meanings was to repair a mistaken CJK unification some
| time in the 00s, and the Consortium may even have tied
| its own hands in that regard with the ever-more-strict
| stability policies).
|
| Similarly, for example, old ISO keyboard symbols (the [?]
| for erase backwards, but also a ton of virtually unused
| ones) were thrown in indiscriminately at the beginning of
| the project when attempting to cover every existing
| encoding, but when the ISO decided to extend the
| repertoire they were told to kindly provide examples of
| running-text (not iconic) usage in a non-member-body-
| controlled publication. (Crickets. The ISO keyboard input
| model itself only vaguely corresponds to how input
| methods for QWERTY-adjacent keyboards work in existing
| systems--as an attempt at rationalization, it seems to
| mostly be a failed one.)
| bobsmooth wrote:
| Unless it's legitimately someone's native tongue,
| conlangs shouldn't be in unicode. If there are kids out
| there that are native Klingon speakers, then you can make
| the argument it should be included.
| reaperducer wrote:
| _One historical use in the 1400s doesn 't merit a character
| and never did_
|
| One _known and surviving_ use. It is possible that it exists
| in other places, since the vast majority of the planet 's
| written work has not been digitized. It may also have been
| used other places that have not survived.
|
| Just because it's not important to you does not mean it is
| not important.
|
| The fact that is survived for 600 years makes it interesting
| and worth saving. It is infinitely unlikely that anything you
| do, write, or say will last that long.
| bhaney wrote:
| > It is infinitely unlikely that anything you do, write, or
| say will last that long
|
| Ouch
| koboll wrote:
| Sure it's possible, but there should be a higher bar than
| "it's possible it's used more than once" for meriting
| inclusion in the standard keyboard of billions of devices
| worldwide.
| tsimionescu wrote:
| The thing is, looking at the page, there are many other
| characters that were not added - the large red S-looking
| characters, for example. But for some "bizarre" reason,
| those were not included in Unicode...
|
| Of course, the simple answer is that Unicode actually
| includes any character that someone cares enough to ask to
| be added, with rare exceptions.
| runarberg wrote:
| idk. when the word Planet was redefined such that Pluto was
| no longer a planet, it kind of ruined the word Planet. It
| suddenly wasn't nearly as useful as a word as it used to
| (even though now it has a precise meaning). For most people
| that use the word, it won't matter (and is actually rather
| exciting) that they keep discovering new planets in our solar
| system.
|
| If they'd treat the word characters the same way, it would
| only serve to confuse and do no favors to the remaining
| glyphs.
| BlueTemplar wrote:
| This is temporary though, soon people will look at you
| funny if you say that Pluto is a planet - and/or they might
| not even have heardof it (though of course that is still
| worth learning about in an History of Science context).
|
| We do NOT keep discovering new planets, rather minor
| planets (I agree that the term is confusing), more than a
| million of them discovered in the Solar System now, like
| the 9007 James Bond.
| runarberg wrote:
| It could go either way, it is not always that the
| scientific meaning wins out, especially not when even
| scientists don't find the new definition useful.
|
| When I think of a planet, I think of a world that has
| active geology that isn't a moon (I know excluding moons
| is arbitrary, and perhaps I shouldn't do that; but hey,
| that's language for you). I honestly don't care about the
| orbit, and I bet that when most people think about
| planets they aren't thinking about the orbit either, let
| alone whether the planet has cleared the orbit or not. I
| doubt that will change.
| gerikson wrote:
| > When I think of a planet, I think of a world that has
| active geology
|
| Wouldn't that definition rule out gas giants?
| runarberg wrote:
| Yeah, probably strictly... But I'm not a planetary
| scientist. I'm merely a user of language, and I don't
| need to be rigorous in my definitions. And to me the
| weather patterns on Jupiter is an interesting feature
| enough to count as geology (even though it is probably
| not strictly a geology).
| jameshart wrote:
| No just that, but whether or not Mars is still
| geologically active is still an open question. If you
| admit planets on the basis that they have a history of
| geological activity, then Ceres is a planet too.
|
| I don't think anybody considers geological activity as
| particularly useful for classifying things as 'planet' or
| 'not planet'.
| runarberg wrote:
| Why shouldn't Ceres be a planet? If Pluto gets to be a
| planet then Ceres is definitely a planet.
|
| But there is still geology active geology on Mars. There
| are still moisture, winds and glaciers that are shaping
| the environment. I consider that to be geologically
| active.
| [deleted]
| PeterisP wrote:
| At the moment this character is used in many documents and
| databases - including comments in this thread, the article
| mentioned there, etc.
|
| There could have been a good case not to include it back in
| 2007, but once it has been included, excluding it would break
| stuff.
| BlueTemplar wrote:
| And updating it rather than adding a new, correct one,
| might make the current uses confusing ?
|
| Speaking of which, do we have any similar hexagonal symbol
| ?
| jotato wrote:
| My thought as well
| baybal2 wrote:
| Unicode basic rule is that character definitions never ever
| change, even when enumerated erroneously.
| Arnt wrote:
| Yes, but this is a change either way, because that
| codepoint's definition referred to that character. Either the
| reference or the description of the appearance has to change.
| echelon wrote:
| Make a new character. Updating the existing character ruins
| the meaning of all previous usages.
|
| It's like trying to change an API. Don't disrespect your
| existing users. Make a new version.
|
| ( [?]?)
|
| Think of all the ASCII art this botches. That has to have
| some historical importance to the Unicode standards body.
|
| ([?]_)
|
| For scholarly digital (unprinted) documents where the
| correct character rendering matters, erroneous past usages
| can be trivially found with grep, a date search, and easily
| corrected. The domain experts will familiarize themselves
| with this issue and fix the problem. Don't take a shotgun
| to it!
|
| This message wn't have the riginally intended meaning if
| the characters are updated from underneath.
| nerfhammer wrote:
| why not make an additional eye a diacritic mark so you can just
| add an arbitrary number of eyes
| martin_a wrote:
| Uff.
|
| I'm not sure we have space for another glyph in Unicode. Looks
| pretty packed in here...
| BlueTemplar wrote:
| UTF-8 is still more than 80% empty, and can be potentially
| extended...
| colejohnson66 wrote:
| _Theoretically_ , UTF-8 can encode up to 31 bits
| (U+7FFF'FFFF)[0], but for compatibility with UTF-16's
| surrogates, it's officially capped to 21 bits with the max
| being U+10'FFFF[1]. That decision was made November 2003,
| so there's two decades of software written with hard caps
| of U+10'FFFF.
|
| [0]: https://www.rfc-editor.org/rfc/rfc2279
|
| [1]: https://www.rfc-editor.org/rfc/rfc3629#section-3
| RcouF1uZ4gsC wrote:
| I think the big issue with Unicode is that it is centralized and
| there are politics about what characters get included (see
| Klingon)
|
| I think I have a solution to decentralize Unicode:
|
| 1. Extend Unicode to 128-bits. We can still use UTF-8 variable
| length encoding which will limit the real size.
|
| 2. Use a blockchain to coordinate the characters. That way
| whoever wants to add a character can do it without gatekeeping.
|
| These simple suggestions will go a long way in making Unicode
| less centralized.
| dhosek wrote:
| This is not exactly a correct description. Unicode does _not_
| specify the appearance of characters, only their meaning. It
| seems what's changed is the reference presentation of the
| character in the Unicode tables, not the character itself.
| Unicode goes to great lengths to preserve backwards compatibility
| so changing the meaning of a code point would violate that
| principle. Your OS or application providing Unicode 15.0.0
| support will not change the appearance of U+A66E. The appearance
| is dependent on the font.
| idlewords wrote:
| They should put in a few additional eyes as hot spares.
| xanathar wrote:
| So it's a Unicode character that represents a... blob with 10
| eyes?
|
| _Hordes of Wizards of the Coast lawyers getting ready for the
| big fight_
| gedy wrote:
| Name checks out:
| https://forgottenrealms.fandom.com/wiki/Xanathar_(original)
| supernewton wrote:
| Nah, Beholders have 11 eyes, so we're good here.
| tsimionescu wrote:
| I feel like the spelling should be updated to Behlders, or
| better yet, Behlders, to reflect that (of course, this would
| only make sense once the glyph update actually hits).
| ElfinTrousers wrote:
| Am I alone in thinking that this is not so much a separate
| character, as a doodle a bored monk made to relieve a tiny bit of
| the tedium of copying manuscripts?
| BearOso wrote:
| And its new official name shall be the Trypophobigon.
| xashor wrote:
| Too bad I have to adjust my business cards for .world
| Traubenfuchs wrote:
| remedan wrote:
| We do have a Unicode character for a gun: U+1F52B PISTOL. Most
| fonts that have it choose to style it as a water gun, though.
| dafoex wrote:
| There's an emoji for handgun, but Apple and other big tech
| decided it needed to be a water gun. There is also a rifle
| character intended to represent the sport of shooting in a
| pentathlon, but again Apple threw its weight around and, while
| the character became codified in Unicode, it never became an
| emoji and no font from big tech supports it.
| jrockway wrote:
| I guess because the goal of Unicode is to be able to represent
| every character that's appeared in language. This one is in a
| published book, while guns and a sexual intercourse symbol
| aren't.
|
| Emoji was a weird value add that Japanese mobile providers
| added to their phones before Unicode. To get them to move to
| Unicode, they had to keep them. That's why there's a Tokyo
| Tower emoji, but not an Eiffel Tower. That's why the post
| office has a @ on it. That people get any use out of emoji
| outside of Japan is really pure luck.
| ElfinTrousers wrote:
| That seems actually logical when you consider that kanji
| presumably began as simple depictions of objects that could
| be drawn quickly. Perhaps the only difference between emoji
| and kanji is time.
| shadowgovt wrote:
| I've even heard emoji referred to as "the carrot that keeps
| the implementations current." Every time a new version of
| Unicode is published, a few more emoji are tacked on. It acts
| as incentive for all the cellphone carriers and such to put
| the money into updating their implementations, because nobody
| wants to be the one on the block with the one phone that
| can't render "Mirror Ball" .
|
| (ETA: LOL, Hacker News drops "Mirror Ball"
| https://emojipedia.org/mirror-ball/ from the comment when you
| post)
| Traubenfuchs wrote:
| I believe the majority of emoji do not work on hacker news.
| jrockway wrote:
| Incidentally, Windows doesn't have the mirror ball. I guess
| it is a carrot to get me to upgrade to Windows 11, which I
| am skipping. (The key with Windows is to only use the good
| versions; XP, 7, 10, ???. Hoping ??? arrives soon ;)
| int_19h wrote:
| It's not in Win11 yet.
| Dwedit wrote:
| There are heiroglyph dicks in unicode, see U+130B8.
| Traubenfuchs wrote:
| I even posted phallus with emission in my comment above.
|
| I can see it on latest iOS, but not on Windows 10 + Chrome.
| rizoma_dev wrote:
| I'm always happy to see some esoteric unicode updates
| diimdeep wrote:
| Here[1][2] is the scan of manuscript from 1429, image #251
|
| [1] https://lib-fond.ru/lib-rgb/304-i/f-304i-308/#image-251 [2]
| https://web.archive.org/web/20110927102700/https://www.stsl....
| aasasd wrote:
| So the text at that point literally talks about 'many-eyes
| seraphims'. The eyes symbol is a pure gag--seems to be spliced
| in place of the letter 'o' in the word 'eye' just a little down
| the line. (However, Old Slavonic is a tough read due to no
| spaces, so I'm not sure about that word. But at least it's not
| the Glagolitic script, which was just ridiculous and actually
| had multi-circle letters.)
| klyrs wrote:
| It's curious that the red ink blobs behind the "eyes" aren't
| included in the unicode glyph either...
| msoad wrote:
| This is similar to "man in business suit levitating" emoji.
|
| How this stuff make it to Unicode?!
| shp0ngle wrote:
| Levitating man is just an unicode encoding of an old Webdings
| (or windings?) font.
|
| There was an accepted proposal to add many windings and
| webdings letters as unicode endpoints. Thus, levitating man in
| a suit.
| octoberfranklin wrote:
| I miss the good old days when character sets didn't feel the
| need for _annual updates_.
| baltimore wrote:
| Is there any end to this? E.g., why not include Galileo's
| pictograms of Saturn as seen here:
| https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=...
| no-reply wrote:
| I'll check back in the future.
| xenonite wrote:
| Sadly, is not eligible for engraving by Apple on AirPods.
| colejohnson66 wrote:
| As of right now, it's available for "adoption":
| https://www.unicode.org/consortium/adopt-a-character.html
| adhesive_wombat wrote:
| Meanwhile I check back every now and again on MUFI (Medieval
| Unicode Font Initiative) [1] and it's still not in.
|
| [1]: https://mufi.info
| politelemon wrote:
| Here's the original tweet where the discrepancy was noticed in
| 2020, and a photograph of a page inside the book where it's used:
|
| https://twitter.com/etiennefd/status/1322673792452354048
| pippy wrote:
| The Unicode can be ridiculous at times. It contains a character
| used once in a single manuscript in a extinct language, but not a
| standardized glyph for an external URL link.
| lordnacho wrote:
| Wait a minute, how will we refer to the old glyph in the future?
| Once this is updated the articles such as this one will have the
| new shape.
| martin_a wrote:
| "The character formely known as U+A66E"
| lifthrasiir wrote:
| There was a joke that U+A66E should retain seven eyes and
| further eyes should be added with a ZWJ sequence [1]. If that
| character somehow got _very_ popular in modern texts, updating
| its glyph may result in an interoperability problem so such
| solution would have been needed. But that didn 't happen so the
| glyph itself has been updated instead.
|
| [1] https://twitter.com/BabelStone/status/1323440365429542919
| memorable wrote:
| Alternative frontend version:
| https://nitter.net/jonty/status/1571615998335123457
| jerf wrote:
| When my kids were young, I accidentally flubbed the pronunciation
| of "Santa Claus" once and said something that sounded a lot like
| "Centiclops", which I decided to roll with. Centiclops is a lot
| like a cyclops with one eye, except the as a reading of the roots
| clearly indicates, this is a creature with 100 eyes.
|
| Today I learn that Centiclops effectively has a Unicode
| character. As Centiclops' representative in the world of the non-
| imaginary, we accept that a Unicode character with a hundred eyes
| is not practical and we accept the representation with just a few
| eyes, but generally agree that upgrading to 7 to 10 is a nice
| improvement, as 7 does not evenly divide into 100 but 10 does.
| This is important, because... reasons.
| sshine wrote:
| From "The House of Asterion" by Jorge Luis Borges:
|
| "It is true that I never leave my house, but it is also true
| that its doors (whose numbers are infinite) (footnote: The
| original says fourteen, but there is ample reason to infer
| that, as used by Asterion, this numeral stands for infinite.)
| are open day and night to men and to animals as well."
|
| https://klasrum.weebly.com/uploads/9/0/9/1/9091667/the_house...
| thaumasiotes wrote:
| > Centiclops is a lot like a cyclops with one eye, except
| th[at] as a reading of the roots clearly indicates, this is a
| creature with 100 eyes.
|
| Not in any normal sense of "roots". _Cent_ is a Latin root
| meaning 100. _ops_ is a Greek form meaning eye. The -i-
| indicates that the word is being formed in Latin, and the -cl-
| is entirely spurious. The original Greek word divides as cycl-
| ops, not cy-clops.
| inopinatus wrote:
| In any case, there is already an ancient, general, and
| perfectly serviceable epithet _Panoptes_.
| martyvis wrote:
| A bit like the heli-copter | helico-pter thing.
| layer8 wrote:
| There should be a combining "eye" character so that you can
| have as many or few eyes as you like.
|
| Though to be honest, that Unicode character looks more like a
| bunch of cells forming a tissue to me than eyes.
| doodpants wrote:
| Or perhaps this character is an accurate representation of a
| Dekaclops.
| jerf wrote:
| My client finds your proposal offensive and an appropriation
| of his culture, and also that Dekaclops guy is mean and
| smells bad and hasn't returned the lawnmower my client lent
| him even though my client has clearly referred to the need to
| mow his lawn several times now so he totally doesn't deserve
| a Unicode character.
| tzot wrote:
| It'd be dekaops, because the -cl- is part of "cy _cl_
| e"+"ops" (one round eye, with the one dropped because it's
| inferred). So "cycle" out, "deka" in.
| tempodox wrote:
| "Santa Clause" would translate to "holy clause". There might be
| such a thing but I think you meant Santa Claus :)
| dtparr wrote:
| Maybe just a big fan of the Tim Allen movie?
| jerf wrote:
| My fingers love adding the e's on the end of any worde that
| can conceivably take them. Also have that problem with any
| word that can take an "ly" even if I don't meanly it.
|
| Fixed, thanks.
| JohnFen wrote:
| I thought "santa" meant "saint"?
| 0xbadcafebee wrote:
| It does; the character originates from Saint Nicholas (or
| Odin, depending who you ask)
| felix318 wrote:
| "Santa" means "female saint" in Italian and Spanish.
| Perhaps the English "santa" came from another language but
| I always found the name "Santa Claus" just horrible.
| Archelaos wrote:
| The first mention of this version of Saint Nicholas's
| name has the form "St. A Claus" and appeared in the New-
| York Gazette of 20 Dec 1773.[1] The same issue also first
| reported some incident regarding tea in Boston harbour.
| Nice coincidence.
|
| [1] Source: https://boston1775.blogspot.com/2016/12/st-
| claus-was-celebra...
| nohuck13 wrote:
| The name Santa Claus evolved from Nick's Dutch nickname,
| Sinter Klaas, a shortened form of Sint Nikolaas (Dutch
| for Saint Nicholas)
|
| https://www.history.com/.amp/topics/christmas/santa-
| claus#si...
| thaumasiotes wrote:
| > I thought "santa" meant "saint"?
|
| Well, _santa_ is a Spanish word meaning "holy" and _saint_
| is a cognate French word meaning the same thing. They
| descend from Latin _sanctus_ ; compare _sanctify_.
|
| When the prayer goes "holy Mary, mother of god", "holy
| Mary" is an exact equivalent of "santa Maria".
| robocat wrote:
| Might as well mention "Sancta Maria" in Latin, for
| example from the Christian Hail Mary[1], a recorded Latin
| version[2], written Latin next to English and Spanish[3]
| and of course translated into _thousands_ of languages[4]
| although unfortunately mostly written using _/ A-Z/i_; I
| am an atheist interested in languages.
|
| [1] https://en.m.wikipedia.org/wiki/Hail_Mary
|
| [2] https://glaemscrafu.jrrvf.com/english/avemaria.html
|
| [3] https://hymnary.org/text/hail_mary_full_of_grace_the_
| lord_is...
|
| [4] http://www.marysrosaries.com/Rosary_prayers_in_differ
| ent_lan...
| fortran77 wrote:
| I thought it was a misspelling of Satan, but maybe that's
| because I'm Jewish.
| wongarsu wrote:
| Saint is more or less the same as holy, just used as a
| title. It comes from Old French saint, seinte "holy, pious,
| devout," from Latin sanctus "holy, consecrated"
| kratom_sandwich wrote:
| I love this character and I love the fact that is being updated.
| Just to get this right: at some point some person chose to doodle
| the letter instead of writing it the correct way and now we have
| a corresponding Unicode character? Sort of amazing and it also
| makes you think ...
| lmkg wrote:
| There was a... "tradition" is a strong word, perhaps "trend" is
| better. Authors making copies of the Bible or related works in
| Cyrillic, that the letter O (equivalent to Roman O) at the
| beginning of the word for "eye" would be stylized to look like
| an eye. There are a variety of glyphs along these lines: , , .
| All of them, including , were added to Unicode as a single
| group.
|
| The glyph "" was used to refer to an Angel with a whole buncha
| eyeballs, as one does. In terms of texts that survive today,
| this specific glyph has exactly one use in a single manuscript
| from the 1400's. It might have been used more, in texts which
| don't survive. But it is part of a larger trend, and I bet that
| its inclusion in Unicode depends strongly on that.
|
| But yeah, in itself the character exists solely so that modern
| computers are capable of a more-faithful rendition of the
| transcription of a single handwritten copy of the Book of
| Psalms.
| happytoexplain wrote:
| Thank you for describing the missing context. I couldn't
| understand why this stylized letter deserved a code point
| more than the uncountable others. I don't necessarily agree
| still, but the fact that this character was only unique
| _within a larger trend_ makes it much more reasonable.
| henriquecm8 wrote:
| So you are saying that the glyph is now more biblically
| accurate?
| int_19h wrote:
| The Bible doesn't specify how many eyes seraphim have.
|
| "In the center, around the throne, were four living
| creatures, and they were covered with eyes, in front and in
| back. ... Each of the four living creatures had six wings
| and was covered with eyes all around, even under its
| wings."
| vintermann wrote:
| Hah, and here I thought I was making a joke when I called it
| a biblically accurate O!
| cyral wrote:
| > modern computers are capable of a more-faithful rendition
| of the transcription of a single handwritten copy of the Book
| of Psalms.
|
| I wonder if there is even a copy of the book transcribed to
| actual characters or if it only exists as scanned PDF copies?
| If anyone did transcribe it, would they have any knowledge
| that the character even exists on computers?
| cillian64 wrote:
| It does raise interesting questions about what counts as
| decoration/formatting and what counts as part of the actual
| text. You could view these ocular O characters as purely
| decorative (like the fancy first character in a paragraph) but
| they could also be seem as a quirk of spelling which should be
| represented in unicode.
|
| But the multiocular O really does seem like one monk got bored
| one time and did some doodling.
| Arnt wrote:
| I attended a Unicode meeting (or maybe two? not sure?) and came
| away with the impression that Unicode is like those open source
| projects that are used by half of the world and maintained by a
| handful of skilled and benevolent people.
|
| In Unicode's case I think most of them are paid, at least.
| shp0ngle wrote:
| That is what I understood too. It doesn't seem particularly
| hard to add new letters to Unicode too if you try a bit.
|
| However that is a bit harder with emojis, that have their own
| subcommittee, which seem to be more bureaucratic and also
| more popular than the rest of Unicode. Everyone wants to make
| a new emoji.
| Stamp01 wrote:
| I don't understand why this character needs to exist given that,
| at least according to the author, it has only been seen once in
| the wild, and it's semantically identical to another more widely
| used character.
|
| I'm glad I'm not responsible for unicode. Clearly I have the
| wrong mindset for it.
| 1-6 wrote:
| I agree with your mindset. It's time for a unicode replacement.
| lifthrasiir wrote:
| Surprisingly many characters in Unicode are only recorded a few
| times if not once before the assignment. Chinese characters for
| example have a lot of them, because it was relatively frequent
| to make a new character for newborns before the modernity and
| some of them have survived through literatures but otherwise
| seen no uses (e.g. U+21E2B only appears once in the _Records
| of the Three Kingdoms_ San Guo Zhi ). But they have still
| received code points because they are considered essential for
| digitaization of historical works, and multiocular O is no
| different.
| bogwog wrote:
| Imagine you're a historian from the future studying some old
| document, and you spot a weird character that you've never seen
| before. Wouldn't it be useful to be able to search for that
| character to see if it shows up in any other document? A simple
| OCR scan will bring up all the information you could ever need
| for that one weird symbol.
| PeterisP wrote:
| Perhaps it's relevant to look at how it was introduced - as a
| "package deal" with many, many characters from medieval
| cyrillic literature, as described in this proposal
| https://www.unicode.org/L2/L2007/07003r-n3194r-cyrillic.pdf
|
| It certainly made sense to include this package in Unicode, and
| the vast majority of those characters certainly should be in
| this proposal. You do have to draw the line somewhere, and
| obviously those close to the line will be debatable, no matter
| where you chose to draw it, like this particular symbol - but
| once you've decided that you will include the one-eyed O (small
| and capital) and the two-eyed O (small and capital), then
| putting in the many-eyed O as well to complete the set doesn't
| seem so far-fetched.
| shadowgovt wrote:
| It's been seen once in the in-print wild.
|
| There's no way to know how many since-written documents will
| break if a whole codepoint is dropped.
| wheybags wrote:
| This kind of stupid thing is my problem with Unicode. We have all
| this baggage for stuff that _nobody uses_ , and we need to deal
| with it forever. The worst for me is the way there is no possible
| way to encode a grapheme cluster as a constant size, so using
| Unicode make it impossible to have simple character access like
| an old style c string, no matter how big you make your char, even
| though it's totally possible with damn near every language that
| people actually use.
|
| So then we all end up paying this massive complexity tax
| everywhere to pay for support for some Mongolian script that died
| out 200 years ago (or multi codepoint encodings of simple things
| like e - just why, it was so avoidable).
| JohnFen wrote:
| I hear you. I loathe working with Unicode for this exact
| reason. It's a bit of a nightmare due to its complexity.
|
| That said, what it's trying to do is enormously complex.
| svat wrote:
| > _encode a grapheme cluster as a constant size [...] totally
| possible with damn near every language that people actually
| use_
|
| This is not true. For a concrete example: the languages Hindi
| and Marathi, with ~500 million speakers, use the Devanagari
| script (also used by Nepali and Sanskrit), in which a grapheme
| cluster is (usually) a sequence of consonants followed by a
| vowel. For instance, something like "bhuktva" (bhuktvaa) would
| be two grapheme clusters, one (bhu) for "bhu" and one (ktvaa)
| for "ktva". In Unicode each vowel and consonant (here, bh, u,
| k, t, v, a) is separately encoded, which is the only reasonable
| thing to do, and inevitably means that grapheme clusters can
| have different lengths (number of code points). The alternative
| would have been to encode every possible (sequence of
| consonants + vowel) as a single codepoint, which gets
| ridiculous quickly: these sequences can be up to 5 consonants
| long, so you'd end up having to encode (33^5 * 13 [?] 500M)
| codepoints for Devanagari alone (or completely prevent certain
| sequences of consonants from being expressed, which makes no
| sense either), not to mention that most of the scripts of the
| Indian subcontinent and south-east Asia follow the same
| principle and have similar issues (e.g. Bengali with 250M
| speakers, Telugu, Javanese, Punjabi, Kannada, Gujarati, Thai
| with over 50M speakers each, etc).
|
| (See chapters 12-17 of the Unicode standard, currently version
| 15: https://www.unicode.org/versions/Unicode15.0.0/ch12.pdf)
| gnulinux wrote:
| Have you ever written software before Unicode? We had N
| different encodings for each language, each culture, each
| country. There were all kinds of bugs creeping up, and software
| that works perfectly well could be buggy for one random
| language. Unicode abstracted all of this away from the
| programmer in a pretty simple fashion. I simply do not see how
| we're paying the "complexity tax" by using Unicode, unless
| you're writing a _library_ that handles Unicode (which you
| shouldn 't do, you should use existing libraries) you don't
| need to know anything about Unicode.
| mkipper wrote:
| Before Unicode, everyone who came up with a character encoding
| scheme probably thought their system was good enough for any
| reasonable use-case. But they all had limitations that made
| them inadequate for things less obscure than representing some
| dead Mongolian language.
|
| It would be nice if we could come up with some magical system
| that optimally encodes all the text that "matters" and ignores
| everything else, but history has shown that to be very hard. So
| we're left with Unicode, which takes the approach of giving us
| (effectively) infinite code points to represent characters,
| with (effectively) infinite ways to visually represent them.
| That does lead to a bunch of "unnecessary" baggage and
| headaches, but it also solves a bunch of real problems that you
| probably don't know exist.
|
| Unicode is a pain in the ass, but it's a solution to a very
| hard problem. You can feel free to design your own solution,
| but you'll probably run head-first into all the problems
| Unicode was trying to solve from 40 years ago.
| lifthrasiir wrote:
| Your notion of character doesn't necessarily match others, and
| there are many cases where the number of possible "characters"
| in some notion is unbounded. Unicode provides a very well-
| defined superset of those notions _for you_. Collecting
| characters is only a minor portion of their jobs.
| BlueTemplar wrote:
| I'm getting the impression that this is only "obvious" from a
| latin-cyrillic-greek alphabet point of view ?
|
| P.S.: Also, even for those, it would seem that one of the big
| reasons for things like combining characters was added to
| Unicode in order to be backwards compatible even with mutually
| incompatible encodings ?
___________________________________________________________________
(page generated 2022-09-19 23:00 UTC)