hngopher.com

       [HN Gopher] Why can't you reverse a string with a flag emoji?
       ___________________________________________________________________
        
       Why can't you reverse a string with a flag emoji?
        
       Author : da12
       Score  : 106 points
       Date   : 2022-01-27 18:35 UTC (4 hours ago)
        
 (HTM) web link (davidamos.dev)
 (TXT) w3m dump (davidamos.dev)
        
       | zanzibar735 wrote:
       | Of course you can reverse a string with a flag emoji. You just
       | need to treat a "string" as a collected of Extended Grapheme
       | Clusters, and then you reverse the order of the EGCs. So if the
       | string is `a<flag unicode bytes>b`, the output should be `b<flag
       | unicode bytes>a`.
        
       | Crazyontap wrote:
       | This section on the linked Wikipedia article(1) is quite amazing
       | on how the family emoji is rendered using a zero-width joiner
       | 
       | (1) https://en.wikipedia.org/wiki/Emoji#Joining
       | 
       | edit: forgot HN doesn't render emojis. Better read it directly on
       | Wikipedia i guess.
        
       | codezero wrote:
       | You also can't URL Encode a string (In JS at least) if you
       | truncate an emoji at the beginning or end of it.
        
       | coreyp_1 wrote:
       | If you think the Unicode flag emoji take a lot of bytes, then
       | consider the family emoji!
       | (https://unicode.org/emoji/charts/full-emoji-list.html#family)
       | 
       | I'm in the process of designing a scripting language and
       | implementing it in C++. I plan to put together a YouTube series
       | about it. (Doesn't everyone want to see Bison and Flex mixed with
       | proper unit tests and C++20 code?)
       | 
       | Due to my future intended use case, I needed good support for
       | Unicode. I thought that I could write it myself, and I was wrong.
       | I wasted two weeks (in my spare time, mostly evenings) trying to
       | cobble together things that should work, identifying patterns,
       | figuring out how to update it as Unicode itself is updated,
       | thinking about edge cases, i18n, zalgo text, etc. And then I
       | finally reached the point where I knew enough to know that I was
       | making the wrong choice.
       | 
       | I'm now using ICU. (https://icu.unicode.org/) It's huge, it was
       | hard to get it working in my environment, and there are very few
       | examples of it's usage online, but after the initial setup dues
       | are paid, it WORKS.
       | 
       | Aside: Yes, I know I'm crazy for implementing a programming
       | language that I intend for serious usage. Yes, I have good
       | reasons for doing it, and yes I have considered alternatives. But
       | it's fun, so I'm doing it anyways.
       | 
       | Moral of the story: Dealing with Unicode is hard, and if you
       | think it shouldn't be that hard, then you probably don't know
       | enough about the problem!
        
         | josephg wrote:
         | Handling unicode can be fine, depending on what you're doing.
         | The hard parts are:
         | 
         | - Counting, rendering and collapsing grapheme clusters (like
         | the flag emoji)
         | 
         | - Converting between legacy encodings (shiftjis, ko8, etc) and
         | UTF-8 / UTF-16
         | 
         | - Canonicalization
         | 
         | If all you need is to deal with utf8 byte buffers, you don't
         | need all that stuff. And your code can stay simple, small and
         | fast.
         | 
         | IIRC the rust standard library doesn't bother supporting any of
         | the hard parts in unicode. The only real unicode support in std
         | is utf8 validation for strings. All the complex aspects of
         | unicode are delegated to 3rd party crates.
         | 
         | By contrast, nodejs (and web browsers) do all of this. But they
         | implement it in the same way you're suggesting - they simply
         | call out to libicu.
        
           | tialaramex wrote:
           | > The only real unicode support in std is utf8 validation for
           | strings.
           | 
           | Rust's core library gives char methods such as is_numeric
           | which asks whether this Unicode codepoint is in one of
           | Unicode's numeric classes such as the letter-like-numerics
           | and various digits. (Rust does provide char with
           | is_ascii_digit and is_ascii_hexdigit if that's all you
           | actually cared about)
           | 
           | So yes, the Rust standard library is carrying around the
           | entire Unicode standard class rule list among other things,
           | of course Rust's library isn't built to a vast binary, so if
           | you never use these features your binary doesn't get that
           | code.
        
         | Gigachad wrote:
         | It always feels like the most amount of work goes to the least
         | used emoji. So many revisions and additions to the family emoji
         | and yet it's one of the ones I don't recall anyone ever using.
         | 
         | I think the trap Unicode got in to is technically they can have
         | infinite emoji so they just don't ever have a way to say no to
         | new proposals.
        
           | masklinn wrote:
           | > It always feels like the most amount of work goes to the
           | least used emoji.
           | 
           | I always feel like those emoji were added on purpose in order
           | to force implementations to fix their unicode support. Before
           | emoji were added, most software had completely broken support
           | for anything beyond the BMP (case study: MySQL's so-called
           | "UTF8" encoding). The introduction of emoji, and their
           | immediate popularity, forced many systems to better support
           | astral planes (that is officially acknowledged:
           | https://unicode.org/faq/emoji_dingbats.html#EO1)
           | 
           | Progressively, emoji using more advanced features got
           | introduced, which force systems (and developers) to fix their
           | unicode-handling, or at least improve it somewhat e.g.
           | skintones for combining codepoints, etc....
           | 
           | > I think the trap Unicode got in to is technically they can
           | have infinite emoji so they just don't ever have a way to say
           | no to new proposals.
           | 
           | You should try to follow a new character through the process,
           | because that's absolutely not what happens and shepherding a
           | new emoji through to standardisation is not an easy task. The
           | unicode consortium absolutely does say no, and has many
           | reasons to do so. There's an entire page on just proposal
           | guidelines (https://unicode.org/emoji/proposals.html), and
           | following it does not in any way ensure it'll be accepted.
        
             | mike_hock wrote:
             | WTF business do emojis have in Unicode? The BMP is all
             | there ever should have been. Standardize the actual writing
             | systems of the world, so everyone can write in their
             | language. And once that is done, the standard doesn't need
             | to change for a hundred years.
             | 
             | What we need now is a standardized, sane subset of Unicode
             | that implementations can support while rejecting the insane
             | scope creep that got added on top of that. I guess the BMP
             | is a good start, even though it already contains
             | superfluous crap like "dingbats" and boxes.
        
           | laumars wrote:
           | They do say no though. Frequently too.
           | 
           | The problem with Unicode is simply that it's trying to solve
           | a very hard problem.
        
             | tialaramex wrote:
             | Exactly this. Humans have _incredibly_ complicated writing
             | systems, and all Unicode wants to do is encode them all.
             | Keep in mind that the trivial toy system we 're more
             | familiar with, ASCII, already has some pretty strange
             | features because even to half-arse one human writing system
             | they needed those features.
             | 
             | Case is totally wild, it only applies to like 5% of the
             | symbols in ASCII, but in the process it means they each
             | need two codepoints and you're expected to carry around
             | tech for switching back and forth between cases.
             | 
             | And then there are several distinct types of white space,
             | each gets a codepoint, some of them try to mess with your
             | text's "position" which may not make any sense in the
             | context where you wanted to use it. What does it mean to
             | have a "horizontal tab" between two parts of the text I
             | wanted to draw on this mug? I found a document which says
             | it is the same as "eight spaces" which seems wrong because
             | surely if you wanted eight spaces you'd just write eight
             | spaces.
             | 
             | And after all that ASCII doesn't have working quotation
             | marks, it doesn't understand how to spell a bunch of common
             | English words like naive or cafe, pretty disappointing.
        
               | xxpor wrote:
               | >Humans have incredibly complicated writing systems
               | 
               | Not only that, there isn't even agreement about what's
               | correct all the time!
               | 
               | >it doesn't understand how to spell a bunch of common
               | English words like naive or cafe, pretty disappointing.
               | 
               | A perfect example of this, since I would argue English
               | doesn't have any diacritics at all. So the use of cafe is
               | code switching. :)
        
               | mattkrause wrote:
               | Not a New Yorker writer, I see....
        
       | mappu wrote:
       | If you like this, you may also like why len(emoji) is still not 1
       | in Python 3 despite all the unicode breakage:
       | https://storytime.ivysaur.me/posts/grapheme-clusters/
       | 
       | I do feel like these are all 'gotcha' questions - I haven't seen
       | any real-world requirement to reverse a string and then have it
       | be displayed in a useful way.
        
       | raffy wrote:
       | Kinda related: I am developing a library for ENS (Ethereum Name
       | Service) name normalization: https://github.com/adraffy/ens-
       | normalize.js
       | 
       | I'm trying to find the best combination of UTS-46, UTS-51,
       | UTS-39, and prior work on IDN resolution w/r/t confusables:
       | https://adraffy.github.io/ens-normalize.js/test/report-confu...
       | 
       | Personally, I found the Unicode spec very messy. Critical
       | information is all over the place. You can see the direct effect
       | of this when you compare various packages across different
       | languages and discover that every library disagrees in multiple
       | places. Even JS String.normalize() isn't consistent in the latest
       | version of most browsers: https://adraffy.github.io/ens-
       | normalize.js/test/report-nf.ht... (fails in Chrome, Safari)
       | 
       | The major difference between ENS and DNS is emoji are front and
       | center. ENS resolves by computing a hash of a name in a
       | canonicalized form. Since resolution must happen decentralized,
       | simply punting to punycode and relying custom logic for Unicode-
       | handling isn't possible. On-chain records are 1:1, so there's no
       | fuzzy matching either. Additionally, ENS is actively registering
       | names, so any improvement to the system must preserve as many
       | names as possible.
       | 
       | At the moment, I'm attempting to improve upon the confusables in
       | the Common/Greek/Latin/Cyrillic scripts, and will combine these
       | new grouping with the mixed-script limitations similar to IDN
       | handling in Chromium.
       | 
       | Interactive Demo: https://adraffy.github.io/ens-
       | normalize.js/test/resolver.htm...
       | 
       | Also this emoji report is pretty cool:
       | https://adraffy.github.io/ens-normalize.js/test/report-emoji...
        
       | [deleted]
        
       | xmprt wrote:
       | This is a cool article about Unicode encoding however I still
       | feel like it should be possible to reverse strings with Flag
       | emojis. I don't see why computers can't handle multi rune symbols
       | in the same way that they handle multi byte runes. We could
       | combine all the runes that should be a single symbol and make
       | sure that we're maintaining the ordering of those runes in the
       | reversed string. Of course that means that naive string reversing
       | doesn't work anymore but naive string reversing wouldn't work in
       | the world of UTF-8 if we just went byte by byte.
        
         | happytoexplain wrote:
         | Swift, for example, does what you're saying. I thought that the
         | reason many languages don't do it that way is that part of the
         | definition of an array (or at least expected-by-convention) is
         | constant-time operations. If you treat a string as an array,
         | then having to deal with variable-length units breaks that
         | rule. That's why, when there _is_ an API for dealing with
         | grapheme clusters, it is usually a special case that duplicates
         | an array-like API, instead of literally using an array.
         | 
         | I actually don't know how/why Python is apparently using code
         | points, since they are variable length. That seems like a
         | compromise between using code units and using grapheme clusters
         | that gets you the worst of both worlds.
         | 
         | Edit: Maybe it uses UTF-32 under the hood when it's doing array
         | operations on code points?
        
       | kevin_thibedeau wrote:
       | This misses the real problem with flag emoji in that they are
       | composed of codepoints that can be in any order. With other emoji
       | you get a base codepoint with potential combining characters.
       | Using a table of combining character ranges you can skip over
       | them and isolate the logical glyph sequences. You don't need
       | surrounding context to parse them out like flags need.
        
         | uniqueuid wrote:
         | Thanks for that interesting detail!
         | 
         | If such re-purposing continues, it might be easier to go
         | straight to utf-32 for some use cases.
        
           | dhosek wrote:
           | Nope, because the repurposing is independent of how the
           | Unicode is represented. There's absolutely no advantage to
           | having a string in UTF-32 over UTF-8 since you'll still need
           | to examine every character and the added overhead for
           | converting byte strings in UTF-8 to 32-bit code points is by
           | far offset by the huge memory increase necessary to store
           | UTF-32.
           | 
           | What's more, it's really not that difficult to start at the
           | end of a valid UTF-8 string and get the characters in reverse
           | order. UTF-8 is well-designed that way in that there's never
           | ambiguity about whether you're looking at the beginning byte
           | of a code point.
        
             | colejohnson66 wrote:
             | > UTF-8 is well-designed that way in that there's never
             | ambiguity about whether you're looking at the beginning
             | byte of a code point.
             | 
             | To expand, if the most-significant-bit is a 0, it's an
             | ASCII codepoint. If the top two are '10', it's a
             | continuation byte, and if they're '11', it's the start of a
             | multibyte codepoint (the other most-significant-bits
             | specify how long it is to facilitate easy codepoint
             | counting).
             | 
             | So a naive codepoint reversal algorithm would start at the
             | end, and move backwards until it sees either an ASCII
             | codepoint or the start of a multibyte one. Upon reaching
             | it, copy those 1-4 bytes to the start of a new buffer.
             | Continue until you reach the start.
             | 
             | [0]: https://en.wikipedia.org/wiki/UTF-8#Encoding
        
         | jug wrote:
         | I think that somewhere in this answer lies a reason why Windows
         | still doesn't support flag emoji. I don't count Microsoft Edge
         | as "Windows" in this case, but as Chromium. Windows doesn't
         | support flag emoji in its native text boxes, but it does
         | support even colorized emoji.
         | 
         | But then again, flags seem to be not only Unicode-hard but
         | post-Unicode-hard.
        
           | masklinn wrote:
           | > But then again, flags seem to be not only Unicode-hard but
           | post-Unicode-hard.
           | 
           | Flags are not that hard, they're a very specific block
           | combining in very predictable way. They're little more than
           | ligatures. Family emoji are much harder.
           | 
           | And this is not "post-Unicode" in any way.
        
             | cygx wrote:
             | _Flags are not that hard, they 're a very specific block
             | combining in very predictable way._
             | 
             | But before their introduction, you could decide if there's
             | a grapheme cluster break between codepoints just by looking
             | at the two codepoints in question. Now, you may need to
             | parse a whole sequence of codepoints to see how flags pair
             | up.
        
       | otagekki wrote:
       | If flag emojis are really a combination of 2 special characters,
       | the reversal of the U.S. flag should result in having the Soviet
       | Union flag.
        
         | TonyTrapp wrote:
         | It's up to the installed fonts really. I don't know if the
         | combination of S + U is standardized as a Soviet Union flag
         | emoji, but even if it is, your locally installed fonts may not
         | contain every single flag emoji, so the browser would still
         | fall back to rendering the two letters instead.
        
         | masklinn wrote:
         | > the reversal of the U.S. flag should result in having the
         | Soviet Union flag.
         | 
         | Except it has been deleted from the ISO 3166-2 registry, so not
         | having it is perfectly valid (arguably more so than having it).
        
         | jameshart wrote:
         | I was _so_ disappointed that didn 't turn out to be the case.
        
         | brewmarche wrote:
         | Just tried reversing a Spanish flag with Python and indeed I
         | got Sweden back
        
       | ezfe wrote:
       | Works in Swift, which is the benefit of Swift having the most
       | painful String API possible:
       | 
       | let v = "Flag: " String(v.reversed()) // Output: :galF v.count //
       | Output: 7
        
       | jiveturkey wrote:
       | Interesting article. Written for beginners, conversationally. Has
       | excessive amounts of whitespace, for "readability" I guess. But
       | at the same time, it dives quite deep, which I don't think this
       | "style" of presentation matches up with the amount of time a more
       | novice reader is going to devote to a single long form article.
       | 
       | As to the content, for all the deep dive, a simple link to
       | https://unicode.org/reports/tr51/#Flags and what an emoji is,
       | would have saved so much exposition. I also wish he'd touched on
       | normalization. With the amount of time he's demanding from
       | readers he could have mentioned this important subject. Because
       | then he could discuss why (starting from his emoji example)
       | a-grave (a) might or might not be reversible, depending how the
       | character is composed.
       | 
       | Also wish he'd pointed to some libraries that can do such
       | reversals.
        
       | faebi wrote:
       | Why reverse them if one barely can implement, display and edit
       | them correctly. I never could make them work perfectly in VIM.
       | Also I had to open a bug in Firefox recently:
       | 
       |  _Flag emojis and others are displayed in double the size on
       | Windows 10 using Firefox Nightly_
       | https://bugzilla.mozilla.org/show_bug.cgi?id=1746795
        
       | [deleted]
        
       | nottorp wrote:
       | So basically unicode along with c++ are great job security if you
       | do bother to learn them.
       | 
       | There's another word that comes to mind when thinking about those
       | two: metastasis.
        
       | [deleted]
        
       | ts4z wrote:
       | Let me cheat a bit and say Unicode comes in three flavors: UTF-8,
       | UCS-2 aka UTF-16, and UTF-32. UTF-8 is byte-oriented, UTF-16 is
       | double-byte oriented, and UTF-32 nobody uses because you waste
       | half the word almost all of the time.
       | 
       | You can't reduce the _bytes_ in UTF-8 or UTF-16, because you 'll
       | scramble the encoding. But you could parsing the string,
       | codepoint-at-a-time, handling the specifics of UTF-8, or UTF-16
       | with its surrogate pairs, and reversing those. This sounds
       | equivalent to reversing UTF-32, and I believe is what the
       | original poster was imagining.
       | 
       | Except you can't do that, because Unicode has composing
       | characters. Now, I'm American and too stupid to type anything
       | other than ASCII, but I know about n+~ = n. If you have the pre-
       | composed version of n, you can reverse the codepoint (it's one
       | codepoint). If you don't have it, and you have n+dead ~, you
       | can't reverse it, or in the word "ano" you might put the ~ on the
       | "o". (Even crazier things happen when you get to the ligatures in
       | Arabic; IIRC one of those is about 20 codepoints.)
       | 
       | So we can't just reverse codepoints, even ancient versions of
       | Unicode. Other posters have talked about the even more exotic
       | stuff like Emoji + skin tone. It's necessary to be very careful.
       | 
       | Now, the old fart in me says that ASCII never had this problem.
       | But the old fart in me knows about CRLF in text protocols, and
       | that's never LFCR; and that if you want to make a n in ASCII you
       | must send n ^H ~. I guess you can reverse that, but if you want
       | to do more exotic things it becomes less obvious.
       | 
       | (IIRC UCS-2 is the deadname, now we call it UTF-16 to remind us
       | to always handle surrogate pairs correctly, which we don't.)
       | 
       | TLDR: Strings are hard.
        
       | progbits wrote:
       | Semi-related (about length of emoji "characters", not reversing):
       | https://hsivonen.fi/string-length/
       | 
       | Previously discussed:
       | 
       | https://news.ycombinator.com/item?id=20914184
       | 
       | https://news.ycombinator.com/item?id=26591373
       | 
       | As for this article & Python - as usual it is biasing towards
       | convenience and implicit behavior rather than properly handling
       | all edge cases.
       | 
       | Compare with Rust where you can't "reverse" a string - that is
       | not a defined operation. But you can either break it into a
       | sequence of characters or graphemes and then reverse that, with
       | expected results: https://play.rust-
       | lang.org/?version=stable&mode=debug&editio...
       | 
       | (Sadly the grapheme segmentation is not part of standard library,
       | at least yet)
        
       | aidenn0 wrote:
       | > The answer is: it depends. There isn't a canonical way to
       | reverse a string, at least that I'm aware of.
       | 
       | Unicode defines grapheme clusters[1] that represent "user-
       | perceived characters" separating a string into those and
       | reversing seems like a pretty good way to go about it.
       | 
       | 1: http://www.unicode.org/reports/tr29/
        
       | qqii wrote:
       | > Challenge: How would you go about writing a function that
       | reverses a string while leaving symbols encoded as sequences of
       | code points intact? Can you do it from scratch? Is there a
       | package available in your language that can do it for you? How
       | did that package solve the problem?
       | 
       | So are there any good libraries that can deal with code points
       | that are merged together into a single pictographic and reverse
       | them "as expected"?
        
         | da12 wrote:
         | If you're using Python, check out grapheme:
         | https://github.com/alvinlindstam/grapheme
        
       | tl wrote:
       | This is a nice dive into limitations in Python's unicode handling
       | and at the end, how to work around some problems. But you could
       | use languages with proper unicode support like Swift or Elixir
       | (weirdly HN is fighting flags in comment code which makes
       | examples header to demonstrate).
        
         | anamexis wrote:
         | HN doesn't allow any emoji.
        
       | mlindner wrote:
       | The person tries to define character when there isn't actually
       | any definition of what that even means. Character is a term
       | limited to languages that actually use them and not all text is
       | made up of characters.
        
       | yoyohello13 wrote:
       | Maybe I'm missing some prerequisite knowledge here, but why would
       | I assume `flag="us"` is an emoji? Looking at that first block of
       | code, there is no reason for me to think "us" is a single
       | character.
       | 
       | Edit: Turns out my browser wasn't rendering the flags.
        
         | ljm wrote:
         | If it's Windows, it doesn't actually use flags for those
         | emojis, it renders a country code instead. If it wasn't
         | supported you would just see the glyph for an unknown
         | character.
         | 
         | The reason was because they didn't want to be caught up in any
         | arguments about what flag to render for a country during any
         | dispute, as with, e.g. the flag for Afghanistan after the
         | Taliban took control.
        
         | happytoexplain wrote:
         | In Windows Chrome, it doesn't render the emoji for me. In
         | Android Chrome, it renders a flag emoji - not the raw region
         | indicators (which look like the letters "u" and "s").
        
         | Benlights wrote:
         | I had the same issue when I read the article, I kept on getting
         | stuck and asking myself what I was missing.
        
         | greenyoda wrote:
         | In my browser (Firefox on Windows), the thing between the
         | quotes in the first block of code looks like a picture of the
         | US flag cropped to a circle, not like the characters "us".
        
           | yoyohello13 wrote:
           | Ah I see, I just opened it in firefox. It looks like some JS
           | library is not getting loaded in Edge. The author was talking
           | about "us", "so", etc. looking like one character and I
           | thought I was going crazy, lol.
        
             | da12 wrote:
             | A whole lesson in Unicode in itself right there with your
             | experience, haha!
        
             | bialpio wrote:
             | Reminds me of an image that renders differently on Macs
             | (https://www.bleepingcomputer.com/news/technology/this-
             | image-...), I bet it'd make for a fun conversation that
             | could make the participants question their sanity. :-)
        
             | masklinn wrote:
             | There should not be any JS involved though, only a font
             | able to render these grapheme clusters.
             | 
             | Do you see the US flag after "copy and paste this emoji" on
             | https://emojipedia.org/flag-united-states/?
        
             | jfk13 wrote:
             | I don't think that's about a JS library. Firefox bundles an
             | emoji font that supports some things -- such as the flags
             | -- that aren't supported by Segoe UI Emoji on Windows, so
             | it has additional coverage for such character sequences.
        
               | yoyohello13 wrote:
               | That makes sense. I saw a failure to load a JS module in
               | the console and assumed that was part of the problem.
        
       | jug wrote:
       | I'm not surprised the flag had two components, but I _was_
       | surprised the US flag was made by literally U and S, haha!
       | 
       | I definitely thought it'd be something like [I am a Flag] and
       | [The flag ID between 0 and 65535]. And reversing it would be
       | [Flag ID] + [I am a Flag] which would not be a defined
       | "component" and instead rendered as the individual two nonsense
       | characters.
        
         | andylynch wrote:
         | You might also have noticed this is partly a very well thought
         | out hack to make Unicode less sensitive to disagreements and
         | changes in consensus on which flags are encoded, or even the
         | names of the countries concerned!
        
       | happytoexplain wrote:
       | I guessed that it would become the USSR flag (US -> SU), but
       | apparently Unicode doesn't define that one! I wonder why. That
       | would have been humorous.
        
         | bloak wrote:
         | As I understand it, there is no two-letter ISO code for the
         | USSR because when they update the standard they remove
         | countries that no longer exist. In at least one case they have
         | reused a code point: CS has been both "Czechoslovakia" and
         | "Serbia and Montenegro", neither of which currently exist.
         | 
         | As a result, two-letter ISO codes are useless for many
         | potential applications, such as, for example, recording which
         | country a book was published in, unless you supplement them
         | with a reference to a particular version of the standard.
         | 
         | Is there a way of getting the Czechoslovakian flag as an emoji?
         | And did Serbia and Montenegro get round to making a flag?
        
           | happytoexplain wrote:
           | Ah, I didn't realize they reused codes from ISO 3166-3. I
           | figured, because they keep these regions around in their own
           | set, that was some implication that the codes would not be
           | reused.
        
         | ts4z wrote:
         | IIRC Unicode doesn't define country codes. It was a workaround
         | for a political issue of which countries recognize which other
         | countries.
         | 
         | It would have been difficult to get the CN delegation to sign
         | off on a list that contained TW, although there are probably
         | others.
        
           | andylynch wrote:
           | There are many more than I realised - Wikipedia has a decent
           | list https://en.m.wikipedia.org/wiki/List_of_states_with_limi
           | ted_...
        
         | chungy wrote:
         | Unicode doesn't define any flags, really. That's up to the font
         | rendering on systems/libraries.
        
           | happytoexplain wrote:
           | True, but Unicode explicitly defines "SU" as a deprecated
           | combination, regardless of flags. Seems like they omit
           | everything from the list of "no longer used" country codes,
           | with some exceptions. I would think they would have no reason
           | not to allow historical regions.
        
       | WA9ACE wrote:
       | I feel like I'm obligated to share this almost 20 year old
       | Spolsky post that gave me my understanding of characters.
       | 
       | https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...
        
         | xmprt wrote:
         | In that same vein, here's my introduction to Unicode about 10
         | years ago from Tom Scott.
         | 
         | https://www.youtube.com/watch?v=MijmeoH9LT4
        
           | zerox7felf wrote:
           | poor man gave me and many others something like half of our
           | introduction to computer science, but has gotten far more
           | fame as the "emoji guy" for his repeated bouts with this
           | particular part of unicode :)
        
           | ciupicri wrote:
           | That's more about the UTF-8 encoding than Unicode itself.
        
       | bandyaboot wrote:
       | Would be interesting to see the list of flag emojis that, when
       | reversed, become a different flag emoji.
        
         | jfk13 wrote:
         | There are plenty of country codes that when reversed become a
         | different, valid country code: e.g. Israel (IL) when reversed
         | is Lithuania (LI); Australia (AU) becomes Ukraine (UA).
         | 
         | Whether "reversing flag emojis" causes such transformations
         | will depend on what is meant by "reversing", which is kind of
         | the whole point here: there are a number of possible
         | interpretations of "reverse".
        
       | alfredxing wrote:
       | Related -- I did a deep dive a couple years ago on emoji
       | codepoints and how they're encoded in the Apple emoji font file,
       | with the end goal of extracting the embedded images --
       | https://github.com/alfredxing/emoji
        
       | utopcell wrote:
       | There are unicode characters that reverse parsing order
       | themselves. This has been the basis of a code injection attack,
       | analyzed in [1].
       | 
       | [1] ``Trojan Source: Invisible Vulnerabilities'':
       | https://trojansource.codes/trojan-source.pdf
        
       | uniqueuid wrote:
       | Upper and lower codepoints are really way too obscure and can
       | create issues you didn't even know you had.
       | 
       | I once had the very unpleasant experience of debugging a case
       | where data saved with R on windows and loaded on macOS ended up
       | with individually double-encoded codepoints.
       | 
       | Not fun.
        
       | randpx wrote:
       | Try reversing the Canadian flag (CA) and you get the Ascension
       | Island Flag (AC). Great article, but completely misses the point.
        
       | Mesopropithecus wrote:
       | Unfortunately the HN text input won't let me do this, but a funny
       | starter for the article would have been this:
       | 
       | '(Spanish flag)'[::-1]
       | 
       | basically ''.join([chr(127466), chr(127480)]) vs.
       | ''.join([chr(127466), chr(127480)])[::-1]
       | 
       | I'll add this to my collection of party tricks and show myself
       | out.
       | 
       | Cool article!
        
       | dhosek wrote:
       | On the challenge front, there are things like a which might be a
       | single code point or two code points (a+'). Then there are the
       | really challenging things like a where if the components are
       | individual characters, the order of  and ~ are not guaranteed to
       | be consistent.
        
         | saltminer wrote:
         | Then you have stuff like zalgo text (http://eeemo.net/) which
         | takes pride in abusing code points
        
         | happytoexplain wrote:
         | Which is why these APIs should always make normalization
         | available: https://unicode.org/reports/tr15/
        
       | treesknees wrote:
       | But you can, and did, reverse a string. It seems you would need
       | more details, such as a request to reverse the meaning or
       | interpretation of the string, which is what the author is getting
       | at.
       | 
       | If someone challenges you to reverse an image, what do you do? Do
       | you invert the colors? Mirror horizontally? Mirror vertically?
       | Just reverse the byte order?
        
         | wahern wrote:
         | There's a specification problem here. I like to say that a
         | "string" isn't a data structure, it's the absence of one.
         | Discussing "strings" is pointless. It follows that comparing
         | programming languages by their "string" handling is likewise
         | pointless.
         | 
         | Case in point: a "struct" in languages like C and Rust is
         | literally a specification of how to treat segments of a
         | "string" of contiguous bytes.
        
           | shadowgovt wrote:
           | Even the most basic ASCII string is still a data structure.
           | 
           | Is it a PASCAL string (length byte followed by data) or a C
           | string (arbitrary run of bytes terminated by a null
           | character)?
        
             | wahern wrote:
             | You qualified "string" with "ASCII", and also tacitly
             | admitted you still need more information than the octets
             | themselves--the length.
             | 
             | Of course, various programming languages have primitives
             | and concepts which they may label "string". But you still
             | need to specify that _context_ , drawing in the additional
             | specification those languages provide. Plus, traditionally
             | and in practice, such concepts often serve the function of
             | importing or exporting unstructured data. So even in the
             | context of a specific programming language, the label
             | "string" is often used to _elide_ details necessary to
             | understanding the content and semantics of some particular
             | chunk of data.
        
               | shadowgovt wrote:
               | I think I understand the difference; you're using
               | "string" the way I would use "blob" or "untyped byte
               | array."
               | 
               | Shifting definitions to yours, I agree.
        
           | avianlyric wrote:
           | In languages like C "string" isn't a proper data structure,
           | it's a `char` array, which itself is little more than a `int`
           | array or `byte` array.
           | 
           | But these languages don't provide true "string" support. They
           | just have a vaguely useful type alias that renames a byte
           | array to a char array, and a bunch of byte array functions
           | that have been renamed to sound like string functions. In
           | reality all the language supports are byte arrays, with some
           | syntactical sugar so you can pretend they're strings.
           | 
           | Newer languages, like go and Python 3, that where created in
           | the world of Unicode provide true string types. Where the
           | type primitives properly deal with idea of variable length
           | characters, and provide tools to make it easy to manipulate
           | strings and characters as independent concepts. If you want
           | to ignore Unicode, because your specific application doesn't
           | need to understand, then you need cast your strings into byte
           | arrays, and all pretences of true string manipulation vanish
           | at the same time.
           | 
           | This is not to say the C can't handle Unicode etc. just like
           | the language doesn't provide true primitives to manipulate
           | strings, instead relies on libraries to provide that
           | functionality, which is perfectly valid approach. Just as
           | baking in more complex string primitives into your language
           | is also a perfectly valid approach. It's just a question of
           | trade offs and use cases, I.e. the problem at the heart of
           | all good engineering.
        
           | samatman wrote:
           | We would all be better off if this were actually true.
           | 
           | Tragically, in C, a string is just _barely_ a data structure,
           | because it must have \0 at the end.
           | 
           | If it were the complete absence of a data structure, we would
           | need some way to get at the length of it, and could treat a
           | slice of it as the same sort of thing as the thing itself.
        
           | jameshart wrote:
           | Yep, it's as meaningful a programming task as 'reverse this
           | double-precision float'.
        
         | egypturnash wrote:
         | Galaxy brain image reversal: completely redraw it from scratch,
         | with a viewpoint 180o from the original.
        
           | ravi-delia wrote:
           | New computer vision challenge
        
       | zwerdlds wrote:
       | In normal conditions you can check for a ZWJ, but with regional
       | coding chars, you would have to consider the regional chars block
       | as a single char in the reversal. Given that is isn't necessarily
       | locale dependant but presentation layer dependant, there might
       | not be anough info to decide how to act.
        
       | jerf wrote:
       | So, in terms of acing interviews, increasingly one of the best
       | answers to the question "Write some code that reverses a string"
       | is that in a world of unicode, "reversing a string" is no longer
       | possible or meaningful.
       | 
       | You'll probably be told "oh, assume US ASCII" or something, but
       | in the meantime, if you can back that up when they dig into it,
       | you'll look really smart.
        
         | Someone wrote:
         | Even ASCII can be argued to be problematic.
         | 
         | What is "3 >= 2", reversed?
         | 
         | What is "Rijksmuseum", reversed?
         | (https://en.wikipedia.org/wiki/IJ_(digraph); capitalization
         | isn't simple here, either
         | (https://en.wikipedia.org/wiki/IJ_(digraph)#Capitalisation)
        
         | greenyoda wrote:
         | > "reversing a string" is no longer possible or meaningful.
         | 
         | If you really wanted to, you could write a string reversal
         | algorithm that treated two-character emojis as an indivisible
         | element of the string and preserved its order (just as you'd
         | need to preserve the order of the bytes in a single multi-byte
         | UTF-8 character). You'd just need to carefully specify what you
         | mean by the terms "string", "character" and "reverse" in a way
         | that includes ordered, multi-character sequences like flag
         | emojis.
        
         | happytoexplain wrote:
         | I would argue that it is possible and meaningful. AFAIK
         | extended grapheme clusters are well defined by the standard,
         | and are very well suited to the default meaning of when
         | somebody says "character", so, given no other information, it's
         | reasonable to reverse a string based on them. I guess the issue
         | is "reverse a string" lacks details, but I think that's
         | different from "not meaningful".
        
         | viktorcode wrote:
         | You certainly can. `print(String(flag.reversed()))` in Swift
         | reverses emojis correctly.
        
         | Spivak wrote:
         | Reversing a string is still meaningful. Take a step back
         | outside the implementation and imagine handing a Unicode string
         | to a human. They could without any knowledge look at the
         | characters they see and produce the correct string reversal.
         | 
         | There is a solution to this which is to compute the list of
         | grapheme clusters, and reverse that.
         | 
         | https://unicode.org/reports/tr29/
        
           | akersten wrote:
           | > imagine handing a Unicode string to a human. They could
           | without any knowledge look at the characters they see and
           | produce the correct string reversal.
           | 
           | I really highly doubt it.
           | 
           | How do you reverse this?: mrHban , hdhh slsl@.
           | 
           | Can you do it without any knowledge about whether what looks
           | like one character is actually a special case joiner between
           | two adjacent codepoints that only happens in one direction?
           | Can you do it without knowing that this string appears
           | wrongly in the HN textbbox due to an apparent RTL issue?
           | 
           | It's just not well-defined to reverse a string, and the
           | reason we say it's not meaningful is that no User Story ever
           | starts "as a visitor to this website I want to be able to see
           | this string in opposite order, no not just that all the bytes
           | are reversed, but you know what I mean."
        
             | adolph wrote:
             | Is a RTL character string already "reversed" from a LTR
             | POV?
             | 
             | Is an absolute value signed as positive?
        
             | Spivak wrote:
             | I mean no but only because I don't understand the
             | characters. Someone who reads Arabic (I assume based on the
             | shape) would have no trouble. You're nitpicking cases where
             | for _some readers_ visual characters might be hard to
             | distinguish but it doesn't change the fact that _there
             | exists a correct answer_ for every piece of text that will
             | be obvious to readers of that text which is the definition
             | of a grapheme cluster.
        
               | akersten wrote:
               | > the fact that there exists a correct answer for every
               | piece of text that will be obvious to readers of that
               | text which is the definition of a grapheme cluster.
               | 
               | No, I insist there is _not_ a single  "correct answer,"
               | even if a reader has perfect knowledge of the language(s)
               | involved. Now remember, this is already moving the
               | goalposts, since it was claimed that a human needed "no
               | knowledge" to get to this allegedly "correct answer."
               | 
               | You already admit that people who don't speak Arabic will
               | have trouble finding the "grapheme clusters," but even
               | two people who speak Arabic may do your clustering or
               | not, depending on some implicit feeling of "the right way
               | to do it" vs taking the question literally and pasting
               | the smallest highlight-able selection of the string in
               | reverse at a time.
               | 
               | Anyway, take a string like this: "here is some Arabic
               | text: <RLM> <Arabic codepoints> <LRM> And back to
               | English"
               | 
               | Whether you discard the ordering mark[0], keep them, or
               | inverse them is an implementation decision that already
               | produces three completely different strings. Unless we
               | want to write a rulebook for the right way to reverse a
               | string, it remains an impossibility to declare anything
               | the correct answer, and because there is no _reason_ to
               | reverse such a string outside of contrived interview
               | questions and ivory tower debates, it is also
               | meaningless.
               | 
               | [0]: https://en.m.wikipedia.org/wiki/Right-to-left_mark
               | https://en.m.wikipedia.org/wiki/Left-to-right_mark
        
               | Spivak wrote:
               | You added the requirement that it be a single correct
               | answer. I just asserted that there existed a correct
               | answer. You're being woefully pedantic -- a human who can
               | read the text presented to them but _no knowledge of
               | unicode_ was my intended meaning. Grapheme clusters are
               | language dependent and chosen for readers of languages
               | that use the characters involved. There 's no implicit
               | feeling, this is what the standards body has decided is
               | the "right way to do it." If you want to use different
               | grapheme clusters because you think the Unicode people
               | are wrong then fine, use those. You can still reverse the
               | string.
               | 
               | Like what are you even arguing? You declared that
               | something was impossible and then ended with that it's
               | not only possible but it's so possible that there are
               | many reasonable correct answers. Pick one and call it a
               | day.
        
               | akersten wrote:
               | > Like what are you even arguing?
               | 
               | It is impossible to "correctly reverse a string" because
               | "reverse a string" is not well defined. We explored many
               | different potential definitions of it, to show that there
               | is no meaningful singular answer.
               | 
               | > You added the requirement that it be a single correct
               | answer.
               | 
               | Your original post says "they could produce _the_ correct
               | string reversal "?
        
             | happytoexplain wrote:
             | >what looks like one character is actually a special case
             | joiner between two adjacent codepoints
             | 
             | Are you referring to a grouping not covered by the
             | definition of grapheme clusters (which I am only passingly
             | familiar with)? If so, then I don't think it's any more
             | non-meaningful to reverse it than to reverse an English
             | string. The result is gibberish to humans either way - it
             | sounds more like you're saying that there is no universally
             | "meaningful to humans" way to reverse some text in
             | potentially any language, which is true regardless of what
             | encoding or written language you're using. I was thinking
             | of it more from the programmer side - i.e. that Unicode
             | provides ways to reverse strings that are more "meaningful"
             | (as opposed to arbitrary) than e.g. just reversing code
             | points.
        
             | nonameiguess wrote:
             | You can even demonstrate a similar concept with English and
             | Latin characters. There is no single thing called a
             | "grapheme" linguistically. There are actually two different
             | types of graphemes. The character sequence "sh" in English
             | is a single referential grapheme but two analogical
             | graphemes. Depending on what the specification means,
             | "short" could be reversed as either "trosh" or "trohs".
             | That's without getting into transliteration. The word for
             | Cherokee in the Cherokee language is "Tsalagi" but the "ts"
             | is a Latin transliteration of a single Cherokee character.
             | Should we count that as one grapheme or two?
             | 
             | Of course, if an interviewer is really asking you how to do
             | this, they're probably either 1) working in bioinformatics,
             | in which case there are exactly four ASCII characters they
             | really care about and the problem is well-defined, or 2)
             | it's implementing something like rev | cut -d '-' -f1 | rev
             | to get rid of the last field and it doesn't matter how you
             | implement "rev" just so long as it works exactly the same
             | in reverse and you can always recover the original string.
        
               | Spivak wrote:
               | The fact that how to reverse a piece of text is locale
               | dependent doesn't mean it's impossible. Basically and
               | transformation on text will be locale dependent. Hell,
               | _length_ is locale dependent.
        
           | lloeki wrote:
           | Should it reverse a BOM as well or keep it first?
        
             | Spivak wrote:
             | Keep it first? Like that's not a gotcha. Your input is a
             | string and the output is that string visually reversed.
             | What it looks like in memory is irrelevant.
        
         | paxys wrote:
         | UTF-8 reverse string has been a thing for a long time in
         | most/all programming languages. It may not work perfectly in
         | 100% of the cases, but that doesn't mean reversing a string is
         | no longer possible.
        
           | jerf wrote:
           | "It may not work perfectly in 100% of the cases, but that
           | doesn't mean reversing a string is no longer possible."
           | 
           | It depends on your point of view. From a strict point of
           | view, it _does_ exactly mean it is no longer possible. By
           | contrast, we all 100% knew what reversing an ASCII string
           | meant, with no ambiguity.
           | 
           | It also depends on the version of Unicode you are using, and
           | oh by the way, unicode strings do not come annotated with the
           | version they are in. Since it's supposed to be backwards
           | compatible hopefully the latest works, but I'd be unsurprised
           | if someone can name something whose correct reversal depends
           | on the version of Unicode. And, if not now, then in some
           | later not-yet-existing pair of Unicode standards.
        
             | pwdisswordfish9 wrote:
             | > By contrast, we all 100% knew what reversing an ASCII
             | string meant, with no ambiguity.
             | 
             | Not if the ASCII string employed the backspace control
             | character to accomplish what is today done with Unicode
             | combining characters.
             | 
             | Or, in fact, if it employed any other kind of control
             | sequence.
        
               | thaumasiotes wrote:
               | I always thought it was interesting that ASCII is
               | transparently just a bunch of control codes for a
               | typewriter (where "strike an 'a'" is a mechanical
               | instruction no different from "reset the carriage
               | position"), but when we wanted to represent symbolic data
               | we copied it and included all of the nonsensical
               | mechanical instructions.
        
               | adzm wrote:
               | Well the control codes were specifically for TTY rather
               | than typewriters, many of the control codes still make
               | sense from that standpoint.
        
               | jameshart wrote:
               | Like... \r\n
        
           | jcelerier wrote:
           | > It may not work perfectly in 100% of the cases, but that
           | doesn't mean reversing a string is no longer possible.
           | 
           | I don't understand why in maths finding one single counter-
           | example is enough to disprove a theorem yet in programming
           | people seem to be happy with 99.x % of success rate. To me,
           | "It may not work perfectly in 100% of the cases" exactly
           | means "no longer possible" as "possible" used to imply that
           | it would work consistently, 100% of the time.
        
             | tux3 wrote:
             | It is very useful in engineering to do things that are
             | mathematically impossible, by simply ignoring or rejecting
             | the last 1%.
             | 
             | Sometimes that's unacceptable, because you really do care
             | about 100% of cases. When it isn't, you get really cool
             | "impossible" tools out of it :)
        
             | paxys wrote:
             | Because programming is not a science (or at most it is an
             | applied science).
             | 
             | By your logic any software that has a single bug would be
             | useless, and if that were the case this entire profession
             | wouldn't exist.
        
         | jameshart wrote:
         | I'd go further and argue that _in general_ reversing a string
         | isn 't possible or meaningful.
         | 
         | It's just not a thing people do, so it's just... not very
         | interesting to argue about what the 'correct' way to do it is.
         | 
         | Similarly, any argument over whether a string has n characters
         | or n+1 characters in it is almost entirely meaningless and
         | uninteresting for real world string processing problems. Allow
         | me to let you into a secret:
         | 
         |  _there 's never really such a thing as a 'character limit'_
         | 
         | There might be a 'printable character width' limit; or there
         | might be a 'number of bytes of storage' limit. Which means
         | interesting questions about a string include things like 'how
         | wide is it when displayed in this font?' or 'how many bytes
         | does it take to store or transmit it?'... But there's rarely
         | any point where, for a general string, it is really interesting
         | to know 'how many characters does the string contain?'
         | 
         | Processing direct user text input is the only situation where
         | you really need a rich notion of 'character', because you need
         | to have a clear sense of what will happen if the user moves a
         | cursor using a left or right arrow, and for exactly what will
         | be deleted when a user hits backspace, or copied/cut and pasted
         | when they operate on a selection. The ij ligature might be a
         | single glyph, but is it a single character? When does it
         | matter? Probably not at all unless you're trying to decide
         | whether to let a user put a cursor in the middle of it or not.
         | 
         | And next to that, I just feel to argue that there is such a
         | thing as a 'correct' way to reverse "Rijndael" according to a
         | strict reading of Unicode glyph composability rules seems like
         | a supremely silly thing to try to do.
         | 
         | I'd much rather, when asked to reverse a string, more
         | developers simply said 'that doesn't make sense, you can't
         | arbitrarily chunk up a string and reassemble it in a different
         | order and expect any good to come of it'.
        
       | Beldin wrote:
       | Interestingly, on my phone the so-called flag is not a flag at
       | all, but "US" in outline.
       | 
       | So python behaves as expected: the 2 character string, when
       | reversed, becomes "SU". Similar stuff happens with the other
       | "flag" strings.
       | 
       | I'm sure emojis in my phone are outdated. I'm not sure how that
       | affects whether I see a flag or letters.
        
         | pilsetnieks wrote:
         | Thankfully, there isn't an assigned ISO 3166-1 2-letter country
         | code for SU currently; people may have interesting reactions
         | seeing what happens when reversing a US flag emoji if there
         | were.
        
       | nextstep wrote:
       | Compare all of this nonsense to how it's done in Swift. String
       | APIs in Swift are great: intuitive and do what you expect.
        
       | exdsq wrote:
       | Am I missing something or is this Day 1 of a programming course
       | in C?
        
       | techwiz137 wrote:
       | It's pretty funny that reversing the American flag yields Soviet
       | Union(SU).
        
       | emodendroket wrote:
       | What I'd like to know is, given the explosion of the character
       | set for emoji, does the rationale for Han unification still make
       | sense? The case for not allowing national variants seems less and
       | less compelling with every emoji they add.
       | 
       | This is a bit of a hobby horse, but imagine if every time you
       | read an article in English on your phone some of the letters were
       | replaced with "equivalent" Greek or Cyrillic one and you can get
       | an idea of the annoyance. Yeah, you can still read it with a bit
       | of thought, but who wants to read that way?
        
         | AlanYx wrote:
         | I agree that Han unification was an unfortunate design
         | decision, but I'd argue that the consortium is following a
         | consistent approach to the Han unification with emoji. For
         | example, they treat "regional" vendor variations in emoji as a
         | font issue. If you get a message with the gun emoji, unless you
         | have out-of-band information regarding which vendor variant is
         | intended, there's no way in software to know if it should be
         | displayed as a water gun (Apple "regional" variant) or a weapon
         | (other vendor variants). Which is not that different from a
         | common problem stemming from Han unification.
        
           | emodendroket wrote:
           | I don't disagree, but my point is more than their concern was
           | about having "too many characters" in Unicode, which no
           | longer seems to be a real concern, so what would be the harm
           | of adding national variants?
        
       | hougaard wrote:
       | In other news, water is wet :)
        
       | michaelsbradley wrote:
       | See chapter 7 in _Hacking the Planet (with Notcurses)_ for a
       | short treatment of encodings, extended grapheme clusters, etc.
       | 
       | https://nick-black.com/htp-notcurses.pdf#page53
        
       | smegsicle wrote:
       | did they think all those skintone emojis are individual
       | codepoints?
        
         | advisedwang wrote:
         | They might have thought that `reverse()` had some kind of
         | unicode-aware handling. I believe `upper()`/`lower()` do.
        
         | daveslash wrote:
         | When I first realized that the skin tone emojis were a code-
         | point + a color code-point modifier, I tried to see what other
         | colors there were and if I could apply those to _other_ emojis.
         | The immature child in me looked to see if there was a red color
         | code point and if so, could I use it to make a _" blood poop"_
         | emoji. Turns out.... no.
        
       | codingkev wrote:
       | Yes, this allows for easy building of flag emojis as long as you
       | know the ISO 3166 two-letter country code.
       | 
       | Example: https://github.com/kennell/flagz/blob/master/flagz.py
        
       | sltkr wrote:
       | So what was the deal with the Scottish flag?
        
         | gsnedders wrote:
         | From Wikipedia:
         | 
         | > A separate mechanism (emoji tag sequences) is used for
         | regional flags, such as England , Scotland , Wales , Texas  or
         | California . It uses U+1F3F4 WAVING BLACK FLAG and formatting
         | tag characters instead of regional indicator symbols. It is
         | based on ISO 3166-2 regions with hyphen removed and lowercase,
         | e.g. GB-ENG - gbeng, terminating with U+E007F CANCEL TAG. Flag
         | of England is therefore represented by a sequence U+1F3F4,
         | U+E0067, U+E0062, U+E0065, U+E006E, U+E0067, U+E007F.
        
           | ghostly_s wrote:
           | This was the only part that was surprising to me, and as it
           | turns out my surprise mostly stems from still not really
           | understanding how the United Kingdom works.
        
             | tialaramex wrote:
             | Don't worry, "How the United Kingdom works" is a political
             | question and so subject to change.
             | 
             | For example, Wales was essentially just straight up
             | conquered, and so for long periods Wales did not have any
             | distinct legal identity from England. You'll see that today
             | there's a bunch of laws which are for _England and Wales_
             | but notably not Scotland, including criminal laws. In
             | living memory Wales got some measure of independent control
             | over its own affairs, via an elected  "Assembly" but what
             | powers are "devolved" to this assembly are in effect the
             | gift of the Parliament, in Westminster, which is sovereign.
             | Whether taking away those powers would go well is a good
             | question.
             | 
             | On the other hand, Northern Ireland is what's left of
             | English/ British dominion over the entire island of
             | Ireland, most of which today is the Republic of Ireland, a
             | sovereign entity with its own everything. It's only existed
             | for about a century, and is a result of the agreed
             | "partition" when the Irish rebelled because _most of the
             | Irish_ wanted independence but those in the North not so
             | much. Feel free to read about euphemistically named
             | "Troubles". In the modern era, Northern Ireland, like
             | Wales, gets a devolved government in Stormont. Unlike
             | Wales, the Northern Ireland government is a total mess, and
             | e.g. they have abortion (like the rest of the UK, and like
             | the rest of Ireland) only because Stormont was so broken
             | that Westminster imposed abortion legalisation on them
             | since they weren't actually governing. If you think the US
             | Congress is dysfunctional, check out Stormont...
             | 
             | Finally Scotland was for a very long time an independent
             | but closely related sovereign nation. It _agreed_ to join
             | this United Kingdom about three hundred years ago in the
             | Acts of Union after about a century with the same Monarch
             | ruling both countries. However, it too got a devolved
             | government, a Parliament, probably the most powerful of the
             | three, in Holyrood, Edingburgh in the 20th century and it
             | has a relatively powerful pro-independence politics, the
             | Scottish National Party is the dominant power in Scottish
             | politics, although how many of its voters _actually_
             | support independence per se is tricky to judge.
             | 
             | Brexit changed all this again, because as part of the EU a
             | bunch of the powers you could reasonably localise, and so
             | were "devolved" to Wales, Scotland and Northern Ireland,
             | had been controlled by EU law. So Westminster could _say_
             | they were devolved, knowing that the constituent entities
             | couldn 't actually do much with this supposed power. Having
             | left the EU, those powers were among the thing Brexiteers
             | seemed to have imagined now lay at Westminster, but of
             | course the devolved countries said no, these are our
             | powers, we get to decide e.g. how agricultural subsidies
             | are distributed to suit our farmers.
             | 
             | That's even more fun in Northern Ireland, because they
             | share a border with the Republic, an EU member, and so
             | they're not allowed to have certain rules that would
             | obviously result in a physical border with guards and so
             | on. Their Unionists (the people who are why it isn't just
             | part of the Republic of Ireland because they want to be in
             | the United Kingdom) feel like they were sold out by
             | Westminster politicians, while the Republicans (those who'd
             | rather be part of the Republic) see this as potentially a
             | further argument in favour of that. All of which isn't
             | helping at all to keep the peace between these rivals, that
             | peace being the whole reason we don't want to put up a
             | border...
        
         | dhosek wrote:
         | Most flags use the ISO 2-character country code to access their
         | values. However, some flags don't map to 2-character country
         | codes (Scotland being one example). In this case it uses the
         | sequence black flag, GBSCT (for Great Britain-Scotland,
         | represented using the tag latin small letter codes for the
         | letters) then cancel tag to end the sequence. Changing the
         | middle five to be GBENG gives the English flag and GBWLS gives
         | the Welsh flag.
        
       | [deleted]
        
       | architectdrone wrote:
       | humorously, on my local machine, I only see the string "us", and
       | was rather confused when he was asserting that it was a single
       | character :D
        
       ___________________________________________________________________
       (page generated 2022-01-27 23:00 UTC)