[HN Gopher] The bottom emoji breaks rust-analyzer
       ___________________________________________________________________
        
       The bottom emoji breaks rust-analyzer
        
       Author : todsacerdoti
       Score  : 120 points
       Date   : 2023-02-13 16:22 UTC (6 hours ago)
        
 (HTM) web link (fasterthanli.me)
 (TXT) w3m dump (fasterthanli.me)
        
       | fasterthanlime wrote:
       | This submission went from #5 to #42 in a minute so I'm assuming
       | it's been deranked. An explanation would be nice, in case a
       | moderator is around.
       | 
       | edit: now it's no longer there at all. Welp.
        
         | [deleted]
        
         | Macha wrote:
         | Could be automated, there's the flamebait detector that deranks
         | if it gets a high comments/vote ratio.
        
         | steveklabnik wrote:
         | (not a mod just been on this website forever)
         | 
         | At this exact moment, there's 49 points, but 45 comments. If
         | comments >= points, you get a _significant_ ranking penalty
         | (the  "flamebait detector", iirc). I can't say for sure that
         | that's happened, but given how close the two numbers are, I bet
         | at some point that was true.
        
           | fasterthanlime wrote:
           | Ahhh, that sounds likely, thanks!
        
       | aendruk wrote:
       | Misleading headline.
       | 
       | > the actual bug: let's add to our code.. an emoji! _Any_ emoji.
       | 
       | > rust-analyzer adheres to the LSP spec. And _lsp-mode_ doesn 't.
       | 
       | So "emojis break an Emacs extension".
       | 
       | Though for me the real takeaway is that LSP specifies UTF-16
       | offsets. That sounds unpleasant to work with.
        
         | [deleted]
        
         | pja wrote:
         | > Though for me the real takeaway is that LSP specifies UTF-16
         | offsets. That sounds unpleasant to work with.
         | 
         | Speaking as someone who has written an lsp server: yes, it is
         | indeed unpleasant to work with.
        
         | Ygg2 wrote:
         | Thanks, for people looking for bug, see
         | https://github.com/emacs-lsp/lsp-mode/issues/2080
        
       | lacrosse_tannin wrote:
       | story of my life with emacs.
        
       | bhaney wrote:
       | First time I've heard "pleading face" referred to as "the bottom
       | emoji"
        
         | bigbottomenergy wrote:
         | Hoo _ooo_ boy if that ain 't on the nose though.
        
         | bitwize wrote:
         | Hacker culture is queer now, and queer, kinky (but SSC) sex is
         | an integral part of that.
         | 
         | If it helps any, its official name is U+1F97A, FACE WITH
         | PLEADING EYES.
        
         | dragonwriter wrote:
         | It's a very weird choice of descriptions _in this context_ ,
         | since it is neither about the domain in which in which that
         | meaning applies, nor, even, anything really specifically about
         | _that_ emoji in particular. It's exactly like unnecessarily
         | dropping U+1F346 AUBERGINE into a conversations that applies to
         | any emoji...and calling it "the male genitalia emoji".
        
           | l3mure wrote:
           | Never before has a programmer joked about getting screwed by
           | a bug, the audacity!
        
         | [deleted]
        
         | OJFord wrote:
         | I chuckled at the headline, assumed it would be 'peach', now
         | I'm just confused? The article doesn't seem to address it,
         | except to link https://unicode-table.com/en/1F97A/ as 'U+1F97A
         | Bottom Face Emoji' (which that page doesn't say at all), so
         | seems very deliberate, must be some joke we're not in on (on
         | the 'right' subreddits for or whatever) I suppose.
        
           | TaylorAlexander wrote:
           | In communities that enjoy sexual power play you have a
           | top/dominant and a bottom/submissive.
           | 
           | This is particularly prevalent amongst trans people on
           | twitter (where I see it used a lot) you see a lot of people
           | using this emoji under for example a powerful looking selfie
           | of someone who appears dominant, as a playful offer of
           | submission.
        
             | tomxor wrote:
             | That's kind of weird to label it based on sexual roles by
             | default. How would they go about explaining that label to a
             | child? How about puss in boots face?
        
               | [deleted]
        
               | OJFord wrote:
               | No no, that's why some people here are confused and out
               | of the loop - the actual unicode name is _not_ 'bottom
               | face', but 'pleading face' or 'face with pleading eyes'
               | (I'm not sure of a good reference for 'actual' names, got
               | those from two different unicode dictionary type sites).
        
               | tomxor wrote:
               | I'm referring to the author's naming not the unicode
               | name. Feels like they have a gender agenda to push (sorry
               | :D).
        
               | OJFord wrote:
               | Oh I see. Yes does seem a bit odd, but their blog to do
               | what they want with I suppose - they don't need to
               | explain it to a child. Also it's possibly just so
               | familiar to them as that they didn't think about it,
               | don't know it as anything else.
        
               | TaylorAlexander wrote:
               | Top/bottom don't even have anything to do with gender,
               | and it wouldn't matter if they did.
        
               | tomxor wrote:
               | It's a fucking pleading face and they made it about
               | sexual roles for no good reason... It's stupid.
        
               | TaylorAlexander wrote:
               | Every meme is stupid. That's the point.
               | 
               | Also, pleading is an actual part of this very common
               | sexual dynamic. If you're just made uncomfortable by sex
               | that's fine but you don't have to turn that in to
               | judgment of people who aren't.
        
               | tomxor wrote:
               | > If you're just made uncomfortable by sex that's fine
               | but you don't have to turn that in to judgment of people
               | who aren't.
               | 
               | You accuse me of making assumptions and then make your
               | own incorrect ones.
               | 
               | I find it annoying when people weave their personal non-
               | technical agendas into technical discussions. I can
               | anticipate your response now... there is no agenda. But
               | go read the article and you will notice is not even
               | specific to this one emoji.
               | 
               | [edit]
               | 
               | Keep up the good work mr/ms downvoter, I've got a lot of
               | points to burn and no shop to spend them in. Good to know
               | I'm pissing off one person who is chronically offended by
               | my skepticism.
        
               | TaylorAlexander wrote:
               | > Keep up the good work mr/ms downvoter
               | 
               | It's they/them downvoter actually.
               | 
               | But seriously, as far as I can tell it's not possible to
               | downvote people who have replied directly to me. It's
               | just other people downvoting you because what you're
               | saying is unpopular.
               | 
               | You're certainly not pissing me off. That's why I'm
               | trying to calmly explain your errors. But it sounds like
               | you've made up your mind and take disagreement as some
               | kind of encouragement.
        
               | tomxor wrote:
               | Yes, I know it's not you.
               | 
               | > But it sounds like you've made up your mind and take
               | disagreement as some kind of encouragement.
               | 
               | I do find disagreement of silent onlookers with nothing
               | to contribute irritating yes. Life is too short to fear
               | controversial or unpopular ideas so I take it as
               | encouragement. I'm happy to change my opinion in the face
               | of compelling arguments, not thoughtless people.
        
               | sophacles wrote:
               | > fucking pleading face...about sexual roles
               | 
               | Exactly.
               | 
               | (also, you are so worried about the kids, why are you
               | making sex about anger in your anger over sex?)
        
               | tomxor wrote:
               | I'm not worried about kids, and I don't care about sex,
               | I'm making a point that this has been arbitrarily
               | sexualised for no reason. Obviously you wouldn't give it
               | that label by default, so what is the author trying to do
               | here? This is supposed to be a technical problem.
               | 
               | I also find it funny that pointing out something is a
               | weird and distracting choice is considered angry, and I
               | can only assume by the onslaught of downvotes,
               | uninclusive or something... or maybe my "what if" has
               | been interpreted as a "think of the children", hard to
               | know when people don't bother forming an argument.
        
               | bigbottomenergy wrote:
               | > I'm making a point that this has been arbitrarily
               | sexualised for no reason.
               | 
               | Wow, you must spend literally every waking second of
               | every single day enraged.
        
               | tomxor wrote:
               | whoosh
        
               | sophacles wrote:
               | The pattern: It's a fucking $X and they $Y
               | 
               | Idiomatically signifies exasperated frustration/anger in
               | many contexts.
               | 
               | My entire comment was just a play on words, pointing out
               | that the word "fucking" means "having sex" (i know it's
               | one of those words with a ton of shades and nuance so
               | there are lots of uses for it but for the play on words,
               | let's be literal - also easier when the context is sexual
               | to begin with). When read in that light - "a having sex
               | pleading face and they made it about sexual roles" is
               | amusing, no?
        
               | tomxor wrote:
               | Fair enough lol, lost in the medium of text I think.
               | "Fuck" is funnily enough one of if not the most adaptive
               | english words.
        
               | dllthomas wrote:
               | If it were a "fucking pleading" face, it would pretty
               | explicitly be about sexual roles.
        
               | bigbottomenergy wrote:
               | Imagine my surprise in Latin American places where
               | Activo/Passivo is exclusively used to describe
               | top/bottom. I kinda have to assume that those words mean
               | more than their congnates in context, or else I kinda
               | feel bad for gay Latin communities being shoe-horned into
               | archaic roles.
        
               | yusefnapora wrote:
               | Perhaps the author assumes that a child with a deep
               | interest in rust Unicode edge cases has likely been on
               | the internet before, and may well have been exposed to
               | the existence of sex?
        
               | TaylorAlexander wrote:
               | The emoji has another name, "pleading face". This is
               | basically how it became to be used by bottoms. Tops don't
               | plead, but bottoms do.
               | 
               | There's lots of emojis that the culture has given
               | alternate meanings to, for example the peach and the
               | eggplant.
               | 
               | You could ask "how would I explain this to a child" to
               | anything adults talk about that is sexual in nature.
               | Usually the answer is "don't."
               | 
               | I don't think TFA was written for children and the author
               | wanted to include this common internet meme in to their
               | article title.
        
             | verpen wrote:
             | [flagged]
        
               | TaylorAlexander wrote:
               | The term "perverts" implies this behavior is abnormal but
               | it's very common. But in the USA for example the dominant
               | culture pretends like this doesn't happen and encourages
               | repression of these desires. I'd rather adults be free to
               | express themselves in healthy ways than judge and label
               | them as "perverts" despite their feelings being common
               | and normal.
        
               | peyton wrote:
               | I don't mind, but it's definitely okay for people to be
               | offended by the use of (what seems to be?) BDSM fetish
               | community terminology. I don't think this is a common
               | term. If I referred to the cat face emoji as the pussy
               | emoji or the cunt emoji in a Rust programming language
               | article, I'd expect some pushback.
        
               | hexmiles wrote:
               | i think it also depend on the social group/context. in
               | some place the word bottom kinda lost some of his
               | "sexual" connotation and become just a "fun" world for
               | submissive/unassertive.
               | 
               | (to be clear i also think is okay not to like this kind
               | of language)
        
               | verpen wrote:
               | Well, I'd rather that creepy males didn't parade their
               | sexual fetishes everywhere and pretend that it's totally
               | normal and fine to do so.
        
               | sc11 wrote:
               | It's used by lesbians just as often and giving vs
               | receiving has nothing to do with fetishes. You can have
               | the most vanilla sex imaginable and still have a
               | top/bottom.
        
               | mplewis wrote:
               | You really created a new account just to post a rude
               | comment like this?
        
           | tom_ wrote:
           | Presumably this:
           | https://en.m.wikipedia.org/wiki/Gay_sex_roles
        
         | thomasahle wrote:
         | Was I the only one looking for [?]?
        
           | DoctorNick wrote:
           | Yes.
        
           | dllthomas wrote:
           | You were not.
        
         | DoctorOW wrote:
         | It's common in gay circles
        
           | enriquto wrote:
           | Interesting. Is there a corresponding "top" emoji?
        
             | HL33tibCe7 wrote:
             | Appropriately, it doesn't exist
        
               | drewbug01 wrote:
               | I appreciate the subtle humor here, but I suspect most of
               | the HN crowd won't quite get it.
        
             | thomasahle wrote:
             | Unicode has these four: https://en.wikipedia.org/wiki/List_
             | of_logic_symbols#:~:text=...
        
             | DoctorOW wrote:
             | Tops are considered less likely to use emoji from my
             | understanding. They're supposed to be more stoic.
        
         | [deleted]
        
         | chungy wrote:
         | There was a discussion on HN recently about it:
         | https://news.ycombinator.com/item?id=34454165
        
         | avgcorrection wrote:
         | First I rolled my eyes when some HN users had to explain to
         | other HN users what that "puppy dog eyes" thing meant. But I
         | felt a bit worse when I first heard _that_ name for it, being
         | used as if it was canonical.
        
         | fasterthanlime wrote:
         | Take it as a sign that you need to hang out with more diverse
         | crowds!
        
           | favaq wrote:
           | If anything I've taken it as a sign that I made the right
           | choice not hanging out with diverse crowds.
        
           | decremental wrote:
           | Or people just need to stop being gross.
        
           | bhaney wrote:
           | I'm not really sure why I should take it as a sign of that.
           | What crowd does this indicate I don't hang out with?
        
             | Ygg2 wrote:
             | LGBT+ BDSM enthusiast.
        
             | tullo_x86 wrote:
             | Queer people. This is an easily-inferred reference for
             | anyone who's decently-good friends with at least one gay
             | guy or lesbian.
        
               | bhaney wrote:
               | I have a handful of very close friends who are gay, but
               | I've never heard of this bottom-face emoji thing before
               | today.
               | 
               | And I'm a little concerned why not knowing about
               | something that seems like an esoteric piece of fetish-
               | related in-group communication implies that I don't hang
               | out with queer people? Is there an expectation that in
               | order to hang out with people belonging to some group, I
               | need to learn specific lingo regarding that group's
               | sexual power dynamics?
        
               | justincredible wrote:
               | Yes, otherwise you're a bigot trying to erase queer
               | people. /s
        
               | mplewis wrote:
               | You could simply get to know people better and you'd end
               | up learning more about their culture and language.
        
       | pnathan wrote:
       | So this all justifies my general stance of "find a Unicode
       | expert" when questions of unicode come up. But that's pretty
       | wilfully blind (although practical).
       | 
       | I would like to understand why UTF-32 didn't catch on as The
       | Standard Unicode for the modern world. it seems that - albeit
       | memory wasteful - it would sidestep a lot of these issues.
        
         | jcranmer wrote:
         | One of the root problems here is that a concept like
         | "character" is extremely underdefined. What you want to do in
         | terms of counting the memory a string will take up, indicating
         | a particular spot in the string, mapping to arrow keys or the
         | backspace key in a visual program, knowing how far to indent to
         | drop a caret in a visual error report, knowing the width of the
         | string in a visual (especially fixed-width) text display. For
         | ASCII text, you can use the _same_ value to represent all
         | possible slightly different definitions, but in non-ASCII, you
         | need to use _different_ definitions, and there 's no one true
         | definition that solves all use cases (no, not even grapheme
         | clusters).
        
         | harikb wrote:
         | UTF-8 has no upper limit on the number of possible characters /
         | emoji's, now or in the future.
         | 
         | Everything else has.
         | 
         | And then is UTF-16 which has all the pains of UTF-8 with none
         | of the advantages of UTF-32
        
           | doubleunplussed wrote:
           | That's not quite right. UTF-8 is not arbitrary length.
           | 
           | Officially, it's at most four bytes, of which 21 bits are
           | usable for encoding codepoints - so that's an upper limit of
           | 2^21 codepoints.
           | 
           | There is an initial byte encoding the length as a series of
           | ones, so if you went ahead and extended the standard to
           | simply allow more bytes, you could get up to 8 bytes, of
           | which 48 bits would be usable.
           | 
           | I can see that a six-byte version with 31 data bits was
           | previously standardised before they settled on four.
           | 
           | I guess you could extend it further by allowing more than one
           | initial byte encoding the length, then it would be arbitrary
           | length. But at that point I'm not sure if it loses its self-
           | synchronising ability, and in any case it would be a
           | different standard at that point.
        
         | layer8 wrote:
         | UTF-32 doesn't completely solve this either, because grapheme
         | clusters.
        
           | pnathan wrote:
           | oh, _bother_. :-/
        
           | bsder wrote:
           | And endianness.
        
             | layer8 wrote:
             | Since Unicode only uses 21 bits, it would be possible to
             | define an endianness-safe 32-bit encoding. E.g. shift left
             | by one and set the LSB.
        
         | jffry wrote:
         | > why UTF-32 didn't catch on
         | 
         | > memory wasteful
         | 
         | The answer is in the question really. If you've got a big pile
         | of mostly-ascii data, quadrupling memory/storage to encode it
         | as UTF32 is going to be a pretty tough sell
        
           | ok123456 wrote:
           | UTF-32 isn't even guaranteed to be a single code point.
        
             | deathanatos wrote:
             | No, UTF-32 code units are Unicode scalar values1, always.
             | 
             | They are not grapheme clusters, such as the "family: man,
             | woman, boy" emoji from TFA.
             | 
             | 1which is approximately what I think you're saying here.
             | I.e., you're trying to say that a code point might span
             | multiple UTF-32 code units; that is not correct. (It should
             | be simple to see how a code point, which has the range [0,
             | 0x10FFFF], can always fit into a u32.)
        
           | Shish2k wrote:
           | How many "big piles of mostly-ascii data" _are_ there though?
           | (Does anyone want to write a script which searches  /dev/mem
           | and categorises pages of RAM into ascii-or-not-ascii so we
           | can get some meaningful numbers? :P )
           | 
           | (If you're doing number-crunching on giant CSVs, maybe I can
           | see it being important, but all the ascii files on my desktop
           | that I can think of are pretty trivial)
        
             | layer8 wrote:
             | If we'd design computing technology from scratch today, we
             | might be using 32-bit bytes, or maybe even 64-bit ones. If
             | memory usage is not a concern, there's really no need to
             | have smaller units.
             | 
             | Our world however runs on 8-bit bytes, so it makes some
             | sense for text to be based on that.
             | 
             | But also, consider Base64 in UTF-32-encoded JSON. ;)
        
             | jcranmer wrote:
             | > How many "big piles of mostly-ascii data" are there
             | though?
             | 
             | Well, the use case mentioned in the article is a pretty
             | good one: program source code. Even if you're going to be
             | writing in a foreign language, all of the fancy punctuation
             | and whitespace that does useful stuff in the language ends
             | up being ASCII, and a good hunk of the standard library is
             | likely to have ASCII names for types and functions, etc.
        
             | RobotToaster wrote:
             | >How many "big piles of mostly-ascii data" are there
             | though?
             | 
             | You just posted this to one.
        
             | steveklabnik wrote:
             | Pretty much every website, due to HTML being ASCII.
        
               | ravi-delia wrote:
               | I feel like compression will do a better job than
               | deciding on an encoding scheme ahead of time, no? Once
               | gziped I wouldn't expect a difference
        
               | steveklabnik wrote:
               | I thought I'd read something about it, but when I
               | googled, what I did find was this old HN comment:
               | https://news.ycombinator.com/item?id=8514519
               | 
               | > UTF-8 + gzip is 32% smaller than UTF-32 + gzip using
               | the HN frontpage as corpus.
        
               | JoshTriplett wrote:
               | You still have to decompress it on the other end, to
               | actually parse and use it. At which point you have four
               | times the memory usage, unless you turn it into some
               | smaller in-memory encoding...such as UTF-8.
        
       | bitwize wrote:
       | Next time just use Visual Studio Code for Rust development.
        
       | recursive wrote:
       | Small mistake.
       | 
       | > High surrogates are D800-DB7F
       | 
       | Akshually, high surrogates extend all the way to DBFF.
       | 
       | https://unicode-table.com/en/blocks/high-surrogates/
        
       | KingLancelot wrote:
       | [dead]
        
       | teddyh wrote:
       | This is a bug in lsp-mode, a third-party Emacs package which is
       | an LSP client. Since Emacs version 29, Emacs will include in its
       | standard distribution a _different_ LSP client package, _eglot_ ,
       | which does not seem to have this bug:
       | 
       | https://github.com/joaotavora/eglot/blob/e501275e06952889056...
        
       | zamalek wrote:
       | To be fair, other operating systems took a while to get unicode
       | right.
        
       | secondcoming wrote:
       | I gave up trying to read this. Where was the bug?
        
         | yoru-sulfur wrote:
         | lsp-mode: https://github.com/emacs-lsp/lsp-mode/issues/2080
        
         | deathanatos wrote:
         | The LSP protocol sends indexes. Insanely, those indexes are in
         | terms of UTF-16 code units. Emacs's LSP client implementation
         | here is sending the wrong index: 8 for the emoji's index, but 9
         | for the index of the next "character". But an emoji spans two
         | UTF-16 code units, so the next index is 10.
         | 
         | Rust-analyzer simply crashes here, but it's been fed hot
         | garbage by the editor. One might argue it shouldn't crash. TFA
         | digs into the details around that, too, because Amos leaves no
         | stone unturned.
        
           | teddyh wrote:
           | Nitpick: lsp-mode is _not_ "Emacs 's LSP client". Emacs
           | recently chose to include "eglot-mode" as part of Emacs
           | itself, and _eglot-mode_ must therefore be considered to be
           | Emacs' official LSP client, not "lsp-mode".
        
           | zerocrates wrote:
           | I imagine this "UTF-16 code unit indexing" decision is just
           | an artifact of the fact that that's how JavaScript works with
           | strings, and LSP comes from VSCode.
        
       | kerkeslager wrote:
       | That's an awful lot of time spent talking about your Emacs
       | config...
        
         | [deleted]
        
         | mi_lk wrote:
         | +1 appreciate the effort, but just get to the point thanks...
        
           | dmm wrote:
           | I liked the emacs config discussion.
        
         | kelnos wrote:
         | Agreed, but I personally enjoyed the detour into "wow, our
         | tools have terrible UX" territory. Totally get that some people
         | would find it superfluous, but I thought it was fun.
         | 
         | I'm a vim user, and find the landscape to be pretty bad there
         | too; it's nice to see that emacs is no better (and IMO worse,
         | based at least on this one example).
        
         | Ygg2 wrote:
         | It's fasterthanlime. It always takes ages to get to the point.
        
           | deathanatos wrote:
           | I love the author's style. Not every blog need do it, but I
           | appreciate that fasterthanli.me does it.
           | 
           | To me, it is intellectually honest: this shows a reader every
           | step along the way, every painful trail that must be overcome
           | from point A to point B. Nothing is omitted. And I think the
           | sooner we all did this, as an industry, the sooner the very
           | _many_ problems and bugs that exist (that get hit before we
           | can even  "get to the point", as you say) would get dragged
           | into the light, and maybe we'd progress, as a society,
           | towards having computers that weren't shit.
        
             | Ygg2 wrote:
             | > Nothing is omitted
             | 
             | To quote internet reviewer: Brevity is the soul of wit.
             | That means stop wasting my time. Keep it nice and simple.
             | 
             | Look. Like what you want, I'm free to prefer a shorter
             | form, and to point out this is part of author's style.
        
       ___________________________________________________________________
       (page generated 2023-02-13 23:01 UTC)