[HN Gopher] The bottom emoji breaks rust-analyzer
___________________________________________________________________
The bottom emoji breaks rust-analyzer
Author : todsacerdoti
Score : 120 points
Date : 2023-02-13 16:22 UTC (6 hours ago)
(HTM) web link (fasterthanli.me)
(TXT) w3m dump (fasterthanli.me)
| fasterthanlime wrote:
| This submission went from #5 to #42 in a minute so I'm assuming
| it's been deranked. An explanation would be nice, in case a
| moderator is around.
|
| edit: now it's no longer there at all. Welp.
| [deleted]
| Macha wrote:
| Could be automated, there's the flamebait detector that deranks
| if it gets a high comments/vote ratio.
| steveklabnik wrote:
| (not a mod just been on this website forever)
|
| At this exact moment, there's 49 points, but 45 comments. If
| comments >= points, you get a _significant_ ranking penalty
| (the "flamebait detector", iirc). I can't say for sure that
| that's happened, but given how close the two numbers are, I bet
| at some point that was true.
| fasterthanlime wrote:
| Ahhh, that sounds likely, thanks!
| aendruk wrote:
| Misleading headline.
|
| > the actual bug: let's add to our code.. an emoji! _Any_ emoji.
|
| > rust-analyzer adheres to the LSP spec. And _lsp-mode_ doesn 't.
|
| So "emojis break an Emacs extension".
|
| Though for me the real takeaway is that LSP specifies UTF-16
| offsets. That sounds unpleasant to work with.
| [deleted]
| pja wrote:
| > Though for me the real takeaway is that LSP specifies UTF-16
| offsets. That sounds unpleasant to work with.
|
| Speaking as someone who has written an lsp server: yes, it is
| indeed unpleasant to work with.
| Ygg2 wrote:
| Thanks, for people looking for bug, see
| https://github.com/emacs-lsp/lsp-mode/issues/2080
| lacrosse_tannin wrote:
| story of my life with emacs.
| bhaney wrote:
| First time I've heard "pleading face" referred to as "the bottom
| emoji"
| bigbottomenergy wrote:
| Hoo _ooo_ boy if that ain 't on the nose though.
| bitwize wrote:
| Hacker culture is queer now, and queer, kinky (but SSC) sex is
| an integral part of that.
|
| If it helps any, its official name is U+1F97A, FACE WITH
| PLEADING EYES.
| dragonwriter wrote:
| It's a very weird choice of descriptions _in this context_ ,
| since it is neither about the domain in which in which that
| meaning applies, nor, even, anything really specifically about
| _that_ emoji in particular. It's exactly like unnecessarily
| dropping U+1F346 AUBERGINE into a conversations that applies to
| any emoji...and calling it "the male genitalia emoji".
| l3mure wrote:
| Never before has a programmer joked about getting screwed by
| a bug, the audacity!
| [deleted]
| OJFord wrote:
| I chuckled at the headline, assumed it would be 'peach', now
| I'm just confused? The article doesn't seem to address it,
| except to link https://unicode-table.com/en/1F97A/ as 'U+1F97A
| Bottom Face Emoji' (which that page doesn't say at all), so
| seems very deliberate, must be some joke we're not in on (on
| the 'right' subreddits for or whatever) I suppose.
| TaylorAlexander wrote:
| In communities that enjoy sexual power play you have a
| top/dominant and a bottom/submissive.
|
| This is particularly prevalent amongst trans people on
| twitter (where I see it used a lot) you see a lot of people
| using this emoji under for example a powerful looking selfie
| of someone who appears dominant, as a playful offer of
| submission.
| tomxor wrote:
| That's kind of weird to label it based on sexual roles by
| default. How would they go about explaining that label to a
| child? How about puss in boots face?
| [deleted]
| OJFord wrote:
| No no, that's why some people here are confused and out
| of the loop - the actual unicode name is _not_ 'bottom
| face', but 'pleading face' or 'face with pleading eyes'
| (I'm not sure of a good reference for 'actual' names, got
| those from two different unicode dictionary type sites).
| tomxor wrote:
| I'm referring to the author's naming not the unicode
| name. Feels like they have a gender agenda to push (sorry
| :D).
| OJFord wrote:
| Oh I see. Yes does seem a bit odd, but their blog to do
| what they want with I suppose - they don't need to
| explain it to a child. Also it's possibly just so
| familiar to them as that they didn't think about it,
| don't know it as anything else.
| TaylorAlexander wrote:
| Top/bottom don't even have anything to do with gender,
| and it wouldn't matter if they did.
| tomxor wrote:
| It's a fucking pleading face and they made it about
| sexual roles for no good reason... It's stupid.
| TaylorAlexander wrote:
| Every meme is stupid. That's the point.
|
| Also, pleading is an actual part of this very common
| sexual dynamic. If you're just made uncomfortable by sex
| that's fine but you don't have to turn that in to
| judgment of people who aren't.
| tomxor wrote:
| > If you're just made uncomfortable by sex that's fine
| but you don't have to turn that in to judgment of people
| who aren't.
|
| You accuse me of making assumptions and then make your
| own incorrect ones.
|
| I find it annoying when people weave their personal non-
| technical agendas into technical discussions. I can
| anticipate your response now... there is no agenda. But
| go read the article and you will notice is not even
| specific to this one emoji.
|
| [edit]
|
| Keep up the good work mr/ms downvoter, I've got a lot of
| points to burn and no shop to spend them in. Good to know
| I'm pissing off one person who is chronically offended by
| my skepticism.
| TaylorAlexander wrote:
| > Keep up the good work mr/ms downvoter
|
| It's they/them downvoter actually.
|
| But seriously, as far as I can tell it's not possible to
| downvote people who have replied directly to me. It's
| just other people downvoting you because what you're
| saying is unpopular.
|
| You're certainly not pissing me off. That's why I'm
| trying to calmly explain your errors. But it sounds like
| you've made up your mind and take disagreement as some
| kind of encouragement.
| tomxor wrote:
| Yes, I know it's not you.
|
| > But it sounds like you've made up your mind and take
| disagreement as some kind of encouragement.
|
| I do find disagreement of silent onlookers with nothing
| to contribute irritating yes. Life is too short to fear
| controversial or unpopular ideas so I take it as
| encouragement. I'm happy to change my opinion in the face
| of compelling arguments, not thoughtless people.
| sophacles wrote:
| > fucking pleading face...about sexual roles
|
| Exactly.
|
| (also, you are so worried about the kids, why are you
| making sex about anger in your anger over sex?)
| tomxor wrote:
| I'm not worried about kids, and I don't care about sex,
| I'm making a point that this has been arbitrarily
| sexualised for no reason. Obviously you wouldn't give it
| that label by default, so what is the author trying to do
| here? This is supposed to be a technical problem.
|
| I also find it funny that pointing out something is a
| weird and distracting choice is considered angry, and I
| can only assume by the onslaught of downvotes,
| uninclusive or something... or maybe my "what if" has
| been interpreted as a "think of the children", hard to
| know when people don't bother forming an argument.
| bigbottomenergy wrote:
| > I'm making a point that this has been arbitrarily
| sexualised for no reason.
|
| Wow, you must spend literally every waking second of
| every single day enraged.
| tomxor wrote:
| whoosh
| sophacles wrote:
| The pattern: It's a fucking $X and they $Y
|
| Idiomatically signifies exasperated frustration/anger in
| many contexts.
|
| My entire comment was just a play on words, pointing out
| that the word "fucking" means "having sex" (i know it's
| one of those words with a ton of shades and nuance so
| there are lots of uses for it but for the play on words,
| let's be literal - also easier when the context is sexual
| to begin with). When read in that light - "a having sex
| pleading face and they made it about sexual roles" is
| amusing, no?
| tomxor wrote:
| Fair enough lol, lost in the medium of text I think.
| "Fuck" is funnily enough one of if not the most adaptive
| english words.
| dllthomas wrote:
| If it were a "fucking pleading" face, it would pretty
| explicitly be about sexual roles.
| bigbottomenergy wrote:
| Imagine my surprise in Latin American places where
| Activo/Passivo is exclusively used to describe
| top/bottom. I kinda have to assume that those words mean
| more than their congnates in context, or else I kinda
| feel bad for gay Latin communities being shoe-horned into
| archaic roles.
| yusefnapora wrote:
| Perhaps the author assumes that a child with a deep
| interest in rust Unicode edge cases has likely been on
| the internet before, and may well have been exposed to
| the existence of sex?
| TaylorAlexander wrote:
| The emoji has another name, "pleading face". This is
| basically how it became to be used by bottoms. Tops don't
| plead, but bottoms do.
|
| There's lots of emojis that the culture has given
| alternate meanings to, for example the peach and the
| eggplant.
|
| You could ask "how would I explain this to a child" to
| anything adults talk about that is sexual in nature.
| Usually the answer is "don't."
|
| I don't think TFA was written for children and the author
| wanted to include this common internet meme in to their
| article title.
| verpen wrote:
| [flagged]
| TaylorAlexander wrote:
| The term "perverts" implies this behavior is abnormal but
| it's very common. But in the USA for example the dominant
| culture pretends like this doesn't happen and encourages
| repression of these desires. I'd rather adults be free to
| express themselves in healthy ways than judge and label
| them as "perverts" despite their feelings being common
| and normal.
| peyton wrote:
| I don't mind, but it's definitely okay for people to be
| offended by the use of (what seems to be?) BDSM fetish
| community terminology. I don't think this is a common
| term. If I referred to the cat face emoji as the pussy
| emoji or the cunt emoji in a Rust programming language
| article, I'd expect some pushback.
| hexmiles wrote:
| i think it also depend on the social group/context. in
| some place the word bottom kinda lost some of his
| "sexual" connotation and become just a "fun" world for
| submissive/unassertive.
|
| (to be clear i also think is okay not to like this kind
| of language)
| verpen wrote:
| Well, I'd rather that creepy males didn't parade their
| sexual fetishes everywhere and pretend that it's totally
| normal and fine to do so.
| sc11 wrote:
| It's used by lesbians just as often and giving vs
| receiving has nothing to do with fetishes. You can have
| the most vanilla sex imaginable and still have a
| top/bottom.
| mplewis wrote:
| You really created a new account just to post a rude
| comment like this?
| tom_ wrote:
| Presumably this:
| https://en.m.wikipedia.org/wiki/Gay_sex_roles
| thomasahle wrote:
| Was I the only one looking for [?]?
| DoctorNick wrote:
| Yes.
| dllthomas wrote:
| You were not.
| DoctorOW wrote:
| It's common in gay circles
| enriquto wrote:
| Interesting. Is there a corresponding "top" emoji?
| HL33tibCe7 wrote:
| Appropriately, it doesn't exist
| drewbug01 wrote:
| I appreciate the subtle humor here, but I suspect most of
| the HN crowd won't quite get it.
| thomasahle wrote:
| Unicode has these four: https://en.wikipedia.org/wiki/List_
| of_logic_symbols#:~:text=...
| DoctorOW wrote:
| Tops are considered less likely to use emoji from my
| understanding. They're supposed to be more stoic.
| [deleted]
| chungy wrote:
| There was a discussion on HN recently about it:
| https://news.ycombinator.com/item?id=34454165
| avgcorrection wrote:
| First I rolled my eyes when some HN users had to explain to
| other HN users what that "puppy dog eyes" thing meant. But I
| felt a bit worse when I first heard _that_ name for it, being
| used as if it was canonical.
| fasterthanlime wrote:
| Take it as a sign that you need to hang out with more diverse
| crowds!
| favaq wrote:
| If anything I've taken it as a sign that I made the right
| choice not hanging out with diverse crowds.
| decremental wrote:
| Or people just need to stop being gross.
| bhaney wrote:
| I'm not really sure why I should take it as a sign of that.
| What crowd does this indicate I don't hang out with?
| Ygg2 wrote:
| LGBT+ BDSM enthusiast.
| tullo_x86 wrote:
| Queer people. This is an easily-inferred reference for
| anyone who's decently-good friends with at least one gay
| guy or lesbian.
| bhaney wrote:
| I have a handful of very close friends who are gay, but
| I've never heard of this bottom-face emoji thing before
| today.
|
| And I'm a little concerned why not knowing about
| something that seems like an esoteric piece of fetish-
| related in-group communication implies that I don't hang
| out with queer people? Is there an expectation that in
| order to hang out with people belonging to some group, I
| need to learn specific lingo regarding that group's
| sexual power dynamics?
| justincredible wrote:
| Yes, otherwise you're a bigot trying to erase queer
| people. /s
| mplewis wrote:
| You could simply get to know people better and you'd end
| up learning more about their culture and language.
| pnathan wrote:
| So this all justifies my general stance of "find a Unicode
| expert" when questions of unicode come up. But that's pretty
| wilfully blind (although practical).
|
| I would like to understand why UTF-32 didn't catch on as The
| Standard Unicode for the modern world. it seems that - albeit
| memory wasteful - it would sidestep a lot of these issues.
| jcranmer wrote:
| One of the root problems here is that a concept like
| "character" is extremely underdefined. What you want to do in
| terms of counting the memory a string will take up, indicating
| a particular spot in the string, mapping to arrow keys or the
| backspace key in a visual program, knowing how far to indent to
| drop a caret in a visual error report, knowing the width of the
| string in a visual (especially fixed-width) text display. For
| ASCII text, you can use the _same_ value to represent all
| possible slightly different definitions, but in non-ASCII, you
| need to use _different_ definitions, and there 's no one true
| definition that solves all use cases (no, not even grapheme
| clusters).
| harikb wrote:
| UTF-8 has no upper limit on the number of possible characters /
| emoji's, now or in the future.
|
| Everything else has.
|
| And then is UTF-16 which has all the pains of UTF-8 with none
| of the advantages of UTF-32
| doubleunplussed wrote:
| That's not quite right. UTF-8 is not arbitrary length.
|
| Officially, it's at most four bytes, of which 21 bits are
| usable for encoding codepoints - so that's an upper limit of
| 2^21 codepoints.
|
| There is an initial byte encoding the length as a series of
| ones, so if you went ahead and extended the standard to
| simply allow more bytes, you could get up to 8 bytes, of
| which 48 bits would be usable.
|
| I can see that a six-byte version with 31 data bits was
| previously standardised before they settled on four.
|
| I guess you could extend it further by allowing more than one
| initial byte encoding the length, then it would be arbitrary
| length. But at that point I'm not sure if it loses its self-
| synchronising ability, and in any case it would be a
| different standard at that point.
| layer8 wrote:
| UTF-32 doesn't completely solve this either, because grapheme
| clusters.
| pnathan wrote:
| oh, _bother_. :-/
| bsder wrote:
| And endianness.
| layer8 wrote:
| Since Unicode only uses 21 bits, it would be possible to
| define an endianness-safe 32-bit encoding. E.g. shift left
| by one and set the LSB.
| jffry wrote:
| > why UTF-32 didn't catch on
|
| > memory wasteful
|
| The answer is in the question really. If you've got a big pile
| of mostly-ascii data, quadrupling memory/storage to encode it
| as UTF32 is going to be a pretty tough sell
| ok123456 wrote:
| UTF-32 isn't even guaranteed to be a single code point.
| deathanatos wrote:
| No, UTF-32 code units are Unicode scalar values1, always.
|
| They are not grapheme clusters, such as the "family: man,
| woman, boy" emoji from TFA.
|
| 1which is approximately what I think you're saying here.
| I.e., you're trying to say that a code point might span
| multiple UTF-32 code units; that is not correct. (It should
| be simple to see how a code point, which has the range [0,
| 0x10FFFF], can always fit into a u32.)
| Shish2k wrote:
| How many "big piles of mostly-ascii data" _are_ there though?
| (Does anyone want to write a script which searches /dev/mem
| and categorises pages of RAM into ascii-or-not-ascii so we
| can get some meaningful numbers? :P )
|
| (If you're doing number-crunching on giant CSVs, maybe I can
| see it being important, but all the ascii files on my desktop
| that I can think of are pretty trivial)
| layer8 wrote:
| If we'd design computing technology from scratch today, we
| might be using 32-bit bytes, or maybe even 64-bit ones. If
| memory usage is not a concern, there's really no need to
| have smaller units.
|
| Our world however runs on 8-bit bytes, so it makes some
| sense for text to be based on that.
|
| But also, consider Base64 in UTF-32-encoded JSON. ;)
| jcranmer wrote:
| > How many "big piles of mostly-ascii data" are there
| though?
|
| Well, the use case mentioned in the article is a pretty
| good one: program source code. Even if you're going to be
| writing in a foreign language, all of the fancy punctuation
| and whitespace that does useful stuff in the language ends
| up being ASCII, and a good hunk of the standard library is
| likely to have ASCII names for types and functions, etc.
| RobotToaster wrote:
| >How many "big piles of mostly-ascii data" are there
| though?
|
| You just posted this to one.
| steveklabnik wrote:
| Pretty much every website, due to HTML being ASCII.
| ravi-delia wrote:
| I feel like compression will do a better job than
| deciding on an encoding scheme ahead of time, no? Once
| gziped I wouldn't expect a difference
| steveklabnik wrote:
| I thought I'd read something about it, but when I
| googled, what I did find was this old HN comment:
| https://news.ycombinator.com/item?id=8514519
|
| > UTF-8 + gzip is 32% smaller than UTF-32 + gzip using
| the HN frontpage as corpus.
| JoshTriplett wrote:
| You still have to decompress it on the other end, to
| actually parse and use it. At which point you have four
| times the memory usage, unless you turn it into some
| smaller in-memory encoding...such as UTF-8.
| bitwize wrote:
| Next time just use Visual Studio Code for Rust development.
| recursive wrote:
| Small mistake.
|
| > High surrogates are D800-DB7F
|
| Akshually, high surrogates extend all the way to DBFF.
|
| https://unicode-table.com/en/blocks/high-surrogates/
| KingLancelot wrote:
| [dead]
| teddyh wrote:
| This is a bug in lsp-mode, a third-party Emacs package which is
| an LSP client. Since Emacs version 29, Emacs will include in its
| standard distribution a _different_ LSP client package, _eglot_ ,
| which does not seem to have this bug:
|
| https://github.com/joaotavora/eglot/blob/e501275e06952889056...
| zamalek wrote:
| To be fair, other operating systems took a while to get unicode
| right.
| secondcoming wrote:
| I gave up trying to read this. Where was the bug?
| yoru-sulfur wrote:
| lsp-mode: https://github.com/emacs-lsp/lsp-mode/issues/2080
| deathanatos wrote:
| The LSP protocol sends indexes. Insanely, those indexes are in
| terms of UTF-16 code units. Emacs's LSP client implementation
| here is sending the wrong index: 8 for the emoji's index, but 9
| for the index of the next "character". But an emoji spans two
| UTF-16 code units, so the next index is 10.
|
| Rust-analyzer simply crashes here, but it's been fed hot
| garbage by the editor. One might argue it shouldn't crash. TFA
| digs into the details around that, too, because Amos leaves no
| stone unturned.
| teddyh wrote:
| Nitpick: lsp-mode is _not_ "Emacs 's LSP client". Emacs
| recently chose to include "eglot-mode" as part of Emacs
| itself, and _eglot-mode_ must therefore be considered to be
| Emacs' official LSP client, not "lsp-mode".
| zerocrates wrote:
| I imagine this "UTF-16 code unit indexing" decision is just
| an artifact of the fact that that's how JavaScript works with
| strings, and LSP comes from VSCode.
| kerkeslager wrote:
| That's an awful lot of time spent talking about your Emacs
| config...
| [deleted]
| mi_lk wrote:
| +1 appreciate the effort, but just get to the point thanks...
| dmm wrote:
| I liked the emacs config discussion.
| kelnos wrote:
| Agreed, but I personally enjoyed the detour into "wow, our
| tools have terrible UX" territory. Totally get that some people
| would find it superfluous, but I thought it was fun.
|
| I'm a vim user, and find the landscape to be pretty bad there
| too; it's nice to see that emacs is no better (and IMO worse,
| based at least on this one example).
| Ygg2 wrote:
| It's fasterthanlime. It always takes ages to get to the point.
| deathanatos wrote:
| I love the author's style. Not every blog need do it, but I
| appreciate that fasterthanli.me does it.
|
| To me, it is intellectually honest: this shows a reader every
| step along the way, every painful trail that must be overcome
| from point A to point B. Nothing is omitted. And I think the
| sooner we all did this, as an industry, the sooner the very
| _many_ problems and bugs that exist (that get hit before we
| can even "get to the point", as you say) would get dragged
| into the light, and maybe we'd progress, as a society,
| towards having computers that weren't shit.
| Ygg2 wrote:
| > Nothing is omitted
|
| To quote internet reviewer: Brevity is the soul of wit.
| That means stop wasting my time. Keep it nice and simple.
|
| Look. Like what you want, I'm free to prefer a shorter
| form, and to point out this is part of author's style.
___________________________________________________________________
(page generated 2023-02-13 23:01 UTC)