[HN Gopher] DuckDuckGo \u202E
___________________________________________________________________
DuckDuckGo \u202E
Author : zeepzeep
Score : 146 points
Date : 2022-02-15 20:11 UTC (2 hours ago)
(HTM) web link (duckduckgo.com)
(TXT) w3m dump (duckduckgo.com)
| kroltan wrote:
| It's intentional, if you inspect the `innerText` you'll see it's
| reversed there too:
| zero_click_wrapper.innerText.codePointAt(0)
|
| Evaluates to 32. And if you think 32 = 0x20 could mean the next
| one would be 0x2E, then no, codePointAt(1) is 0x55.
| nneonneo wrote:
| `innerText` doesn't include the RTL marker, probably due to the
| fact that it is supposed to reflect the "rendered" appearance
| of the element (i.e. deleting certain invisible characters).
| However, `textContent` shows the RTL marker as expected.
|
| I'm on the side of this being an unintentional effect.
| benbristow wrote:
| Reversed: U+202E RIGHT-TO-LEFT OVERRIDE, decimal: 8238, HTML: No
| visual representation, UTF-8: 0xE2 0x80 0xAE, block: General
| Punctuation
| gambler wrote:
| Extremely bad design. This kind of complexity should have been
| moved to some kind of post-processing spec rather than core
| Unicode. It's already causing issues and will cause more. The
| more universal something is, the more effort should be applied to
| keeping it simple.
| [deleted]
| the_mitsuhiko wrote:
| I strongly disagree. This is a necessary part to shared content
| text and pushing this type of functionality into another layer
| makes a lot of content non accessible in basic text format.
| This is precisely the type of control character that makes
| Unicode such a powerful and successful system.
| [deleted]
| mananaysiempre wrote:
| ... It's not clear how? Except by telling every speaker of
| Arabic and Hebrew saying they want some of that delicious
| "plain text" action to go screw themselves (there are _no_
| purely-RTL texts, only bidirectional ones, not least because of
| the Indic numerals). AFAIU (at least from the full-length
| horror novel that is the CDRA) IBM tried presentation-order
| (and no-complex-shaping) RTL text for decades and gave up, so
| Unicode bidi is essentially the result of said giving up (and
| the "Arabic Presentation Forms" block the foul-smelling corpse
| of the idea).
|
| Specify the dominant direction of your user-input-containing
| elements, people, and/or enclose the input in U+2068 FSI ...
| U+2069 PDI (after balancing outstanding bidi controls inside).
| soheil wrote:
| What's next, searching for the word death causes you to die?
| soheil wrote:
| Where does DDG get its search result? Do they scrape Google? If
| so how do they not bet banned both technically and legally?
| thesuitonym wrote:
| They have their own web crawlers, as well as a deal with Bing
| (And perhaps others)
| sp332 wrote:
| https://help.duckduckgo.com/duckduckgo-help-pages/results/so...
| echelon wrote:
| You still have to be mindful of \u202e in anything new that
| you're writing, but browsers do a much better job of not having
| it bleed across elements like they did back in the 2000s.
|
| Back in the era of forums that didn't support unicode correctly
| (2005ish?), it was trollish fun to post messages containing
| \u202E and watch the UI and all subsequent messages and elements
| get messed up. (One stray \u202E would flip the entire page
| contents following it.) I never took it to a level of abuse since
| it was easy to remove and then ban offenders, but it was fun in a
| one-off thread, and it always had great reactions.
|
| I patched my own software to handle it, but I don't recall anyone
| really abusing it in a widespread manner. (Contrast this with the
| era of prolific and widely abused AOL/AIM exploits that would
| kill your IM client with malformed messages.)
|
| IIRC, a bunch of messaging clients also didn't (or still don't)
| handle \u202e termination and it sometimes bled into new messages
| and even the text input box. That was pretty horrible and
| unfixable without restarting.
|
| Obligatory XKCD: https://xkcd.com/1137/
|
| Some shenanigans in the wild:
|
| https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_rig...
|
| https://twitter.com/mkolsek/status/1237123571341803522
|
| (These are way tamer than the effects used to be.)
|
| (Also, HN filters it out. I tried to have some fun. :P)
| splch wrote:
| Oh that's cute! Translation for anyone curious / lazy:
|
| Punctuation General :block ,0xAE 0x80 0xE2 :8-UTF ,representation
| visual No :HTML ,8238 :decimal ,OVERRIDE LEFT-TO-RIGHT 202E+U
|
| Love the demos :)
| heartbeats wrote:
| Why can't I just disable RTL on my system?
|
| I do not speak a word of Arabic. There is no circumstance in
| which my life will be materially improved by correct RTL text
| rendering. I might want proper display of individual characters
| so I can copy-paste them, but I have no use for RTL text.
|
| On the other hand, RTL causes a lot of unpleasant problems like
| this. Why can't I simply coerce all foreign languages into LTR?
| hnlmorg wrote:
| If there was ever a clear signal that working with Unicode is
| incredibly hard, it would be the fact that no one on HN can
| decide if this is accidental or intentional.
| [deleted]
| tedunangst wrote:
| A significant portion of the problem seems to be that some
| people can't even identify what's going because the tools
| they're using to inspect the page are also showing it reversed.
| divbzero wrote:
| Let me take a stab at a definitive answer:
|
| - It is unintentional for DuckDuckGo. The code for DuckDuckGo
| works correctly but no one who wrote that code thought about
| whether a reversal would happen.
|
| - It is intentional for the browser. The code for the browser
| works correctly and someone who wrote that code actively
| thought through how to make a reversal happen.
|
| I don't think 'accidental' is the right word to use in either
| case because the outcome is what you would want.
| shockeychap wrote:
| This! Also, https://news.ycombinator.com/item?id=21105625
| tshaddox wrote:
| It certainly looks like a simple template that DDG applies
| consistently to all queries for a UTF-8 byte literal. It's the
| exact same template for a query for a more straightforward
| literal, like u0041.
|
| So I think it's fair to say that it's not intentional in the
| sense of being a deliberately added easter egg. Of course, they
| might be aware of the behavior and decided to leave it that
| way.
| barbazoo wrote:
| And some of us don't even get what this is about. Should I be
| seeing DDG doing something particular here?
| dtech wrote:
| The "answer" tab is right to left
| barbazoo wrote:
| I had that turned off. Thanks for explaining it.
| [deleted]
| iqanq wrote:
| It's accidental, because other characters are also displayed:
| https://duckduckgo.com/?q=u20aa
| Retr0id wrote:
| It's intentional, because there is no RTL override in the
| HTML source, the string is merely reversed.
| dzaima wrote:
| but there is, see: document.querySelector("
| .zci__body").textContent.charCodeAt(0) document.query
| Selector(".zci__body").textContent.substring(1)
| progval wrote:
| > no RTL override in the HTML source, the string is merely
| reversed
|
| What? After opening the source, ctrl-f "representation"
| selects the reversed word. The source view just happens to
| interpret the RTL override.
| Jerrrry wrote:
| Stacking combining diacritics[1] is also fun, to make extremely
| tall text.
|
| Also fun is enumerating all the characters in the Private
| Character section[2] to see what UI symbols are able to be
| inserted into unintended places.
|
| [1] https://www.unicode.org/charts/PDF/U0300.pdf
|
| [2] http://www.unicode.org/faq/private_use.html
| https://www.unicode.org/charts/PDF/UE000.pdf
| amelius wrote:
| > This is often abused by hackers to disguise file extensions:
| when using it in the file name my-text.'U+202E'cod.exe, the file
| name is actually displayed as my-text.exe.doc
|
| So every programmer has to know about and support U+202E, but not
| filesystem programmers?
| mananaysiempre wrote:
| More like UI programmers? It seems that almost everyone has
| agreed that text-processing smarts inside a filesystem are a
| bad idea (see: the NTFS collation table, the APFS transition
| away from ancient-version-NFD-but-not-quite), although there is
| that island of (admittedly very smart) -insensitive but
| -preserving holdouts (casing on Windows, normalization on ZFS).
| Linus rants on the topic[1] passionately, if not very
| informatively.
|
| Note that U+202E is a _control code_ that has effect on
| _display_ , not the logical order of the text (much like, say,
| a bare CR), so I can't say what the filesystem is doing wrong
| here (except maybe for not rejecting this outright, but see re
| smarts above, this probably needs to be done on a higher
| level). You don't blame the filesystem for believing the
| filename "A\rB.txt" starts with A and not B, do you? Even
| though ls will say otherwise.
|
| Bidi IRIs (which _are_ at that higher level) are kind of
| horrendous, though.
|
| [1] https://yarchive.net/comp/linux/utf8.html
| tedunangst wrote:
| What do you want the filesystem programmer to do?
| foxfluff wrote:
| if (!isascii(c)) panic("stupid user");
| tyingq wrote:
| That's pretty much correct. Most of the filesystems I'm aware
| of just treat filenames as a "string of bytes" with some list
| of characters that aren't allowed, and perhaps a few other
| rules. Other than that, it's a free-for-all on names.
| jamescodesthing wrote:
| Same works for urls.
| TadeusTaD wrote:
| Instantly reminded me of a relevant xkcd: https://xkcd.com/1137/
| zeepzeep wrote:
| Hey that's new to me, I'll use this, thanks.
| tobz1000 wrote:
| Easter egg or bug?
| Waterluvian wrote:
| Poe's Law applied to coding easter eggs? :D
| rackjack wrote:
| Easter bug?
| zeepzeep wrote:
| That's the question!
|
| (I think it's unintended though)
| oneplane wrote:
| bug egg? it's also an instant answer from the community (the
| little info icon on the right hand side) so perhaps just
| presented that way due to how it was delivered by that specific
| community member.
| jfk13 wrote:
| Similarly, if I try https://www.google.com/search?q=u202e, the
| second result I currently get (YMMV) is from https://unicode-
| table.com/, and almost the entire snippet shows up backwards in
| the search results.
| Sebb767 wrote:
| I'm not sure whether this is a bug or a feature^Weaster egg
| BitwiseFool wrote:
| I'm out of the loop, what kind of Easter Egg is it?
| brimble wrote:
| The text in the instant-answer bar is reversed for this
| result. Which could plausibly either be on purpose, or a
| result of the character itself being inserted and not
| escaped, so having its intended effect.
| pwdisswordfish9 wrote:
| Oversight, probably. By default, the code point is displayed
| next to that description, and they don't turn that off for
| bidirectional control characters.
|
| https://duckduckgo.com/?q=u1f4a9
|
| (Yes, I have that one memorized)
| [deleted]
| gunapologist99 wrote:
| Are there any lists of unicode characters (like the OWASP one)
| that should be blacklisted from most apps (not just for XSS, but
| even for desktop apps)?
|
| Are there any good security guides/best practices for unicode
| sanitation?
| wongarsu wrote:
| How are users supposed to write "`bvr l duckduckgo.com kdy
| lkhpsh byntrnt" without \u202E? It's perfectly normal for RTL
| languages to switch text direction in the middle of a sentence.
| harambae wrote:
| Not a full security guide, but if you haven't seen this before
| it's useful to have...
|
| https://github.com/danielmiessler/SecLists/blob/master/Fuzzi...
| adamrezich wrote:
| I've seen this before but either this is new since last time
| or I missed it, either way: lol # Human
| injection # # Strings which may cause human
| to reinterpret worldview If you're reading
| this, you've been in a coma for almost 20 years now. We're
| trying a new technique. We don't know where this message will
| end up in your dream, but we hope it works. Please wake up,
| we miss you.
| sterlind wrote:
| please don't blacklist U+202D and U+202E or the Private Use
| Area. my conlang has a right-to-left cursive script, and it's
| not in Unicode. the characters live in the PUA and my font
| renders them as a fallback. there's no mechanism for fonts to
| ask for RTL, so I have to use bidi override.
| sp332 wrote:
| I don't think this is a good place for a blacklist. Text
| effects should be encapsulated and reset at the end of the text
| block, the way bold or italic effects are.
| thecosmicfrog wrote:
| Reminds me of searching for the terms "do a barrel roll",
| "recursion" or "askew" on Google. I'm sure there's plenty of
| others.
| ryukoposting wrote:
| And somehow, the "external link" icon is outside the scope of
| Unicode.
| joelbondurant4 wrote:
| lucideer wrote:
| Everyone here is asking if this is an "intentional easter-egg" or
| an "accidental bug"
|
| But what about accidentally working-as-intended?
|
| Sure it's a little trickier to read, but it's certainly not a
| "bug" that will cause any damage / danger / instability / etc.
| gambler wrote:
| Problem is, this behavior is so outside of the range of common
| expectations, it's really hard to say if it's harmless or not
| and what are the worst cases for (ab)using it.
| thrdbndndn wrote:
| I don't get your take.
|
| Even the most strict definition of bug doesn't imply it has to
| "cause any damage / danger / instability / etc." to be one.
|
| And I won't call it "work as intended" when the purpose of this
| feature is to provide an answer for human to read, and it
| failed on that.
| evolve2k wrote:
| I'd warmly beg to differ, I personally think it's
| illustrating how it is supposed to work, most elloquently.
___________________________________________________________________
(page generated 2022-02-15 23:00 UTC)