[HN Gopher] ROT8000
       ___________________________________________________________________
        
       ROT8000
        
       Author : edent
       Score  : 172 points
       Date   : 2021-09-22 13:07 UTC (9 hours ago)
        
 (HTM) web link (rot8000.com)
 (TXT) w3m dump (rot8000.com)
        
       | rsj_hn wrote:
       | This is cool, but I wish more people would use a more
       | aesthetically pleasing cipher, like morse code:
       | 
       | https://onlineasciitools.com/convert-ascii-to-morse
        
       | dash2 wrote:
       | Who are these nutcases? https://www.master-list2000.com/ and what
       | is pl41nt3xt?
        
         | kevinmgranger wrote:
         | Where did you find this / how is it relevant? Was this in the
         | OP but then removed?
        
         | chris_st wrote:
         | > _what is pl41nt3xt?_
         | 
         | Leet-speak for "plaintext".
        
           | genewitch wrote:
           | 1337 4 "plaintext" nub
        
         | ditherstudies wrote:
         | Hi, I created rot8000 for The Wrong, an online biennial of
         | digital art -- specifically for the pl41nt3xt pavillion, which
         | included text-only works. The pavillion was taken down when the
         | biennial ended, and looks like that link is no longer valid
        
       | jstanley wrote:
       | Interesting, I have made a similar project, except instead of
       | rotN, it encodes the input as UTF-8, and then shifts up the
       | codepoints to display each byte as a different character to what
       | it would normally be. The invariant is that `byte & 0xff` is the
       | real byte value.
       | 
       | I call it "Mojibake Steganography":
       | https://incoherency.co.uk/mojibake/
       | 
       | I think in principle (judging by the description of rot8000), my
       | tool should be able to decode rot8000 messages natively, but it
       | doesn't seem to work on the example given here. From looking
       | directly at the codepoints given, I think the example is wrong.
       | It starts:
       | 
       | u+7c5d u+7c71 u+7c6e - which works out to "]qn" instead of "The",
       | unless I am misunderstanding something. And in fact that looks
       | definitely wrong if we're expecting ASCII output because they're
       | all more than 127 away from 0x8000, no matter how it works.
       | 
       | The rot8000 page says:
       | 
       | > It also bypasses 32 control characters, technically making it
       | rotFFE0, sometimes with an additional offset.
       | 
       | I definitely don't understand how this is meant to work. Why does
       | skipping 32 control characters turn it from rot8000 into rotFFE0?
       | Should that say 7FE0? I still don't see how ASCII is coming out
       | as 7Cxx.
       | 
       | Taking `char - 0x7c09` gets the expected ASCII output.
        
         | NelsonMinar wrote:
         | I like your system!
         | 
         | One nice property of rot13 is it reverses itself;
         | rot13(rot13(X)) = X. At least, for basic ASCII alphabet. Your
         | UTF-8 encoding step makes that impossible. I wonder if there's
         | a sensible Unicode-friendly algorithm that has that rot13
         | property.
        
           | tiziano88 wrote:
           | Like the one in the original post?
        
         | Tepix wrote:
         | Cool! Works even with Emoji.
         | 
         | However the feature of rot13 (and rot8000) that you can use the
         | same operation to "decrypt" it again is unfortunately missing
         | in your variant.
        
         | robinhouston wrote:
         | If you look at the code [1], it skips:                 *
         | control characters       * whitespace       * surrogate code
         | units (U+D800 - U+DFFF)
         | 
         | 1.
         | https://github.com/rottytooth/rot8000/blob/main/Rottytooth.R...
        
       | sandwell wrote:
       | I wonder if it is possible to generate a ROT8000 quine, that is a
       | phrase like "hello world" which yields a semantically matching
       | phrase in some other language?
        
         | kderbyma wrote:
         | the BabbelROT
        
       | stavros wrote:
       | 'ro-,tat is pronounced ROH-tat, by the way.
        
       | azhenley wrote:
       | Is there any benefit to make it so that the function must be
       | applied X times to restore the original text? E.g., ROT2000.
        
         | Tepix wrote:
         | Yes you can use the same operation to encrypt and decrypt it if
         | X == 2.
        
       | jhvkjhk wrote:
       | By extending rot13 to Unicode characters, it supports encrypting
       | emoji message automatically!
        
         | Uberphallus wrote:
         | No. Emojis go from U+10000 to U+1FFFF, and this rotates chars
         | U+0 through U+FFFF (hence the U+8000 middle point).
        
           | aidenn0 wrote:
           | Right you'd need rot 0x88000 to cover all 17 planes. Downside
           | would be that the space is not fully packed so you'd get a
           | lot of invalid characters.
        
           | tragomaskhalos wrote:
           | Pedantic point: some (early) emoji have codes below this
        
       | cjfd wrote:
       | I am just getting boxes with hex codes in them if I type ascii
       | letters so that is not so very nice. Even if you have all of the
       | required fonts I am not sure it is that great to get characters
       | from a completely foreign language. Also, I suppose, one could
       | end up with surrogate code points which do not have a character
       | representation. To summarize: I think this sounded like more fun
       | in theory than it turns out to be in practice.
        
       | BoppreH wrote:
       | Nice idea. I often use base64 for this, since it's somewhat
       | recognizable and there are tons of decoding tools available.
       | 
       | Base64 does lengthen the text by a third, which may or may not be
       | a problem. On the other hand, it doesn't need special handling of
       | control characters, and manages to hide word lengths well.
        
         | arethuza wrote:
         | Many years ago I was involved in finding and fixing a messaging
         | bug that only appeared when the base64 encoded payload had a
         | length that was a multiple of 87 bytes (it might have been some
         | other value - it was 15+ years ago).
         | 
         | Bug was in a C++ base64 encoder component.
        
       | GekkePrutser wrote:
       | I wonder how this will play with Unicode's highly complex
       | combination rules.. (e.g. frowning face + brown texture = brown
       | frowning face).
       | 
       | I bet using ROT on this will lead to unintended consequences
       | because the original characters won't combine but the replaced
       | ones will.
       | 
       | But anyway ROT is a dumb thing to do anyway so it doesn't have
       | any real-world use.
        
         | kapp_in_life wrote:
         | I don't think it would matter right? The output might have less
         | characters but inverting it would still show the original text.
         | Like hypothetically
         | 
         | ab => frowning face + brown texture = brown frowning face => ab
        
           | contravariant wrote:
           | If this always happens one way sure, but what if you also
           | happened to include the symbol that gets translated to "brown
           | frowning face"?
        
           | GekkePrutser wrote:
           | True, but some apps don't have the ability to show all these
           | variations and may leave them out (simplify to just a face
           | icon) when copying/pasting. Unicode interpretation is a
           | really complex bundle of quirks these days so I'm pretty sure
           | things will start going wrong.
        
         | kasitmp wrote:
         | Real world use case: geocaching.com uses it to hide hints, so
         | you don't read and spoil yourself by accident. It's pretty much
         | accepted and adopted by the users. I also would ban words like
         | "dumb" or, for another example "easy" in IT and CS contexts.
        
         | kyle-rb wrote:
         | In this case, it would be very unlikely to actually happen, for
         | a few reasons.
         | 
         | Almost all combining rules (including skin tone modifiers)
         | require a zero-width joiner character between the person emoji
         | and the modifier emoji. So really it's frowning face + ZWJ +
         | brown texture = brown frowning face. (Although technically I
         | don't think frowning face can be modified.) Also, there are
         | relatively few ZWJ combinations.
         | 
         | Technically, there are some older combination emojis that
         | predate ZWJ, mainly the flags, which are composed of two
         | single-letter emojis, e.g. regional-indicator-U + regional-
         | indicator-S = United States flag. So I guess it might be
         | possible to get a couple of those.
         | 
         | And in any case, I think this page assumes that you're staying
         | within the bounds of the basic multilingual plane (it mentions
         | a self-inverting transform would be ROT32768), which doesn't
         | include emojis or skin tone modifiers.
         | 
         | [1] https://emojipedia.org/emoji-zwj-sequence/
        
           | jfk13 wrote:
           | > Almost all combining rules (including skin tone modifiers)
           | require a zero-width joiner character
           | 
           | No, the skin tone modifiers apply directly to eligible person
           | emojis; no ZWJ is involved. (Unless _other_ modifiers that
           | require ZWJ are also present, such as the gender signs.)
           | 
           | https://unicode.org/emoji/charts/full-emoji-modifiers.html
        
       | maerF0x0 wrote:
       | This makes me wonder how many of the craigslist (or other
       | channel) posts all in Asian language characters are actually
       | secret messages?
        
         | genewitch wrote:
         | There's a couple of browser plugins that do this, with a
         | password, so long as someone else knows the password it will
         | decode. I'm not near my machine that has it, but I know it does
         | Korean, japanese, and Chinese characters - you choose which set
         | you want. And it doesn't back-translate to anything useful,
         | it's just encoding.
        
       | ajanuary wrote:
       | Bar bs gur avpr dhnyvgvrf bs ebg13 vf gung vf fgvyy cerfreirf
       | fbzr fgehpgher. Nf jryy nf n pregnva nrfgurgvp nccrny, pbzzba
       | jbeqf va pbzzhavgvrf gung hfr vg urnivyl orpbzr erpbtavfnoyr.
       | Juvyr gung qbrf fyvtugyl qvzvavfu vg'f hfr nf n fcbvyre-grkg
       | zrpunavfz, vg qbrf nqq gb gur phygher.
       | 
       | Lei Shen Zi Lai Dang Dang Dang Du  Shen Zhe  Zi Zhuo Luo  Shen Zi
       | Zhuo Luo Lei  Zhuo Duan Zhe Si Du  Nu Lei Luo Xian Luo Lei Cun
       | Luo Xian  Fan Luo Xian Xian  Xian Zi Lei Ni Li Zi Ni Lei Luo
       | Hata Mi Ni Xian Zi  Xian Nu Duan Li Luo Xian Pai Yan  Ying Zhuo
       | Xu Xian  Shen Duan Di Luo Xian  Xu Zi  Cun Xu Xian Ni Duan Fan
       | Fan Kume  Shen Shen Lei Luo  Si Luo Zhe Xian Luo Du  Duan Zhe Si
       | Fan Luo Xian Xian  Zi Duan Zhe Zi Duan Fan Xu Xian Xu Zhe Yue
       | Duan Xian  Duan  Xian Nu Shen Xu Fan Luo Lei Lu Zi Luo Qian Zi
       | Shen Luo Li Zhuo Duan Zhe Xu Xian Shen Yan  Mi Ni Zi  [?] Shen
       | Lei Di Xu Zhe Yue  [?] Xu Zi Zhuo  Zhe Shen Zhe Lu Fan Duan Zi Xu
       | Zhe  Xian Li Lei Xu Nu Zi Xian  Lian Xu Xian Lian  Duan Nu Nu Luo
       | Duan Fan Xu Zhe Yue Yan  Zhou  Zi Zhuo Xu Zhe Di  Zi Zhuo Luo Lei
       | Luo  Xu Xian  Duan  Ni Xian Luo  Li Duan Xian Luo  Ying Shen Lei
       | Shen Zhe Fan Kume  Lei Shen Zi Duan Zi Xu Zhe Yue  Li Zhuo Duan
       | Lei Duan Li Zi Luo Lei Xian  Xu Zhe  Zi Zhuo Luo  Fan Luo Zi Zi
       | Luo Lei  Li Duan Zi Luo Yue Shen Lei Kume Yan
        
       | amptorn wrote:
       | This is a very bad idea because it's going to rotate ordinary
       | characters to code points where Unicode normalization has an
       | effect, including combining characters, whitespace, control
       | characters... After normalization, rotating back will produce
       | garbage.
        
         | Spooky23 wrote:
         | Depends on what you're trying to do. Might be a viable strategy
         | for avoiding filters that are aware of thing like base64
        
         | GekkePrutser wrote:
         | Oops I was just writing the same, I didn't realise someone had
         | already mentioned this.
         | 
         | But anyway ROT in itself is a pretty stupid idea anyway,
         | usually just done for show.
        
           | wutbrodo wrote:
           | > But anyway ROT in itself is a pretty stupid idea anyway,
           | usually just done for show.
           | 
           | How do you figure? It feels like the simplest way to handle
           | eg spoilers in a universally portable and widely-recognizable
           | way.
        
           | woodruffw wrote:
           | The website explains the primary actual use case for ROT-
           | style transforms:
           | 
           | > It is used to enclose the text in a sealed wrapper that the
           | reader must choose to open - e.g. for posting things that
           | might offend some readers, or spoilers.
           | 
           | AFAIK, this has been a common use of ROT13 since the 1980s.
           | It also preserves substring search and message length (unlike
           | BaseN encodings), which are occasionally useful properties.
        
             | hermitdev wrote:
             | Yeah, in other words: it's not intended to hold up to
             | scrutiny, just hold up to a glance.
        
               | Phrodo_00 wrote:
               | I don't get this line of thinking. Nowhere does it says
               | it's supposed to have any security uses.
        
         | shakna wrote:
         | > including combining characters, whitespace, control
         | characters...
         | 
         | It actually skips whitespace, control characters and surrogate
         | pairs [0].
         | 
         | [0]
         | https://github.com/rottytooth/rot8000/blob/main/Rottytooth.R...
        
       | SommaRaikkonen wrote:
       | For those who want to test it out: http://rot8000.com/Index
        
       | Gormisdomai wrote:
       | I'm curious why it successfully rotates some emoji but not
       | others.
       | 
       | E.g. stars and hearts get rotated but sunglasses do not
       | 
       | (EDIT: rewrote my example to use words because HN doesn't render
       | emoji, duh)
        
         | OskarS wrote:
         | From the article:
         | 
         | > While rot13 is the self-inverse for a 26-character system,
         | and rot47 for ANSI, the Basic Multilingual Plane of Unicode
         | requires rot32768 (or 8000 in hex) for a reciprical cypher
         | 
         | Not all emoji is in the BMP, at least some are in the
         | Supplementary Multilingual Plane.
         | 
         | It's weird to me that if you're gonna do this dumb "rot13 but
         | for Unicode", you'd only do it for the BMP, and not ALL of
         | Unicode.
        
         | jsjohnst wrote:
         | Technical answer:
         | 
         | Star = U+2B50 which is less than U+FFFF
         | 
         | Sunglasses = U+1F576 which is greater than U+FFFF
         | 
         | The details you might be missing is that some emoji existed in
         | Unicode before color graphic "emoji" was actually a thing. The
         | stars (and hearts) are examples of ones which used to be just a
         | basic shape in the font but now are commonly full color
         | graphical "images".
        
       | roamerz wrote:
       | I saw this and thought yay a new version of Rise of the Triad!
        
       ___________________________________________________________________
       (page generated 2021-09-22 23:01 UTC)