hngopher.com

       [HN Gopher] Whence '\n'?
       ___________________________________________________________________
        
       Whence '\n'?
        
       Author : lukastyrychtr
       Score  : 189 points
       Date   : 2024-10-05 09:23 UTC (1 days ago)
        
 (HTM) web link (rodarmor.com)
 (TXT) w3m dump (rodarmor.com)
        
       | cpach wrote:
       | Previous discussion:
       | https://news.ycombinator.com/item?id=41564527
        
       | nasso_dev wrote:
       | > This post was inspired by another post about exactly the same
       | thing. I couldn't find it when I looked for it, so I wrote this.
       | All credit to the original author for noticing how interesting
       | this rabbit hole is.
       | 
       | I think the author may be thinking of Ken Thompson's Turing Award
       | lecture "Reflections on Trusting Trust".
        
         | Karellen wrote:
         | Although that presentation does point out that the technique is
         | more generally used in quines. Given that there is a fair
         | amount of research, papers and commentary on quines, it's
         | possible that the author may have read something along those
         | lines.
         | 
         | https://en.wikipedia.org/wiki/Quine_(computing)
        
         | ktm5j wrote:
         | I totally missed that bit when the OP, but it definitely made
         | me think of that paper so maybe.
        
         | yen223 wrote:
         | I don't think so. I too recall seeing a post about this exact
         | piece of trivia ('\n' in rust) years ago, but I couldn't find
         | the source anymore.
        
           | tylerhou wrote:
           | It might have been https://research.swtch.com/nih ?
        
             | yen223 wrote:
             | There's nothing in that article about Rust?
        
         | yuchi wrote:
         | Also have a read of this fabulous short web from 2009:
         | https://www.teamten.com/lawrence/writings/coding-machines/
        
       | kijin wrote:
       | The incorrect capitalization made me think that, perhaps, there's
       | a scarcely known escape sequence \N that is different from \n.
       | Maybe it matches any character that isn't a newline? Nope, just
       | small caps in the original article.
        
         | paulddraper wrote:
         | There is actually.
         | 
         | Many systems use \N in CSVs or similar as NULL, to distinguish
         | from an empty string.
         | 
         | I figured this is what the article was about?
        
         | cpach wrote:
         | If you do view source, it's actually \n, but it's not displayed
         | as such because of this CSS rule:                 .title {
         | font-variant: small-caps;       }
        
           | sedatk wrote:
           | So, the HN title is wrong.
        
             | isatty wrote:
             | The original title is.
        
               | niederman wrote:
               | No, the original title is correct, small caps are just an
               | alternate way of setting lowercase letters.
        
               | neuroelectron wrote:
               | When have you ever seen small caps in use on this
               | website?
        
               | deathanatos wrote:
               | In addition to what others have said about smallcaps
               | being a stylistic rendering, if you copy & paste the
               | original title, you'll get                 Whence '\n'?
        
         | deathanatos wrote:
         | Python has a \N escape sequence. It inserts a Unicode character
         | by name. For example,                 '\N{PILE OF POO}'
         | 
         | is the Unicode string containing a single USV, the pile of poop
         | emoji.
         | 
         | Much more self-documenting than doing it with a hex sequence
         | with \u or \U.
        
         | binary132 wrote:
         | That is in fact why I clicked this article. Oh well. Still a
         | fun read. :)
        
       | archmaster wrote:
       | if only this went into where the ocaml escape came from :)
        
         | diath wrote:
         | It does, it links to this:
         | https://github.com/ocaml/ocaml/blob/4d6ecfb5cf4a5da814784dee...
        
           | fiddlerwoaroof wrote:
           | But this doesn't really explain anything: '\010' isn't really
           | any more primitive than '\x0a': they're just different
           | representations of the same bit sequence
        
             | fluoridation wrote:
             | But it is more primitive than '\n', and can be rendered
             | into binary without any further arbitrary conversion steps
             | (arbitrary in that there's nothing in '\n' that says it
             | should mean 10). It's just "transform the number after the
             | backslash into the byte with that value".
        
       | dist-epoch wrote:
       | I remember a similar article for some C compiler, and it turned
       | out the only place the value 0x10 appeared was in the compiler
       | binary, because in the source code it had something like "\\\n"
       | -> "\n"
        
       | atoav wrote:
       | One rule of programming I figured out pretty quick is: if there
       | are two ways of doing it and there is a 50/50 chance of one being
       | correct and the other one isn't, chances are you will get it
       | wrong the first time.
        
         | chgs wrote:
         | The USB rule.
         | 
         | First time is the wrong way up
         | 
         | Second time is also the wrong way up
         | 
         | Third time works
        
           | fader wrote:
           | It's because of the quantum properties of USB connectors.
           | They have spin 1/2.
        
             | SAI_Peregrinus wrote:
             | I thought it was because USB connectors occupy 4 spatial
             | dimensions.
        
               | PaulDavisThe1st wrote:
               | That's good, because otherwise we'd never be able to find
               | them _when_ we need them.
        
               | inopinatus wrote:
               | Instead we always find a USB type mini B when needing a
               | micro B, a micro B when needing a type C, and a type C
               | when needing an extended micro B. If you reveal a spare
               | extended micro B whilst rummaging around then it will in
               | additional transpire that the next cable needed will be a
               | mini B, irrespective of any prior expectation you may
               | have held about the device in question.
               | 
               | A randomly occurring old-school full-size type B may be
               | encountered during any cable search, approximately 1% of
               | the time, usually at the same moment your printer jams.
               | 
               | What I really don't understand, however, is why I keep
               | finding DB13W3s in my closet
        
               | kstrauser wrote:
               | Just 3, plus 1 imaginary.
        
           | jancsika wrote:
           | It's like the Two General's Problem embedded in a single
           | connector.
           | 
           | You never _really_ know it 's right until you take it out and
           | test the friction against the other orientation.
        
           | dtgriscom wrote:
           | I boosted my USB plugged-in-successfuly-on-first-try rate
           | when I imagined the offset block in the cable male USB
           | connector as being heavy, so it should be below the
           | centerline when plugged into a laptop's female USB connector.
           | (Only works when the connector is horizontal, but better than
           | nothing.)
        
           | dailykoder wrote:
           | It's actually super easy and, atleast for me, was always
           | intuitive. Most USB cables have their logo or something else
           | engraved on the "top" with the air gap. And since the ports
           | are mostly arranged the same way, there is rarely any
           | problem. Maybe I am just too dumb to understand jokes, but it
           | always confused me :(
        
             | switch007 wrote:
             | People don't always have perfect sight, lighting etc to see
             | it. Or know about that tip. Or remember what it signifies.
             | Often you're fumbling, doing 2 things at once.
        
             | gweinberg wrote:
             | It's really only the sideways ones which give people
             | trouble. Especially if it's sideways on the back of a
             | computer (or tv) so you can't really see what you're
             | doing).
        
               | crote wrote:
               | Desktop computers are fairly easy too. The vast majority
               | of towers have the motherboard on the right-hand side, so
               | that can be treated as the "down" direction USB-wise.
        
             | dfc wrote:
             | I can't find a reference now. But from what I remember the
             | logo is supposed to be on top facing the user when plugging
             | a device in. This was part of the standard that defined the
             | size/shape/etc of what USB is.
        
             | chupasaurus wrote:
             | Intel added the satiric text about the rule with double-
             | tongue depiction in one of their whitepapers around USB3
             | publication for a reason. Sadly couldn't find it.
        
           | crote wrote:
           | USB-C changed that to "It'll physically fit the first time,
           | but good luck figuring out if it's going to work!"
        
       | ncruces wrote:
       | I'm guessing the "other post" that inspired this might be:
       | https://research.swtch.com/nih
        
         | dang wrote:
         | Discussed here:
         | 
         |  _Running the "Reflections on Trusting Trust" Compiler_ -
         | https://news.ycombinator.com/item?id=38020792 - Oct 2023 (67
         | comments)
        
       | tzot wrote:
       | I always thought, maybe because of C, that \0??? is an octal
       | escape; so in my mind \012 is \x0a or 0x0a, and \010 is 0x08.
       | 
       | So I find this quite confusing; maybe OCaml does not have octal
       | escapes but decimal ones, and \09 is the Tab character. I haven't
       | checked.
        
         | dpassens wrote:
         | It is indeed a decimal escape:
         | https://ocaml.org/manual/5.2/lex.html#char-literal
        
         | fanf2 wrote:
         | Yeah backslash-decimal character escapes are really rare, the
         | only string syntaxes I know of that have them are in O'Caml,
         | Lua, and DNS
        
           | binary132 wrote:
           | Is O'Caml an Irish fork of OCaml? :)
        
         | syncsynchalt wrote:
         | There's some truth in that direction, but it's not related to
         | backslash escapes (which are symbolic/mnemonic, \n is
         | "[Ne]wline", \r is "carriage [R]eturn", \t is "[T]ab", and so
         | on).
         | 
         | Instead, consider the convention of control characters, such as
         | ^C (interrupt), ^G (bell), or ^M (carriage return). Those are
         | characters in the C0 control set, where ^C is \0x3, ^G is \0x7,
         | or ^M is \0xD. You're seeing a bit of cleverness that goes back
         | to pre-Unix days: to represent the invisible C0 characters in
         | ASCII, a terminal prepends the "^" character and prints the
         | character AND-0x40, shifting it into a visible range.
         | 
         | You may want to pull up an ASCII table such as
         | https://www.asciitable.com to follow along. Each control
         | character (first column) is mapped to the ^character two
         | columns over, on that table.
         | 
         | That's why \0 is represented with the odd choice of ^@, the
         | escape key becomes ^[, and other hard-to-remember equivalents.
         | These weren't choices made by Unix authors, they're artifacts
         | of ASCII numbering.
        
       | gjvc wrote:
       | this is a nothingburger of an article
        
       | coolio1232 wrote:
       | I thought this was going to be about '\N' but there's only '\n'
       | here.
        
         | dang wrote:
         | It's in the html doc title but the article doesn't deliver.
        
       | ynfnehf wrote:
       | First place I read about this idea (specifically newlines, not in
       | general trusting trust) was day 42 in
       | https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...
       | 
       | "For example, my compiler interprets "\n" (a sequence of
       | backslash and character "n") in a string literal as "\n" (a
       | newline character in this case). If you think about this, you
       | would find this a little bit weird, because it does not have
       | information as to the actual ASCII character code for "\n". The
       | information about the character code is not present in the source
       | code but passed on from a compiler compiling the compiler.
       | Newline characters of my compiler can be traced back to GCC which
       | compiled mine."
        
       | happytoexplain wrote:
       | This is over my head. Why did we need to take a trip to discover
       | why \n is encoded as a byte with the value 10? Isn't that
       | expected? The author and HN comments don't say, so I feel stupid.
        
         | kibwen wrote:
         | The point is to ask "who" encoded that byte as the value of 10.
         | If you're writing a parser and you parse a newline as the
         | escape sequence `\n`, then where did the value 10 come from? If
         | you instead parse a newline as the integer literal `10`, then
         | where does the actual binary value 1010 come from?
         | 
         | The ultimate point of this exercise is to alter your perception
         | of what a compiler is (in the same way as the famous
         | Reflections On Trusting Trust presentation).
         | 
         | Which is to say: your compiler is not something that _outputs_
         | your program; your compiler is also _input_ to your program.
         | And as a program itself, your compiler 's compiler was an input
         | to your compiler, which makes it transitively an input to your
         | program, and the same is true of your compiler's compiler's
         | compiler, and your compiler's compiler's compiler's compiler,
         | and your compiler's compiler's compiler's compiler's compiler,
         | and...
        
         | mikl wrote:
         | The interesting point is how the value of 10 is not defined in
         | Rust's source code, but passed down as "word of mouth" from
         | compiler to compiler.
        
         | yen223 wrote:
         | If you had to rebuild the rust compiler from scratch, and all
         | you had was rustc's source code, there's nothing in the source
         | code to tell you what '\n' actually maps to.
         | 
         | It's an interesting real-world example of the Ken Thompson
         | hack.
        
         | crote wrote:
         | The thing is, why 10? Why not 9 or 11? The code says "if you
         | see 'string of newline character', output 'newline character'".
         | How does the compiler know what a newline character is? Its
         | code in turn just says "if you see 'string of newline
         | character', treat it as 'newline character'"...
         | 
         | As a human I can just Google "C string escape codes", but that
         | table is nowhere to be found inside the compiler. If C 2025 is
         | going to define Start of Heading as \h, is `'h' =>
         | cooked.push('\h')` going to magically start working? How could
         | it possibly know?
         | 
         | Clearly at some point someone must've manually programmed a
         | `'n' => 10` mapping, but _where is it_!?
        
       | amelius wrote:
       | Why backslash?
        
         | o11c wrote:
         | Because backslash is a modern invention with no prior meaning
         | in text. It was invented to allow writing the mathematical
         | "and" and "or" symbols as /\ and \/.
        
           | dTal wrote:
           | Hm. According to Wiki, "As of November 2022, efforts to
           | identify either the origin of this character or its purpose
           | before the 1960s have not been successful."
           | 
           | While your rationale _was_ used to argue for its inclusion in
           | ASCII, as an origin story however it is very unlikely, as
           | (according to wiki again):  "The earliest known reference
           | found to date is a 1937 maintenance manual from the Teletype
           | Corporation with a photograph showing the keyboard of its
           | Kleinschmidt keyboard perforator WPE-3 using the Wheatstone
           | system."
           | 
           | The Kleinschmidt keyboard perforator was used for sending
           | telegraphs, and is not well equipped with mathematical
           | symbols, or indeed any symbols at all besides forward slash,
           | backslash, question mark, and equals sign. Not even period!
        
       | amelius wrote:
       | A more interesting question: what would our code look like if
       | ASCII (or strings in general) didn't have escape codes?
        
         | wongarsu wrote:
         | In PHP you see a lot of print('Hello World' . PHP_EOL); where
         | the dot is string concatenation (underrated choice imho) and
         | PHP_EOL is a predefined constant that maps to \n or \r\n
         | depending on platform. You could easily extend that to have
         | global constants for all non-printable ascii characters.
        
         | jiehong wrote:
         | The font you use could choose to display something for control
         | characters, so they would have a visible shape on top of having
         | a meaning.
         | 
         | Perhaps like [0] (Unicode notation).
         | 
         | [0]: https://rootr.net/im/ASCII/ASCII.gif
        
         | zokier wrote:
         | It would depend more on what we are intending to do, are we
         | controlling a terminal, or are we writing to a file (with
         | specific format).
         | 
         | Terminal control is fairly easy answer, there would be some
         | other API to control cursor position, so the code would need to
         | call some function to move the cursor to next line.
         | 
         | For files, it would depend on what the format is. So we might
         | be writing just `<p>hello world</p>` instead of `hello
         | world\n`. In fact I find it bit weird that we are using
         | teletype (and telegraph etc) control protocol (what ASCII
         | mostly is) as our "general purpose text" format; it doesn't
         | make much sense to me.
        
         | ivanjermakov wrote:
         | This is actually a valid problem when writing quines[1]. You
         | need to escape string delimiters in a way without using its
         | literal.
         | 
         | This is what `chr(39)` is for in the following Python quine:
         | a = 'a = {}{}{}; print(a.format(chr(39), a, chr(39)))';
         | print(a.format(chr(39), a, chr(39)))
         | 
         | [1]: https://en.wikipedia.org/wiki/Quine_(computing)
        
       | binary132 wrote:
       | Cool, but actually it was just 0x0A all along! The symbolic
       | representation was always just an alias. It didn't actually go
       | through 81 generations of rustc to get back to "where it really
       | came from", as you'd be able to see if you could see the real
       | binary behind the symbols. Yes I am just being a bit silly for
       | fun, but at the same time, I know I personally often commit the
       | error of imagining that my representations and abstractions are
       | the real thing, and forgetting that they're really just
       | headstuff.
        
       | i4k wrote:
       | This is fascinating and terrifying.
        
       | phibz wrote:
       | Backslash escape codes are a convention. They're so pervasive
       | that we sometimes forget this. It could just as easily be some
       | other character that is special and treated as an escape token.
        
       | gnulinux wrote:
       | This is a fascinating post. It reads to me like some kind of
       | cross between literate-programming and poetry. It's really trying
       | to explain the idea that when you run `just foo` the very 0x0A
       | byte comes from possibly hundreds of cycles of code generation.
       | Back in the day, someone encoded this information into OCaml
       | compiler -- _somehow_ -- and years later here in my computer 0x0A
       | information is stored due to this history.
       | 
       | But the way in which this phenomena is explained is via actual
       | code. The code itself is besides the point of course, it's not
       | like anyone will ever run or compile this specific code, but it's
       | put there for humans to follow the discussion.
        
       ___________________________________________________________________
       (page generated 2024-10-06 23:00 UTC)