[HN Gopher] Whence '\n'?
___________________________________________________________________
Whence '\n'?
Author : lukastyrychtr
Score : 189 points
Date : 2024-10-05 09:23 UTC (1 days ago)
(HTM) web link (rodarmor.com)
(TXT) w3m dump (rodarmor.com)
| cpach wrote:
| Previous discussion:
| https://news.ycombinator.com/item?id=41564527
| nasso_dev wrote:
| > This post was inspired by another post about exactly the same
| thing. I couldn't find it when I looked for it, so I wrote this.
| All credit to the original author for noticing how interesting
| this rabbit hole is.
|
| I think the author may be thinking of Ken Thompson's Turing Award
| lecture "Reflections on Trusting Trust".
| Karellen wrote:
| Although that presentation does point out that the technique is
| more generally used in quines. Given that there is a fair
| amount of research, papers and commentary on quines, it's
| possible that the author may have read something along those
| lines.
|
| https://en.wikipedia.org/wiki/Quine_(computing)
| ktm5j wrote:
| I totally missed that bit when the OP, but it definitely made
| me think of that paper so maybe.
| yen223 wrote:
| I don't think so. I too recall seeing a post about this exact
| piece of trivia ('\n' in rust) years ago, but I couldn't find
| the source anymore.
| tylerhou wrote:
| It might have been https://research.swtch.com/nih ?
| yen223 wrote:
| There's nothing in that article about Rust?
| yuchi wrote:
| Also have a read of this fabulous short web from 2009:
| https://www.teamten.com/lawrence/writings/coding-machines/
| kijin wrote:
| The incorrect capitalization made me think that, perhaps, there's
| a scarcely known escape sequence \N that is different from \n.
| Maybe it matches any character that isn't a newline? Nope, just
| small caps in the original article.
| paulddraper wrote:
| There is actually.
|
| Many systems use \N in CSVs or similar as NULL, to distinguish
| from an empty string.
|
| I figured this is what the article was about?
| cpach wrote:
| If you do view source, it's actually \n, but it's not displayed
| as such because of this CSS rule: .title {
| font-variant: small-caps; }
| sedatk wrote:
| So, the HN title is wrong.
| isatty wrote:
| The original title is.
| niederman wrote:
| No, the original title is correct, small caps are just an
| alternate way of setting lowercase letters.
| neuroelectron wrote:
| When have you ever seen small caps in use on this
| website?
| deathanatos wrote:
| In addition to what others have said about smallcaps
| being a stylistic rendering, if you copy & paste the
| original title, you'll get Whence '\n'?
| deathanatos wrote:
| Python has a \N escape sequence. It inserts a Unicode character
| by name. For example, '\N{PILE OF POO}'
|
| is the Unicode string containing a single USV, the pile of poop
| emoji.
|
| Much more self-documenting than doing it with a hex sequence
| with \u or \U.
| binary132 wrote:
| That is in fact why I clicked this article. Oh well. Still a
| fun read. :)
| archmaster wrote:
| if only this went into where the ocaml escape came from :)
| diath wrote:
| It does, it links to this:
| https://github.com/ocaml/ocaml/blob/4d6ecfb5cf4a5da814784dee...
| fiddlerwoaroof wrote:
| But this doesn't really explain anything: '\010' isn't really
| any more primitive than '\x0a': they're just different
| representations of the same bit sequence
| fluoridation wrote:
| But it is more primitive than '\n', and can be rendered
| into binary without any further arbitrary conversion steps
| (arbitrary in that there's nothing in '\n' that says it
| should mean 10). It's just "transform the number after the
| backslash into the byte with that value".
| dist-epoch wrote:
| I remember a similar article for some C compiler, and it turned
| out the only place the value 0x10 appeared was in the compiler
| binary, because in the source code it had something like "\\\n"
| -> "\n"
| atoav wrote:
| One rule of programming I figured out pretty quick is: if there
| are two ways of doing it and there is a 50/50 chance of one being
| correct and the other one isn't, chances are you will get it
| wrong the first time.
| chgs wrote:
| The USB rule.
|
| First time is the wrong way up
|
| Second time is also the wrong way up
|
| Third time works
| fader wrote:
| It's because of the quantum properties of USB connectors.
| They have spin 1/2.
| SAI_Peregrinus wrote:
| I thought it was because USB connectors occupy 4 spatial
| dimensions.
| PaulDavisThe1st wrote:
| That's good, because otherwise we'd never be able to find
| them _when_ we need them.
| inopinatus wrote:
| Instead we always find a USB type mini B when needing a
| micro B, a micro B when needing a type C, and a type C
| when needing an extended micro B. If you reveal a spare
| extended micro B whilst rummaging around then it will in
| additional transpire that the next cable needed will be a
| mini B, irrespective of any prior expectation you may
| have held about the device in question.
|
| A randomly occurring old-school full-size type B may be
| encountered during any cable search, approximately 1% of
| the time, usually at the same moment your printer jams.
|
| What I really don't understand, however, is why I keep
| finding DB13W3s in my closet
| kstrauser wrote:
| Just 3, plus 1 imaginary.
| jancsika wrote:
| It's like the Two General's Problem embedded in a single
| connector.
|
| You never _really_ know it 's right until you take it out and
| test the friction against the other orientation.
| dtgriscom wrote:
| I boosted my USB plugged-in-successfuly-on-first-try rate
| when I imagined the offset block in the cable male USB
| connector as being heavy, so it should be below the
| centerline when plugged into a laptop's female USB connector.
| (Only works when the connector is horizontal, but better than
| nothing.)
| dailykoder wrote:
| It's actually super easy and, atleast for me, was always
| intuitive. Most USB cables have their logo or something else
| engraved on the "top" with the air gap. And since the ports
| are mostly arranged the same way, there is rarely any
| problem. Maybe I am just too dumb to understand jokes, but it
| always confused me :(
| switch007 wrote:
| People don't always have perfect sight, lighting etc to see
| it. Or know about that tip. Or remember what it signifies.
| Often you're fumbling, doing 2 things at once.
| gweinberg wrote:
| It's really only the sideways ones which give people
| trouble. Especially if it's sideways on the back of a
| computer (or tv) so you can't really see what you're
| doing).
| crote wrote:
| Desktop computers are fairly easy too. The vast majority
| of towers have the motherboard on the right-hand side, so
| that can be treated as the "down" direction USB-wise.
| dfc wrote:
| I can't find a reference now. But from what I remember the
| logo is supposed to be on top facing the user when plugging
| a device in. This was part of the standard that defined the
| size/shape/etc of what USB is.
| chupasaurus wrote:
| Intel added the satiric text about the rule with double-
| tongue depiction in one of their whitepapers around USB3
| publication for a reason. Sadly couldn't find it.
| crote wrote:
| USB-C changed that to "It'll physically fit the first time,
| but good luck figuring out if it's going to work!"
| ncruces wrote:
| I'm guessing the "other post" that inspired this might be:
| https://research.swtch.com/nih
| dang wrote:
| Discussed here:
|
| _Running the "Reflections on Trusting Trust" Compiler_ -
| https://news.ycombinator.com/item?id=38020792 - Oct 2023 (67
| comments)
| tzot wrote:
| I always thought, maybe because of C, that \0??? is an octal
| escape; so in my mind \012 is \x0a or 0x0a, and \010 is 0x08.
|
| So I find this quite confusing; maybe OCaml does not have octal
| escapes but decimal ones, and \09 is the Tab character. I haven't
| checked.
| dpassens wrote:
| It is indeed a decimal escape:
| https://ocaml.org/manual/5.2/lex.html#char-literal
| fanf2 wrote:
| Yeah backslash-decimal character escapes are really rare, the
| only string syntaxes I know of that have them are in O'Caml,
| Lua, and DNS
| binary132 wrote:
| Is O'Caml an Irish fork of OCaml? :)
| syncsynchalt wrote:
| There's some truth in that direction, but it's not related to
| backslash escapes (which are symbolic/mnemonic, \n is
| "[Ne]wline", \r is "carriage [R]eturn", \t is "[T]ab", and so
| on).
|
| Instead, consider the convention of control characters, such as
| ^C (interrupt), ^G (bell), or ^M (carriage return). Those are
| characters in the C0 control set, where ^C is \0x3, ^G is \0x7,
| or ^M is \0xD. You're seeing a bit of cleverness that goes back
| to pre-Unix days: to represent the invisible C0 characters in
| ASCII, a terminal prepends the "^" character and prints the
| character AND-0x40, shifting it into a visible range.
|
| You may want to pull up an ASCII table such as
| https://www.asciitable.com to follow along. Each control
| character (first column) is mapped to the ^character two
| columns over, on that table.
|
| That's why \0 is represented with the odd choice of ^@, the
| escape key becomes ^[, and other hard-to-remember equivalents.
| These weren't choices made by Unix authors, they're artifacts
| of ASCII numbering.
| gjvc wrote:
| this is a nothingburger of an article
| coolio1232 wrote:
| I thought this was going to be about '\N' but there's only '\n'
| here.
| dang wrote:
| It's in the html doc title but the article doesn't deliver.
| ynfnehf wrote:
| First place I read about this idea (specifically newlines, not in
| general trusting trust) was day 42 in
| https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...
|
| "For example, my compiler interprets "\n" (a sequence of
| backslash and character "n") in a string literal as "\n" (a
| newline character in this case). If you think about this, you
| would find this a little bit weird, because it does not have
| information as to the actual ASCII character code for "\n". The
| information about the character code is not present in the source
| code but passed on from a compiler compiling the compiler.
| Newline characters of my compiler can be traced back to GCC which
| compiled mine."
| happytoexplain wrote:
| This is over my head. Why did we need to take a trip to discover
| why \n is encoded as a byte with the value 10? Isn't that
| expected? The author and HN comments don't say, so I feel stupid.
| kibwen wrote:
| The point is to ask "who" encoded that byte as the value of 10.
| If you're writing a parser and you parse a newline as the
| escape sequence `\n`, then where did the value 10 come from? If
| you instead parse a newline as the integer literal `10`, then
| where does the actual binary value 1010 come from?
|
| The ultimate point of this exercise is to alter your perception
| of what a compiler is (in the same way as the famous
| Reflections On Trusting Trust presentation).
|
| Which is to say: your compiler is not something that _outputs_
| your program; your compiler is also _input_ to your program.
| And as a program itself, your compiler 's compiler was an input
| to your compiler, which makes it transitively an input to your
| program, and the same is true of your compiler's compiler's
| compiler, and your compiler's compiler's compiler's compiler,
| and your compiler's compiler's compiler's compiler's compiler,
| and...
| mikl wrote:
| The interesting point is how the value of 10 is not defined in
| Rust's source code, but passed down as "word of mouth" from
| compiler to compiler.
| yen223 wrote:
| If you had to rebuild the rust compiler from scratch, and all
| you had was rustc's source code, there's nothing in the source
| code to tell you what '\n' actually maps to.
|
| It's an interesting real-world example of the Ken Thompson
| hack.
| crote wrote:
| The thing is, why 10? Why not 9 or 11? The code says "if you
| see 'string of newline character', output 'newline character'".
| How does the compiler know what a newline character is? Its
| code in turn just says "if you see 'string of newline
| character', treat it as 'newline character'"...
|
| As a human I can just Google "C string escape codes", but that
| table is nowhere to be found inside the compiler. If C 2025 is
| going to define Start of Heading as \h, is `'h' =>
| cooked.push('\h')` going to magically start working? How could
| it possibly know?
|
| Clearly at some point someone must've manually programmed a
| `'n' => 10` mapping, but _where is it_!?
| amelius wrote:
| Why backslash?
| o11c wrote:
| Because backslash is a modern invention with no prior meaning
| in text. It was invented to allow writing the mathematical
| "and" and "or" symbols as /\ and \/.
| dTal wrote:
| Hm. According to Wiki, "As of November 2022, efforts to
| identify either the origin of this character or its purpose
| before the 1960s have not been successful."
|
| While your rationale _was_ used to argue for its inclusion in
| ASCII, as an origin story however it is very unlikely, as
| (according to wiki again): "The earliest known reference
| found to date is a 1937 maintenance manual from the Teletype
| Corporation with a photograph showing the keyboard of its
| Kleinschmidt keyboard perforator WPE-3 using the Wheatstone
| system."
|
| The Kleinschmidt keyboard perforator was used for sending
| telegraphs, and is not well equipped with mathematical
| symbols, or indeed any symbols at all besides forward slash,
| backslash, question mark, and equals sign. Not even period!
| amelius wrote:
| A more interesting question: what would our code look like if
| ASCII (or strings in general) didn't have escape codes?
| wongarsu wrote:
| In PHP you see a lot of print('Hello World' . PHP_EOL); where
| the dot is string concatenation (underrated choice imho) and
| PHP_EOL is a predefined constant that maps to \n or \r\n
| depending on platform. You could easily extend that to have
| global constants for all non-printable ascii characters.
| jiehong wrote:
| The font you use could choose to display something for control
| characters, so they would have a visible shape on top of having
| a meaning.
|
| Perhaps like [0] (Unicode notation).
|
| [0]: https://rootr.net/im/ASCII/ASCII.gif
| zokier wrote:
| It would depend more on what we are intending to do, are we
| controlling a terminal, or are we writing to a file (with
| specific format).
|
| Terminal control is fairly easy answer, there would be some
| other API to control cursor position, so the code would need to
| call some function to move the cursor to next line.
|
| For files, it would depend on what the format is. So we might
| be writing just `<p>hello world</p>` instead of `hello
| world\n`. In fact I find it bit weird that we are using
| teletype (and telegraph etc) control protocol (what ASCII
| mostly is) as our "general purpose text" format; it doesn't
| make much sense to me.
| ivanjermakov wrote:
| This is actually a valid problem when writing quines[1]. You
| need to escape string delimiters in a way without using its
| literal.
|
| This is what `chr(39)` is for in the following Python quine:
| a = 'a = {}{}{}; print(a.format(chr(39), a, chr(39)))';
| print(a.format(chr(39), a, chr(39)))
|
| [1]: https://en.wikipedia.org/wiki/Quine_(computing)
| binary132 wrote:
| Cool, but actually it was just 0x0A all along! The symbolic
| representation was always just an alias. It didn't actually go
| through 81 generations of rustc to get back to "where it really
| came from", as you'd be able to see if you could see the real
| binary behind the symbols. Yes I am just being a bit silly for
| fun, but at the same time, I know I personally often commit the
| error of imagining that my representations and abstractions are
| the real thing, and forgetting that they're really just
| headstuff.
| i4k wrote:
| This is fascinating and terrifying.
| phibz wrote:
| Backslash escape codes are a convention. They're so pervasive
| that we sometimes forget this. It could just as easily be some
| other character that is special and treated as an escape token.
| gnulinux wrote:
| This is a fascinating post. It reads to me like some kind of
| cross between literate-programming and poetry. It's really trying
| to explain the idea that when you run `just foo` the very 0x0A
| byte comes from possibly hundreds of cycles of code generation.
| Back in the day, someone encoded this information into OCaml
| compiler -- _somehow_ -- and years later here in my computer 0x0A
| information is stored due to this history.
|
| But the way in which this phenomena is explained is via actual
| code. The code itself is besides the point of course, it's not
| like anyone will ever run or compile this specific code, but it's
| put there for humans to follow the discussion.
___________________________________________________________________
(page generated 2024-10-06 23:00 UTC)