hngopher.com

       [HN Gopher] The Elegance of the ASCII Table
       ___________________________________________________________________
        
       The Elegance of the ASCII Table
        
       Author : thewub
       Score  : 256 points
       Date   : 2024-07-22 22:31 UTC (1 days ago)
        
 (HTM) web link (danq.me)
 (TXT) w3m dump (danq.me)
        
       | lucasoshiro wrote:
       | Once I saw a case-insensitive switch in C using that pattern of
       | letters:
       | 
       | switch (my_char | 0x20) {                  case 'a': ...
       | break;             case 'b': ...        break;          }
        
         | mananaysiempre wrote:
         | This can be made to work for ASCII and EBCDIC simultaneously
         | for extra esoterica points:                 switch (my_char |
         | 'A' ^ 'a') {       case 'A' | 'a': /* ... */ break;       /*
         | ... */       }
         | 
         | I don't know if this is too fancy to have ever made it into
         | real code, but I believe I've seen places in the ICU source
         | that still say ('A' <= x <= 'I' || 'J' <= x <= 'R' || 'S <= x
         | <= 'Z') instead of just ('A' <= x <= 'Z'), EBCDIC letters being
         | arranged in those three contiguous ranges.
        
         | Sharlin wrote:
         | Yes, that's very intentional and just masking (or setting) the
         | bit is the intended way to do case-insensitive comparison of
         | the letter range in ASCII (eg. stricmp in C), or to transform
         | text to lower or upper case (tolower, toupper).
         | 
         | But what's more, ever wondered whence the _control_ (Ctrl) key
         | presses like Ctrl-H to backspace, or Ctrl-M for carriage
         | return? Well, inspecting the ASCII chart it becomes evident:
         | the Ctrl key simply masks bit 6 (0x40), turning a letter into
         | its respective _control_ character!
        
           | lucasoshiro wrote:
           | Nice!
           | 
           | I'm an emacs user, and when I use a readline-based REPL I use
           | ctrl-M a lot. I thought it was inherited from the emacs
           | keybindings, like many other shortcuts from GNU readline
        
             | jerf wrote:
             | Then an additional useful command: In the out-of-the-box
             | emacs bindings, C-q is the "quoted insert" command. It will
             | take the next character and directly insert it into the
             | buffer. This is useful for things like tab or control
             | characters where emacs would normally use the keystroke to
             | do something else. I've been working in an email-related
             | space lately so I've been doing a good amount of C-q C-m
             | for inserting literal CRs, and C-q TAB for a few places
             | where I want a literal tab in the source, in a buffer that
             | interprets a normal TAB as a command to indentify the
             | current row. I mention this because you can use the ASCII
             | table to work out how to insert a particular control
             | character with your keyboard literally, if you need to
             | insert one of the handful of other characters you may be
             | interested in every so often, like C-l for "form feed" (now
             | used for "page feed" in some older printer-related
             | contexts) or C-@ for NUL if you're doing something weird
             | with binary files in a "text" buffer.
        
           | flohofwoe wrote:
           | ...it's a bit of a shame that the same upper/lowercase trick
           | doesn't apply to all UNICODE codepoints (at least those that
           | have upper/lower variants).
           | 
           | It seems to work for codepoints up to U+00FF, for instance:
           | - A (U+00C5) vs a (U+00E5)
           | 
           | ...but above 0xFF lowercase follows uppercase:
           | - A (U+0102) vs a (U+0103)
           | 
           | Typical for UNICODE though, nothing makes sense ;)
        
             | Findecanor wrote:
             | That's because U+00A0-U+00FF are encoding an earlier
             | character set: "ISO Latin-1" (ISO 8859-1), itself based on
             | DEC's "Multinational Character Set". The upper/lowercase
             | trick does not apply to ss/y but does in MCS where Y/y are
             | at a different pair of code points.
             | 
             | ISO Latin-1 was the character set on many Unix systems,
             | Amiga OS, MS-Windows (as "Windows-1252" with extra chars),
             | and was for many years the default character set on the
             | web.
        
       | dwheeler wrote:
       | The encodings we use today have a surprisingly deep and complex
       | history. For more, see: "The Evolution of Character Codes,
       | 1874-1968" https://ia800606.us.archive.org/17/items/enf-
       | ascii/ascii.pdf
        
         | rrwo wrote:
         | Thanks for posting that.
         | 
         | People tend to overlook that the technologies we use today have
         | a much older history.
        
       | jiveturkey wrote:
       | ebcdic is also quite elegant
       | 
       | https://news.ycombinator.com/item?id=13543715
        
         | gerdesj wrote:
         | Its shit if you don't routinely speak or write English. On
         | those grounds, I'll decry it as not only shit but purposely
         | shit.
         | 
         | OK a bit over the top ... the designers of EBCSDIC had a rather
         | tight set of constraints to deal with, none of which included:
         | "be inclusive". Again, if I really had to be charitable (I
         | looked after a System/36, back in the day), the hardware was
         | rather shit too, sorry ... constrained. Yes constrained. Why
         | should six inch fans fire up reliably after a few years of use
         | and not need a poke after an IPL? No real dust snags and I
         | carefully sprayed some WD40 on the one that I could get at. I
         | have modern Dells and HPs in horrid environments that do better
         | with shitty plastic fans.
         | 
         | EBCDIC is not elegant at all unless excluding non English
         | characters in an encoding system is your idea of elegant.
         | 
         | According to this: https://en.wikipedia.org/wiki/EBCDIC it
         | expended loads of effort with dealing with control eg: "SM/SW"
         | instead of language.
         | 
         | ASCII and EBCDIC and that basically say: fuck you foreigners!
         | 
         | We now have hardware that is apparently capable of messianic
         | feats. Let's do the entirety of humanity some justice and
         | really do something elegant. It won't involve EBCDIC.
        
         | theamk wrote:
         | It's really not.
         | 
         | In base-2 machines, the letters are mixed with punctuation,
         | which is pretty horrible design which makes simple things
         | complex, and does not actually bring anything new to the table.
         | 
         | In BCD machines it is slightly better, except letters aren't
         | contiguous either - row 0 is bad, but it's the extra space
         | between R and S which is really ugly. And it's unusable with
         | BCD operations anyway, as high nibble values are used
         | extensively.
         | 
         | Naive sorting simply does not work... lowercase before
         | uppercase, punctuation in the middle of the alphabet, numbers
         | after letters.
         | 
         | I see no elegance there, it's like the worst example of legacy
         | code.
        
           | jiveturkey wrote:
           | It was designed for a specific purpose ... elegance in
           | context
        
             | theamk wrote:
             | Which would that "specific purpose" be? Even punch cards
             | have alphabet interrupted between R and S.
             | 
             | And the whole punch card -> 8-bit is pretty illogical, just
             | like the cards themselves. How come no punches in zone
             | don't correspond to 0 high bits?
             | 
             | (and don't get me started on punch card.. it started with
             | "let's do 1 hole per column for digits" - OK, makes sense;
             | then "let's do 2 hole/column for uppercase" - I guess OK
             | but why did you put extra char in the middle... but then
             | it's 4 holes/column for superscripts? 3-6 holes/col for
             | punctuation? If someone were to design punch cards today
             | but using same requirements, they could easily come up with
             | a much more logical schema)
        
       | Dwedit wrote:
       | Many old NES/SNES games had a simpler character encoding system,
       | with 0-9 and A-Z at the beginning of the table. No conversion
       | require to display hex.
        
       | ggm wrote:
       | man ascii
       | 
       | is never far from my fingers. combined with od -c and od -x it
       | gets the job done. I don't think as fluently in Octal as I used
       | to. Hex has become ubiquitous.
        
         | fsckboy wrote:
         | you mean ?                   ascii
        
           | ggm wrote:
           | No I don't -I live in a different universe to you:
           | % (uname; cd /usr/ports; ls -d */ascii)       FreeBSD
           | zsh: no matches found: */ascii       % which ascii
           | ascii not found       %
           | 
           | It's the same on OSX and debian by default doesn't install
           | that command. If you live inside a POSIX/IEEE 1003 system and
           | want to know the ascii table reliably then the command I run
           | is the one which works. If your distribution doesn't ship
           | manuals by default you have bigger problems.
        
             | amszmidt wrote:
             | "man ascii" has as much guarantee to work on a POSIX system
             | as a command called "ascii" seeing neither (specifically a
             | man page called "ascii") are part of the standard.
             | 
             | So you will either get command not found, or man page not
             | found.
        
           | bandie91 wrote:
           | man 7 ascii
        
       | transfire wrote:
       | One downside of ASCII is the lack of two extra "letters"
       | (whatever they might be, e.g. perhaps German ss), as it makes it
       | impossible to represent base 64 alphanumerically. So we ended up
       | with many alternatives picking two arbitrary punctuation marks.
        
       | jolmg wrote:
       | > So when you're reading 7-bit ASCII, if it starts with 00, it's
       | a non-printing character. Otherwise it's a printing character.
       | 
       | > The first printing character is space; it's an invisible
       | character, but it's still one that has meaning to humans, so it's
       | not a control character (this sounds obvious today, but it was
       | actually the source of some semantic argument when the ASCII
       | standard was first being discussed).
       | 
       | Hmm.. Interesting that space is considered a printing character
       | while horizontal tab and newline are control characters. They're
       | all invisible and move the cursor, but I guess it makes sense.
       | Space is uniquely very specific in how the cursor is moved one
       | character space, so it's like an invisible character. Newline can
       | either imply movement straight down, or down and to the left,
       | depending on a configuration or platform (e.g. DOS vs UNIX line
       | endings). Horizontal tab can also move you a configurable amount
       | rightwards, and perhaps it might've been thought a bit
       | differently, given there's also a vertical tab, which I've got no
       | idea on how it was used. Maybe it's the newline-equivalent for
       | tables, e.g. "id\tcolor\v1\tred\v2\tblue\v" or something like
       | that.
       | 
       | Interesting also that BS is a control char while DEL is a
       | printing(?) char. I guess that's because BS implies just movement
       | leftwards over the text, while DEL is all ones like running a
       | black sharpie through text. Guess that's what makes it printing.
       | Wonder if there were DEL keys on typewriters that just stamped a
       | black square, and on keypunchers that just punched 7 holes, so
       | people would press "backspace" to go back then "delete" to
       | overwrite.
       | 
       | I've used ASCII a lot, but even after so many years, I'm getting
       | moments where it's like "oh this piece isn't just here, it
       | _needs_ to be here for a deep reason ". It's like a jigsaw
       | puzzle.
        
         | pwg wrote:
         | You also have to keep in mind the "interface" for 1962-1968.
         | The printer teletype machine.
         | 
         | The "control codes" were to "control" the printhead. So
         | "carriage return" meant move the "print carriage" back to the
         | left margin. "New line" meant move the paper platen one line
         | height of rotation to move the paper to the next line. In that
         | context, "back space" was "move print head one space left"
         | (rather more like a "reverse space"). The article does mention
         | that there was some debate about whether space should be
         | considered "printable", but if you consider a mechanical
         | printer, as the head is moving to the right and banging out
         | characters onto the paper, the spaces between words do, sort
         | of, look like "printables" (of a sort, a "print nothing"
         | character as it were).
         | 
         | Tab's being control characters then make a bit more sense, in
         | that they cause the printhead to jump some fixed distance to
         | the right.
         | 
         | The article stated why DEL is where it is (all ones) -- so that
         | for punched paper tape, one could get a punch-out of every
         | position, which was then interpreted as "nothing here" by the
         | tape reading machine.
         | 
         | As for typewriters, no, none had a "black box" blot out key.
         | Correction (for typewriters without built in correction tape)
         | was one of: retype the page, apply an eraser (and hopefully not
         | damage the paper surface too much) then retype character and
         | continue, or apply correction fluid (white-out) and retype
         | character and continue.
         | 
         | For those typewriters with built in correction tape options (at
         | least some IBM Selectric models, possibly more) the typewriter
         | would retype the character using the "white-out" ribbon, then
         | retype the replacement character using the normal "typewriting"
         | ribbon.
        
           | EvanAnderson wrote:
           | > The article stated why DEL is where it is (all ones) -- so
           | that for punched paper tape, one could get a punch-out of
           | every position...
           | 
           | I saw an analogous use of backspace on some OS I ran into 30
           | years ago cruising around either Tymnet or TELENET. (I wish I
           | could remember the OS...)
           | 
           | The password prompt assumed local echo. After entering a
           | password the host would send a series of backspaces and
           | various patterns of characters (####, **, etc) to overprint
           | the locally-echoed (and printed) characters.
        
             | kmoser wrote:
             | On the login to the first timesharing system I used, it
             | would prompt for your password, then type eight M's, W's,
             | and X's on top of each other (on paper, of course, since
             | this was using a Teletype terminal), so when you actually
             | typed your password the characters would be printed on top
             | of those already obscured lines.
        
           | rob74 wrote:
           | > _For those typewriters with built in correction tape
           | options (at least some IBM Selectric models, possibly more)
           | the typewriter would retype the character using the "white-
           | out" ribbon_
           | 
           | there was also a solution for cheaper typewriters: small
           | sheets of "white-out" paper (known under the genericized
           | brand name "Tipp-Ex" here in Germany) that you could hold
           | between the ink ribbon and the paper to "overwrite" a typo.
        
           | tivert wrote:
           | > Tab's being control characters then make a bit more sense,
           | in that they cause the printhead to jump some fixed distance
           | to the right.
           | 
           | Isn't that incorrect? Tab doesn't jump a " _fixed_ distance
           | to the right, " it jumps a _variable_ distance to the next
           | tab-stop to the right.
        
             | bandie91 wrote:
             | yea he must meant that it jumps to a fixed position
        
         | kragen wrote:
         | del is not a printing character. it's a control character. if
         | you run a paper tape full of del characters through a teletype
         | it does not print anything. it has to have that bit pattern,
         | even though it greatly complicates the mechanics of the
         | teletype (which has to do all the digital logic with cams and
         | levers) because that way it can be punched over any character
         | on the paper tape to delete it
         | 
         | a figure caption in this page says 'This is a historical
         | throwback to paper tape, where the keyboard would punch some
         | permutation of seven holes to represent the ones and zeros of
         | each character. You can't delete holes once they've been
         | punched, so the only way to mark a character as invalid was to
         | rewind the tape and punch out all the holes in that position:
         | i.e. all 1s.' which is mostly correct, except that it wasn't a
         | _historical throwback_ ; paper tape was perhaps the most
         | important medium for ascii not just in 01963 and 01967 but
         | probably in 01973, maybe even in 01977. teletype owners today
         | are still using paper tape that was manufactured during the
         | vietnam war, where it was used in unprecedented volume for
         | routing teletype messages by hand
         | 
         | the dominant early pc operating system, cp/m (if it's not
         | overly grandiose to call it an 'operating system') had system
         | calls for reading and writing the console, the disk, and the
         | paper tape punch and reader. when i hooked up a modem to my
         | cp/m system to call bbses, i hooked it up as the punch and
         | reader
        
           | jart wrote:
           | > so the only way to mark a character as invalid was to
           | rewind the tape and punch out all the holes in that position
           | 
           | So that's why \177 (DEL) is the loneliest control character.
           | Wow. Thank you!
        
             | kragen wrote:
             | happy to help
        
           | 91bananas wrote:
           | just... this is why this forum exists. thank you
        
             | kragen wrote:
             | you're welcome. i'll try to remember your comment the next
             | time someone replies to me with something like
             | https://news.ycombinator.com/item?id=40993821 or
             | https://news.ycombinator.com/item?id=40993328 or
             | https://news.ycombinator.com/item?id=40992456
        
         | kazinator wrote:
         | Space doesn't just move the cursor on a display; it will
         | obliterate a character cell with a space glyph.
         | 
         | When a display terminal has nondestructive backspace (backspace
         | character doesn't erase), it can be software emulated with BS-
         | SPACE-BS.
         | 
         | At your Linux terminal, you can do "stty echoe" (echo erase) to
         | turn this on (affecting the echoing of backspace characters
         | that are input, not all backspace characters).
         | 
         | Dial-up BBSes had this as a configurable setting also.
        
         | california-og wrote:
         | While DEL didn't stamp a black square on typewriters, it
         | sometimes did so (or something similar, like diagonal stripes)
         | in various digital character sets. ISO 2047[0] established the
         | graphical representations for the control characters of the
         | 7-bit coded character set in 1975, maily for debugging reasons.
         | This graphical representation for DEL was used by Apple IIGS,
         | TRS-80 and even Amiga!
         | 
         | [0]: https://en.m.wikipedia.org/wiki/ISO_2047
        
         | nikau wrote:
         | Logically space maps to a character people use with pen and
         | paper unlike tab
        
         | layer8 wrote:
         | Space is what is represented in the output, i.e. in one cell of
         | the terminal grid, whereas control characters like Tab and
         | CR/LF don't map onto such an output representation. If you want
         | to represent the printed contents of each "grid cell" of a
         | printout or a textmode screen buffer, you don't need the
         | control characters, only the printable characters. The
         | printable characters are what you'd need in a screen font.
        
       | BobbyTables2 wrote:
       | I always lament that since at least 1980s or so, it seems the
       | vast majority of the control characters were never used for their
       | intended purpose.
       | 
       | Instead, we crudely use commas and tabs as delimiters instead of
       | something like RS (#30).
        
         | thaumasiotes wrote:
         | That's because the intended purpose is either useless (for
         | machine control characters) or useless and logically impossible
         | (for delimiters).
         | 
         | What do you do if you have a record that includes a record
         | separator character? Given that you have this problem anyway,
         | why do you want a character dedicated to achieving the same
         | thing that a comma achieves?
        
           | penteract wrote:
           | The record separator isn't on people's keyboards, so it's
           | less likely to show up where it's not expected. Also it's
           | less likely to legitimately occur in something like a name,
           | so there are many users of CSVs who can say they will never
           | need to consider data containing a record separator, and they
           | will be right more often than those who never consider data
           | containing a comma.
           | 
           | Of course, the fact that record separators aren't on
           | keyboards is probably why CSVs use commas.
        
             | thaumasiotes wrote:
             | > Also it's less likely to legitimately occur in something
             | like a name, so there are many users of CSVs who can say
             | they will never need to consider data containing a record
             | separator, and they will be right more often than those who
             | never consider data containing a comma.
             | 
             | No, they'll be right exactly as often, 0% of the time.
             | 
             | But their mistake will show up less frequently, causing
             | more problems when it does.
             | 
             | As soon as it's possible for some of your data to come from
             | someone else's dataset, you're guaranteed to have to
             | accommodate record separators within your data as well as
             | within the metadata. You're better off using a system that
             | plans for this inevitability than one that pretends it
             | can't happen at all.
        
               | penteract wrote:
               | > No, they'll be right exactly as often, 0% of the time.
               | 
               | > But their mistake will show up less frequently, causing
               | more problems when it does.
               | 
               | Enough people use CSVs (and have limited, small-scale
               | use-cases) that I'd be willing to bet "less frequently"
               | means never for at least 1% of people who use CSVs.
               | 
               | I don't know whether the chance of no problems is worth
               | the increased difficulty of problems that do occur -
               | considering that balance feels a bit silly because if
               | you're aware there could be a problem in a context where
               | you could choose between commas and unit separators, you
               | could just add validation or escaping.
        
               | thaumasiotes wrote:
               | > considering that balance feels a bit silly because if
               | you're aware there could be a problem in a context where
               | you could choose between commas and record separators,
               | you could just add validation or escaping.
               | 
               | As soon as you have validation or escaping, having a
               | record separator character loses its entire purpose. The
               | existence of the character is predicated on the idea that
               | you don't have to do that, and that idea is false.
               | 
               | That's why the character is never used. It's a conceptual
               | mistake that was accidentally enshrined in a series of
               | encoding standards that had enough free space to
               | accommodate it.
        
               | penteract wrote:
               | > As soon as you have validation or escaping, having a
               | record separator character loses its entire purpose. The
               | existence of the character is predicated on the idea that
               | you don't have to do that, and that idea is false.
               | 
               | I disagree with this - the data needs to be stored
               | somehow, and while other characters (like comma) can be
               | used, having a dedicated character can help - for example
               | if the data might legitimately contain commas or newlines
               | but not unit separators or record separators, then
               | escaping isn't needed if you use unit/record separators
               | (although validation is still necessary).
        
               | Symbiote wrote:
               | I agree.
               | 
               | TSV is widely used, but lacks a way to escape the tab and
               | new line characterss. RS-V is the same, but allows
               | including tabs and new lines in records.
        
               | Dylan16807 wrote:
               | > As soon as you have validation or escaping, having a
               | record separator character loses its entire purpose.
               | 
               | Not true. Validation is easier than escaping.
        
             | yardshop wrote:
             | In the DOS days, you could "type" control characters by
             | pressing Ctrl and the corresponding letter key, Ctrl+M is
             | Carriage Return, Ctrl+H is Backspace, Ctrl+Z is End Of
             | File, etc.
             | 
             | It was probably possible to type an RS with Ctrl+Shift+.
             | and the others with similar combos.
        
               | jki275 wrote:
               | you can still type them -- alt + 030(for instance) on the
               | keypad will insert that RS character. In Windows at least
               | -- not sure about the other OS.
        
               | Symbiote wrote:
               | On Linux terminals entering control characters is done
               | with the control key, Ctrl-G for example, but they will
               | often be intercepted by the program that is running.
               | 
               | Bash will insert the control character (rather than
               | interpret it) if you prefix it with Ctrl-V.
        
               | penteract wrote:
               | In a desktop linux terminal, Ctrl-^ or Ctrl-~ work for
               | me. In a tty, I need to press Ctrl-V before them.
        
               | jart wrote:
               | Yeah Linux still works exactly this way. The modern WIN32
               | API even works that way too. When you ReadConsoleInput()
               | it gives you teletypewriter style keyboard codes. When I
               | wrote a termios driver for Cosmopolitan to have a Linux-
               | style shell in CMD it really didn't take much to
               | translate them into the Linux style. We're all still
               | using glorified teletypes at the end of the day. It will
               | always be the substrate of our world. One system built
               | upon another older system.
        
               | _flux wrote:
               | I think it's worth mentioning that Ctrl-A is ascii 1,
               | Ctrl-B ascii 2, etc, as it is in Unix today.
        
             | keybored wrote:
             | I can't think of a case where someone would write a control
             | character like that into something intended for text on
             | purpose. So you might as well disallow it.
        
               | jerf wrote:
               | The situation that comes up the most often that you need
               | to consider is when someone embeds the same sort of file
               | into itself, or chunks of the same sort of file into
               | itself. If using the ASCII characters to delimit fields
               | was common, you'd need to consider that over the course
               | of some moderately interesting system's life time the
               | odds of someone copying and pasting something from an
               | encoded file into the spreadsheet application and picking
               | up the ASCII control characters with it is basically
               | 100%. And while we may be able to say with some
               | confidence that nobody is going to embed a CSV file into
               | a CSV file (and I say only _some_ confidence, the world
               | is weird and I 'm _sure_ someone will read this who has
               | actually seen someone do this), there 's other situations
               | like HTML-in-HTML (for example, every HTML tutorial ever)
               | that are guaranteed by their nature.
               | 
               | It is still valid to disallow the ASCII control
               | characters, one just has to make sure that it is done
               | comprehensively, in all places users may input them. But
               | that's not created by using ASCII control characters,
               | that's a consequence of the "ban the control characters
               | entirely" approach regardless of what the control
               | characters are.
               | 
               | It's neat when you can get away with it, but I generally
               | prefer to define a robust encoding scheme instead. A
               | minimal one like "replace backslash with double-
               | backslash, replace control characters with backslashed
               | characters" and "replace backslash sequences with their
               | control characters, including backslash-backslash as a
               | single backslash" can be inserted almost anywhere in just
               | a few lines of string replace (or stream processing if
               | you need the speed). The only tricky bit is you need to
               | make sure you get the order correct or you corrupt data,
               | and while I've done this enough to have it almost
               | memorized now I do recall _feeling_ like the correct
               | order is backwards from what I naturally wanted the first
               | few times. But it is simple and robust if you get it
               | right.
        
               | keybored wrote:
               | Someday I will create both formats: a control-characters
               | are banned format (and never accepted) and one where they
               | are escaped. That ought to be good enough for all needs!
               | 
               | (A trivial evening project for some; not for all of us)
        
           | AdamH12113 wrote:
           | _> What do you do if you have a record that includes a record
           | separator character?_
           | 
           | You use the ASCII escape character (0x1B), which is designed
           | for exactly that purpose.
        
           | keybored wrote:
           | > What do you do if you have a record that includes a record
           | separator character?
           | 
           | This comes up every time. Options:
           | 
           | 1. You disallow it. And you might as well disallow all the
           | control codes except the carriage return, line feed, and
           | other "spacing" characters. Because what are they doing in
           | the data proper? They are in-band signals.
           | 
           | 2. You use the Escape character to escape them
           | 
           | 3. Weirdest option: if you really want to nest in a limited
           | way you can still use the group and file separator characters
        
           | NoMoreNicksLeft wrote:
           | Well, that's what an escape is for. Are we really having a
           | serious discussion in 2024, where someone is suggesting that
           | it's not the responsibility of the software engineer to
           | sanitize inputs before chucking the data into some sort of
           | database?
        
         | EvanAnderson wrote:
         | I did some ETL work that used the ASCII delimiter characters.
         | It was very enjoyable. I didn't have to worry about escaping or
         | parsing escaped strings. The control codes were guaranteed to
         | be illegal in input. It was refreshing.
        
           | theamk wrote:
           | Could you do the same with TSV? A lot of datasets can either
           | prohibit tabs in data, or convert it to spaces in early
           | ingestion.
        
             | EvanAnderson wrote:
             | TSV is a joy compared to CSV, for sure. CLI tools that
             | output TSV are what immediately spring to mind.
        
             | red_admiral wrote:
             | Yes, and as long as you remember to turn off the "TAB
             | produces 4 spaces" thing in your editor (grumble makefiles
             | grumble) it's really nice to work with.
        
         | fukawi2 wrote:
         | I recall working on a PICK D3 system, which was a "multivalue"
         | database. Each field could have multiple values, those values
         | could have sub values, and a third level beyond that.
         | 
         | Values were separated with char(254), subvalues were separated
         | with char(253), and the third level were char(252) separated.
         | 
         | It was... unique, but worked. And to be fair, PICK originated
         | in the 60's, so this method probably evolved in parallel to the
         | ASCII table!
        
         | red_admiral wrote:
         | As long as your data is not binary, so does not contain record
         | separators itself, this would be a thousand times better than
         | CSV (because text data _does_ often contain commas and double
         | quotes).
         | 
         | The only thing you'd need is editors to support some way of
         | entering and displaying the RS, and CTRL+^ is a bit of a kludge
         | as it ends up CTRL+SHIFT+6.
         | 
         | Of course, if a record itself can contain RS for subrecords,
         | things become more complicated. I guess you could use `\^`.
        
         | tracker1 wrote:
         | That's my thought as well... I remember using them pre-xhr web
         | in order to send data from the server to JS, which I could then
         | split up pretty easily on the client side. I still don't know
         | why we are so tethered to CSV.
        
         | yencabulator wrote:
         | Ah, Deborah Records. Little Debbie Records, we call her.
        
       | th0ma5 wrote:
       | I heard someone describe the ASCII table as a state machine.
       | Guess I could understand that as a state machine needed to parse
       | it? This is surprisingly hard to search for but I was wondering
       | if anyone knows what they were talking about.
        
         | EvanAnderson wrote:
         | They might be talking about using escape sequences to map
         | additional codepoints into ASCII. It was designed to be
         | extensible. See:
         | https://web.archive.org/web/20150810075144/http://bobbemer.c...
        
         | kevin_thibedeau wrote:
         | Bespoke hardware for text handling isn't a thing these days but
         | would have been in the 60's and 70's. A table layout that can
         | be easily decoded in hardware simplifies the necessary
         | circuitry for responding to control characters or converting
         | binary numbers to/from decimal when the microprocessor hadn't
         | been invented yet.
        
         | gumby wrote:
         | It was originally implemented in actual hardware (rods and
         | bars). Just look inside a teletype, like a KSR-23 (pre ascii,
         | but similar)
        
       | EvanAnderson wrote:
       | I would be remiss not to post a link to the late Bob Bemer's[0]
       | website.
       | 
       | https://web.archive.org/web/20150801005415/http://bobbemer.c...
       | 
       | He was considered the "father of ASCII". Hr wrote very well and
       | gives clear explanations for the motivations behind the design of
       | ASCII.
       | 
       | [0] https://en.m.wikipedia.org/wiki/Bob_Bemer
        
       | thristian wrote:
       | > _That, I'm afraid, is because ASCII was based not on modern
       | computer keyboards but on the shifted positions of a Remington
       | No. 2 mechanical typewriter - whose shifted layout was the
       | closest compromise we could find as a standard at the time, I
       | imagine._
       | 
       | According to Wikipedia1, American typewriters were pretty
       | consistent with keyboard layout until the IBM Selectric electric
       | typewriter. Apparently "small" characters (like apostrophe,
       | double-quote, underscore, and hyphen) should be typed with less
       | pressure to avoid damaging the platen, and IBM decided the
       | Selectric could be simpler if those symbols were grouped on
       | dedicated keys instead of sharing keys with "high pressure"
       | symbols, so they shuffled the symbols around a bit, resulting in
       | a layout that would look very familiar to a modern PC user.
       | 
       | Because IBM electric typewriters were so widely used (at least in
       | English speaking countries), any computer company that wanted to
       | sell to businesses wanted a Selectric-style layout, including the
       | IBM PC.
       | 
       | Meanwhile, in other countries where typewriters in general
       | weren't so popular or useful, the earliest computers had ASCII-
       | style punctuation layout for simplicity, and later computers
       | didn't have any pressing need to change, so they stuck with it.
       | Japanese keyboards, for example, are still ASCII-style to this
       | day.
       | 
       | 1: https://en.wikipedia.org/wiki/IBM_Selectric#Keyboard_layout
        
         | sixothree wrote:
         | I never realized my first computer used ascii directly for the
         | shifted number keys.
         | 
         | https://en.wikipedia.org/wiki/TRS-80_Color_Computer#/media/F...
        
           | Mountain_Skies wrote:
           | So many things on the CoCo turned out to be the way they were
           | for cost saving reasons. Tandy was good at saving pennies
           | everywhere it could. When I took 'Typing' in high school, my
           | muscle memory was in a constant fight between the IBM
           | Selectric layout of the typewriters at school and the CoCo at
           | home.
        
           | kps wrote:
           | https://en.wikipedia.org/wiki/Bit-paired_keyboard
        
       | kragen wrote:
       | unfortunately this page is based on mackenzie's book. mackenzie
       | is the ibm guy who spent decades trying to kill ascii, promoting
       | its brain-damaged ebcdic as a superior replacement (because it
       | was more compatible, at least if you were already an ibm
       | customer). he spends most of his fucking book trumpeting the
       | virtues of ebcdic actually
       | 
       | bob bemer more or less invented ascii. he was also an ibm guy
       | before mackenzie's crowd pushed him out of ibm for promoting it.
       | he wrote a much better book about the history of ascii which is
       | also freely available online, really more a pamphlet than a book,
       | called "a story of ascii": https://archive.org/details/ascii-
       | bemer/page/n1/mode/2up
       | 
       | tom jennings, who invented fido, also wrote a history of ascii,
       | called 'an annotated history of some character codes or ascii:
       | american standard code for information infiltration'; it's no
       | longer online at his own site, but for the time being the archive
       | has preserved it:
       | https://web.archive.org/web/20100414012008/http://wps.com/pr...
       | 
       | jennings's history is animated by a palpable rage at mackenzie's
       | self-serving account of the history of ascii, partly because
       | bemer hadn't really told his own story publicly. so jennings goes
       | so far as to write punchcard codes (and mackenzie) out of ascii's
       | history entirely, deriving it purely from teletypewriter codes--
       | from which it does undeniably draw many features, but after all,
       | bemer was a punchcard guy, and ascii's many excellent virtues for
       | collation show it
       | 
       | as dwheeler points out, the accomplished informatics archivist
       | eric fischer has also written an excellent history of the
       | evolution of ascii. though, unlike bemer, fischer wasn't actually
       | at the standardization meetings that created ascii, he is more
       | careful and digs deeper than either bemer or jennings, so it
       | might be better to read him first:
       | https://archive.org/details/enf-ascii/
       | 
       | it would be a mistake to credit ascii entirely to bemer; aside
       | from the relatively minor changes in 01967 (including making
       | lowercase official), the draft was extensively revised by the
       | standards committees in the years leading up to 01963, including
       | dramatic improvements in the control-character set
       | 
       | for the historical relationship between ascii character codes and
       | keyboard layouts, see https://en.wikipedia.org/wiki/Bit-
       | paired_keyboard
        
       | gumby wrote:
       | I wish the author had included the full ascii chart in 4 bits
       | across / 4 bits down. You can mask a single bit to change case
       | and that is super obvious that way.
       | 
       | The charts that simply show you the assignments in hex and octal
       | obscure the elegance of the design.
        
         | AdamH12113 wrote:
         | The third and fourth columns of the table are only a single bit
         | apart from each other. If you mentally swap the first two
         | columns, you get a Gray code ordering of the most significant
         | bits, which is pretty close to what you're looking for.
        
           | gumby wrote:
           | Found it: https://en.wikipedia.org/wiki/ASCII#/media/File:USA
           | SCII_code...
        
         | kalleboo wrote:
         | It was at some point looking at a chart like that where it also
         | dawned on me where the control codes like ^D, ^H, ^[ etc came
         | from
        
           | kr2 wrote:
           | I was going to ask you to please explain as I didn't
           | understand, but I am guessing you are talking about the same
           | thing as this comment[1] right? That's super cool
           | 
           | https://news.ycombinator.com/item?id=41042570
        
             | gumby wrote:
             | Yes, though ^m, ^[ et al aren't so much elegant as
             | coincidental; but look at A and a, for example.
             | 
             | I found the chart I was looking for: https://en.wikipedia.o
             | rg/wiki/ASCII#/media/File:USASCII_code...
        
       | netcraft wrote:
       | I've searched off and on for a great stylistic representation of
       | the ASCII table, id love a poster to hang on my wall, or possibly
       | even something I could get as a tattoo.
        
       | yawl wrote:
       | I also wrote a chat novel about ASCII:
       | https://www.lostlanguageofthemachines.com/chapter2/chat
        
       | userbinator wrote:
       | _You might be familiar with carriage return (0D) and line feed
       | (10)_
       | 
       | You mean 0D and 0A, or 13 and 10, but that mix of base really
       | stood out to me in an otherwise good article. I'm one of numerous
       | others who have memorised most of the base ASCII table, and quite
       | a few of the symbols as well as extended ASCII (CP437), mainly
       | because it comes in handy for reading programs without needing a
       | disassembler. Those who do a lot of web development may find the
       | sequence 3A 2F 2F familiar too, as well as 3Ds and 3Fs.
       | 
       | I can see the rationale for <=> being in that order, but [\\] and
       | {|} are less obvious, as well as why their position is 1 column
       | to the left of <=>.
        
         | yardshop wrote:
         | > You mean 0D and 0A, or 13 and 10
         | 
         | He fixed it.
        
       | KingOfCoders wrote:
       | For everyone who doesn't need a,u,o. Or software that needs to
       | take a,u,o. For everyone else, UTF is a blessing.
        
         | bigstrat2003 wrote:
         | Which, given the people who designed this and the time they
         | were designing for, was most of them (and most of their
         | audience). Don't confuse "this old standard doesn't adequately
         | cover all cases today" with "this old standard sucked at the
         | time".
        
         | lmm wrote:
         | > For everyone else, UTF is a blessing.
         | 
         | Except people who want to use Japanese and not have it render
         | weirdly, something that was easy in any internationalised
         | software that used the traditional codepage system, but is
         | practically impossible in Unicode-based software.
        
           | Retr0id wrote:
           | Where can I learn more about this issue?
        
             | zokier wrote:
             | https://en.wikipedia.org/wiki/Han_unification
        
             | p_l wrote:
             | Probably referring to so-called "Han unification" which
             | tried to use same codepoints for different glyphs to reduce
             | code space for ideograms derived from Chinese ones.
             | 
             | But that only causes confusion because you need to provide
             | external information which way to interpret them, just like
             | a code page
        
       | blahedo wrote:
       | Another piece of elegance: by putting the uppercase letters in
       | the block beginning at 0x40 (well, 0x41) it means that all the
       | control codes at the start of the table line up with a letter (or
       | one of a small set of other punctuation: @[\\]^_), giving both a
       | natural shorthand visual representation and a way to enter them
       | with an early keyboard, by joining the pressing of the letter
       | with... the Control key. Control-M (often written ^M) is carriage
       | return because carriage return is 0x0D and M is 0x4D.
        
       | WalterBright wrote:
       | Too bad we now have Unicode, an elegant castle covered with ugly
       | graffiti and ramshackle addons. For example:
       | 
       | 1. normalization
       | 
       | 2. backwards running text (hey, why not add spiral running text?)
       | 
       | 3. fonts
       | 
       | 4. invisible characters
       | 
       | 5. multiple code points with the same glyph
       | 
       | 6. glyphs defined by multiple code points (gee, I thought Unicode
       | was to get away with that mess from code pages!)
       | 
       | 7. made up languages (Elvish? Come on!)
       | 
       | 8. you vote for my made-up emoticon, and I'll vote for yours!
        
         | jart wrote:
         | Hey at least we got the astral planes.
         | https://justine.lol/dox/unicode.txt
        
         | Retr0id wrote:
         | Language itself is a pile of ugly graffiti and ramshackle
         | addons. It would be weird if Unicode _didn 't_ reflect this.
        
         | p_l wrote:
         | How to say you don't know what Unicode is for without saying
         | it.
         | 
         | 1, 2, 4, 5, 6, and, unfortunately, 8, all fall under "ability
         | to encode written text from all human languages". And that
         | includes historical. Some of the issues (5 & 6) are due
         | semantic difference even if the resulting glyph looks the same.
         | Unfortunately you can't expect programmers to understand pesky
         | little thing like languages having different writing, so you
         | end up with normalisation to handle the fact that one system
         | sent "a + ogonek accent" and another (properly) sent "a with
         | ogonek" (these print the same but are semantically different!),
         | and now you need to figure out normalisation in order to be
         | able to compare strings.
         | 
         | 7. just like 8 are down to proposal of specific new forms of
         | writing to add to Unicode. Elvish had one since 1997 but only
         | now got a tentative "we will talk about it". Klingon, which is
         | IIRC more complete language including native speakers (...weird
         | things happened sometimes) does not have outside of private use
         | area.
         | 
         | Emojis were added because they were used with incompatible
         | encodings first, even before unicode happened, and without
         | including something like SIXEL into unicode they were
         | unrepresentable (and with SIXEL would lose semantic
         | information)
        
           | news_to_me wrote:
           | > "a + ogonek accent" and another (properly) sent "a with
           | ogonek" (these print the same but are semantically
           | different!)
           | 
           | How can these possibly be semantically different? Isn't the
           | point of combining characters to create semantic characters
           | that are the combination of those parts?
        
             | p_l wrote:
             | There's a semantic difference between "accented letter" and
             | "different letter that happens to visually look like
             | _another language 's_ accented letter".
             | 
             | "A" in polish is not "A" with some accent. And the idea
             | behind unicode was to preserve human written text,
             | including keeping track of things like "this is letter A1
             | with an accent, but this is letter A2 that looks visually
             | similar to A1 with accent but is different semantically".
             | Of course then worries about code page size resulted in the
             | stupidity of Han unification, so Unicode _is_ a bit broken.
        
               | Dylan16807 wrote:
               | Unless there's some nuance I'm missing, I think you're
               | reading too much into the word "accent".
               | 
               | Especially because the codepoint is actually called
               | "Combining Ogonek".
               | 
               | And for anyone writing in Cyrillic, it's actually more
               | accurate to use the combining form, even as its own
               | letter, because the only precomposed form technically
               | uses a latin A.
               | 
               | But my main point is that I do not think there is
               | supposed to be any semantic difference in Unicode based
               | on whether you use precomposed or decomposed code points.
        
         | chthonicdaemon wrote:
         | All languages are made up. For that matter, all glyphs are made
         | up, too.
        
           | bandie91 wrote:
           | there is not only a quantitative difference between a conlang
           | designed by a small group (or 1 person) and a "human"
           | language developed organically in the span of centuries by
           | millions of speakers, but also qualitative.
        
         | Aardwolf wrote:
         | For me it's how they inconsistently, backwards-incompatibly,
         | make some existing characters outside of the emoji-plane (and
         | especially when in technical/mathematical blocks) render
         | colored by default, rather than keep everything colored related
         | in the emoji plane (making copies if needed rather than
         | affecting old character, the semantics are very different
         | anyway), e.g. https://imgur.com/a/Ugi7K1i and
         | https://imgur.com/a/UMppZHG
        
         | mnau wrote:
         | As someone that whose native language isn't representable
         | purely by ASCII, I celebrate it. Plus the first 128 codepoints
         | are same as ASCII in UT-8.
         | 
         | Is Unicode kind of messy? Sure, but that's just natural
         | consequences of writing systems being messy. Every point you
         | made was for a sensible reason that is in a scope of Unicode
         | mission (representing all text in all writing systems).
        
         | RiverCrochet wrote:
         | > covered with ugly graffiti and ramshackle addons
         | 
         | Unfortunately there is plently of precendent for this
         | ramshacklism. Like ACK/NAK - those are protocol signals, not
         | characters! ENQ? What even is Shift In/Shift Out (SI/SO)? Then
         | the database characters toward the end there FS, RS, GS, US.
         | 
         | > backwards running text (hey, why not add spiral running
         | text?)
         | 
         | You jest, but you do have cursor positioning ANSI sequences
         | which are designed to let text draw anywhere on your screen.
         | And make it blink! You also don't find it weird to have a
         | destructive "clear-screen" sequence?
         | 
         | > glyphs defined by multiple code points
         | 
         | I wonder when they started putting the slash across the 0 to
         | differentiate from the O.
         | 
         | > you vote for my made-up emoticon, and I'll vote for yours!
         | 
         | I mean you do have the private Unicode range where you can
         | actually do that. But before that, SIXEL graphics.
        
           | kps wrote:
           | > Like ACK/NAK - those are protocol signals, not characters!
           | 
           | American Standard Code _for Information Interchange_
        
         | saagarjha wrote:
         | Unicode is quite elegant in its encoding too. If you're going
         | to criticize it for its content, maybe start with talking about
         | how ASCII also has invisible characters and those that people
         | rarely use.
        
         | NoMoreNicksLeft wrote:
         | They're all made-up languages, some were just made-up a little
         | bit more transparently.
        
         | yencabulator wrote:
         | I can't wait for when the majority of Unicode codepoints/glyphs
         | are emojis that are no longer fashionable! That'll be a really
         | weird relic of history, later.
        
         | shepherdjerred wrote:
         | What would be the alternative? I think Unicode is pretty great.
         | 
         | You can pretty easily imagine a world where we had a bunch of
         | different encodings with none being dominant.
        
         | Findecanor wrote:
         | > 2. backwards running text (hey, why not add spiral running
         | text?)
         | 
         | Unicode encodes code points in logical order rather than visual
         | order: the order in which text is supposed to be collated and
         | spoken rather than the visual order.
         | 
         | One tricky issue is when both directions exist in the same
         | text. Unicode can encode nesting of text in one direction
         | within another. For example, text consisting of an English word
         | and a Hebrew word can be encoded as either the English embedded
         | in Hebrew or the Hebrew embedded in English: both would render
         | the same but collate differently.
         | 
         | Is there a better way?
        
         | MisterTea wrote:
         | Most of this is pretty useful for reproducing a wide gamut of
         | human language. It gets completely fucked when it comes to
         | fonts with png's embedded in svg's and other INSANE matryoshka
         | doll nesting of bitmap/vector rendering technologies.
         | 
         | I also half hate emoji as it pollutes human writable text with
         | bitmaps that are difficult to reproduce by hand on paper with a
         | writing instrument - it's not text. I say half hate as it
         | allows us a standard set of icons to use that can be easily
         | rendered in-line with text or on their own.
        
       | zokier wrote:
       | I think that adopting ASCII as the general purpose text encoding
       | was one of the great mistakes of early computing. It originated
       | as control interface for teletypes and such, and that's arguably
       | where it should have remained. For storing and processing (plain)
       | text ASCII doesn't really fit that well, control characters are a
       | hindrance and the code space would have been useful for
       | additional characters. The ASCII set of printables was definitely
       | a compromise formed by the limited code space.
        
         | kstenerud wrote:
         | It's one of the greatest triumphs of early computing. Not only
         | did it harmonize text representation and transmission in a
         | backwards compatible manner; the fact that they deliberately
         | kept it 7 bit for so long also helped for developing a sane set
         | of other language character sets (ISO-8859), and paved the way
         | for a smooth transition to Unicode (UTF-8) - which is now the
         | dominant encoding worldwide.
        
           | ddingus wrote:
           | Yes, seconded easily
        
         | hackit2 wrote:
         | Yeah you were not around when a kb of memory took up half your
         | room. Looking back it doesn't make sense but at the time a byte
         | was what-ever you wanted it to be. Considering number of
         | characters in English language is 26, it is reasonable for a
         | byte to be 5 bits, giving you a total of 32 possible states.
         | Which leaves you with 6 values which could be used as control
         | characters. how-ever lets not forget there are 7,164 other
         | languages of the world, and they all have their own unique way
         | of doing things.
         | 
         | Oh yeah, lets not forget that at the time you had other
         | nationalistic countries/territories/people with their own
         | superior technology all vying for the top position, all well
         | trying to out do each other. Then you also had manipulative
         | monopolies/trade embargo's and wars.
         | 
         | It isn't perfect but people aren't perfect.
        
         | ddingus wrote:
         | No way!
         | 
         | No amount of extra characters was going to address what Unicode
         | did.
         | 
         | ASCII was not a mistake at all. Adopting it unified what was
         | surely going to be a real mess.
         | 
         | At the time it made sense, and the control functions were
         | needed. Still are.
        
           | zokier wrote:
           | > At the time it made sense, and the control functions were
           | needed. Still are.
           | 
           | Control characters were needed for terminals. They never made
           | sense for text. Mixing the two matters is the problem.
        
             | ddingus wrote:
             | It isn't a problem. The text is the UX.
             | 
             | What else would you have proposed, or would propose?
        
       | augusto-moura wrote:
       | Useful tip, on linux (not sure about other *nixes) you can view
       | the ascii table by opening its manpage:                 man ascii
       | 
       | It's been useful to me more than once every year, mostly to know
       | about shell escape codes and when doing weird character ranges in
       | regex and C.
       | 
       | It can be a bit confusing, but the gist is that you have 2 chars
       | being show in each line, I would prefer a view where you see the
       | same char with shift and/or ctrl flags, but you can only ask so
       | much
        
         | dailykoder wrote:
         | Damn, thanks!
         | 
         | Why the hell did I never try this? Maybe because typing ascii
         | table into my favorite search engine and clicking one of the
         | first links was fast enough
        
           | omnicognate wrote:
           | I used to do that until the experience became degraded
           | enough, reflecting the general state of the web, that I took
           | the time to look for a better way and found `man ascii`.
        
         | bell-cot wrote:
         | Similar in FreeBSD. It has octal, hex, decimal, and binary
         | ASCII tables, along with the full names of the control
         | characters.
        
         | INTPenis wrote:
         | The reason I know this is because in 2004 I was squatting in an
         | apartment with no TV and no internet. So each day after work I
         | would go home and just read manpages for fun.
         | 
         | Ended up learning ipfw through the firewall manpage on FreeBSD,
         | and using my skills to setup and manage an IPFW at work.
         | 
         | It's amazing how much you get done with no TV and no internet.
         | Also played a lot of nethack.
        
           | w0m wrote:
           | I learned vim proper by reading :help on an eeepc while
           | flying back and forth over the Atlantic alone one year.
        
         | layer8 wrote:
         | Or even simpler use the _ascii_ command, when installed:
         | https://packages.debian.org/bookworm/ascii
        
         | bodyfour wrote:
         | > not sure about other *nixes
         | 
         | Should be available on any UNIX, it was added to V7 UNIX back
         | in the 1970s: https://github.com/dspinellis/unix-history-
         | repo/blob/Researc...
         | 
         | Even before that, it existed as a standalone text file
         | https://github.com/dspinellis/unix-history-repo/blob/8cf2a84...
         | This still exists on many systems -- for instance as
         | /usr/share/misc/ascii on MacOS
        
         | fitsumbelay wrote:
         | strange: on MacOS 14.5 I get output for `man ascii` but `ascii`
         | goes "command not found"
        
           | AnimalMuppet wrote:
           | On my Linux VM, it's the same, and it's because 'man ascii'
           | comes from man(7), not man(1). It's not a man page for a
           | program. It's just a man page.
        
         | irrational wrote:
         | Works on mac
        
       | snvzz wrote:
       | The ASCII table is defective; it is missing a dedicated code for
       | newline.
       | 
       | CR and LF aren't dedicated, and have precise cursor movement
       | meanings, rather than being a logical line ender.
       | 
       | There was a proposal in the 80s to reassigning the -otherwise
       | useless- VT (vertical tab) character for the purpose.
       | Unfortunately unfruitful.
        
         | bregma wrote:
         | A separate control character was not needed to indicate where
         | your Hollerith string ended: it ended at the end of the
         | Hollerith string. If you wanted to render a Hollerith string
         | onto print media, you'd often want to feed the line and then
         | return the carriage before printing the next Hollerith string.
         | Of course, that wasn't strictly necessary if you were using a
         | line printer, which would just print the line and advance.
         | 
         | The filesystems I used had 5 kinds of file: random-access,
         | sequential, ISAM, Fortran-carriage-control and carriage-return-
         | carriage-control. The only people who used the latter were the
         | eggheads that used that new-fangled C programming language
         | brought over from Bell Lab's experimental Unix system.
         | 
         | You're probably just looking for the record separator (036). If
         | you are storing multiple text records and a block of memory,
         | that would be the ideal ASCII code to separate them.
        
         | Gormo wrote:
         | > Unfortunately unfruitful.
         | 
         |  _Fortunately_ unfruitful, since if it had gained adoption,
         | there 'd be a mix of _three_ different line endings (and
         | combinations thereof) in widespread use, instead of two.
        
       | 1vuio0pswjnm7 wrote:
       | Mentioned in footnote 7:
       | 
       | https://ia601808.us.archive.org/2/items/mackenzie-coded-char...
        
       | red_admiral wrote:
       | The "16 rows x 8 columns" version, with the lowercase letters
       | added, seems the most elegant one to me because it makes the
       | internal structure of the thing visible. For example, to
       | lowercase a letter, you set bit 6; a decimal digit is the prefix
       | 011 followed by the binary encoding of the digit etc.
       | 
       | It also makes clear why ESC can be entered as `^[` or ENTER
       | (technically CR) as `^M` on some terminals (still works in my
       | xterm), because the effect of the control key is to unset bits 6
       | and 7 in the original set-up.
       | 
       | Of course you can color in the fields too, if you want.
        
       | johanneskanybal wrote:
       | Kind of hard to read something where the author considers every
       | non-english languages equally worthy to emoji's.. It was good in
       | the 50's but was important like 4-5 decades too long.
        
       | georgehotelling wrote:
       | Dark grey #303030 text on slightly darker grey #1B1C21 background
       | is really hard to read. Maybe I'm just getting old, but I also
       | assume the audience for a blog post about the ASCII table was
       | born in a year that starts with 19.
        
         | Retr0id wrote:
         | The background is white on my machine, are you using some kind
         | of extension to force "dark mode"?
        
           | georgehotelling wrote:
           | I'm using pi-hole, uBlock Origin, and Privacy Badger on
           | Firefox. I checked my network tab before complaining and
           | didn't see any resources that failed to load.
        
       | bloak wrote:
       | Vaguely related: Apart from PS and EUR, a typical GB keyboard has
       | a couple of non-ASCII characters printed on it: ! and |. The key
       | labelled | is usually mapped to |, but the key labelled ! often
       | gives you an actual !, though I can't remember many occasions on
       | which I've wanted one of them. Apparently the characters ! and |
       | are in EBCDIC.
        
       | PaulHoule wrote:
       | Beats EBCDIC
       | 
       | https://en.wikipedia.org/wiki/EBCDIC
       | 
       | On the 4th floor of my building the computer systems lab has a
       | glass front that has what looks like a punch card etched in
       | frosted glass but if you look closer it was made by sticking
       | stickers on the glass.
       | 
       | I made a "punchcard decoder" on a 4x6 card to help people decode
       | the message on the wall
       | 
       | https://mastodon.social/@UP8/112836035703067309
       | 
       | The EBCDIC code was designed to be compatible with this encoding
       | which has all sorts of weird features, for instance the "/" right
       | between "R" and "Z"; letters don't form a consecutive block so
       | testing to see if a char is a letter is more complex than in
       | ASCII.
       | 
       | I am thinking of redoing that card to put the alphabet in order.
       | A column in a punched card has between 0 to 3 punches, 0 is a
       | space, 1 is a letter or a symbol in the first column, if one of
       | the rows at the top is punched you combine that with the number
       | of the other punched row on the left 3x9 grid. If three holes are
       | punched one of them is an 8 (unless you've got one of the
       | extended charsets) and you have one of the symbols in the right
       | 3x6. Note the ! and C/ which are not in ASCII but are in latin-1.
        
       | senkora wrote:
       | Fun fact: sorting ASCII numerically puts all the uppercase
       | letters first, followed by all the lowercase letters (ABC...
       | abc...). A more typical dictionary ordering would be more like
       | AaBbCc... (or to even consider A and a at the same sort level and
       | only use them to break ties if the words are otherwise
       | identical).
       | 
       | The order used by ASCII is sometimes called "ASCIIbetical", which
       | I think is wonderful.
       | 
       | https://en.wiktionary.org/wiki/ASCIIbetical
        
         | NoMoreNicksLeft wrote:
         | I thought the point of that was that a single bitflip makes an
         | uppercase lower, or vice versa...
        
           | Dylan16807 wrote:
           | It can't be "the" point, because AaBbCc would also let you
           | use a single bit to control case, the bottom bit.
        
       | DonHopkins wrote:
       | The Apple ][ and TTYs and other old computers had "bit pairing
       | keyboards", where the punctuation marks above the digits were
       | aligned with the ASCII values of the corresponding digits,
       | different by one bit.                   Typewriter: !@#$%^&*()
       | Apple:      !"#$%&'()         Digits:     1234567890
       | 
       | https://en.wikipedia.org/wiki/Bit-paired_keyboard
       | 
       | >A bit-paired keyboard is a keyboard where the layout of shifted
       | keys corresponds to columns in the ASCII (1963) table,
       | archetypally the Teletype Model 33 (1963) keyboard. This was
       | later contrasted with a typewriter-paired keyboard, where the
       | layout of shifted keys corresponds to electric typewriter
       | layouts, notably the IBM Selectric (1961). The difference is most
       | visible in the digits row (top row): compared with mechanical
       | typewriters, bit-paired keyboards remove the _ character from 6
       | and shift the remaining & _() from 7890 to 6789, while
       | typewriter-paired keyboards replace 3 characters: | Shift+2 from
       | " to @ | Shift+6 from _ to ^ and | Shift+8 from ' to _. An
       | important subtlety is that ASCII was based on mechanical
       | typewriters, but electric typewriters became popular during the
       | same period that ASCII was adopted, and made their own changes to
       | layout.[1] Thus differences between bit-paired and (electric)
       | typewriter-paired keyboards are due to the differences of both of
       | these from earlier mechanical typewriters.
       | 
       | >[...] Bit-paired keyboard layouts survive today only in the
       | standard Japanese keyboard layout, which has all shifted values
       | of digits in the bit-paired layout.
       | 
       | >[...] For this reason, among others (such as ease of collation),
       | the ASCII standard strove to organize the code points so that
       | shifting could be implemented by simply toggling a bit. This is
       | most conspicuous in uppercase and lowercase characters: uppercase
       | characters are in columns 4 (100) and 5 (101), while the
       | corresponding lowercase characters are in columns 6 (110) and 7
       | (111), requiring only toggling the 6th bit (2nd high bit) to
       | switch case; as there are only 26 letters, the remaining 6 points
       | in each column were occupied by symbols or, in one case, a
       | control character (DEL, in 127).
       | 
       | >[...] In the US, bit-paired keyboards continued to be used into
       | the 1970s, including on electronic keyboards like the HP 2640
       | terminal (1975) and the first model Apple II computer (1977).
        
       | aronhegedus wrote:
       | Was a really fun article to read/podcast to listen to.
       | 
       | Favorite fact is that 127 is the DEL because for hole punching it
       | removes all the info. I love those little nuggets of history
        
       | pixelbeat__ wrote:
       | I wrote about ASCII and UTF-8 elegance at:
       | 
       | https://www.pixelbeat.org/docs/utf8_programming.html
        
       ___________________________________________________________________
       (page generated 2024-07-23 23:10 UTC)