[HN Gopher] The Elegance of the ASCII Table
___________________________________________________________________
The Elegance of the ASCII Table
Author : thewub
Score : 256 points
Date : 2024-07-22 22:31 UTC (1 days ago)
(HTM) web link (danq.me)
(TXT) w3m dump (danq.me)
| lucasoshiro wrote:
| Once I saw a case-insensitive switch in C using that pattern of
| letters:
|
| switch (my_char | 0x20) { case 'a': ...
| break; case 'b': ... break; }
| mananaysiempre wrote:
| This can be made to work for ASCII and EBCDIC simultaneously
| for extra esoterica points: switch (my_char |
| 'A' ^ 'a') { case 'A' | 'a': /* ... */ break; /*
| ... */ }
|
| I don't know if this is too fancy to have ever made it into
| real code, but I believe I've seen places in the ICU source
| that still say ('A' <= x <= 'I' || 'J' <= x <= 'R' || 'S <= x
| <= 'Z') instead of just ('A' <= x <= 'Z'), EBCDIC letters being
| arranged in those three contiguous ranges.
| Sharlin wrote:
| Yes, that's very intentional and just masking (or setting) the
| bit is the intended way to do case-insensitive comparison of
| the letter range in ASCII (eg. stricmp in C), or to transform
| text to lower or upper case (tolower, toupper).
|
| But what's more, ever wondered whence the _control_ (Ctrl) key
| presses like Ctrl-H to backspace, or Ctrl-M for carriage
| return? Well, inspecting the ASCII chart it becomes evident:
| the Ctrl key simply masks bit 6 (0x40), turning a letter into
| its respective _control_ character!
| lucasoshiro wrote:
| Nice!
|
| I'm an emacs user, and when I use a readline-based REPL I use
| ctrl-M a lot. I thought it was inherited from the emacs
| keybindings, like many other shortcuts from GNU readline
| jerf wrote:
| Then an additional useful command: In the out-of-the-box
| emacs bindings, C-q is the "quoted insert" command. It will
| take the next character and directly insert it into the
| buffer. This is useful for things like tab or control
| characters where emacs would normally use the keystroke to
| do something else. I've been working in an email-related
| space lately so I've been doing a good amount of C-q C-m
| for inserting literal CRs, and C-q TAB for a few places
| where I want a literal tab in the source, in a buffer that
| interprets a normal TAB as a command to indentify the
| current row. I mention this because you can use the ASCII
| table to work out how to insert a particular control
| character with your keyboard literally, if you need to
| insert one of the handful of other characters you may be
| interested in every so often, like C-l for "form feed" (now
| used for "page feed" in some older printer-related
| contexts) or C-@ for NUL if you're doing something weird
| with binary files in a "text" buffer.
| flohofwoe wrote:
| ...it's a bit of a shame that the same upper/lowercase trick
| doesn't apply to all UNICODE codepoints (at least those that
| have upper/lower variants).
|
| It seems to work for codepoints up to U+00FF, for instance:
| - A (U+00C5) vs a (U+00E5)
|
| ...but above 0xFF lowercase follows uppercase:
| - A (U+0102) vs a (U+0103)
|
| Typical for UNICODE though, nothing makes sense ;)
| Findecanor wrote:
| That's because U+00A0-U+00FF are encoding an earlier
| character set: "ISO Latin-1" (ISO 8859-1), itself based on
| DEC's "Multinational Character Set". The upper/lowercase
| trick does not apply to ss/y but does in MCS where Y/y are
| at a different pair of code points.
|
| ISO Latin-1 was the character set on many Unix systems,
| Amiga OS, MS-Windows (as "Windows-1252" with extra chars),
| and was for many years the default character set on the
| web.
| dwheeler wrote:
| The encodings we use today have a surprisingly deep and complex
| history. For more, see: "The Evolution of Character Codes,
| 1874-1968" https://ia800606.us.archive.org/17/items/enf-
| ascii/ascii.pdf
| rrwo wrote:
| Thanks for posting that.
|
| People tend to overlook that the technologies we use today have
| a much older history.
| jiveturkey wrote:
| ebcdic is also quite elegant
|
| https://news.ycombinator.com/item?id=13543715
| gerdesj wrote:
| Its shit if you don't routinely speak or write English. On
| those grounds, I'll decry it as not only shit but purposely
| shit.
|
| OK a bit over the top ... the designers of EBCSDIC had a rather
| tight set of constraints to deal with, none of which included:
| "be inclusive". Again, if I really had to be charitable (I
| looked after a System/36, back in the day), the hardware was
| rather shit too, sorry ... constrained. Yes constrained. Why
| should six inch fans fire up reliably after a few years of use
| and not need a poke after an IPL? No real dust snags and I
| carefully sprayed some WD40 on the one that I could get at. I
| have modern Dells and HPs in horrid environments that do better
| with shitty plastic fans.
|
| EBCDIC is not elegant at all unless excluding non English
| characters in an encoding system is your idea of elegant.
|
| According to this: https://en.wikipedia.org/wiki/EBCDIC it
| expended loads of effort with dealing with control eg: "SM/SW"
| instead of language.
|
| ASCII and EBCDIC and that basically say: fuck you foreigners!
|
| We now have hardware that is apparently capable of messianic
| feats. Let's do the entirety of humanity some justice and
| really do something elegant. It won't involve EBCDIC.
| theamk wrote:
| It's really not.
|
| In base-2 machines, the letters are mixed with punctuation,
| which is pretty horrible design which makes simple things
| complex, and does not actually bring anything new to the table.
|
| In BCD machines it is slightly better, except letters aren't
| contiguous either - row 0 is bad, but it's the extra space
| between R and S which is really ugly. And it's unusable with
| BCD operations anyway, as high nibble values are used
| extensively.
|
| Naive sorting simply does not work... lowercase before
| uppercase, punctuation in the middle of the alphabet, numbers
| after letters.
|
| I see no elegance there, it's like the worst example of legacy
| code.
| jiveturkey wrote:
| It was designed for a specific purpose ... elegance in
| context
| theamk wrote:
| Which would that "specific purpose" be? Even punch cards
| have alphabet interrupted between R and S.
|
| And the whole punch card -> 8-bit is pretty illogical, just
| like the cards themselves. How come no punches in zone
| don't correspond to 0 high bits?
|
| (and don't get me started on punch card.. it started with
| "let's do 1 hole per column for digits" - OK, makes sense;
| then "let's do 2 hole/column for uppercase" - I guess OK
| but why did you put extra char in the middle... but then
| it's 4 holes/column for superscripts? 3-6 holes/col for
| punctuation? If someone were to design punch cards today
| but using same requirements, they could easily come up with
| a much more logical schema)
| Dwedit wrote:
| Many old NES/SNES games had a simpler character encoding system,
| with 0-9 and A-Z at the beginning of the table. No conversion
| require to display hex.
| ggm wrote:
| man ascii
|
| is never far from my fingers. combined with od -c and od -x it
| gets the job done. I don't think as fluently in Octal as I used
| to. Hex has become ubiquitous.
| fsckboy wrote:
| you mean ? ascii
| ggm wrote:
| No I don't -I live in a different universe to you:
| % (uname; cd /usr/ports; ls -d */ascii) FreeBSD
| zsh: no matches found: */ascii % which ascii
| ascii not found %
|
| It's the same on OSX and debian by default doesn't install
| that command. If you live inside a POSIX/IEEE 1003 system and
| want to know the ascii table reliably then the command I run
| is the one which works. If your distribution doesn't ship
| manuals by default you have bigger problems.
| amszmidt wrote:
| "man ascii" has as much guarantee to work on a POSIX system
| as a command called "ascii" seeing neither (specifically a
| man page called "ascii") are part of the standard.
|
| So you will either get command not found, or man page not
| found.
| bandie91 wrote:
| man 7 ascii
| transfire wrote:
| One downside of ASCII is the lack of two extra "letters"
| (whatever they might be, e.g. perhaps German ss), as it makes it
| impossible to represent base 64 alphanumerically. So we ended up
| with many alternatives picking two arbitrary punctuation marks.
| jolmg wrote:
| > So when you're reading 7-bit ASCII, if it starts with 00, it's
| a non-printing character. Otherwise it's a printing character.
|
| > The first printing character is space; it's an invisible
| character, but it's still one that has meaning to humans, so it's
| not a control character (this sounds obvious today, but it was
| actually the source of some semantic argument when the ASCII
| standard was first being discussed).
|
| Hmm.. Interesting that space is considered a printing character
| while horizontal tab and newline are control characters. They're
| all invisible and move the cursor, but I guess it makes sense.
| Space is uniquely very specific in how the cursor is moved one
| character space, so it's like an invisible character. Newline can
| either imply movement straight down, or down and to the left,
| depending on a configuration or platform (e.g. DOS vs UNIX line
| endings). Horizontal tab can also move you a configurable amount
| rightwards, and perhaps it might've been thought a bit
| differently, given there's also a vertical tab, which I've got no
| idea on how it was used. Maybe it's the newline-equivalent for
| tables, e.g. "id\tcolor\v1\tred\v2\tblue\v" or something like
| that.
|
| Interesting also that BS is a control char while DEL is a
| printing(?) char. I guess that's because BS implies just movement
| leftwards over the text, while DEL is all ones like running a
| black sharpie through text. Guess that's what makes it printing.
| Wonder if there were DEL keys on typewriters that just stamped a
| black square, and on keypunchers that just punched 7 holes, so
| people would press "backspace" to go back then "delete" to
| overwrite.
|
| I've used ASCII a lot, but even after so many years, I'm getting
| moments where it's like "oh this piece isn't just here, it
| _needs_ to be here for a deep reason ". It's like a jigsaw
| puzzle.
| pwg wrote:
| You also have to keep in mind the "interface" for 1962-1968.
| The printer teletype machine.
|
| The "control codes" were to "control" the printhead. So
| "carriage return" meant move the "print carriage" back to the
| left margin. "New line" meant move the paper platen one line
| height of rotation to move the paper to the next line. In that
| context, "back space" was "move print head one space left"
| (rather more like a "reverse space"). The article does mention
| that there was some debate about whether space should be
| considered "printable", but if you consider a mechanical
| printer, as the head is moving to the right and banging out
| characters onto the paper, the spaces between words do, sort
| of, look like "printables" (of a sort, a "print nothing"
| character as it were).
|
| Tab's being control characters then make a bit more sense, in
| that they cause the printhead to jump some fixed distance to
| the right.
|
| The article stated why DEL is where it is (all ones) -- so that
| for punched paper tape, one could get a punch-out of every
| position, which was then interpreted as "nothing here" by the
| tape reading machine.
|
| As for typewriters, no, none had a "black box" blot out key.
| Correction (for typewriters without built in correction tape)
| was one of: retype the page, apply an eraser (and hopefully not
| damage the paper surface too much) then retype character and
| continue, or apply correction fluid (white-out) and retype
| character and continue.
|
| For those typewriters with built in correction tape options (at
| least some IBM Selectric models, possibly more) the typewriter
| would retype the character using the "white-out" ribbon, then
| retype the replacement character using the normal "typewriting"
| ribbon.
| EvanAnderson wrote:
| > The article stated why DEL is where it is (all ones) -- so
| that for punched paper tape, one could get a punch-out of
| every position...
|
| I saw an analogous use of backspace on some OS I ran into 30
| years ago cruising around either Tymnet or TELENET. (I wish I
| could remember the OS...)
|
| The password prompt assumed local echo. After entering a
| password the host would send a series of backspaces and
| various patterns of characters (####, **, etc) to overprint
| the locally-echoed (and printed) characters.
| kmoser wrote:
| On the login to the first timesharing system I used, it
| would prompt for your password, then type eight M's, W's,
| and X's on top of each other (on paper, of course, since
| this was using a Teletype terminal), so when you actually
| typed your password the characters would be printed on top
| of those already obscured lines.
| rob74 wrote:
| > _For those typewriters with built in correction tape
| options (at least some IBM Selectric models, possibly more)
| the typewriter would retype the character using the "white-
| out" ribbon_
|
| there was also a solution for cheaper typewriters: small
| sheets of "white-out" paper (known under the genericized
| brand name "Tipp-Ex" here in Germany) that you could hold
| between the ink ribbon and the paper to "overwrite" a typo.
| tivert wrote:
| > Tab's being control characters then make a bit more sense,
| in that they cause the printhead to jump some fixed distance
| to the right.
|
| Isn't that incorrect? Tab doesn't jump a " _fixed_ distance
| to the right, " it jumps a _variable_ distance to the next
| tab-stop to the right.
| bandie91 wrote:
| yea he must meant that it jumps to a fixed position
| kragen wrote:
| del is not a printing character. it's a control character. if
| you run a paper tape full of del characters through a teletype
| it does not print anything. it has to have that bit pattern,
| even though it greatly complicates the mechanics of the
| teletype (which has to do all the digital logic with cams and
| levers) because that way it can be punched over any character
| on the paper tape to delete it
|
| a figure caption in this page says 'This is a historical
| throwback to paper tape, where the keyboard would punch some
| permutation of seven holes to represent the ones and zeros of
| each character. You can't delete holes once they've been
| punched, so the only way to mark a character as invalid was to
| rewind the tape and punch out all the holes in that position:
| i.e. all 1s.' which is mostly correct, except that it wasn't a
| _historical throwback_ ; paper tape was perhaps the most
| important medium for ascii not just in 01963 and 01967 but
| probably in 01973, maybe even in 01977. teletype owners today
| are still using paper tape that was manufactured during the
| vietnam war, where it was used in unprecedented volume for
| routing teletype messages by hand
|
| the dominant early pc operating system, cp/m (if it's not
| overly grandiose to call it an 'operating system') had system
| calls for reading and writing the console, the disk, and the
| paper tape punch and reader. when i hooked up a modem to my
| cp/m system to call bbses, i hooked it up as the punch and
| reader
| jart wrote:
| > so the only way to mark a character as invalid was to
| rewind the tape and punch out all the holes in that position
|
| So that's why \177 (DEL) is the loneliest control character.
| Wow. Thank you!
| kragen wrote:
| happy to help
| 91bananas wrote:
| just... this is why this forum exists. thank you
| kragen wrote:
| you're welcome. i'll try to remember your comment the next
| time someone replies to me with something like
| https://news.ycombinator.com/item?id=40993821 or
| https://news.ycombinator.com/item?id=40993328 or
| https://news.ycombinator.com/item?id=40992456
| kazinator wrote:
| Space doesn't just move the cursor on a display; it will
| obliterate a character cell with a space glyph.
|
| When a display terminal has nondestructive backspace (backspace
| character doesn't erase), it can be software emulated with BS-
| SPACE-BS.
|
| At your Linux terminal, you can do "stty echoe" (echo erase) to
| turn this on (affecting the echoing of backspace characters
| that are input, not all backspace characters).
|
| Dial-up BBSes had this as a configurable setting also.
| california-og wrote:
| While DEL didn't stamp a black square on typewriters, it
| sometimes did so (or something similar, like diagonal stripes)
| in various digital character sets. ISO 2047[0] established the
| graphical representations for the control characters of the
| 7-bit coded character set in 1975, maily for debugging reasons.
| This graphical representation for DEL was used by Apple IIGS,
| TRS-80 and even Amiga!
|
| [0]: https://en.m.wikipedia.org/wiki/ISO_2047
| nikau wrote:
| Logically space maps to a character people use with pen and
| paper unlike tab
| layer8 wrote:
| Space is what is represented in the output, i.e. in one cell of
| the terminal grid, whereas control characters like Tab and
| CR/LF don't map onto such an output representation. If you want
| to represent the printed contents of each "grid cell" of a
| printout or a textmode screen buffer, you don't need the
| control characters, only the printable characters. The
| printable characters are what you'd need in a screen font.
| BobbyTables2 wrote:
| I always lament that since at least 1980s or so, it seems the
| vast majority of the control characters were never used for their
| intended purpose.
|
| Instead, we crudely use commas and tabs as delimiters instead of
| something like RS (#30).
| thaumasiotes wrote:
| That's because the intended purpose is either useless (for
| machine control characters) or useless and logically impossible
| (for delimiters).
|
| What do you do if you have a record that includes a record
| separator character? Given that you have this problem anyway,
| why do you want a character dedicated to achieving the same
| thing that a comma achieves?
| penteract wrote:
| The record separator isn't on people's keyboards, so it's
| less likely to show up where it's not expected. Also it's
| less likely to legitimately occur in something like a name,
| so there are many users of CSVs who can say they will never
| need to consider data containing a record separator, and they
| will be right more often than those who never consider data
| containing a comma.
|
| Of course, the fact that record separators aren't on
| keyboards is probably why CSVs use commas.
| thaumasiotes wrote:
| > Also it's less likely to legitimately occur in something
| like a name, so there are many users of CSVs who can say
| they will never need to consider data containing a record
| separator, and they will be right more often than those who
| never consider data containing a comma.
|
| No, they'll be right exactly as often, 0% of the time.
|
| But their mistake will show up less frequently, causing
| more problems when it does.
|
| As soon as it's possible for some of your data to come from
| someone else's dataset, you're guaranteed to have to
| accommodate record separators within your data as well as
| within the metadata. You're better off using a system that
| plans for this inevitability than one that pretends it
| can't happen at all.
| penteract wrote:
| > No, they'll be right exactly as often, 0% of the time.
|
| > But their mistake will show up less frequently, causing
| more problems when it does.
|
| Enough people use CSVs (and have limited, small-scale
| use-cases) that I'd be willing to bet "less frequently"
| means never for at least 1% of people who use CSVs.
|
| I don't know whether the chance of no problems is worth
| the increased difficulty of problems that do occur -
| considering that balance feels a bit silly because if
| you're aware there could be a problem in a context where
| you could choose between commas and unit separators, you
| could just add validation or escaping.
| thaumasiotes wrote:
| > considering that balance feels a bit silly because if
| you're aware there could be a problem in a context where
| you could choose between commas and record separators,
| you could just add validation or escaping.
|
| As soon as you have validation or escaping, having a
| record separator character loses its entire purpose. The
| existence of the character is predicated on the idea that
| you don't have to do that, and that idea is false.
|
| That's why the character is never used. It's a conceptual
| mistake that was accidentally enshrined in a series of
| encoding standards that had enough free space to
| accommodate it.
| penteract wrote:
| > As soon as you have validation or escaping, having a
| record separator character loses its entire purpose. The
| existence of the character is predicated on the idea that
| you don't have to do that, and that idea is false.
|
| I disagree with this - the data needs to be stored
| somehow, and while other characters (like comma) can be
| used, having a dedicated character can help - for example
| if the data might legitimately contain commas or newlines
| but not unit separators or record separators, then
| escaping isn't needed if you use unit/record separators
| (although validation is still necessary).
| Symbiote wrote:
| I agree.
|
| TSV is widely used, but lacks a way to escape the tab and
| new line characterss. RS-V is the same, but allows
| including tabs and new lines in records.
| Dylan16807 wrote:
| > As soon as you have validation or escaping, having a
| record separator character loses its entire purpose.
|
| Not true. Validation is easier than escaping.
| yardshop wrote:
| In the DOS days, you could "type" control characters by
| pressing Ctrl and the corresponding letter key, Ctrl+M is
| Carriage Return, Ctrl+H is Backspace, Ctrl+Z is End Of
| File, etc.
|
| It was probably possible to type an RS with Ctrl+Shift+.
| and the others with similar combos.
| jki275 wrote:
| you can still type them -- alt + 030(for instance) on the
| keypad will insert that RS character. In Windows at least
| -- not sure about the other OS.
| Symbiote wrote:
| On Linux terminals entering control characters is done
| with the control key, Ctrl-G for example, but they will
| often be intercepted by the program that is running.
|
| Bash will insert the control character (rather than
| interpret it) if you prefix it with Ctrl-V.
| penteract wrote:
| In a desktop linux terminal, Ctrl-^ or Ctrl-~ work for
| me. In a tty, I need to press Ctrl-V before them.
| jart wrote:
| Yeah Linux still works exactly this way. The modern WIN32
| API even works that way too. When you ReadConsoleInput()
| it gives you teletypewriter style keyboard codes. When I
| wrote a termios driver for Cosmopolitan to have a Linux-
| style shell in CMD it really didn't take much to
| translate them into the Linux style. We're all still
| using glorified teletypes at the end of the day. It will
| always be the substrate of our world. One system built
| upon another older system.
| _flux wrote:
| I think it's worth mentioning that Ctrl-A is ascii 1,
| Ctrl-B ascii 2, etc, as it is in Unix today.
| keybored wrote:
| I can't think of a case where someone would write a control
| character like that into something intended for text on
| purpose. So you might as well disallow it.
| jerf wrote:
| The situation that comes up the most often that you need
| to consider is when someone embeds the same sort of file
| into itself, or chunks of the same sort of file into
| itself. If using the ASCII characters to delimit fields
| was common, you'd need to consider that over the course
| of some moderately interesting system's life time the
| odds of someone copying and pasting something from an
| encoded file into the spreadsheet application and picking
| up the ASCII control characters with it is basically
| 100%. And while we may be able to say with some
| confidence that nobody is going to embed a CSV file into
| a CSV file (and I say only _some_ confidence, the world
| is weird and I 'm _sure_ someone will read this who has
| actually seen someone do this), there 's other situations
| like HTML-in-HTML (for example, every HTML tutorial ever)
| that are guaranteed by their nature.
|
| It is still valid to disallow the ASCII control
| characters, one just has to make sure that it is done
| comprehensively, in all places users may input them. But
| that's not created by using ASCII control characters,
| that's a consequence of the "ban the control characters
| entirely" approach regardless of what the control
| characters are.
|
| It's neat when you can get away with it, but I generally
| prefer to define a robust encoding scheme instead. A
| minimal one like "replace backslash with double-
| backslash, replace control characters with backslashed
| characters" and "replace backslash sequences with their
| control characters, including backslash-backslash as a
| single backslash" can be inserted almost anywhere in just
| a few lines of string replace (or stream processing if
| you need the speed). The only tricky bit is you need to
| make sure you get the order correct or you corrupt data,
| and while I've done this enough to have it almost
| memorized now I do recall _feeling_ like the correct
| order is backwards from what I naturally wanted the first
| few times. But it is simple and robust if you get it
| right.
| keybored wrote:
| Someday I will create both formats: a control-characters
| are banned format (and never accepted) and one where they
| are escaped. That ought to be good enough for all needs!
|
| (A trivial evening project for some; not for all of us)
| AdamH12113 wrote:
| _> What do you do if you have a record that includes a record
| separator character?_
|
| You use the ASCII escape character (0x1B), which is designed
| for exactly that purpose.
| keybored wrote:
| > What do you do if you have a record that includes a record
| separator character?
|
| This comes up every time. Options:
|
| 1. You disallow it. And you might as well disallow all the
| control codes except the carriage return, line feed, and
| other "spacing" characters. Because what are they doing in
| the data proper? They are in-band signals.
|
| 2. You use the Escape character to escape them
|
| 3. Weirdest option: if you really want to nest in a limited
| way you can still use the group and file separator characters
| NoMoreNicksLeft wrote:
| Well, that's what an escape is for. Are we really having a
| serious discussion in 2024, where someone is suggesting that
| it's not the responsibility of the software engineer to
| sanitize inputs before chucking the data into some sort of
| database?
| EvanAnderson wrote:
| I did some ETL work that used the ASCII delimiter characters.
| It was very enjoyable. I didn't have to worry about escaping or
| parsing escaped strings. The control codes were guaranteed to
| be illegal in input. It was refreshing.
| theamk wrote:
| Could you do the same with TSV? A lot of datasets can either
| prohibit tabs in data, or convert it to spaces in early
| ingestion.
| EvanAnderson wrote:
| TSV is a joy compared to CSV, for sure. CLI tools that
| output TSV are what immediately spring to mind.
| red_admiral wrote:
| Yes, and as long as you remember to turn off the "TAB
| produces 4 spaces" thing in your editor (grumble makefiles
| grumble) it's really nice to work with.
| fukawi2 wrote:
| I recall working on a PICK D3 system, which was a "multivalue"
| database. Each field could have multiple values, those values
| could have sub values, and a third level beyond that.
|
| Values were separated with char(254), subvalues were separated
| with char(253), and the third level were char(252) separated.
|
| It was... unique, but worked. And to be fair, PICK originated
| in the 60's, so this method probably evolved in parallel to the
| ASCII table!
| red_admiral wrote:
| As long as your data is not binary, so does not contain record
| separators itself, this would be a thousand times better than
| CSV (because text data _does_ often contain commas and double
| quotes).
|
| The only thing you'd need is editors to support some way of
| entering and displaying the RS, and CTRL+^ is a bit of a kludge
| as it ends up CTRL+SHIFT+6.
|
| Of course, if a record itself can contain RS for subrecords,
| things become more complicated. I guess you could use `\^`.
| tracker1 wrote:
| That's my thought as well... I remember using them pre-xhr web
| in order to send data from the server to JS, which I could then
| split up pretty easily on the client side. I still don't know
| why we are so tethered to CSV.
| yencabulator wrote:
| Ah, Deborah Records. Little Debbie Records, we call her.
| th0ma5 wrote:
| I heard someone describe the ASCII table as a state machine.
| Guess I could understand that as a state machine needed to parse
| it? This is surprisingly hard to search for but I was wondering
| if anyone knows what they were talking about.
| EvanAnderson wrote:
| They might be talking about using escape sequences to map
| additional codepoints into ASCII. It was designed to be
| extensible. See:
| https://web.archive.org/web/20150810075144/http://bobbemer.c...
| kevin_thibedeau wrote:
| Bespoke hardware for text handling isn't a thing these days but
| would have been in the 60's and 70's. A table layout that can
| be easily decoded in hardware simplifies the necessary
| circuitry for responding to control characters or converting
| binary numbers to/from decimal when the microprocessor hadn't
| been invented yet.
| gumby wrote:
| It was originally implemented in actual hardware (rods and
| bars). Just look inside a teletype, like a KSR-23 (pre ascii,
| but similar)
| EvanAnderson wrote:
| I would be remiss not to post a link to the late Bob Bemer's[0]
| website.
|
| https://web.archive.org/web/20150801005415/http://bobbemer.c...
|
| He was considered the "father of ASCII". Hr wrote very well and
| gives clear explanations for the motivations behind the design of
| ASCII.
|
| [0] https://en.m.wikipedia.org/wiki/Bob_Bemer
| thristian wrote:
| > _That, I'm afraid, is because ASCII was based not on modern
| computer keyboards but on the shifted positions of a Remington
| No. 2 mechanical typewriter - whose shifted layout was the
| closest compromise we could find as a standard at the time, I
| imagine._
|
| According to Wikipedia1, American typewriters were pretty
| consistent with keyboard layout until the IBM Selectric electric
| typewriter. Apparently "small" characters (like apostrophe,
| double-quote, underscore, and hyphen) should be typed with less
| pressure to avoid damaging the platen, and IBM decided the
| Selectric could be simpler if those symbols were grouped on
| dedicated keys instead of sharing keys with "high pressure"
| symbols, so they shuffled the symbols around a bit, resulting in
| a layout that would look very familiar to a modern PC user.
|
| Because IBM electric typewriters were so widely used (at least in
| English speaking countries), any computer company that wanted to
| sell to businesses wanted a Selectric-style layout, including the
| IBM PC.
|
| Meanwhile, in other countries where typewriters in general
| weren't so popular or useful, the earliest computers had ASCII-
| style punctuation layout for simplicity, and later computers
| didn't have any pressing need to change, so they stuck with it.
| Japanese keyboards, for example, are still ASCII-style to this
| day.
|
| 1: https://en.wikipedia.org/wiki/IBM_Selectric#Keyboard_layout
| sixothree wrote:
| I never realized my first computer used ascii directly for the
| shifted number keys.
|
| https://en.wikipedia.org/wiki/TRS-80_Color_Computer#/media/F...
| Mountain_Skies wrote:
| So many things on the CoCo turned out to be the way they were
| for cost saving reasons. Tandy was good at saving pennies
| everywhere it could. When I took 'Typing' in high school, my
| muscle memory was in a constant fight between the IBM
| Selectric layout of the typewriters at school and the CoCo at
| home.
| kps wrote:
| https://en.wikipedia.org/wiki/Bit-paired_keyboard
| kragen wrote:
| unfortunately this page is based on mackenzie's book. mackenzie
| is the ibm guy who spent decades trying to kill ascii, promoting
| its brain-damaged ebcdic as a superior replacement (because it
| was more compatible, at least if you were already an ibm
| customer). he spends most of his fucking book trumpeting the
| virtues of ebcdic actually
|
| bob bemer more or less invented ascii. he was also an ibm guy
| before mackenzie's crowd pushed him out of ibm for promoting it.
| he wrote a much better book about the history of ascii which is
| also freely available online, really more a pamphlet than a book,
| called "a story of ascii": https://archive.org/details/ascii-
| bemer/page/n1/mode/2up
|
| tom jennings, who invented fido, also wrote a history of ascii,
| called 'an annotated history of some character codes or ascii:
| american standard code for information infiltration'; it's no
| longer online at his own site, but for the time being the archive
| has preserved it:
| https://web.archive.org/web/20100414012008/http://wps.com/pr...
|
| jennings's history is animated by a palpable rage at mackenzie's
| self-serving account of the history of ascii, partly because
| bemer hadn't really told his own story publicly. so jennings goes
| so far as to write punchcard codes (and mackenzie) out of ascii's
| history entirely, deriving it purely from teletypewriter codes--
| from which it does undeniably draw many features, but after all,
| bemer was a punchcard guy, and ascii's many excellent virtues for
| collation show it
|
| as dwheeler points out, the accomplished informatics archivist
| eric fischer has also written an excellent history of the
| evolution of ascii. though, unlike bemer, fischer wasn't actually
| at the standardization meetings that created ascii, he is more
| careful and digs deeper than either bemer or jennings, so it
| might be better to read him first:
| https://archive.org/details/enf-ascii/
|
| it would be a mistake to credit ascii entirely to bemer; aside
| from the relatively minor changes in 01967 (including making
| lowercase official), the draft was extensively revised by the
| standards committees in the years leading up to 01963, including
| dramatic improvements in the control-character set
|
| for the historical relationship between ascii character codes and
| keyboard layouts, see https://en.wikipedia.org/wiki/Bit-
| paired_keyboard
| gumby wrote:
| I wish the author had included the full ascii chart in 4 bits
| across / 4 bits down. You can mask a single bit to change case
| and that is super obvious that way.
|
| The charts that simply show you the assignments in hex and octal
| obscure the elegance of the design.
| AdamH12113 wrote:
| The third and fourth columns of the table are only a single bit
| apart from each other. If you mentally swap the first two
| columns, you get a Gray code ordering of the most significant
| bits, which is pretty close to what you're looking for.
| gumby wrote:
| Found it: https://en.wikipedia.org/wiki/ASCII#/media/File:USA
| SCII_code...
| kalleboo wrote:
| It was at some point looking at a chart like that where it also
| dawned on me where the control codes like ^D, ^H, ^[ etc came
| from
| kr2 wrote:
| I was going to ask you to please explain as I didn't
| understand, but I am guessing you are talking about the same
| thing as this comment[1] right? That's super cool
|
| https://news.ycombinator.com/item?id=41042570
| gumby wrote:
| Yes, though ^m, ^[ et al aren't so much elegant as
| coincidental; but look at A and a, for example.
|
| I found the chart I was looking for: https://en.wikipedia.o
| rg/wiki/ASCII#/media/File:USASCII_code...
| netcraft wrote:
| I've searched off and on for a great stylistic representation of
| the ASCII table, id love a poster to hang on my wall, or possibly
| even something I could get as a tattoo.
| yawl wrote:
| I also wrote a chat novel about ASCII:
| https://www.lostlanguageofthemachines.com/chapter2/chat
| userbinator wrote:
| _You might be familiar with carriage return (0D) and line feed
| (10)_
|
| You mean 0D and 0A, or 13 and 10, but that mix of base really
| stood out to me in an otherwise good article. I'm one of numerous
| others who have memorised most of the base ASCII table, and quite
| a few of the symbols as well as extended ASCII (CP437), mainly
| because it comes in handy for reading programs without needing a
| disassembler. Those who do a lot of web development may find the
| sequence 3A 2F 2F familiar too, as well as 3Ds and 3Fs.
|
| I can see the rationale for <=> being in that order, but [\\] and
| {|} are less obvious, as well as why their position is 1 column
| to the left of <=>.
| yardshop wrote:
| > You mean 0D and 0A, or 13 and 10
|
| He fixed it.
| KingOfCoders wrote:
| For everyone who doesn't need a,u,o. Or software that needs to
| take a,u,o. For everyone else, UTF is a blessing.
| bigstrat2003 wrote:
| Which, given the people who designed this and the time they
| were designing for, was most of them (and most of their
| audience). Don't confuse "this old standard doesn't adequately
| cover all cases today" with "this old standard sucked at the
| time".
| lmm wrote:
| > For everyone else, UTF is a blessing.
|
| Except people who want to use Japanese and not have it render
| weirdly, something that was easy in any internationalised
| software that used the traditional codepage system, but is
| practically impossible in Unicode-based software.
| Retr0id wrote:
| Where can I learn more about this issue?
| zokier wrote:
| https://en.wikipedia.org/wiki/Han_unification
| p_l wrote:
| Probably referring to so-called "Han unification" which
| tried to use same codepoints for different glyphs to reduce
| code space for ideograms derived from Chinese ones.
|
| But that only causes confusion because you need to provide
| external information which way to interpret them, just like
| a code page
| blahedo wrote:
| Another piece of elegance: by putting the uppercase letters in
| the block beginning at 0x40 (well, 0x41) it means that all the
| control codes at the start of the table line up with a letter (or
| one of a small set of other punctuation: @[\\]^_), giving both a
| natural shorthand visual representation and a way to enter them
| with an early keyboard, by joining the pressing of the letter
| with... the Control key. Control-M (often written ^M) is carriage
| return because carriage return is 0x0D and M is 0x4D.
| WalterBright wrote:
| Too bad we now have Unicode, an elegant castle covered with ugly
| graffiti and ramshackle addons. For example:
|
| 1. normalization
|
| 2. backwards running text (hey, why not add spiral running text?)
|
| 3. fonts
|
| 4. invisible characters
|
| 5. multiple code points with the same glyph
|
| 6. glyphs defined by multiple code points (gee, I thought Unicode
| was to get away with that mess from code pages!)
|
| 7. made up languages (Elvish? Come on!)
|
| 8. you vote for my made-up emoticon, and I'll vote for yours!
| jart wrote:
| Hey at least we got the astral planes.
| https://justine.lol/dox/unicode.txt
| Retr0id wrote:
| Language itself is a pile of ugly graffiti and ramshackle
| addons. It would be weird if Unicode _didn 't_ reflect this.
| p_l wrote:
| How to say you don't know what Unicode is for without saying
| it.
|
| 1, 2, 4, 5, 6, and, unfortunately, 8, all fall under "ability
| to encode written text from all human languages". And that
| includes historical. Some of the issues (5 & 6) are due
| semantic difference even if the resulting glyph looks the same.
| Unfortunately you can't expect programmers to understand pesky
| little thing like languages having different writing, so you
| end up with normalisation to handle the fact that one system
| sent "a + ogonek accent" and another (properly) sent "a with
| ogonek" (these print the same but are semantically different!),
| and now you need to figure out normalisation in order to be
| able to compare strings.
|
| 7. just like 8 are down to proposal of specific new forms of
| writing to add to Unicode. Elvish had one since 1997 but only
| now got a tentative "we will talk about it". Klingon, which is
| IIRC more complete language including native speakers (...weird
| things happened sometimes) does not have outside of private use
| area.
|
| Emojis were added because they were used with incompatible
| encodings first, even before unicode happened, and without
| including something like SIXEL into unicode they were
| unrepresentable (and with SIXEL would lose semantic
| information)
| news_to_me wrote:
| > "a + ogonek accent" and another (properly) sent "a with
| ogonek" (these print the same but are semantically
| different!)
|
| How can these possibly be semantically different? Isn't the
| point of combining characters to create semantic characters
| that are the combination of those parts?
| p_l wrote:
| There's a semantic difference between "accented letter" and
| "different letter that happens to visually look like
| _another language 's_ accented letter".
|
| "A" in polish is not "A" with some accent. And the idea
| behind unicode was to preserve human written text,
| including keeping track of things like "this is letter A1
| with an accent, but this is letter A2 that looks visually
| similar to A1 with accent but is different semantically".
| Of course then worries about code page size resulted in the
| stupidity of Han unification, so Unicode _is_ a bit broken.
| Dylan16807 wrote:
| Unless there's some nuance I'm missing, I think you're
| reading too much into the word "accent".
|
| Especially because the codepoint is actually called
| "Combining Ogonek".
|
| And for anyone writing in Cyrillic, it's actually more
| accurate to use the combining form, even as its own
| letter, because the only precomposed form technically
| uses a latin A.
|
| But my main point is that I do not think there is
| supposed to be any semantic difference in Unicode based
| on whether you use precomposed or decomposed code points.
| chthonicdaemon wrote:
| All languages are made up. For that matter, all glyphs are made
| up, too.
| bandie91 wrote:
| there is not only a quantitative difference between a conlang
| designed by a small group (or 1 person) and a "human"
| language developed organically in the span of centuries by
| millions of speakers, but also qualitative.
| Aardwolf wrote:
| For me it's how they inconsistently, backwards-incompatibly,
| make some existing characters outside of the emoji-plane (and
| especially when in technical/mathematical blocks) render
| colored by default, rather than keep everything colored related
| in the emoji plane (making copies if needed rather than
| affecting old character, the semantics are very different
| anyway), e.g. https://imgur.com/a/Ugi7K1i and
| https://imgur.com/a/UMppZHG
| mnau wrote:
| As someone that whose native language isn't representable
| purely by ASCII, I celebrate it. Plus the first 128 codepoints
| are same as ASCII in UT-8.
|
| Is Unicode kind of messy? Sure, but that's just natural
| consequences of writing systems being messy. Every point you
| made was for a sensible reason that is in a scope of Unicode
| mission (representing all text in all writing systems).
| RiverCrochet wrote:
| > covered with ugly graffiti and ramshackle addons
|
| Unfortunately there is plently of precendent for this
| ramshacklism. Like ACK/NAK - those are protocol signals, not
| characters! ENQ? What even is Shift In/Shift Out (SI/SO)? Then
| the database characters toward the end there FS, RS, GS, US.
|
| > backwards running text (hey, why not add spiral running
| text?)
|
| You jest, but you do have cursor positioning ANSI sequences
| which are designed to let text draw anywhere on your screen.
| And make it blink! You also don't find it weird to have a
| destructive "clear-screen" sequence?
|
| > glyphs defined by multiple code points
|
| I wonder when they started putting the slash across the 0 to
| differentiate from the O.
|
| > you vote for my made-up emoticon, and I'll vote for yours!
|
| I mean you do have the private Unicode range where you can
| actually do that. But before that, SIXEL graphics.
| kps wrote:
| > Like ACK/NAK - those are protocol signals, not characters!
|
| American Standard Code _for Information Interchange_
| saagarjha wrote:
| Unicode is quite elegant in its encoding too. If you're going
| to criticize it for its content, maybe start with talking about
| how ASCII also has invisible characters and those that people
| rarely use.
| NoMoreNicksLeft wrote:
| They're all made-up languages, some were just made-up a little
| bit more transparently.
| yencabulator wrote:
| I can't wait for when the majority of Unicode codepoints/glyphs
| are emojis that are no longer fashionable! That'll be a really
| weird relic of history, later.
| shepherdjerred wrote:
| What would be the alternative? I think Unicode is pretty great.
|
| You can pretty easily imagine a world where we had a bunch of
| different encodings with none being dominant.
| Findecanor wrote:
| > 2. backwards running text (hey, why not add spiral running
| text?)
|
| Unicode encodes code points in logical order rather than visual
| order: the order in which text is supposed to be collated and
| spoken rather than the visual order.
|
| One tricky issue is when both directions exist in the same
| text. Unicode can encode nesting of text in one direction
| within another. For example, text consisting of an English word
| and a Hebrew word can be encoded as either the English embedded
| in Hebrew or the Hebrew embedded in English: both would render
| the same but collate differently.
|
| Is there a better way?
| MisterTea wrote:
| Most of this is pretty useful for reproducing a wide gamut of
| human language. It gets completely fucked when it comes to
| fonts with png's embedded in svg's and other INSANE matryoshka
| doll nesting of bitmap/vector rendering technologies.
|
| I also half hate emoji as it pollutes human writable text with
| bitmaps that are difficult to reproduce by hand on paper with a
| writing instrument - it's not text. I say half hate as it
| allows us a standard set of icons to use that can be easily
| rendered in-line with text or on their own.
| zokier wrote:
| I think that adopting ASCII as the general purpose text encoding
| was one of the great mistakes of early computing. It originated
| as control interface for teletypes and such, and that's arguably
| where it should have remained. For storing and processing (plain)
| text ASCII doesn't really fit that well, control characters are a
| hindrance and the code space would have been useful for
| additional characters. The ASCII set of printables was definitely
| a compromise formed by the limited code space.
| kstenerud wrote:
| It's one of the greatest triumphs of early computing. Not only
| did it harmonize text representation and transmission in a
| backwards compatible manner; the fact that they deliberately
| kept it 7 bit for so long also helped for developing a sane set
| of other language character sets (ISO-8859), and paved the way
| for a smooth transition to Unicode (UTF-8) - which is now the
| dominant encoding worldwide.
| ddingus wrote:
| Yes, seconded easily
| hackit2 wrote:
| Yeah you were not around when a kb of memory took up half your
| room. Looking back it doesn't make sense but at the time a byte
| was what-ever you wanted it to be. Considering number of
| characters in English language is 26, it is reasonable for a
| byte to be 5 bits, giving you a total of 32 possible states.
| Which leaves you with 6 values which could be used as control
| characters. how-ever lets not forget there are 7,164 other
| languages of the world, and they all have their own unique way
| of doing things.
|
| Oh yeah, lets not forget that at the time you had other
| nationalistic countries/territories/people with their own
| superior technology all vying for the top position, all well
| trying to out do each other. Then you also had manipulative
| monopolies/trade embargo's and wars.
|
| It isn't perfect but people aren't perfect.
| ddingus wrote:
| No way!
|
| No amount of extra characters was going to address what Unicode
| did.
|
| ASCII was not a mistake at all. Adopting it unified what was
| surely going to be a real mess.
|
| At the time it made sense, and the control functions were
| needed. Still are.
| zokier wrote:
| > At the time it made sense, and the control functions were
| needed. Still are.
|
| Control characters were needed for terminals. They never made
| sense for text. Mixing the two matters is the problem.
| ddingus wrote:
| It isn't a problem. The text is the UX.
|
| What else would you have proposed, or would propose?
| augusto-moura wrote:
| Useful tip, on linux (not sure about other *nixes) you can view
| the ascii table by opening its manpage: man ascii
|
| It's been useful to me more than once every year, mostly to know
| about shell escape codes and when doing weird character ranges in
| regex and C.
|
| It can be a bit confusing, but the gist is that you have 2 chars
| being show in each line, I would prefer a view where you see the
| same char with shift and/or ctrl flags, but you can only ask so
| much
| dailykoder wrote:
| Damn, thanks!
|
| Why the hell did I never try this? Maybe because typing ascii
| table into my favorite search engine and clicking one of the
| first links was fast enough
| omnicognate wrote:
| I used to do that until the experience became degraded
| enough, reflecting the general state of the web, that I took
| the time to look for a better way and found `man ascii`.
| bell-cot wrote:
| Similar in FreeBSD. It has octal, hex, decimal, and binary
| ASCII tables, along with the full names of the control
| characters.
| INTPenis wrote:
| The reason I know this is because in 2004 I was squatting in an
| apartment with no TV and no internet. So each day after work I
| would go home and just read manpages for fun.
|
| Ended up learning ipfw through the firewall manpage on FreeBSD,
| and using my skills to setup and manage an IPFW at work.
|
| It's amazing how much you get done with no TV and no internet.
| Also played a lot of nethack.
| w0m wrote:
| I learned vim proper by reading :help on an eeepc while
| flying back and forth over the Atlantic alone one year.
| layer8 wrote:
| Or even simpler use the _ascii_ command, when installed:
| https://packages.debian.org/bookworm/ascii
| bodyfour wrote:
| > not sure about other *nixes
|
| Should be available on any UNIX, it was added to V7 UNIX back
| in the 1970s: https://github.com/dspinellis/unix-history-
| repo/blob/Researc...
|
| Even before that, it existed as a standalone text file
| https://github.com/dspinellis/unix-history-repo/blob/8cf2a84...
| This still exists on many systems -- for instance as
| /usr/share/misc/ascii on MacOS
| fitsumbelay wrote:
| strange: on MacOS 14.5 I get output for `man ascii` but `ascii`
| goes "command not found"
| AnimalMuppet wrote:
| On my Linux VM, it's the same, and it's because 'man ascii'
| comes from man(7), not man(1). It's not a man page for a
| program. It's just a man page.
| irrational wrote:
| Works on mac
| snvzz wrote:
| The ASCII table is defective; it is missing a dedicated code for
| newline.
|
| CR and LF aren't dedicated, and have precise cursor movement
| meanings, rather than being a logical line ender.
|
| There was a proposal in the 80s to reassigning the -otherwise
| useless- VT (vertical tab) character for the purpose.
| Unfortunately unfruitful.
| bregma wrote:
| A separate control character was not needed to indicate where
| your Hollerith string ended: it ended at the end of the
| Hollerith string. If you wanted to render a Hollerith string
| onto print media, you'd often want to feed the line and then
| return the carriage before printing the next Hollerith string.
| Of course, that wasn't strictly necessary if you were using a
| line printer, which would just print the line and advance.
|
| The filesystems I used had 5 kinds of file: random-access,
| sequential, ISAM, Fortran-carriage-control and carriage-return-
| carriage-control. The only people who used the latter were the
| eggheads that used that new-fangled C programming language
| brought over from Bell Lab's experimental Unix system.
|
| You're probably just looking for the record separator (036). If
| you are storing multiple text records and a block of memory,
| that would be the ideal ASCII code to separate them.
| Gormo wrote:
| > Unfortunately unfruitful.
|
| _Fortunately_ unfruitful, since if it had gained adoption,
| there 'd be a mix of _three_ different line endings (and
| combinations thereof) in widespread use, instead of two.
| 1vuio0pswjnm7 wrote:
| Mentioned in footnote 7:
|
| https://ia601808.us.archive.org/2/items/mackenzie-coded-char...
| red_admiral wrote:
| The "16 rows x 8 columns" version, with the lowercase letters
| added, seems the most elegant one to me because it makes the
| internal structure of the thing visible. For example, to
| lowercase a letter, you set bit 6; a decimal digit is the prefix
| 011 followed by the binary encoding of the digit etc.
|
| It also makes clear why ESC can be entered as `^[` or ENTER
| (technically CR) as `^M` on some terminals (still works in my
| xterm), because the effect of the control key is to unset bits 6
| and 7 in the original set-up.
|
| Of course you can color in the fields too, if you want.
| johanneskanybal wrote:
| Kind of hard to read something where the author considers every
| non-english languages equally worthy to emoji's.. It was good in
| the 50's but was important like 4-5 decades too long.
| georgehotelling wrote:
| Dark grey #303030 text on slightly darker grey #1B1C21 background
| is really hard to read. Maybe I'm just getting old, but I also
| assume the audience for a blog post about the ASCII table was
| born in a year that starts with 19.
| Retr0id wrote:
| The background is white on my machine, are you using some kind
| of extension to force "dark mode"?
| georgehotelling wrote:
| I'm using pi-hole, uBlock Origin, and Privacy Badger on
| Firefox. I checked my network tab before complaining and
| didn't see any resources that failed to load.
| bloak wrote:
| Vaguely related: Apart from PS and EUR, a typical GB keyboard has
| a couple of non-ASCII characters printed on it: ! and |. The key
| labelled | is usually mapped to |, but the key labelled ! often
| gives you an actual !, though I can't remember many occasions on
| which I've wanted one of them. Apparently the characters ! and |
| are in EBCDIC.
| PaulHoule wrote:
| Beats EBCDIC
|
| https://en.wikipedia.org/wiki/EBCDIC
|
| On the 4th floor of my building the computer systems lab has a
| glass front that has what looks like a punch card etched in
| frosted glass but if you look closer it was made by sticking
| stickers on the glass.
|
| I made a "punchcard decoder" on a 4x6 card to help people decode
| the message on the wall
|
| https://mastodon.social/@UP8/112836035703067309
|
| The EBCDIC code was designed to be compatible with this encoding
| which has all sorts of weird features, for instance the "/" right
| between "R" and "Z"; letters don't form a consecutive block so
| testing to see if a char is a letter is more complex than in
| ASCII.
|
| I am thinking of redoing that card to put the alphabet in order.
| A column in a punched card has between 0 to 3 punches, 0 is a
| space, 1 is a letter or a symbol in the first column, if one of
| the rows at the top is punched you combine that with the number
| of the other punched row on the left 3x9 grid. If three holes are
| punched one of them is an 8 (unless you've got one of the
| extended charsets) and you have one of the symbols in the right
| 3x6. Note the ! and C/ which are not in ASCII but are in latin-1.
| senkora wrote:
| Fun fact: sorting ASCII numerically puts all the uppercase
| letters first, followed by all the lowercase letters (ABC...
| abc...). A more typical dictionary ordering would be more like
| AaBbCc... (or to even consider A and a at the same sort level and
| only use them to break ties if the words are otherwise
| identical).
|
| The order used by ASCII is sometimes called "ASCIIbetical", which
| I think is wonderful.
|
| https://en.wiktionary.org/wiki/ASCIIbetical
| NoMoreNicksLeft wrote:
| I thought the point of that was that a single bitflip makes an
| uppercase lower, or vice versa...
| Dylan16807 wrote:
| It can't be "the" point, because AaBbCc would also let you
| use a single bit to control case, the bottom bit.
| DonHopkins wrote:
| The Apple ][ and TTYs and other old computers had "bit pairing
| keyboards", where the punctuation marks above the digits were
| aligned with the ASCII values of the corresponding digits,
| different by one bit. Typewriter: !@#$%^&*()
| Apple: !"#$%&'() Digits: 1234567890
|
| https://en.wikipedia.org/wiki/Bit-paired_keyboard
|
| >A bit-paired keyboard is a keyboard where the layout of shifted
| keys corresponds to columns in the ASCII (1963) table,
| archetypally the Teletype Model 33 (1963) keyboard. This was
| later contrasted with a typewriter-paired keyboard, where the
| layout of shifted keys corresponds to electric typewriter
| layouts, notably the IBM Selectric (1961). The difference is most
| visible in the digits row (top row): compared with mechanical
| typewriters, bit-paired keyboards remove the _ character from 6
| and shift the remaining & _() from 7890 to 6789, while
| typewriter-paired keyboards replace 3 characters: | Shift+2 from
| " to @ | Shift+6 from _ to ^ and | Shift+8 from ' to _. An
| important subtlety is that ASCII was based on mechanical
| typewriters, but electric typewriters became popular during the
| same period that ASCII was adopted, and made their own changes to
| layout.[1] Thus differences between bit-paired and (electric)
| typewriter-paired keyboards are due to the differences of both of
| these from earlier mechanical typewriters.
|
| >[...] Bit-paired keyboard layouts survive today only in the
| standard Japanese keyboard layout, which has all shifted values
| of digits in the bit-paired layout.
|
| >[...] For this reason, among others (such as ease of collation),
| the ASCII standard strove to organize the code points so that
| shifting could be implemented by simply toggling a bit. This is
| most conspicuous in uppercase and lowercase characters: uppercase
| characters are in columns 4 (100) and 5 (101), while the
| corresponding lowercase characters are in columns 6 (110) and 7
| (111), requiring only toggling the 6th bit (2nd high bit) to
| switch case; as there are only 26 letters, the remaining 6 points
| in each column were occupied by symbols or, in one case, a
| control character (DEL, in 127).
|
| >[...] In the US, bit-paired keyboards continued to be used into
| the 1970s, including on electronic keyboards like the HP 2640
| terminal (1975) and the first model Apple II computer (1977).
| aronhegedus wrote:
| Was a really fun article to read/podcast to listen to.
|
| Favorite fact is that 127 is the DEL because for hole punching it
| removes all the info. I love those little nuggets of history
| pixelbeat__ wrote:
| I wrote about ASCII and UTF-8 elegance at:
|
| https://www.pixelbeat.org/docs/utf8_programming.html
___________________________________________________________________
(page generated 2024-07-23 23:10 UTC)