[HN Gopher] Internet Standard #80: ASCII Format for Network Inte...
___________________________________________________________________
Internet Standard #80: ASCII Format for Network Interchange (1969)
Author : 1vuio0pswjnm7
Score : 53 points
Date : 2023-08-06 08:32 UTC (14 hours ago)
(HTM) web link (www.ietf.org)
(TXT) w3m dump (www.ietf.org)
| wlindley wrote:
| This also sets the standard for the names of characters.
| (Parentheses are round) [Brackets are square.] {And braces are
| 'curly.'} There are no 'round braces' and to say 'curly braces'
| is redundant.
| rmwaite wrote:
| I also use <angle brackets> to differentiate [square brackets].
| quickthrower2 wrote:
| It seems as though ASCII was completed in 1963 according to:
| http://edition.cnn.com/TECH/computing/9907/06/1963.idg/, but this
| Internet Standard (nee RFC) I suppose is saying to use it for
| "networks"
| bhaak wrote:
| Not exactly. The first edition form 1963 only specified a
| subset of modern ASCII.
|
| https://www.sensitiveresearch.com/Archive/CharCodeHist/X3.4-...
|
| Bits 6 and 7 were mostly unassigned.
|
| Obviously, lowercase characters didn't exist in 1963, they were
| invented later and that's why they added them later. The
| version from 1968 is basically what we consider to be ASCII. /s
|
| From what I remember, I think this was due to compatibility
| reasons to old teletype(?) hardware but those reservations went
| out of the windows pretty fast though.
|
| https://en.wikipedia.org/wiki/ASCII#History
| mike_hock wrote:
| > Obviously, lowercase characters didn't exist in 1963, they
| were invented later and that's why they added them later.
|
| Lowercase letters were invented in 1967 and later backported
| to cursive writing for use in hospitals to thwart Chinese
| medical espionage.
| Someone wrote:
| https://www.sensitiveresearch.com/Archive/CharCodeHist/X3.4-.
| .. has the full 1963 standard (12 jpg's)
|
| I find it interesting to read the section on considerations.
|
| They, for example, mention that COBOL is supported, but ALGOL
| isn't, explain how the standard could be extended for use for
| "European alphabets" (they could use the 5 values after Z and
| the one before A for additional letters) or "base 12 numeric
| digits" (to support pre-decimalization British monetary
| values)
| p_l wrote:
| ASCII being late with final release is why IBM S/360 and
| derived mainframes use EBCDIC - IBM planned on replacing it
| with ASCII, but the standard wasn't completed and firm yet when
| they needed it.
| 1vuio0pswjnm7 wrote:
| Although I use HTML every day, most personal documentation I save
| is not in HTML format. It is in ASCII.
|
| I have some text files that can grow quite large and one trick I
| use is to keep a less history file to help me navigate. For
| example, I can save searches and marks in a history file. The
| format is simple. .less-history-file
| .search "term1 "term2 .mark m a 1
| 46248 1.fifo m b 1 11509 1.fifo
|
| Then I use a small shell script to read the text file. Something
| like #!/bin/sh (zstd -dc
| /path/file.txt.zst > 1.fifo&)
| LESSHISTSIZE=999999999999999999999999 \
| LESSHISTFILE=/path/.lesshst.file.txt.zst \ exec less -G
| --save-marks --no-histdups 1.fifo
|
| Another aproach might be to use an editor with macros like ex/vi
| as the pager.
|
| Sometimes I see people converting text files into HTML, e.g.,
| RFCs or manpages, but in these documents there are often no
| "hyperlinks" so the HTML at best only adds enhanced appearance
| and possibly the ability to jump around within the document. At a
| cost of making them significantly larger.
|
| Instead of using markup and storing navigation information in the
| file, and using an HTML reader, this stores the information in a
| small, separate, associated text file.
| Someone wrote:
| How do you guarantee that file is ascii? I would think it
| rapidly would become non-ascii for most users, even for English
| speakers. You can't write cafe in ascii, for example, use curly
| quotes, or write a pound, euro, or yen sign
| paulddraper wrote:
| Most people write only the characters on their QUERTY
| keyboards.
|
| cafe, resume, fiance, etc
| ElectricalUnion wrote:
| But are you gonna generate 100% of such files using only
| your typing? What about Input Method Editors or copy-
| pasting?
| 1vuio0pswjnm7 wrote:
| "How do you guarantee that the file is ascii."
|
| I use an HTML reader set to 7-bit ASCII. If necessary I
| replace or eliminate non-ASCII characters I do not want. I
| often use flex or tr for this task.
|
| In the situation described in the comment, I'm not sharing
| these files with anyone else. I'm reading these files myself.
| As such, other users' preferences are irrelevant.
|
| The submission is about ASCII for "network exchange". But the
| comment replied to here is not necessarily about ASCII for
| network exchange. It's about a method used by yours truly for
| reading large ASCII files.
| evandale wrote:
| I'm not the parent, but I personally can't think of a single
| instance where I would write "cafe" instead of "cafe". I
| can't imagine needing to use a pound, euro, or yen sign
| either. I'd probably just write pound, euro, or yen if I
| needed to talk about those currencies.
|
| It's a pain to use any of those characters on every device I
| use so I don't see how any of them could get in the file if I
| were to adopt this.
| hn_throwaway_99 wrote:
| Some clarification:
|
| ASCII is not a format, it's an encoding. You are referring to
| "plain text" vs. HTML. For the rest of what you've written,
| this feels like a million times more complicated than using
| markdown.
| 1vuio0pswjnm7 wrote:
| "You are referring to "plain text" vs. HTML."
|
| The term "ASCII format" in the submission title comes from
| the IETF Internet Standard, not me. Contact the author Vint
| Cerf or the IETF with objections. Not much I can do to change
| an Internet Standard.
|
| In this comment, I never used the term "ASCII format". I used
| the term "HTML format".
|
| What is "markdown". Word play on the term "markup".
| hn_throwaway_99 wrote:
| You are misunderstanding:
|
| 1. ASCII simply defines a byte encoding to represent
| characters. E.g. UTF-8 is another character encoding. HTML
| files are often written in ASCII.
|
| 2. Markdown is a very common standard for representing
| minimal formatting in a plain text file:
| https://en.m.wikipedia.org/wiki/Markdown
| [deleted]
| gumby wrote:
| Of course the terms "Internet" wouldn't be used for another
| decade (and deployed even later) but the rfc-editor's index is
| more recent than that.
|
| There are some implications of RFC 20 right in the abstract that
| are obscure today.
|
| > use of standard 7-bit ASCII embedded in an 8 bit byte whose
| high order bit is always 0.
|
| This was very important because character sets had still not been
| standardized back then, though ASCII newer non-IBM systems tended
| to choose ASCII by default. Even then, since byte and word
| lengths varied, multiple character sets were common even on a
| single machine (e.g. both ascii and SIXBIT) so specifying the
| 8-bit byte (and IBMism IIRC) was necessary. 36-bit words were
| quite popular in research machines and back in '69 I think all
| the arpanet hosts were 36-bit PDP-10s, which supported bytes of
| width 1-36 bits. You can still see remnants of these machines in
| some older protocols.
|
| As a consequence, the FTP protocol had a "binary mode" and a
| "text mode" because you can't safely transfer binary data from
| machines with incompatible word sizes and endianness. Text mode
| guaranteed a stream of seven bit characters in the correct order.
| If you specified the wrong transfer mode you usually ended up
| with gibberish.
|
| > SRI uses "." (ASCII X'2E' or 2/14) as the end-of-line
| character, where as UCLA uses X'OD' or 0/13 (carriage return).
|
| You can see that _this_ issue also goes way back.
|
| So if you think things were better before the confusion of utf-8
| vs other representations, or were better before Windows code
| pages, well, they weren't.
| Someone wrote:
| > 36-bit PDP-10s, which supported bytes of width 1-36 bits.
|
| I thought they only supported (some) elements of bit sizes that
| divided 36, but http://www.hakmem.org/pdp-10.html#Byte thought
| me:
|
| _"In the PDP-10 a "byte" is some number of contiguous bits
| within one word. A byte pointer is a quantity (which occupies a
| whole word) which describes the location of a byte. There are
| three parts to the description of a byte: the word (i.e.,
| address) in which the byte occurs, the position of the byte
| within the word, and the length of the byte"_
| _notreallyme_ wrote:
| > the FTP protocol had a "binary mode" and a "text mode"
| because you can't safely transfer binary data from machines
| with incompatible word sizes and endianness
|
| I'm not sure to follow you here. The TYPE modes are ASCII,
| IMAGE, EBCDIC and Local byte size. The "binary mode" being
| Image and "text mode" being ASCII in current FTP clients.
|
| The IMAGE mode is the one that actually transfers file as they
| really are without any changes. The other 3 formats are
| actually used for character conversion, with an additional
| parameters to specify telnet or ASA conversion, and even
| changing the byte size in case of L.
|
| In the end, the "text mode" is the one that will tend to
| corrupt your file (thanks to the CR/LF, LF/CR, LF discrepancy
| between the 3 major desktop OS).
|
| Always choose TYPE IN (Image Non-print) by default!
| bawolff wrote:
| I think SMTP would be a more relevant example.
| gumby wrote:
| Does smtp have another mode _other_ than text?
| gumby wrote:
| Thanks for the reminder -- it's been decades since I thought
| about FTP, much less used it.
|
| While we're at it, let's not forget that it was originally
| also the transport for network mail!
| unglaublich wrote:
| I really like the formatting of the RFCs, both the text and HTML
| ones.
|
| Does anyone know what tool or formatting machinery is used for
| this?
|
| I'd like to keep personal notes in this format.
| gabrielsroka wrote:
| IETF has some tools for this. A quick search revealed
| https://github.com/ietf-tools/ietf-at
|
| See also https://www.rfc-editor.org/rse/format-faq/
| jwilk wrote:
| https://github.com/ietf-tools/rfc2html
| laurensr wrote:
| Please refer to 'rendering and converting' at
| https://authors.ietf.org/choosing-a-format-and-tools
| kragen wrote:
| this is one of the early rfcs where it would be most desirable to
| see a scan of the original document
| jwilk wrote:
| Here it is: https://www.rfc-editor.org/rfc/rfc20.pdf
| kragen wrote:
| this is wonderful, thank you
|
| an unexpected thing about this scan is that it seems to
| literally be a scan of the ansi standard for ascii
|
| like on page 4 it says 'usas x3.4-1968, revision of
| x3.4-1967' and the page headers say 'x3.4, usa standard code
| for information interchange'
|
| it seems clear that the only part by vint cerf is the first
| page
|
| so the internet was founded on piracy and copyright
| infringement from the very beginning
| Someone wrote:
| > it seems to literally be a scan of the ansi standard for
| ascii
|
| More than "seems". Page 1 says it is ("copies from USAS
| X3,4-1968")
| jwilk wrote:
| HTML version: https://www.rfc-editor.org/rfc/rfc20.html
| jwilk wrote:
| https://www.sensitiveresearch.com/Archive/CharCodeHist/#NOTE...
|
| > _From Eric Fischer: You may also be interested to know that RFC
| 20_ [...] _is, aside from minor typos and the opening paragraph,
| essentially a word-for-word copy of the X3.4-1968 standard,
| presumably without the knowledge or consent of ANSI._
___________________________________________________________________
(page generated 2023-08-06 23:02 UTC)