hngopher.com

       [HN Gopher] Internet Standard #80: ASCII Format for Network Inte...
       ___________________________________________________________________
        
       Internet Standard #80: ASCII Format for Network Interchange (1969)
        
       Author : 1vuio0pswjnm7
       Score  : 53 points
       Date   : 2023-08-06 08:32 UTC (14 hours ago)
        
 (HTM) web link (www.ietf.org)
 (TXT) w3m dump (www.ietf.org)
        
       | wlindley wrote:
       | This also sets the standard for the names of characters.
       | (Parentheses are round) [Brackets are square.] {And braces are
       | 'curly.'} There are no 'round braces' and to say 'curly braces'
       | is redundant.
        
         | rmwaite wrote:
         | I also use <angle brackets> to differentiate [square brackets].
        
       | quickthrower2 wrote:
       | It seems as though ASCII was completed in 1963 according to:
       | http://edition.cnn.com/TECH/computing/9907/06/1963.idg/, but this
       | Internet Standard (nee RFC) I suppose is saying to use it for
       | "networks"
        
         | bhaak wrote:
         | Not exactly. The first edition form 1963 only specified a
         | subset of modern ASCII.
         | 
         | https://www.sensitiveresearch.com/Archive/CharCodeHist/X3.4-...
         | 
         | Bits 6 and 7 were mostly unassigned.
         | 
         | Obviously, lowercase characters didn't exist in 1963, they were
         | invented later and that's why they added them later. The
         | version from 1968 is basically what we consider to be ASCII. /s
         | 
         | From what I remember, I think this was due to compatibility
         | reasons to old teletype(?) hardware but those reservations went
         | out of the windows pretty fast though.
         | 
         | https://en.wikipedia.org/wiki/ASCII#History
        
           | mike_hock wrote:
           | > Obviously, lowercase characters didn't exist in 1963, they
           | were invented later and that's why they added them later.
           | 
           | Lowercase letters were invented in 1967 and later backported
           | to cursive writing for use in hospitals to thwart Chinese
           | medical espionage.
        
           | Someone wrote:
           | https://www.sensitiveresearch.com/Archive/CharCodeHist/X3.4-.
           | .. has the full 1963 standard (12 jpg's)
           | 
           | I find it interesting to read the section on considerations.
           | 
           | They, for example, mention that COBOL is supported, but ALGOL
           | isn't, explain how the standard could be extended for use for
           | "European alphabets" (they could use the 5 values after Z and
           | the one before A for additional letters) or "base 12 numeric
           | digits" (to support pre-decimalization British monetary
           | values)
        
         | p_l wrote:
         | ASCII being late with final release is why IBM S/360 and
         | derived mainframes use EBCDIC - IBM planned on replacing it
         | with ASCII, but the standard wasn't completed and firm yet when
         | they needed it.
        
       | 1vuio0pswjnm7 wrote:
       | Although I use HTML every day, most personal documentation I save
       | is not in HTML format. It is in ASCII.
       | 
       | I have some text files that can grow quite large and one trick I
       | use is to keep a less history file to help me navigate. For
       | example, I can save searches and marks in a history file. The
       | format is simple.                  .less-history-file
       | .search        "term1        "term2        .mark         m a 1
       | 46248 1.fifo         m b 1 11509 1.fifo
       | 
       | Then I use a small shell script to read the text file. Something
       | like                  #!/bin/sh             (zstd -dc
       | /path/file.txt.zst > 1.fifo&)
       | LESSHISTSIZE=999999999999999999999999 \
       | LESSHISTFILE=/path/.lesshst.file.txt.zst \        exec less -G
       | --save-marks --no-histdups 1.fifo
       | 
       | Another aproach might be to use an editor with macros like ex/vi
       | as the pager.
       | 
       | Sometimes I see people converting text files into HTML, e.g.,
       | RFCs or manpages, but in these documents there are often no
       | "hyperlinks" so the HTML at best only adds enhanced appearance
       | and possibly the ability to jump around within the document. At a
       | cost of making them significantly larger.
       | 
       | Instead of using markup and storing navigation information in the
       | file, and using an HTML reader, this stores the information in a
       | small, separate, associated text file.
        
         | Someone wrote:
         | How do you guarantee that file is ascii? I would think it
         | rapidly would become non-ascii for most users, even for English
         | speakers. You can't write cafe in ascii, for example, use curly
         | quotes, or write a pound, euro, or yen sign
        
           | paulddraper wrote:
           | Most people write only the characters on their QUERTY
           | keyboards.
           | 
           | cafe, resume, fiance, etc
        
             | ElectricalUnion wrote:
             | But are you gonna generate 100% of such files using only
             | your typing? What about Input Method Editors or copy-
             | pasting?
        
           | 1vuio0pswjnm7 wrote:
           | "How do you guarantee that the file is ascii."
           | 
           | I use an HTML reader set to 7-bit ASCII. If necessary I
           | replace or eliminate non-ASCII characters I do not want. I
           | often use flex or tr for this task.
           | 
           | In the situation described in the comment, I'm not sharing
           | these files with anyone else. I'm reading these files myself.
           | As such, other users' preferences are irrelevant.
           | 
           | The submission is about ASCII for "network exchange". But the
           | comment replied to here is not necessarily about ASCII for
           | network exchange. It's about a method used by yours truly for
           | reading large ASCII files.
        
           | evandale wrote:
           | I'm not the parent, but I personally can't think of a single
           | instance where I would write "cafe" instead of "cafe". I
           | can't imagine needing to use a pound, euro, or yen sign
           | either. I'd probably just write pound, euro, or yen if I
           | needed to talk about those currencies.
           | 
           | It's a pain to use any of those characters on every device I
           | use so I don't see how any of them could get in the file if I
           | were to adopt this.
        
         | hn_throwaway_99 wrote:
         | Some clarification:
         | 
         | ASCII is not a format, it's an encoding. You are referring to
         | "plain text" vs. HTML. For the rest of what you've written,
         | this feels like a million times more complicated than using
         | markdown.
        
           | 1vuio0pswjnm7 wrote:
           | "You are referring to "plain text" vs. HTML."
           | 
           | The term "ASCII format" in the submission title comes from
           | the IETF Internet Standard, not me. Contact the author Vint
           | Cerf or the IETF with objections. Not much I can do to change
           | an Internet Standard.
           | 
           | In this comment, I never used the term "ASCII format". I used
           | the term "HTML format".
           | 
           | What is "markdown". Word play on the term "markup".
        
             | hn_throwaway_99 wrote:
             | You are misunderstanding:
             | 
             | 1. ASCII simply defines a byte encoding to represent
             | characters. E.g. UTF-8 is another character encoding. HTML
             | files are often written in ASCII.
             | 
             | 2. Markdown is a very common standard for representing
             | minimal formatting in a plain text file:
             | https://en.m.wikipedia.org/wiki/Markdown
        
       | [deleted]
        
       | gumby wrote:
       | Of course the terms "Internet" wouldn't be used for another
       | decade (and deployed even later) but the rfc-editor's index is
       | more recent than that.
       | 
       | There are some implications of RFC 20 right in the abstract that
       | are obscure today.
       | 
       | > use of standard 7-bit ASCII embedded in an 8 bit byte whose
       | high order bit is always 0.
       | 
       | This was very important because character sets had still not been
       | standardized back then, though ASCII newer non-IBM systems tended
       | to choose ASCII by default. Even then, since byte and word
       | lengths varied, multiple character sets were common even on a
       | single machine (e.g. both ascii and SIXBIT) so specifying the
       | 8-bit byte (and IBMism IIRC) was necessary. 36-bit words were
       | quite popular in research machines and back in '69 I think all
       | the arpanet hosts were 36-bit PDP-10s, which supported bytes of
       | width 1-36 bits. You can still see remnants of these machines in
       | some older protocols.
       | 
       | As a consequence, the FTP protocol had a "binary mode" and a
       | "text mode" because you can't safely transfer binary data from
       | machines with incompatible word sizes and endianness. Text mode
       | guaranteed a stream of seven bit characters in the correct order.
       | If you specified the wrong transfer mode you usually ended up
       | with gibberish.
       | 
       | > SRI uses "." (ASCII X'2E' or 2/14) as the end-of-line
       | character, where as UCLA uses X'OD' or 0/13 (carriage return).
       | 
       | You can see that _this_ issue also goes way back.
       | 
       | So if you think things were better before the confusion of utf-8
       | vs other representations, or were better before Windows code
       | pages, well, they weren't.
        
         | Someone wrote:
         | > 36-bit PDP-10s, which supported bytes of width 1-36 bits.
         | 
         | I thought they only supported (some) elements of bit sizes that
         | divided 36, but http://www.hakmem.org/pdp-10.html#Byte thought
         | me:
         | 
         |  _"In the PDP-10 a "byte" is some number of contiguous bits
         | within one word. A byte pointer is a quantity (which occupies a
         | whole word) which describes the location of a byte. There are
         | three parts to the description of a byte: the word (i.e.,
         | address) in which the byte occurs, the position of the byte
         | within the word, and the length of the byte"_
        
         | _notreallyme_ wrote:
         | > the FTP protocol had a "binary mode" and a "text mode"
         | because you can't safely transfer binary data from machines
         | with incompatible word sizes and endianness
         | 
         | I'm not sure to follow you here. The TYPE modes are ASCII,
         | IMAGE, EBCDIC and Local byte size. The "binary mode" being
         | Image and "text mode" being ASCII in current FTP clients.
         | 
         | The IMAGE mode is the one that actually transfers file as they
         | really are without any changes. The other 3 formats are
         | actually used for character conversion, with an additional
         | parameters to specify telnet or ASA conversion, and even
         | changing the byte size in case of L.
         | 
         | In the end, the "text mode" is the one that will tend to
         | corrupt your file (thanks to the CR/LF, LF/CR, LF discrepancy
         | between the 3 major desktop OS).
         | 
         | Always choose TYPE IN (Image Non-print) by default!
        
           | bawolff wrote:
           | I think SMTP would be a more relevant example.
        
             | gumby wrote:
             | Does smtp have another mode _other_ than text?
        
           | gumby wrote:
           | Thanks for the reminder -- it's been decades since I thought
           | about FTP, much less used it.
           | 
           | While we're at it, let's not forget that it was originally
           | also the transport for network mail!
        
       | unglaublich wrote:
       | I really like the formatting of the RFCs, both the text and HTML
       | ones.
       | 
       | Does anyone know what tool or formatting machinery is used for
       | this?
       | 
       | I'd like to keep personal notes in this format.
        
         | gabrielsroka wrote:
         | IETF has some tools for this. A quick search revealed
         | https://github.com/ietf-tools/ietf-at
         | 
         | See also https://www.rfc-editor.org/rse/format-faq/
        
         | jwilk wrote:
         | https://github.com/ietf-tools/rfc2html
        
         | laurensr wrote:
         | Please refer to 'rendering and converting' at
         | https://authors.ietf.org/choosing-a-format-and-tools
        
       | kragen wrote:
       | this is one of the early rfcs where it would be most desirable to
       | see a scan of the original document
        
         | jwilk wrote:
         | Here it is: https://www.rfc-editor.org/rfc/rfc20.pdf
        
           | kragen wrote:
           | this is wonderful, thank you
           | 
           | an unexpected thing about this scan is that it seems to
           | literally be a scan of the ansi standard for ascii
           | 
           | like on page 4 it says 'usas x3.4-1968, revision of
           | x3.4-1967' and the page headers say 'x3.4, usa standard code
           | for information interchange'
           | 
           | it seems clear that the only part by vint cerf is the first
           | page
           | 
           | so the internet was founded on piracy and copyright
           | infringement from the very beginning
        
             | Someone wrote:
             | > it seems to literally be a scan of the ansi standard for
             | ascii
             | 
             | More than "seems". Page 1 says it is ("copies from USAS
             | X3,4-1968")
        
       | jwilk wrote:
       | HTML version: https://www.rfc-editor.org/rfc/rfc20.html
        
       | jwilk wrote:
       | https://www.sensitiveresearch.com/Archive/CharCodeHist/#NOTE...
       | 
       | > _From Eric Fischer: You may also be interested to know that RFC
       | 20_ [...] _is, aside from minor typos and the opening paragraph,
       | essentially a word-for-word copy of the X3.4-1968 standard,
       | presumably without the knowledge or consent of ANSI._
        
       ___________________________________________________________________
       (page generated 2023-08-06 23:02 UTC)