[HN Gopher] Plain Text Protocols
       ___________________________________________________________________
        
       Plain Text Protocols
        
       Author : tate
       Score  : 136 points
       Date   : 2021-02-25 14:28 UTC (8 hours ago)
        
 (HTM) web link (blainsmith.com)
 (TXT) w3m dump (blainsmith.com)
        
       | guerrilla wrote:
       | I think the author would like TreeNotation [1]
       | 
       | https://faq.treenotation.org/
        
       | IshKebab wrote:
       | Please avoid plain text protocols!
       | 
       | > * Simple to implement. > * Fast to parse. > * Human readable.
       | 
       | I'll give him one out of three. They are human readable. The are
       | clearly not fast to parse - binary protocols are obviously
       | faster. The biggest problem is "simple to implement".
       | 
       | That's just nonsense. Using text introduces so many ambiguities
       | and tricky edge cases around quoting, whitespace, case, line
       | endings, character encodings, escaping, etc. etc. etc. Soooo much
       | more complicated than a simple binary system (e.g. Protobuf).
       | 
       | There was a link here only recently about how Go might introduce
       | security issues by lowercasing HTTP headers. Would a binary
       | protocol have had that issue? I seriously doubt it.
       | 
       | Don't use plain text protocols!!
        
       | icy wrote:
       | While HTTP is plaintext alright, it's neither simple nor easy to
       | parse. Probably wouldn't put it in the same group as
       | statsd/InfluxDB line protocols which can be parsed in a few lines
       | of code.
        
         | je_bailey wrote:
         | I really don't understand what you mean by this. I've never
         | thought http was difficult. What is it that you find
         | problematic?
        
           | nly wrote:
           | Legacy HTTP/1.1 suffers a few issues, see the current RFC
           | errata:
           | 
           | https://www.rfc-
           | editor.org/errata_search.php?rfc=7230&rec_st...
           | 
           | There are issues particularly around how whitespace and
           | obsolete line folding should be handled
           | 
           | Various whitespace issues in node.js:
           | https://github.com/nodejs/http-
           | parser/issues?q=is%3Aissue+wh...
           | 
           | Spec clarification: https://github.com/httpwg/http-
           | core/issues/53
           | 
           | Node.js's parser was at one point replacing all white space
           | in headers with a single space character, even though until
           | recently this was non-conformant (you were only supposed to
           | do so with obs-fold). It did this so it didn't have to buffer
           | characters (since http-parser is a streaming parser).
           | 
           | It's not as trivial as a few string splits. Node.js's parser
           | is ~2,500 lines of C code.
        
       | psim1 wrote:
       | SIP and its cousin SDP are wonderfully readable plaintext
       | protocols used for VoIP. If you don't think so, have a look at
       | SIP's predecessor H.323.
        
       | kstenerud wrote:
       | Plain text protocols are:
       | 
       | - human readable
       | 
       | - good for quick protyping
       | 
       | - good for inspection while debugging
       | 
       | They are also:
       | 
       | - complicated and slow to parse
       | 
       | - more bloated than binary
       | 
       | We benefited from text protocols because we had plenty of
       | headroom: memory was cheap enough, storage was cheap enough,
       | network was cheap enough, power was cheap enough for our use
       | cases. But that's not quite so true anymore when you have to
       | scale to support many connected systems and handle lots of data.
       | The honeymoon's almost over.
       | 
       | These are some of the reasons why I'm building https://concise-
       | encoding.org
        
         | slezyr wrote:
         | > They are also:
         | 
         | - Get funny when you need to pass binary data.
        
           | bombcar wrote:
           | They also all eventually encode control data as text, which
           | then causes errors with parsing some data that coincidentally
           | has those same control characters in it.
           | 
           | Just look at the garble URLs you see sometimes, more percent
           | signs than a Macy's sale.
        
         | oarsinsync wrote:
         | > We benefited from text protocols because we had plenty of
         | headroom: memory was cheap enough, storage was cheap enough,
         | network was cheap enough, power was cheap enough for our use
         | cases. But that's not quite so true anymore when you have to
         | scale to support many connected systems and handle lots of
         | data. The honeymoon's almost over.
         | 
         | Sorry, but memory, storage and network are all orders of
         | magnitude cheaper today than when most of these text protocols
         | were originally developed.
         | 
         | We have significantly more capacity today than we did back
         | then. Thats why we waste all that headroom on reinventing
         | everything in javascript.
        
         | bullen wrote:
         | - complicated and slow to parse
         | 
         | - more bloated than binary
         | 
         | Not neccessarily, you can write fast compact protocols with
         | text... sending integer and float data as text is not the
         | bottleneck in any system:
         | 
         | See my root comment!
        
           | kstenerud wrote:
           | Floating point numbers are INCREDIBLY complicated to scan
           | (and print).
           | 
           | https://code.woboq.org/userspace/glibc/stdio-
           | common/vfscanf-...
           | 
           | Compare that with reading 4 bytes directly into an ieee754
           | float32.
           | 
           | If your messages are short, the benefits of fast codecs are
           | outweighed by the inefficiencies in the communication system
           | (most of your time, processing power and bandwidth are taken
           | up by setting up and tearing down the communication medium).
           | If it takes 7 "administration" packets to send 1 data packet,
           | your codec won't be your bottleneck (in which case you
           | probably don't care about efficiency anyway, and this
           | discussion is not for you).
        
             | bullen wrote:
             | It's not that bad, maybe 10x of nothing.
             | 
             | There are much bigger fish to fry when building a large
             | network solution, most prominently getting the thing to be
             | debuggable on live machines!
        
               | GoblinSlayer wrote:
               | >large network solution
               | 
               | You basically restrict networking to big monopolists,
               | like Google, but Google likes binary protocols, like
               | grpc, http2 and quic. And if you have a bug in a complex
               | parser, having text won't help debuggability, because the
               | bug is not in text.
        
         | magila wrote:
         | They also tend to become interop nightmares due to having an
         | incomplete/ambiguous/non-existent grammar. Different
         | implementations end up implementing slightly different rules
         | which ends up requiring an accumulation of hacks to parse all
         | the variations that end up in the wild.
        
           | kQq9oHeAz6wLLS wrote:
           | Found the healthcare IT guy.
           | 
           | This is the HL7 spec's issue. Everyone interprets the spec
           | slightly differently. It's given rise to the interface
           | engine, which are a type of very powerful software that sits
           | between systems and makes things work properly, which is why
           | I love them.
        
       | blainsmith wrote:
       | Thanks for sharing this post.
        
       | jacques_chester wrote:
       | Plaintext protocols are textbook case of a negative externality.
       | Programmers who work on them capture value, but impose higher
       | costs (vs a well-suited binary protocol) for parsing,
       | serialising, transmission, storage, computation etc onto everyone
       | else.
        
         | jandrese wrote:
         | Binary protocols are their own negative externality as well.
         | They can be much harder to debug, requiring specialized tooling
         | that may also have its own bugs. They can also suffer from
         | things like insufficiently specified byte ordering issues and
         | differences in floating point behaviour between systems.
         | 
         | I know of at least one game that very close but not quite cross
         | compatible between Windows and Linux due to differences in the
         | way the floating point numbers are handled. People think they
         | can just sweep all of that parsing complexity under the rug by
         | just reading the bytes into their memory structures, but it
         | comes back to bite them in the end.
        
           | jacques_chester wrote:
           | > _requiring specialized tooling that may also have its own
           | bugs_
           | 
           | The common ones all have Wireshark plugins. So I'm not sure
           | what's special.
           | 
           | > _They can also suffer from things like insufficiently
           | specified byte ordering issues and differences in floating
           | point behaviour between systems ... I know of at least one
           | game that very close but not quite cross compatible between
           | Windows and Linux..._
           | 
           | I think this shows I did a poor job of explaining myself. I
           | don't mean that everyone should create their own binary
           | encoding. I'm saying that you should pick a well-known, well-
           | supported encoding like protobufs, Avro, CBOR, flatbuffers
           | ...
           | 
           | There are about a dozen strong contenders, all of which have
           | tooling, library support and the ability to ROFLstomp
           | plaintext on any measure of burden other than yours or mine.
        
         | andrewflnr wrote:
         | Interesting perspective, but considering how much everyone
         | seems to want software, I think we have to say that the commons
         | also captures a lot of the same value from plain text that the
         | programmers do. That might work out to less of a negative
         | externality than just a trade-off for everyone, especially when
         | you consider the positive effects on the commons of the network
         | effects that text makes easier.
        
           | jacques_chester wrote:
           | Interesting counterpoint. It would come down to how things
           | net out. And it won't be stable over time. I'm still of the
           | view that programmers systematically overvalue their
           | convenience, because it's a value/cost tradeoff that they
           | directly experience.
        
             | andrewflnr wrote:
             | Fair. I do think the long term goal should be a compact
             | binary format with equal or better tooling as plain text.
             | Goodness knows there are enough formats and structural
             | editors out there, so in principle we only need to
             | standardize on one, but it seems none of them are actually
             | quite good enough yet.
        
               | jacques_chester wrote:
               | I am generally of the view that Avro is Good Enough for
               | most things that plaintext is used for and is pretty
               | well-supported.
               | 
               | Arrow looks very promising for cases where fast raw data
               | shipment is the goal.
        
               | andrewflnr wrote:
               | The tooling, though, we need the ubiquitous tooling. But
               | that's not really a technical problem. :P Maybe when I
               | pitch my hat in the structural editor ring I'll try to do
               | an Avro editor.
        
       | leastsquare wrote:
       | I also wonder if _letsencrypt_ will be dropped at some point so
       | the SSL mafia can start squeezing everyone who has been shamed
       | into using SSL.
        
       | evmar wrote:
       | My favorite plain text protocol is HTML server sent events,
       | within HTTP. It's really trivial to make a server produce these
       | -- it's just some simple newline-delimited printfs() to the
       | socket -- and they manifest client-side as simple event objects.
       | 
       | https://html.spec.whatwg.org/multipage/server-sent-events.ht...
        
         | jeffbee wrote:
         | Don't you have to deal with the problem of having a double
         | newline in the field? Any time the value has a newline you have
         | to restart with the field name, colon, and space. So it's not
         | quite trivial to produce.
        
         | jacques_chester wrote:
         | They _are_ simple, but it 's very slow, in my experience.
        
       | AceJohnny2 wrote:
       | Tangentially, I'm a bit surprised that we've completely dropped
       | the ASCII separator control characters: 28/0x1C FS (File
       | Separator), 29/0x1D GS (Group Separator), 30/0x1E RS (Record
       | Separator), 21/0x1F US (Unit Separator).
       | 
       | It's a pity, because I usually see Space (32/0x20) as the record
       | separator, which I suppose is convenient because it works well in
       | a standard text editor, but it does mean we've built up decades
       | of habit/trauma-avoidance about avoiding spaces in names,
       | replacing them with underscores (_) or dashes (-)...
        
         | AceJohnny2 wrote:
         | BTW, at least in a Unix terminal you can input the separator
         | characters using Ctl with another char (because terminals
         | inherit from the time when modifier keys just set/unset bits in
         | the input), so:
         | 
         | - 28/0x1C FS = Ctl-\
         | 
         | - 29/0x1D GS = Ctl-]
         | 
         | - 30/0x1E RS = Ctl-^
         | 
         | - 31/0x1F US = Ctrl-_
        
       | u801e wrote:
       | POP, IMAP and NNTP are also plain text protocols. What's
       | interesting about SMTP as well as NNTP is that the data phase in
       | the former as well as the post phase in the latter allow for all
       | ascii characters to be transmitted as is without any issue other
       | than NUL. The period needs to be escaped in certain cases such
       | that a CRLF.CRLF doesn't prematurely end the data phase or
       | article post. Clients actually emply "dot-stuffing" to address
       | that case, meaning that any line that starts with a period is
       | modified such that it starts with 2 periods before being
       | transmitted to the server.
       | 
       | When a client receives the email or article, it will remove the
       | extra period so that the lines start with a single period.
        
       | unnouinceput wrote:
       | None mention JSON? Alright, then I'll do it.
        
       | nly wrote:
       | Plain text protocols, unless they can be expressed in, say, 3
       | grammar rules are almost always more pain than they're worth.
       | 
       | These days just go and use a flexible binary option like
       | protobufs, flatbuffers, avro, etc.
        
       | Pinus wrote:
       | The obvious risk with plain-text protocols is that you don't
       | write a rigorous spec, and don't write a strict parser, but some
       | hack up least-effort thing with a few string.split() and
       | whatever. This means there is a lot of slack in what is actually
       | accepted, and unless you are in full control of both ends of the
       | protocol, that slack will be taken advantage of, and unless you
       | are more powerful than whoever is at the other end (which you
       | aren't, if they're your clients and you are not Google or
       | Facebook), you have to support it forever. So write plain-text
       | protocols if you like, but make sure to have a rigorous spec, and
       | a parser with the persicketyness of... I don't know.
        
         | rini17 wrote:
         | That works until next time someone considers the strictness of
         | the existing protocol too overwhelming or complicated and
         | invents a new "simpler" one.
        
           | GoblinSlayer wrote:
           | TCP is strict. If you want something rich, you just run
           | another protocol on top of the strict one. There can be many
           | layers: TCP -> TLS -> HTTP -> JSON -> Base64 -> AES -> PNG ->
           | Deflate -> Picture.
        
         | mumblemumble wrote:
         | I'm thinking here of "standards" like csv and mbox that are
         | almost impossible to handle with 100% reliability if you don't
         | control all the programs that are producing them. It can get
         | even worse with some niche products. I used to work with a
         | piece of legal software that defined its own text format, and
         | had a nasty habit of exporting files that it couldn't import.
         | There was a defined spec, but it was riddled with ambiguities.
         | 
         | I'm coming to think that, when it comes to text formats, it's
         | LL(k) | GTFO.
        
           | [deleted]
        
         | wtetzner wrote:
         | Don't you have the same problem with ill-specified binary
         | protocols?
        
       | hermanradtke wrote:
       | "Plaintext" is ASCII binary that is overwhelmingly English. The
       | reason people like plaintext is that we have the tooling to see
       | bits of the protocol as it comes over the wire. If we had good
       | tooling for other protocols, then the barrier to entry would be
       | lower as well.
        
         | hamburglar wrote:
         | I disagree that tooling makes up for the lack of human
         | readability in a binary protocol. One of the reasons text-based
         | protocols are so convenient to debug is that you can generally
         | still read them when one side is screwing up the protocol.
         | tcpdump: "oh, there's my problem" Custom analyzer: "protocol
         | error"
        
           | hermanradtke wrote:
           | Pretend for a moment that HTTP used
           | https://en.wikipedia.org/wiki/Esperanto instead of English.
           | You would need tooling to translate Esperanto to English.
        
             | hamburglar wrote:
             | Yes. Please feel free to assume that everywhere I say
             | "plain text," I mean "plain text that is not intentionally
             | obfuscated." I apologize for not being clear.
        
             | wtetzner wrote:
             | Would that really cause a problem in determining if the
             | text being sent is well-formed?
             | 
             | Having GET, POST, PUT, etc. and header names be in another
             | language wouldn't prevent you from determining the well-
             | formedness of the text.
        
         | gambler wrote:
         | ASCII has a built-in markup language and a processing control
         | protocol that most people aren't even aware of and most tools
         | out there don't support. This is significant. Look at the parts
         | that are used and parts that aren't. What is the difference
         | between them?
        
           | colejohnson66 wrote:
           | I think the big reason the ASCII C0 characters never took off
           | was because you can't _see or type_ them.[a] If I'm writing a
           | spreadsheet by hand (like CSV /TSV), I have dedicated keys
           | for the separators (comma and tab keys). I don't have those
           | for the C0 ones. I don't even think there's Alt-### codes for
           | them.
           | 
           | [a]: Regarding "seeing" them, Notepad++ has a nifty feature
           | where it'll show the control characters' names in a black
           | box[0]
           | 
           | [0]: https://superuser.com/questions/942074/what-does-stx-
           | soh-and...
        
           | specialist wrote:
           | Heh. I used those control characters to embed a full text
           | editor within AutoCAD on MS-DOS. Back in the day. Mostly
           | because someone bet me it couldn't be done.
        
           | Robotbeat wrote:
           | I don't know. Can you tell me? ;)
        
             | TeMPOraL wrote:
             | The bits that aren't used don't correspond to printable
             | characters :).
        
             | tdeck wrote:
             | I assume the parent is referring to the various control
             | characters like "START OF HEADING", "START OF TEXT",
             | "RECORD SEPARATOR", etc... I haven't seen most of these
             | used for their original control purpose but they date back
             | a long way:
             | 
             | https://ascii.cl/control-characters.htm
        
             | colejohnson66 wrote:
             | The "C0" block (U+0000 through U+001F)
             | https://en.wikipedia.org/wiki/C0_and_C1_control_codes
             | 
             | They're almost never used in practice however.
        
         | danaliv wrote:
         | Yes, exactly. I love binary protocols/formats. Plain text
         | formats are wasteful, and difficult (or at least annoying) to
         | implement with any consistency. But you really do need a
         | translation layer to make binary formats reasonable to work
         | with as a developer. There are very good reasons why we prefer
         | to work with text: we have a text input device on our
         | computers, and our brains are chock full of words with firmly
         | associated meanings. We don't have a binary input device, nor
         | do we come preloaded with associations between, say, the number
         | 4 and the concepts of "end" and "headers." (0x4 is END_HEADERS
         | in HTTP/2.)
         | 
         | Once you have the tools in place, working with binary formats
         | is as easy as working with plaintext ones.
         | 
         | Of course making these tools takes work. Not much work, but
         | work. And it's the kind of work most people are allergic to:
         | up-front investment for long-term gains. With text you get the
         | instant gratification of all your tools working out of the box.
         | 
         | I don't think I'd go so far as to say that plain text is junk
         | food, but it's close. It definitely clogs arteries. :)
        
         | CivBase wrote:
         | > "Plaintext" is ASCII binary that is overwhelmingly English.
         | 
         | I don't see any reason why "plaintext" must be limited to
         | ASCII. Many "plaintext" protocols support Unicode, including
         | the ones listed in this article. Some protocols use human
         | language (as you said, overwhelmingly English), but many do
         | not. There is nothing inherent about plaintext which
         | necessitates the use of English.
         | 
         | > The reason people like plaintext is that we have the tooling
         | to see bits of the protocol as it comes over the wire. If we
         | had good tooling for other protocols, then the barrier to entry
         | would be lower as well.
         | 
         | I disagree.
         | 
         | Humans have used text as the most ubiquitous protocol for
         | storing and transferring arbitrary information since ancient
         | times. Some other protocols have been developed for specific
         | purposes (eg traffic symbols, hazard icons, charts, or whatever
         | it is IKEA does in their assembly instructions), but none of
         | have matched text in terms of accessibility or practicality for
         | conveying arbitrary information.
         | 
         | I think your statement misrepresents the relationship between
         | tool quality and the ubiquity of the protocol. Text has,
         | throughout most of recorded human history, been the most useful
         | and effective mechanism for transferring arbitrary information
         | from one human to another. Text isn't so ubiquitous because our
         | tooling for it is good; _our tooling for text is good because
         | it is so ubiquitous._
         | 
         | Text is accessible to anyone who can see and is supplemented by
         | other protocols for those who can't (eg braille, spoken
         | language, morse code). It is relatively compact and precise
         | compared to other media like pictures, audio, or video. It is
         | easily extended with additional glyphs and adapted for various
         | languages. There's just nothing that holds a candle to text
         | when it comes to encoding arbitrary information.
        
           | waynesonfire wrote:
           | There is nothing inherently human readable about plain text.
           | It's still unreadable bits, just like any other binary
           | protocol. The benefits of plain text are the ubiquitous tools
           | that allow us to interact with the format.
           | 
           | It would be interesting to think about what set of tools
           | gives 80% of the plain text benefit. Is it cat? grep? wc?
           | API?. Most programming languages I know of can read a text
           | file and turn it into a string, that's nice. The benefit of
           | this analysis would be that when developing a binary
           | protocol, it will be evident the support tools that need to
           | be developed to provide plenty of value.
           | 
           | I'm not afraid of binary protocols as long as there is
           | tooling to interact with the data. And if those tools are
           | available, I prefer binary protocols for it's efficiency.
        
             | divbzero wrote:
             | > _I 'm not afraid of binary protocols as long as there is
             | tooling to interact with the data._
             | 
             | I agree with this premise but would also note how long it
             | takes for such tooling to become widespread. Even UTF-8
             | took awhile to become universal -- I recall fiddling with
             | it on the command line as recently as Windows 7 (code page
             | 1252 and the like).
        
             | CivBase wrote:
             | > There is nothing inherently human readable about plain
             | text. It's still unreadable bits, just like any other
             | binary protocol. The benefits of plain text are the
             | ubiquitous tools that allow us to interact with the format.
             | 
             | You seem to have glossed over my whole point about how the
             | ubiquity of text is what drives good tooling for it, not
             | the other way around. Text is not a technology created for
             | computers. It has been a ubiquitous information protocol
             | for millennia.
             | 
             | > I'm not afraid of binary protocols as long as there is
             | tooling to interact with the data. And if those tools are
             | available, I prefer binary protocols for it's efficiency.
             | 
             | I'm not afraid of binary protocols either and there are
             | good reasons to use them. The most common reason is that
             | they can be purpose-built to support much greater
             | information density. However, purpose-built protocols
             | require purpose-built tools and are, by their very nature,
             | limited in application. Therefore, purpose-built protocols
             | will never be as well supported as general-purpose
             | protocols like text.
             | 
             | That isn't to say that purpose-built protocols are never
             | supported well enough to be preferable over text. Images,
             | audio, video, databases, programs, and many other types of
             | information are usually stored in well-supported, purpose-
             | built, binary protocols.
        
         | andrewmcwatters wrote:
         | I don't think we necessarily even need good tooling for other
         | protocols, we just need good binary analysis tooling that
         | visualizes any binary buffer.
         | 
         | I don't know of a single good app that exists for that.
        
           | nick__m wrote:
           | Wireshark is somewhat usefull if the protocols in the binary
           | blob are supported.
           | 
           | First, the buffer must be converted to ascii hex, then the
           | following procedure is used to import it: https://www.wiresha
           | rk.org/docs/wsug_html_chunked/ChIOImportS...
        
           | wtetzner wrote:
           | That's because you need to know the format to know how to
           | interpret it. Otherwise the best you can really do is use a
           | hex editor.
           | 
           | Or are you suggesting a tool that lets you easily specify the
           | binary format? I'm pretty sure there are some that exist.
        
       | zby wrote:
       | OK - but what is Plain Text? Is it ASCII or maybe should it also
       | include UTF8 or other Unicode encodings? What is the difference
       | between the bits that form HTTP/1 and the bits that form a binary
       | protocol like HTTP/2?
        
       | jpalomaki wrote:
       | These look easy, but I don't think they always are.
       | 
       | Think about CSV for example. Looks simple to create and parse. In
       | reality these simple implementations will give you lot of
       | headache when they don't handle edge cases (escaping, linefeeds
       | etc).
        
         | u801e wrote:
         | If they only used the ASCII record separator instead of a
         | comma.
        
           | SahAssar wrote:
           | You would still need to be able to handle escaping, right?
           | Otherwise you couldn't have a string with a record separator
           | within a CSV column.
        
           | leejoramo wrote:
           | I always preferred tabs to CVS, but how I wish our industry
           | had made use of the ASCII record separator character. How
           | many hours would I and my teams have saved in the last 20
           | years?
        
           | edflsafoiewq wrote:
           | Then you couldn't type it in a text editor.
        
             | u801e wrote:
             | I guess it depends on a text editor. Mapping a key to
             | insert that character is one possible solution.
        
             | jeffbee wrote:
             | Standard unix-style input accepts ctrl-^ as ASCII RS and
             | ctrl-_ as ASCII US. If you want your terminal to accept an
             | ASCII US literally -- so that you can use it as the -t
             | argument to sort, for example -- you would use ctrl-v
             | ctrl-_ to give it the literal character.                 $
             | hd       ^^^_       00000000  1e 1f 0a           00000003
             | $ sort -t "^_" -k 2,2n       a^_42       z^_5       n^_7
             | z5       n7       a42
        
         | nightcracker wrote:
         | The problem with CSV isn't that it's text based, the problem is
         | that "CSV" isn't a file format with an authoritative
         | description.
        
           | jandrese wrote:
           | There is a RFC (4180) for CSV, but the truth of the matter is
           | that there are thousands of parsers written to whatever spec
           | the author thought up.
           | 
           | In the real world the entire spec is contained in its three
           | word name.
           | 
           | In the end I think the simplicity was also a weakness.
           | Because the spec is so minimal a programmer goes "Oh, I'll
           | just write my own, no problem", where a more complex protocol
           | would have necessitated finding some exisiting library
           | instead. Whatever the author of that library did would become
           | the defacto standard and there would be less incompatibility
           | between files.
        
             | sgtnoodle wrote:
             | That reminds me of the time I needed to parse some XML and
             | ended up writing my own parser...
        
       | rasengan wrote:
       | IRC is also a simple plaintext protocol. It's simplicity helped
       | an entire generation of programmers to appear.
        
         | stagas wrote:
         | IRC was the first protocol I implemented back in the days. It's
         | so simple you don't even need a client, telnet is enough to get
         | you chatting. I miss this directness.
        
       | dexen wrote:
       | HTTP/2 is a good example of how to handle textual-to-binary
       | transition.
       | 
       | The original HTTP/1 was textual (if a bit convoluted), and that
       | helped it to become a lingua franca; helped cooperation and data
       | interchange between applications proliferate; everybody was able
       | to proverbially "scratch his itch". Gradually tooling grew up
       | around it too.
       | 
       | The HTTP/2 is binary, and also quite complex - however most of
       | that is hidden behind the already established tooling, and from
       | developer's perspective, the protocol is seen as "improved HTTP/1
       | with optional extras". The protocol _appears_ textual for all
       | developmental intents and purposes - because the transition was
       | handled (mostly) smoothly in the tooling. The key APIs remained
       | the same for quick, mundane use cases - and got extended for
       | advanced use cases.
       | 
       | There's a lesson in that having been a success. Unpopular opinion
       | warning: contrast the success of HTTP/2 with the failure of IPv6
       | to maintain backward compatibility at the _API_ level - which
       | hampered its ability to be seamlessly employed in applications.
        
         | cesarb wrote:
         | > contrast the success of HTTP/2 with the failure of IPv6 to
         | maintain backward compatibility at the API level
         | 
         | Unfortunately, it was not possible for IPv6 to maintain
         | backward compatibility with IPv4 at the API level. That's
         | because the IPv4 API was not textual; it was binary, with
         | fixed-size 32-bit fields everywhere. What they did was the next
         | best thing: they made the IPv6 API able to also use IPv4
         | addresses, so that new programs can use a single API for both
         | IPv4 and IPv6.
        
           | fulafel wrote:
           | The API compatibility is pretty far down the list of
           | bottlenecks with IPv6. There was some churn related to it 20
           | years ago.
        
         | gambler wrote:
         | _> however most of that is hidden behind the already
         | established tooling_
         | 
         | ...and so everyone without Google-level funding is stuck with
         | this tooling. Thus, control over the protocol that millions of
         | people could use to communicate with one another directly (by
         | making websites and possibly servers) is ceded to a handful of
         | centralized authorities that can handle the complexity and also
         | happen to benefit from the new features.
         | 
         | I remember how Node when it was just rising in popularity was
         | usually demonstrated by writing a primitive HTTP server that
         | served a "hello world" HTTP page. There were no special
         | libraries involved, so it was super-easy to understand what's
         | going on. We're moving away from able to do things of this sort
         | without special tooling and almost no one seems to notice or
         | care.
        
           | SahAssar wrote:
           | Node has had a built in HTTP server since v0.1.17, are you
           | sure those examples didn't use that? Because if they did then
           | it was the same in those examples as it is now.
           | 
           | Source:
           | https://nodejs.org/api/http.html#http_class_http_server
        
           | espadrine wrote:
           | > _I remember how Node when it was just rising in popularity
           | was usually demonstrated by writing a primitive HTTP server
           | that served a "hello world" HTTP page._
           | 
           | That is still possible in the exact same way.
           | 
           | But a toy is just a toy. All websites should encrypt their
           | content with TLS. In fact, all protocols should encrypt their
           | communications. The result? Sure, it is a binary stream of
           | random-looking bits.
           | 
           | Yet to me, what matters about text protocols is not the ASCII
           | encoding. It is the ability to read and edit the raw
           | representation.
           | 
           | As long as your protocol has an unambiguous one-to-one
           | textual representation with two-way conversion, I can inspect
           | it and modify it with no headache.
           | 
           | An outstanding example of that is WASM, which converts to and
           | from WAT: https://en.wikipedia.org/wiki/WebAssembly#Code_repr
           | esentatio...
        
             | LocalH wrote:
             | >All websites should encrypt their content with TLS. In
             | fact, all protocols should encrypt their communications.
             | 
             | I reject the notion that encryption should be _mandatory_
             | for _all websites_. It should be best practice, especially
             | for a  "modern" website with millions of users, but we
             | don't need _every single website_ encrypted.
        
               | miohtama wrote:
               | While I agree with you, it is best to be on the safe
               | side. The damage from having a wrong website unencrypted
               | could be massive vs. cost of simply encrypting
               | everything. Demanding 100% encryption is an extra layer
               | to protect against human mistakes.
        
               | LocalH wrote:
               | Demanding 100% encryption also locks out some
               | retrocomputing hardware that had existing browsers in the
               | early Internet days. Not all sites _need_ encryption.
               | Where it 's appropriate, most certainly. HTTPS should be
               | the overwhelming standard. But there is a place for HTTP,
               | and there should _always_ be. Same for other unencrypted
               | protocols. Unencrypted FTP still has a place.
        
               | SahAssar wrote:
               | HTTP/FTP certainly have their place, but that is not on
               | the open internet. For retro computing and otherwise
               | special cases a proxy on the local network can do
               | HTTP->HTTPS conversion.
        
               | MayeulC wrote:
               | You can always uses a MITM proxy that presents an
               | unencrypted view of the web. As long as you keep to
               | HTML+CSS, that should be enough. Some simple js also, but
               | you can't generate https URLs on the client side. Which,
               | for retrocomputing, is probably fine.
               | 
               | You wouldn't want to expose these "retro" machines to the
               | Internet anyways.
        
               | pwdisswordfish0 wrote:
               | Indeed. When it comes to technology, I think resiliency
               | and robustness in general should trump almost all other
               | concerns.
               | 
               | It would be nice if HTTP were extended to accommodate the
               | inverse of the Upgrade header. Something to signal to the
               | server something like, "Please, I insist. I really need
               | you to _just_ serve me the content in clear text. I have
               | my reasons. " The server would of course be free to sign
               | the response.
        
           | tasogare wrote:
           | If you care about protocol simplicity and their afferant
           | implementation costs, then the continuously creeping Web
           | platform it a few magnitudes worse in this respect.
        
         | contravariant wrote:
         | After all I've read on HTTP/2 I'm still not entirely sure what
         | problem it is trying to solve.
        
           | nbm wrote:
           | The main benefit is multiplexing - being able to use the same
           | connection for multiple transactions at the same time. This
           | can have benefits in finding and keeping the congestion
           | window at its calculated maximum size, reduce connection-
           | related start-up, as well as overcome waiting for a
           | currently-used connection to be free if you have a max
           | connection per server model.
           | 
           | The other potential benefits were priorities and server-
           | initiated push, but both I'd say largely went unused and/or
           | were too much trouble to use. Priorities were redesigned in
           | HTTP 3 - more at https://blog.cloudflare.com/adopting-a-new-
           | approach-to-http-... - and Chrome recently decided push in
           | HTTP 2 wasn't worth keeping around -
           | https://www.ctrl.blog/entry/http2-push-chromium-
           | deprecation....
           | 
           | HTTP 2's main problem is head-of-line blocking in TCP -
           | basically, if you lose a packet, you wait until you get that
           | packet and acknowledge a maximum amount of packets thereafter
           | - slowing the connection down. With multiplexing, this means
           | that a bunch of in-flight transactions, as well as
           | potentially future ones, are blocked at the same time. With
           | multiple TCP connections, you don't have this problem of a
           | dropped packet affecting multiple transactions.
           | 
           | HTTP 3 has many more benefits - basically, all the benefits
           | of multiplexing without the head of line blocking (instead,
           | only that stream is affected), as well as ability to
           | negotiate alternative congestion control algorithms when
           | client TCP stacks don't support newer ones - or come with bad
           | defaults. And the future is bright for non-HTTP and non-
           | reliable streams as well over QUIC, the transport HTTP 3 is
           | built on.
        
             | contravariant wrote:
             | Right, all this kind of feels as if HTTP/2 is trying to
             | solve transport layer problems in the application layer.
             | Especially if you leave out the server initiated push. I
             | can't really pretend to know much about this but I can't
             | say I'm surprised that this causes problems when the
             | underlying transport-layer protocol is trying to solve the
             | same problem.
             | 
             | So is it correct to view HTTP/3 as basically taking a step
             | back and just running HTTP over a different transport-layer
             | protocol (QUIC)? (If so I think the name is a bit
             | confusing, HTTP over QUIC would be much clearer)
        
               | mumblemumble wrote:
               | It was originally called HTTP over QUIC, and got renamed
               | to HTTP/3 in order to avoid some other confusion.
               | 
               | https://en.wikipedia.org/wiki/HTTP/3#History
        
               | cwp wrote:
               | That's true, but the transport layer has ossified, and
               | the application layer is the only place we can still
               | innovate. RIP SCTP.
        
               | SahAssar wrote:
               | HTTP/2 is what you do if you're confined to using TCP.
               | HTTP/3 is what you get if you use UDP to solve the same
               | problems (and new problems discovered by trying it over
               | TCP).
        
         | bullen wrote:
         | HTTP/2.0 has TCP head-of-line issues, in practice that
         | nullifies it's usefulness!
         | 
         | HTTP/1.1 is much more balanced and simple, and as I said all
         | over this topic the bottleneck is elsewhere!
        
         | rectang wrote:
         | > _Unpopular opinion warning: contrast the success of HTTP /2
         | with the failure of IPv6 to maintain backward compatibility at
         | the API level - which hampered its ability to be seamlessly
         | employed in applications._
         | 
         | Unpopular?
         | 
         | The gratuitous breaking of backwards compatibility by IPv6,
         | inflicting staggering inefficiencies felt directly or
         | indirectly by all internet users, should be a canonical case
         | study by now. It should be taught to all engineering students
         | as a cautionary tale: never, ever do this.
        
           | convolvatron wrote:
           | i'm fairly sympathetic here - except part of the blame should
           | really be on the socket layer and resolver interface. if they
           | had been a bit better at modelling multiprotocol networks,
           | this kind of transition would have been easier.
        
       | superkuh wrote:
       | Plain text protocols are in serious danger as the (confused)
       | desire for TLS- _only_ everywhere spreads with the best
       | intentions. The problem is that the security TLS- _only_ brings
       | to protocols like HTTP(s) also brings with it massive
       | centralization in cert authorities which provide single points of
       | technical, social, and political failure.
       | 
       | If the TLS-everywhere people succeed in their misguided cargo-
       | cult efforts to kill of HTTP and other plain text connections
       | everywhere, if browsers make HTTPS only default, then the web
       | will lose even more of it's distributed nature.
       | 
       | But it's not only the web (HTTP) that is under attack from the
       | centralization of TLS, even SMTP with SMTPS might fall to it
       | eventually. Right now you can self sign on your mailserver and it
       | works just fine. But I imagine "privacy" advocates will want to
       | exchange distributed security of centralized privacy there soon
       | too.
       | 
       | TLS is great. I love it and it has real, important uses. But TLS
       | _only_ is terrible for the internet. We need plain text protocols
       | too. HTTP+HTTPS for life.
        
         | elric wrote:
         | Mixing trust and encryption that resulted in centralized TLS
         | was probably a design flaw. Certificate pinning in DNS is an
         | attractive "fix", but moves the problem up a layer. But DNS is
         | already centralized, so there's that.
         | 
         | > Right now you can self sign on your mailserver and it works
         | just fine
         | 
         | Well .. sort of. Until you have to interact with google or ms
         | mail servers. After an hour of wondering why your mails are
         | getting blackholed, one starts to reconsider one's life
         | choices.
        
         | s_gourichon wrote:
         | You can talk plain text protocol through a TLS or SSL-encrypted
         | connection, even interactively.
         | 
         | Example:                       { echo GET / HTTP/1.0 ; echo ;
         | sleep 1 ; } | openssl s_client -connect www.google.com:443
         | 
         | Or just :                       openssl s_client -connect
         | www.google.com:443
         | 
         | then type interactively GET / HTTP/1.0 then press enter twice.
        
           | tkot wrote:
           | Using openssl s_client -ign_eof makes piping text a bit
           | easier because connection won't be closed prematurely (so you
           | don't need to use sleep 1)
        
         | minitoar wrote:
         | I sort of thought TLS everywhere was more about encryption than
         | authentication.
        
           | bombcar wrote:
           | If it was only authentication then they'd be perfectly fine
           | with unsigned certs. But they're not.
        
         | jl6 wrote:
         | Gemini uses TLS and it is common practice for Gemini clients to
         | use self-signed certificates and TOFU. No dependency on
         | centralized CAs.
        
           | elric wrote:
           | TOFU seems to work pretty well for SSH. AFAIK not many people
           | actively verify host fingerprints on first use. It doesn't
           | protect against MITM attacks on the first connection, but I
           | wonder if that's not a case of better being the enemy of good
           | to some extent?
        
             | nucleardog wrote:
             | The high value targets are much more spread about with SSH
             | than with HTTP. Finding a place where you could inject
             | yourself between, for example, a user ssh'ing into a
             | banking service and the banking service is going to be
             | difficult. Just blindly MITM'ing a bunch of users at a
             | coffee shop will probably get you little to nothing of any
             | real value.
             | 
             | And because SSH is rarely used for the public to connect in
             | to services it's a lot easier to add additional layers of
             | security on top. Most valuable targets won't even be
             | exposed in the first place or will have a VPN or some other
             | barrier that would prevent the attack anyway.
             | 
             | From the HTTP end though, it's easy to narrow down
             | "valuable" targets--there are like 5 main banks in my
             | country. They're, by design, meant to be connected to by
             | the public so there are no additional layers of security
             | implemented. If you set up in a coffee shop for a day
             | there's a pretty reasonable chance you'd find at least one
             | or two people that had just bought a new device or were
             | otherwise logging in for the first time that you could nab.
             | 
             | You'd also run into the issue of what to do when sites
             | needed to update their certificates for various reasons. If
             | the SSH host key changes it's pretty easy to communicate
             | that out-of-band within a company to let people know to
             | expect it. If a website's certificate changes what do we
             | do? We end up training users to just blindly click through
             | the "this website's certificate has changed!" warning and
             | we're back to effectively zero protection.
        
               | zozbot234 wrote:
               | > If you set up in a coffee shop for a day there's a
               | pretty reasonable chance you'd find at least one or two
               | people that had just bought a new device
               | 
               | Sure, but it's easy to protect against this - just
               | connect to the same service via a different endpoint and
               | check that both endpoints get the same certificate. AIUI
               | this is how the EFF SSL observatory detects MITM attacks
               | in the wild, and similar approaches could be used to make
               | TOFU-to-a-popular-service a lot more resilient, at least
               | wrt. most attacks.
        
         | taywrobel wrote:
         | There's a difference between the transport and the protocol.
         | For instance I've used Redis fronted by TLS in the past.
         | Initial connection did get more tricky for sure, needing to
         | have the certs in place to first connect.
         | 
         | However after the connection was established with OpenSSL I was
         | able to run all the usual commands in plain text and read all
         | the responses in plain text. Having transport layer encryption
         | on the TCP connection didn't effect the protocol itself at all.
        
         | bullen wrote:
         | Don't worry, the day HTTP is deprecated is the day civilization
         | is over.
        
       | InfiniteRand wrote:
       | It would be good to have a standard checklist of edge cases to
       | handle with plaintext protocol design. Anyone know of one?
       | 
       | I'm thinking along the lines of:
       | 
       | 1. Control characters
       | 
       | 2. Whitespace normalization
       | 
       | 3. Newline normalization
       | 
       | 4. Options for compression
       | 
       | 5. Escaping characters significant to the protocol
       | 
       | 6. Encoding characters outside of the normal character range
       | 
       | 7. Dealing with ambiguous characters (not really an issue with
       | strict ASCII)
       | 
       | 8. Internationalization (which is intertwined with the previous
       | items)
       | 
       | 9. Dealing with invalid characters
       | 
       | I'm not saying plain text doesn't have its advantages, I'm just
       | saying there are issues you need to consider.
        
         | chubot wrote:
         | Some comments on those issues here:
         | 
         | https://www.arp242.net/the-art-of-unix-programming/#id290742...
         | 
         | https://lobste.rs/s/rzhxyk/plain_text_protocols#c_5vx4ez
        
         | [deleted]
        
       | bullen wrote:
       | HTTP is unbeatable when you remove optional headers, not because
       | of bandwidth;
       | 
       | but because there are robust servers that can multi-thread joint-
       | memory access with non-blocking IO and atomic concurrency.
       | 
       | I use comet-stream for real-time 3D Action MMO data, so I have my
       | own text based protocol wrapped in 2x sockets HTTP:
       | 
       | F.ex. message =
       | "move|<session>|<x>,<y>,<z>|<x>,<y>,<z>,<w>|walk":
       | 
       | ### push (client -> server):                 "GET /push?data=" +
       | message + " HTTP/1.1\r\nHost: my.host.name",
       | 
       | then you get back                 "200 OK\r\nContent-Length:
       | <length>\r\n\r\n<content>". In this case "0\r\n\r\n".
       | 
       | ### pull (server -> client):
       | 
       | Just one request pulls infinite response chunks:
       | "GET /pull HTTP/1.1\r\nHost: my.host.name\r\nAccept: text/event-
       | stream"
       | 
       | then you get back                 "200 OK\r\nTransfer-Encoding:
       | chunked\r\n\r\n".            while(true) {
       | <hex_length> + "\r\n"              "data:" + message +
       | "\n\n\r\n\r\n"            }
       | 
       | Simple and efficient!
       | 
       | Text- + Event- based protocols over HTTP way outscale Binary- +
       | Tick- based ones for compressed (as in averaged, not zipped)
       | real-time data.
        
       | mattlondon wrote:
       | Don't forget Gopher!
       | 
       | Orignal: https://tools.ietf.org/html/rfc1436
       | 
       | Gopher+: https://github.com/gopher-protocol/gopher-plus
       | 
       | I feel like there is a lot of potential to "rejuvenate" Gopher
       | somewhat in today's internet. No javascript, no cookies, no ads,
       | no auto-play videos etc.
       | 
       | There are some nice new modern GUIs like https://gophie.org that
       | are cross platform and modern.
       | 
       | Fun fact: Redis (yes - _that_ Redis) just added Gopher protocol
       | support (https://redis.io/topics/gopher)
        
         | hliyan wrote:
         | I sometimes wonder if the web would have been a better place
         | had CSS and everything but the most basic HTML tags didn't
         | exist.
        
         | tdeck wrote:
         | But also no hyperlinks, no mixing text with images, and no
         | unicode support in the menus.
        
           | mattlondon wrote:
           | I think there is probably a lot of mileage that can be gained
           | serving markdown over gopher with embedded gopher://
           | hyperlinks and images and utf8 and everything else markdown
           | supports via gopher itself. Gopher0 already has sort-of
           | support for HTML file types so this would not be such a wild
           | divergence from the original design. Not serving HTML
           | provides some basic guarantees (no JavaScript, no tracking
           | pixelsnetc)
           | 
           | Gopher+ allows for quite flexible (albeit clunky) attributes
           | for menu items so I can imagine an attribute on a menu
           | directing compatible browsers to the markdown version of the
           | menu, but allowing old clients to just view the traditional
           | menu. This kinda relegates the gopher menu to a sort of old
           | school directory listing type things we used to see in HTTP,
           | but there is room for some fanciness via gopher+ to style
           | menus themselves if browsers support that too!
           | 
           | All of this is possible in Gopher+ if clients support it
           | (...and there is an appetite for it). Perhaps we need some
           | sort of "agreement"/Python-PIP-style thing to define sets of
           | common Gopher+ attributes for all of this sort of thing.
        
         | jl6 wrote:
         | I hope you are aware of Gemini which aims to do exactly that?
        
       | andrewmcwatters wrote:
       | What's sort of interesting is that there aren't too many
       | overwhelming reasons why someone couldn't come up with a piece of
       | software that autodetected a binary format and translated it to
       | something readable in a GUI.
       | 
       | I mean we know what the binary layout is of things, so I never
       | understood (outside of the time that it would take to build such
       | a utility) why I've never been able to find something that says,
       | "Oh yeah, that binary string contains three doubles, separated by
       | a NULL character, with an int, followed by a UTF-8 compatible
       | string."
       | 
       | Such a tool would be incredibly useful for reverse engineering
       | proprietary formats, and yet I don't know of a good one, so if it
       | exists it's at least obscure enough for it to have escaped my
       | knowledge for well over a decade.
        
         | nograpes wrote:
         | There is a command-line program called "file" that attempts to
         | determine the file type (format). It uses a series of known
         | formats and returns the first matching one. I have found it
         | useful to reverse engineer proprietary formats.
        
           | andrewmcwatters wrote:
           | Yeah, but that's for known formats.
           | 
           | If I said I have a buffer of 512 bytes and piped it through
           | to some cli, that would be fine if it could tell me how many
           | ints, chars, floats, doubles, compressed bits of data,
           | CRC32s, UTF-8 strings, etc. it contained, but there's few
           | utilities out there that will do that.
        
             | magmastonealex wrote:
             | I'm curious how you'd propose doing that.
             | 
             | If I give you a buffer of 5 bytes:
             | 
             | [0x68 0x65 0x6c 0x6c 0x6f]
             | 
             | there are a ton of ways to interpret that.
             | - The ascii string "hello"         - 5 single-byte integers
             | - 2 two-byte integers and 0x6c as a delimiter         - 1
             | four-byte integer and ending in the char "o"         - 1
             | 32-bit float, and one single-byte integer
             | 
             | etc. Or are you hoping for something that will provide you
             | with all the possible combinations? That would produce
             | pages of output for any decently-sized binary blob.
        
               | andrewmcwatters wrote:
               | I'm sort of looking for something that will attempt to
               | narrow down possibilities. The way I'd do it is by
               | providing some visualizations based on the user selecting
               | what data types and lengths they're looking for.
               | 
               | So for instance, if I know I'm looking at triangle data,
               | I can guess that it's probably compressed, ask the app to
               | decompress the data based on some common compression
               | types, look at that data and guess that I'm looking at
               | some floats or doubles.
               | 
               | Maybe I'm wrong, so then I can ask the app to search for
               | other data types at that point.
               | 
               | To me, that would be a tremendous help over my experience
               | with existing hex editors.
               | 
               | Edit: It's not fair for me to say there aren't tools that
               | do exactly this, but to be more precise, a decent user
               | experience is lacking in most cases.
        
               | stack_underflow wrote:
               | Your post reminded me of the presentation on cantor.dust:
               | https://sites.google.com/site/xxcantorxdustxx/
               | https://www.youtube.com/watch?v=4bM3Gut1hIk - Christopher
               | Domas The future of RE Dynamic Binary Visualization
               | (very interesting presentation)
               | 
               | Looks like there's even been a recently open sourced
               | plugin for Ghidra released by Battelle:
               | https://github.com/Battelle/cantordust
        
       ___________________________________________________________________
       (page generated 2021-02-25 23:02 UTC)