[HN Gopher] Plain Text Protocols
___________________________________________________________________
Plain Text Protocols
Author : tate
Score : 136 points
Date : 2021-02-25 14:28 UTC (8 hours ago)
(HTM) web link (blainsmith.com)
(TXT) w3m dump (blainsmith.com)
| guerrilla wrote:
| I think the author would like TreeNotation [1]
|
| https://faq.treenotation.org/
| IshKebab wrote:
| Please avoid plain text protocols!
|
| > * Simple to implement. > * Fast to parse. > * Human readable.
|
| I'll give him one out of three. They are human readable. The are
| clearly not fast to parse - binary protocols are obviously
| faster. The biggest problem is "simple to implement".
|
| That's just nonsense. Using text introduces so many ambiguities
| and tricky edge cases around quoting, whitespace, case, line
| endings, character encodings, escaping, etc. etc. etc. Soooo much
| more complicated than a simple binary system (e.g. Protobuf).
|
| There was a link here only recently about how Go might introduce
| security issues by lowercasing HTTP headers. Would a binary
| protocol have had that issue? I seriously doubt it.
|
| Don't use plain text protocols!!
| icy wrote:
| While HTTP is plaintext alright, it's neither simple nor easy to
| parse. Probably wouldn't put it in the same group as
| statsd/InfluxDB line protocols which can be parsed in a few lines
| of code.
| je_bailey wrote:
| I really don't understand what you mean by this. I've never
| thought http was difficult. What is it that you find
| problematic?
| nly wrote:
| Legacy HTTP/1.1 suffers a few issues, see the current RFC
| errata:
|
| https://www.rfc-
| editor.org/errata_search.php?rfc=7230&rec_st...
|
| There are issues particularly around how whitespace and
| obsolete line folding should be handled
|
| Various whitespace issues in node.js:
| https://github.com/nodejs/http-
| parser/issues?q=is%3Aissue+wh...
|
| Spec clarification: https://github.com/httpwg/http-
| core/issues/53
|
| Node.js's parser was at one point replacing all white space
| in headers with a single space character, even though until
| recently this was non-conformant (you were only supposed to
| do so with obs-fold). It did this so it didn't have to buffer
| characters (since http-parser is a streaming parser).
|
| It's not as trivial as a few string splits. Node.js's parser
| is ~2,500 lines of C code.
| psim1 wrote:
| SIP and its cousin SDP are wonderfully readable plaintext
| protocols used for VoIP. If you don't think so, have a look at
| SIP's predecessor H.323.
| kstenerud wrote:
| Plain text protocols are:
|
| - human readable
|
| - good for quick protyping
|
| - good for inspection while debugging
|
| They are also:
|
| - complicated and slow to parse
|
| - more bloated than binary
|
| We benefited from text protocols because we had plenty of
| headroom: memory was cheap enough, storage was cheap enough,
| network was cheap enough, power was cheap enough for our use
| cases. But that's not quite so true anymore when you have to
| scale to support many connected systems and handle lots of data.
| The honeymoon's almost over.
|
| These are some of the reasons why I'm building https://concise-
| encoding.org
| slezyr wrote:
| > They are also:
|
| - Get funny when you need to pass binary data.
| bombcar wrote:
| They also all eventually encode control data as text, which
| then causes errors with parsing some data that coincidentally
| has those same control characters in it.
|
| Just look at the garble URLs you see sometimes, more percent
| signs than a Macy's sale.
| oarsinsync wrote:
| > We benefited from text protocols because we had plenty of
| headroom: memory was cheap enough, storage was cheap enough,
| network was cheap enough, power was cheap enough for our use
| cases. But that's not quite so true anymore when you have to
| scale to support many connected systems and handle lots of
| data. The honeymoon's almost over.
|
| Sorry, but memory, storage and network are all orders of
| magnitude cheaper today than when most of these text protocols
| were originally developed.
|
| We have significantly more capacity today than we did back
| then. Thats why we waste all that headroom on reinventing
| everything in javascript.
| bullen wrote:
| - complicated and slow to parse
|
| - more bloated than binary
|
| Not neccessarily, you can write fast compact protocols with
| text... sending integer and float data as text is not the
| bottleneck in any system:
|
| See my root comment!
| kstenerud wrote:
| Floating point numbers are INCREDIBLY complicated to scan
| (and print).
|
| https://code.woboq.org/userspace/glibc/stdio-
| common/vfscanf-...
|
| Compare that with reading 4 bytes directly into an ieee754
| float32.
|
| If your messages are short, the benefits of fast codecs are
| outweighed by the inefficiencies in the communication system
| (most of your time, processing power and bandwidth are taken
| up by setting up and tearing down the communication medium).
| If it takes 7 "administration" packets to send 1 data packet,
| your codec won't be your bottleneck (in which case you
| probably don't care about efficiency anyway, and this
| discussion is not for you).
| bullen wrote:
| It's not that bad, maybe 10x of nothing.
|
| There are much bigger fish to fry when building a large
| network solution, most prominently getting the thing to be
| debuggable on live machines!
| GoblinSlayer wrote:
| >large network solution
|
| You basically restrict networking to big monopolists,
| like Google, but Google likes binary protocols, like
| grpc, http2 and quic. And if you have a bug in a complex
| parser, having text won't help debuggability, because the
| bug is not in text.
| magila wrote:
| They also tend to become interop nightmares due to having an
| incomplete/ambiguous/non-existent grammar. Different
| implementations end up implementing slightly different rules
| which ends up requiring an accumulation of hacks to parse all
| the variations that end up in the wild.
| kQq9oHeAz6wLLS wrote:
| Found the healthcare IT guy.
|
| This is the HL7 spec's issue. Everyone interprets the spec
| slightly differently. It's given rise to the interface
| engine, which are a type of very powerful software that sits
| between systems and makes things work properly, which is why
| I love them.
| blainsmith wrote:
| Thanks for sharing this post.
| jacques_chester wrote:
| Plaintext protocols are textbook case of a negative externality.
| Programmers who work on them capture value, but impose higher
| costs (vs a well-suited binary protocol) for parsing,
| serialising, transmission, storage, computation etc onto everyone
| else.
| jandrese wrote:
| Binary protocols are their own negative externality as well.
| They can be much harder to debug, requiring specialized tooling
| that may also have its own bugs. They can also suffer from
| things like insufficiently specified byte ordering issues and
| differences in floating point behaviour between systems.
|
| I know of at least one game that very close but not quite cross
| compatible between Windows and Linux due to differences in the
| way the floating point numbers are handled. People think they
| can just sweep all of that parsing complexity under the rug by
| just reading the bytes into their memory structures, but it
| comes back to bite them in the end.
| jacques_chester wrote:
| > _requiring specialized tooling that may also have its own
| bugs_
|
| The common ones all have Wireshark plugins. So I'm not sure
| what's special.
|
| > _They can also suffer from things like insufficiently
| specified byte ordering issues and differences in floating
| point behaviour between systems ... I know of at least one
| game that very close but not quite cross compatible between
| Windows and Linux..._
|
| I think this shows I did a poor job of explaining myself. I
| don't mean that everyone should create their own binary
| encoding. I'm saying that you should pick a well-known, well-
| supported encoding like protobufs, Avro, CBOR, flatbuffers
| ...
|
| There are about a dozen strong contenders, all of which have
| tooling, library support and the ability to ROFLstomp
| plaintext on any measure of burden other than yours or mine.
| andrewflnr wrote:
| Interesting perspective, but considering how much everyone
| seems to want software, I think we have to say that the commons
| also captures a lot of the same value from plain text that the
| programmers do. That might work out to less of a negative
| externality than just a trade-off for everyone, especially when
| you consider the positive effects on the commons of the network
| effects that text makes easier.
| jacques_chester wrote:
| Interesting counterpoint. It would come down to how things
| net out. And it won't be stable over time. I'm still of the
| view that programmers systematically overvalue their
| convenience, because it's a value/cost tradeoff that they
| directly experience.
| andrewflnr wrote:
| Fair. I do think the long term goal should be a compact
| binary format with equal or better tooling as plain text.
| Goodness knows there are enough formats and structural
| editors out there, so in principle we only need to
| standardize on one, but it seems none of them are actually
| quite good enough yet.
| jacques_chester wrote:
| I am generally of the view that Avro is Good Enough for
| most things that plaintext is used for and is pretty
| well-supported.
|
| Arrow looks very promising for cases where fast raw data
| shipment is the goal.
| andrewflnr wrote:
| The tooling, though, we need the ubiquitous tooling. But
| that's not really a technical problem. :P Maybe when I
| pitch my hat in the structural editor ring I'll try to do
| an Avro editor.
| leastsquare wrote:
| I also wonder if _letsencrypt_ will be dropped at some point so
| the SSL mafia can start squeezing everyone who has been shamed
| into using SSL.
| evmar wrote:
| My favorite plain text protocol is HTML server sent events,
| within HTTP. It's really trivial to make a server produce these
| -- it's just some simple newline-delimited printfs() to the
| socket -- and they manifest client-side as simple event objects.
|
| https://html.spec.whatwg.org/multipage/server-sent-events.ht...
| jeffbee wrote:
| Don't you have to deal with the problem of having a double
| newline in the field? Any time the value has a newline you have
| to restart with the field name, colon, and space. So it's not
| quite trivial to produce.
| jacques_chester wrote:
| They _are_ simple, but it 's very slow, in my experience.
| AceJohnny2 wrote:
| Tangentially, I'm a bit surprised that we've completely dropped
| the ASCII separator control characters: 28/0x1C FS (File
| Separator), 29/0x1D GS (Group Separator), 30/0x1E RS (Record
| Separator), 21/0x1F US (Unit Separator).
|
| It's a pity, because I usually see Space (32/0x20) as the record
| separator, which I suppose is convenient because it works well in
| a standard text editor, but it does mean we've built up decades
| of habit/trauma-avoidance about avoiding spaces in names,
| replacing them with underscores (_) or dashes (-)...
| AceJohnny2 wrote:
| BTW, at least in a Unix terminal you can input the separator
| characters using Ctl with another char (because terminals
| inherit from the time when modifier keys just set/unset bits in
| the input), so:
|
| - 28/0x1C FS = Ctl-\
|
| - 29/0x1D GS = Ctl-]
|
| - 30/0x1E RS = Ctl-^
|
| - 31/0x1F US = Ctrl-_
| u801e wrote:
| POP, IMAP and NNTP are also plain text protocols. What's
| interesting about SMTP as well as NNTP is that the data phase in
| the former as well as the post phase in the latter allow for all
| ascii characters to be transmitted as is without any issue other
| than NUL. The period needs to be escaped in certain cases such
| that a CRLF.CRLF doesn't prematurely end the data phase or
| article post. Clients actually emply "dot-stuffing" to address
| that case, meaning that any line that starts with a period is
| modified such that it starts with 2 periods before being
| transmitted to the server.
|
| When a client receives the email or article, it will remove the
| extra period so that the lines start with a single period.
| unnouinceput wrote:
| None mention JSON? Alright, then I'll do it.
| nly wrote:
| Plain text protocols, unless they can be expressed in, say, 3
| grammar rules are almost always more pain than they're worth.
|
| These days just go and use a flexible binary option like
| protobufs, flatbuffers, avro, etc.
| Pinus wrote:
| The obvious risk with plain-text protocols is that you don't
| write a rigorous spec, and don't write a strict parser, but some
| hack up least-effort thing with a few string.split() and
| whatever. This means there is a lot of slack in what is actually
| accepted, and unless you are in full control of both ends of the
| protocol, that slack will be taken advantage of, and unless you
| are more powerful than whoever is at the other end (which you
| aren't, if they're your clients and you are not Google or
| Facebook), you have to support it forever. So write plain-text
| protocols if you like, but make sure to have a rigorous spec, and
| a parser with the persicketyness of... I don't know.
| rini17 wrote:
| That works until next time someone considers the strictness of
| the existing protocol too overwhelming or complicated and
| invents a new "simpler" one.
| GoblinSlayer wrote:
| TCP is strict. If you want something rich, you just run
| another protocol on top of the strict one. There can be many
| layers: TCP -> TLS -> HTTP -> JSON -> Base64 -> AES -> PNG ->
| Deflate -> Picture.
| mumblemumble wrote:
| I'm thinking here of "standards" like csv and mbox that are
| almost impossible to handle with 100% reliability if you don't
| control all the programs that are producing them. It can get
| even worse with some niche products. I used to work with a
| piece of legal software that defined its own text format, and
| had a nasty habit of exporting files that it couldn't import.
| There was a defined spec, but it was riddled with ambiguities.
|
| I'm coming to think that, when it comes to text formats, it's
| LL(k) | GTFO.
| [deleted]
| wtetzner wrote:
| Don't you have the same problem with ill-specified binary
| protocols?
| hermanradtke wrote:
| "Plaintext" is ASCII binary that is overwhelmingly English. The
| reason people like plaintext is that we have the tooling to see
| bits of the protocol as it comes over the wire. If we had good
| tooling for other protocols, then the barrier to entry would be
| lower as well.
| hamburglar wrote:
| I disagree that tooling makes up for the lack of human
| readability in a binary protocol. One of the reasons text-based
| protocols are so convenient to debug is that you can generally
| still read them when one side is screwing up the protocol.
| tcpdump: "oh, there's my problem" Custom analyzer: "protocol
| error"
| hermanradtke wrote:
| Pretend for a moment that HTTP used
| https://en.wikipedia.org/wiki/Esperanto instead of English.
| You would need tooling to translate Esperanto to English.
| hamburglar wrote:
| Yes. Please feel free to assume that everywhere I say
| "plain text," I mean "plain text that is not intentionally
| obfuscated." I apologize for not being clear.
| wtetzner wrote:
| Would that really cause a problem in determining if the
| text being sent is well-formed?
|
| Having GET, POST, PUT, etc. and header names be in another
| language wouldn't prevent you from determining the well-
| formedness of the text.
| gambler wrote:
| ASCII has a built-in markup language and a processing control
| protocol that most people aren't even aware of and most tools
| out there don't support. This is significant. Look at the parts
| that are used and parts that aren't. What is the difference
| between them?
| colejohnson66 wrote:
| I think the big reason the ASCII C0 characters never took off
| was because you can't _see or type_ them.[a] If I'm writing a
| spreadsheet by hand (like CSV /TSV), I have dedicated keys
| for the separators (comma and tab keys). I don't have those
| for the C0 ones. I don't even think there's Alt-### codes for
| them.
|
| [a]: Regarding "seeing" them, Notepad++ has a nifty feature
| where it'll show the control characters' names in a black
| box[0]
|
| [0]: https://superuser.com/questions/942074/what-does-stx-
| soh-and...
| specialist wrote:
| Heh. I used those control characters to embed a full text
| editor within AutoCAD on MS-DOS. Back in the day. Mostly
| because someone bet me it couldn't be done.
| Robotbeat wrote:
| I don't know. Can you tell me? ;)
| TeMPOraL wrote:
| The bits that aren't used don't correspond to printable
| characters :).
| tdeck wrote:
| I assume the parent is referring to the various control
| characters like "START OF HEADING", "START OF TEXT",
| "RECORD SEPARATOR", etc... I haven't seen most of these
| used for their original control purpose but they date back
| a long way:
|
| https://ascii.cl/control-characters.htm
| colejohnson66 wrote:
| The "C0" block (U+0000 through U+001F)
| https://en.wikipedia.org/wiki/C0_and_C1_control_codes
|
| They're almost never used in practice however.
| danaliv wrote:
| Yes, exactly. I love binary protocols/formats. Plain text
| formats are wasteful, and difficult (or at least annoying) to
| implement with any consistency. But you really do need a
| translation layer to make binary formats reasonable to work
| with as a developer. There are very good reasons why we prefer
| to work with text: we have a text input device on our
| computers, and our brains are chock full of words with firmly
| associated meanings. We don't have a binary input device, nor
| do we come preloaded with associations between, say, the number
| 4 and the concepts of "end" and "headers." (0x4 is END_HEADERS
| in HTTP/2.)
|
| Once you have the tools in place, working with binary formats
| is as easy as working with plaintext ones.
|
| Of course making these tools takes work. Not much work, but
| work. And it's the kind of work most people are allergic to:
| up-front investment for long-term gains. With text you get the
| instant gratification of all your tools working out of the box.
|
| I don't think I'd go so far as to say that plain text is junk
| food, but it's close. It definitely clogs arteries. :)
| CivBase wrote:
| > "Plaintext" is ASCII binary that is overwhelmingly English.
|
| I don't see any reason why "plaintext" must be limited to
| ASCII. Many "plaintext" protocols support Unicode, including
| the ones listed in this article. Some protocols use human
| language (as you said, overwhelmingly English), but many do
| not. There is nothing inherent about plaintext which
| necessitates the use of English.
|
| > The reason people like plaintext is that we have the tooling
| to see bits of the protocol as it comes over the wire. If we
| had good tooling for other protocols, then the barrier to entry
| would be lower as well.
|
| I disagree.
|
| Humans have used text as the most ubiquitous protocol for
| storing and transferring arbitrary information since ancient
| times. Some other protocols have been developed for specific
| purposes (eg traffic symbols, hazard icons, charts, or whatever
| it is IKEA does in their assembly instructions), but none of
| have matched text in terms of accessibility or practicality for
| conveying arbitrary information.
|
| I think your statement misrepresents the relationship between
| tool quality and the ubiquity of the protocol. Text has,
| throughout most of recorded human history, been the most useful
| and effective mechanism for transferring arbitrary information
| from one human to another. Text isn't so ubiquitous because our
| tooling for it is good; _our tooling for text is good because
| it is so ubiquitous._
|
| Text is accessible to anyone who can see and is supplemented by
| other protocols for those who can't (eg braille, spoken
| language, morse code). It is relatively compact and precise
| compared to other media like pictures, audio, or video. It is
| easily extended with additional glyphs and adapted for various
| languages. There's just nothing that holds a candle to text
| when it comes to encoding arbitrary information.
| waynesonfire wrote:
| There is nothing inherently human readable about plain text.
| It's still unreadable bits, just like any other binary
| protocol. The benefits of plain text are the ubiquitous tools
| that allow us to interact with the format.
|
| It would be interesting to think about what set of tools
| gives 80% of the plain text benefit. Is it cat? grep? wc?
| API?. Most programming languages I know of can read a text
| file and turn it into a string, that's nice. The benefit of
| this analysis would be that when developing a binary
| protocol, it will be evident the support tools that need to
| be developed to provide plenty of value.
|
| I'm not afraid of binary protocols as long as there is
| tooling to interact with the data. And if those tools are
| available, I prefer binary protocols for it's efficiency.
| divbzero wrote:
| > _I 'm not afraid of binary protocols as long as there is
| tooling to interact with the data._
|
| I agree with this premise but would also note how long it
| takes for such tooling to become widespread. Even UTF-8
| took awhile to become universal -- I recall fiddling with
| it on the command line as recently as Windows 7 (code page
| 1252 and the like).
| CivBase wrote:
| > There is nothing inherently human readable about plain
| text. It's still unreadable bits, just like any other
| binary protocol. The benefits of plain text are the
| ubiquitous tools that allow us to interact with the format.
|
| You seem to have glossed over my whole point about how the
| ubiquity of text is what drives good tooling for it, not
| the other way around. Text is not a technology created for
| computers. It has been a ubiquitous information protocol
| for millennia.
|
| > I'm not afraid of binary protocols as long as there is
| tooling to interact with the data. And if those tools are
| available, I prefer binary protocols for it's efficiency.
|
| I'm not afraid of binary protocols either and there are
| good reasons to use them. The most common reason is that
| they can be purpose-built to support much greater
| information density. However, purpose-built protocols
| require purpose-built tools and are, by their very nature,
| limited in application. Therefore, purpose-built protocols
| will never be as well supported as general-purpose
| protocols like text.
|
| That isn't to say that purpose-built protocols are never
| supported well enough to be preferable over text. Images,
| audio, video, databases, programs, and many other types of
| information are usually stored in well-supported, purpose-
| built, binary protocols.
| andrewmcwatters wrote:
| I don't think we necessarily even need good tooling for other
| protocols, we just need good binary analysis tooling that
| visualizes any binary buffer.
|
| I don't know of a single good app that exists for that.
| nick__m wrote:
| Wireshark is somewhat usefull if the protocols in the binary
| blob are supported.
|
| First, the buffer must be converted to ascii hex, then the
| following procedure is used to import it: https://www.wiresha
| rk.org/docs/wsug_html_chunked/ChIOImportS...
| wtetzner wrote:
| That's because you need to know the format to know how to
| interpret it. Otherwise the best you can really do is use a
| hex editor.
|
| Or are you suggesting a tool that lets you easily specify the
| binary format? I'm pretty sure there are some that exist.
| zby wrote:
| OK - but what is Plain Text? Is it ASCII or maybe should it also
| include UTF8 or other Unicode encodings? What is the difference
| between the bits that form HTTP/1 and the bits that form a binary
| protocol like HTTP/2?
| jpalomaki wrote:
| These look easy, but I don't think they always are.
|
| Think about CSV for example. Looks simple to create and parse. In
| reality these simple implementations will give you lot of
| headache when they don't handle edge cases (escaping, linefeeds
| etc).
| u801e wrote:
| If they only used the ASCII record separator instead of a
| comma.
| SahAssar wrote:
| You would still need to be able to handle escaping, right?
| Otherwise you couldn't have a string with a record separator
| within a CSV column.
| leejoramo wrote:
| I always preferred tabs to CVS, but how I wish our industry
| had made use of the ASCII record separator character. How
| many hours would I and my teams have saved in the last 20
| years?
| edflsafoiewq wrote:
| Then you couldn't type it in a text editor.
| u801e wrote:
| I guess it depends on a text editor. Mapping a key to
| insert that character is one possible solution.
| jeffbee wrote:
| Standard unix-style input accepts ctrl-^ as ASCII RS and
| ctrl-_ as ASCII US. If you want your terminal to accept an
| ASCII US literally -- so that you can use it as the -t
| argument to sort, for example -- you would use ctrl-v
| ctrl-_ to give it the literal character. $
| hd ^^^_ 00000000 1e 1f 0a 00000003
| $ sort -t "^_" -k 2,2n a^_42 z^_5 n^_7
| z5 n7 a42
| nightcracker wrote:
| The problem with CSV isn't that it's text based, the problem is
| that "CSV" isn't a file format with an authoritative
| description.
| jandrese wrote:
| There is a RFC (4180) for CSV, but the truth of the matter is
| that there are thousands of parsers written to whatever spec
| the author thought up.
|
| In the real world the entire spec is contained in its three
| word name.
|
| In the end I think the simplicity was also a weakness.
| Because the spec is so minimal a programmer goes "Oh, I'll
| just write my own, no problem", where a more complex protocol
| would have necessitated finding some exisiting library
| instead. Whatever the author of that library did would become
| the defacto standard and there would be less incompatibility
| between files.
| sgtnoodle wrote:
| That reminds me of the time I needed to parse some XML and
| ended up writing my own parser...
| rasengan wrote:
| IRC is also a simple plaintext protocol. It's simplicity helped
| an entire generation of programmers to appear.
| stagas wrote:
| IRC was the first protocol I implemented back in the days. It's
| so simple you don't even need a client, telnet is enough to get
| you chatting. I miss this directness.
| dexen wrote:
| HTTP/2 is a good example of how to handle textual-to-binary
| transition.
|
| The original HTTP/1 was textual (if a bit convoluted), and that
| helped it to become a lingua franca; helped cooperation and data
| interchange between applications proliferate; everybody was able
| to proverbially "scratch his itch". Gradually tooling grew up
| around it too.
|
| The HTTP/2 is binary, and also quite complex - however most of
| that is hidden behind the already established tooling, and from
| developer's perspective, the protocol is seen as "improved HTTP/1
| with optional extras". The protocol _appears_ textual for all
| developmental intents and purposes - because the transition was
| handled (mostly) smoothly in the tooling. The key APIs remained
| the same for quick, mundane use cases - and got extended for
| advanced use cases.
|
| There's a lesson in that having been a success. Unpopular opinion
| warning: contrast the success of HTTP/2 with the failure of IPv6
| to maintain backward compatibility at the _API_ level - which
| hampered its ability to be seamlessly employed in applications.
| cesarb wrote:
| > contrast the success of HTTP/2 with the failure of IPv6 to
| maintain backward compatibility at the API level
|
| Unfortunately, it was not possible for IPv6 to maintain
| backward compatibility with IPv4 at the API level. That's
| because the IPv4 API was not textual; it was binary, with
| fixed-size 32-bit fields everywhere. What they did was the next
| best thing: they made the IPv6 API able to also use IPv4
| addresses, so that new programs can use a single API for both
| IPv4 and IPv6.
| fulafel wrote:
| The API compatibility is pretty far down the list of
| bottlenecks with IPv6. There was some churn related to it 20
| years ago.
| gambler wrote:
| _> however most of that is hidden behind the already
| established tooling_
|
| ...and so everyone without Google-level funding is stuck with
| this tooling. Thus, control over the protocol that millions of
| people could use to communicate with one another directly (by
| making websites and possibly servers) is ceded to a handful of
| centralized authorities that can handle the complexity and also
| happen to benefit from the new features.
|
| I remember how Node when it was just rising in popularity was
| usually demonstrated by writing a primitive HTTP server that
| served a "hello world" HTTP page. There were no special
| libraries involved, so it was super-easy to understand what's
| going on. We're moving away from able to do things of this sort
| without special tooling and almost no one seems to notice or
| care.
| SahAssar wrote:
| Node has had a built in HTTP server since v0.1.17, are you
| sure those examples didn't use that? Because if they did then
| it was the same in those examples as it is now.
|
| Source:
| https://nodejs.org/api/http.html#http_class_http_server
| espadrine wrote:
| > _I remember how Node when it was just rising in popularity
| was usually demonstrated by writing a primitive HTTP server
| that served a "hello world" HTTP page._
|
| That is still possible in the exact same way.
|
| But a toy is just a toy. All websites should encrypt their
| content with TLS. In fact, all protocols should encrypt their
| communications. The result? Sure, it is a binary stream of
| random-looking bits.
|
| Yet to me, what matters about text protocols is not the ASCII
| encoding. It is the ability to read and edit the raw
| representation.
|
| As long as your protocol has an unambiguous one-to-one
| textual representation with two-way conversion, I can inspect
| it and modify it with no headache.
|
| An outstanding example of that is WASM, which converts to and
| from WAT: https://en.wikipedia.org/wiki/WebAssembly#Code_repr
| esentatio...
| LocalH wrote:
| >All websites should encrypt their content with TLS. In
| fact, all protocols should encrypt their communications.
|
| I reject the notion that encryption should be _mandatory_
| for _all websites_. It should be best practice, especially
| for a "modern" website with millions of users, but we
| don't need _every single website_ encrypted.
| miohtama wrote:
| While I agree with you, it is best to be on the safe
| side. The damage from having a wrong website unencrypted
| could be massive vs. cost of simply encrypting
| everything. Demanding 100% encryption is an extra layer
| to protect against human mistakes.
| LocalH wrote:
| Demanding 100% encryption also locks out some
| retrocomputing hardware that had existing browsers in the
| early Internet days. Not all sites _need_ encryption.
| Where it 's appropriate, most certainly. HTTPS should be
| the overwhelming standard. But there is a place for HTTP,
| and there should _always_ be. Same for other unencrypted
| protocols. Unencrypted FTP still has a place.
| SahAssar wrote:
| HTTP/FTP certainly have their place, but that is not on
| the open internet. For retro computing and otherwise
| special cases a proxy on the local network can do
| HTTP->HTTPS conversion.
| MayeulC wrote:
| You can always uses a MITM proxy that presents an
| unencrypted view of the web. As long as you keep to
| HTML+CSS, that should be enough. Some simple js also, but
| you can't generate https URLs on the client side. Which,
| for retrocomputing, is probably fine.
|
| You wouldn't want to expose these "retro" machines to the
| Internet anyways.
| pwdisswordfish0 wrote:
| Indeed. When it comes to technology, I think resiliency
| and robustness in general should trump almost all other
| concerns.
|
| It would be nice if HTTP were extended to accommodate the
| inverse of the Upgrade header. Something to signal to the
| server something like, "Please, I insist. I really need
| you to _just_ serve me the content in clear text. I have
| my reasons. " The server would of course be free to sign
| the response.
| tasogare wrote:
| If you care about protocol simplicity and their afferant
| implementation costs, then the continuously creeping Web
| platform it a few magnitudes worse in this respect.
| contravariant wrote:
| After all I've read on HTTP/2 I'm still not entirely sure what
| problem it is trying to solve.
| nbm wrote:
| The main benefit is multiplexing - being able to use the same
| connection for multiple transactions at the same time. This
| can have benefits in finding and keeping the congestion
| window at its calculated maximum size, reduce connection-
| related start-up, as well as overcome waiting for a
| currently-used connection to be free if you have a max
| connection per server model.
|
| The other potential benefits were priorities and server-
| initiated push, but both I'd say largely went unused and/or
| were too much trouble to use. Priorities were redesigned in
| HTTP 3 - more at https://blog.cloudflare.com/adopting-a-new-
| approach-to-http-... - and Chrome recently decided push in
| HTTP 2 wasn't worth keeping around -
| https://www.ctrl.blog/entry/http2-push-chromium-
| deprecation....
|
| HTTP 2's main problem is head-of-line blocking in TCP -
| basically, if you lose a packet, you wait until you get that
| packet and acknowledge a maximum amount of packets thereafter
| - slowing the connection down. With multiplexing, this means
| that a bunch of in-flight transactions, as well as
| potentially future ones, are blocked at the same time. With
| multiple TCP connections, you don't have this problem of a
| dropped packet affecting multiple transactions.
|
| HTTP 3 has many more benefits - basically, all the benefits
| of multiplexing without the head of line blocking (instead,
| only that stream is affected), as well as ability to
| negotiate alternative congestion control algorithms when
| client TCP stacks don't support newer ones - or come with bad
| defaults. And the future is bright for non-HTTP and non-
| reliable streams as well over QUIC, the transport HTTP 3 is
| built on.
| contravariant wrote:
| Right, all this kind of feels as if HTTP/2 is trying to
| solve transport layer problems in the application layer.
| Especially if you leave out the server initiated push. I
| can't really pretend to know much about this but I can't
| say I'm surprised that this causes problems when the
| underlying transport-layer protocol is trying to solve the
| same problem.
|
| So is it correct to view HTTP/3 as basically taking a step
| back and just running HTTP over a different transport-layer
| protocol (QUIC)? (If so I think the name is a bit
| confusing, HTTP over QUIC would be much clearer)
| mumblemumble wrote:
| It was originally called HTTP over QUIC, and got renamed
| to HTTP/3 in order to avoid some other confusion.
|
| https://en.wikipedia.org/wiki/HTTP/3#History
| cwp wrote:
| That's true, but the transport layer has ossified, and
| the application layer is the only place we can still
| innovate. RIP SCTP.
| SahAssar wrote:
| HTTP/2 is what you do if you're confined to using TCP.
| HTTP/3 is what you get if you use UDP to solve the same
| problems (and new problems discovered by trying it over
| TCP).
| bullen wrote:
| HTTP/2.0 has TCP head-of-line issues, in practice that
| nullifies it's usefulness!
|
| HTTP/1.1 is much more balanced and simple, and as I said all
| over this topic the bottleneck is elsewhere!
| rectang wrote:
| > _Unpopular opinion warning: contrast the success of HTTP /2
| with the failure of IPv6 to maintain backward compatibility at
| the API level - which hampered its ability to be seamlessly
| employed in applications._
|
| Unpopular?
|
| The gratuitous breaking of backwards compatibility by IPv6,
| inflicting staggering inefficiencies felt directly or
| indirectly by all internet users, should be a canonical case
| study by now. It should be taught to all engineering students
| as a cautionary tale: never, ever do this.
| convolvatron wrote:
| i'm fairly sympathetic here - except part of the blame should
| really be on the socket layer and resolver interface. if they
| had been a bit better at modelling multiprotocol networks,
| this kind of transition would have been easier.
| superkuh wrote:
| Plain text protocols are in serious danger as the (confused)
| desire for TLS- _only_ everywhere spreads with the best
| intentions. The problem is that the security TLS- _only_ brings
| to protocols like HTTP(s) also brings with it massive
| centralization in cert authorities which provide single points of
| technical, social, and political failure.
|
| If the TLS-everywhere people succeed in their misguided cargo-
| cult efforts to kill of HTTP and other plain text connections
| everywhere, if browsers make HTTPS only default, then the web
| will lose even more of it's distributed nature.
|
| But it's not only the web (HTTP) that is under attack from the
| centralization of TLS, even SMTP with SMTPS might fall to it
| eventually. Right now you can self sign on your mailserver and it
| works just fine. But I imagine "privacy" advocates will want to
| exchange distributed security of centralized privacy there soon
| too.
|
| TLS is great. I love it and it has real, important uses. But TLS
| _only_ is terrible for the internet. We need plain text protocols
| too. HTTP+HTTPS for life.
| elric wrote:
| Mixing trust and encryption that resulted in centralized TLS
| was probably a design flaw. Certificate pinning in DNS is an
| attractive "fix", but moves the problem up a layer. But DNS is
| already centralized, so there's that.
|
| > Right now you can self sign on your mailserver and it works
| just fine
|
| Well .. sort of. Until you have to interact with google or ms
| mail servers. After an hour of wondering why your mails are
| getting blackholed, one starts to reconsider one's life
| choices.
| s_gourichon wrote:
| You can talk plain text protocol through a TLS or SSL-encrypted
| connection, even interactively.
|
| Example: { echo GET / HTTP/1.0 ; echo ;
| sleep 1 ; } | openssl s_client -connect www.google.com:443
|
| Or just : openssl s_client -connect
| www.google.com:443
|
| then type interactively GET / HTTP/1.0 then press enter twice.
| tkot wrote:
| Using openssl s_client -ign_eof makes piping text a bit
| easier because connection won't be closed prematurely (so you
| don't need to use sleep 1)
| minitoar wrote:
| I sort of thought TLS everywhere was more about encryption than
| authentication.
| bombcar wrote:
| If it was only authentication then they'd be perfectly fine
| with unsigned certs. But they're not.
| jl6 wrote:
| Gemini uses TLS and it is common practice for Gemini clients to
| use self-signed certificates and TOFU. No dependency on
| centralized CAs.
| elric wrote:
| TOFU seems to work pretty well for SSH. AFAIK not many people
| actively verify host fingerprints on first use. It doesn't
| protect against MITM attacks on the first connection, but I
| wonder if that's not a case of better being the enemy of good
| to some extent?
| nucleardog wrote:
| The high value targets are much more spread about with SSH
| than with HTTP. Finding a place where you could inject
| yourself between, for example, a user ssh'ing into a
| banking service and the banking service is going to be
| difficult. Just blindly MITM'ing a bunch of users at a
| coffee shop will probably get you little to nothing of any
| real value.
|
| And because SSH is rarely used for the public to connect in
| to services it's a lot easier to add additional layers of
| security on top. Most valuable targets won't even be
| exposed in the first place or will have a VPN or some other
| barrier that would prevent the attack anyway.
|
| From the HTTP end though, it's easy to narrow down
| "valuable" targets--there are like 5 main banks in my
| country. They're, by design, meant to be connected to by
| the public so there are no additional layers of security
| implemented. If you set up in a coffee shop for a day
| there's a pretty reasonable chance you'd find at least one
| or two people that had just bought a new device or were
| otherwise logging in for the first time that you could nab.
|
| You'd also run into the issue of what to do when sites
| needed to update their certificates for various reasons. If
| the SSH host key changes it's pretty easy to communicate
| that out-of-band within a company to let people know to
| expect it. If a website's certificate changes what do we
| do? We end up training users to just blindly click through
| the "this website's certificate has changed!" warning and
| we're back to effectively zero protection.
| zozbot234 wrote:
| > If you set up in a coffee shop for a day there's a
| pretty reasonable chance you'd find at least one or two
| people that had just bought a new device
|
| Sure, but it's easy to protect against this - just
| connect to the same service via a different endpoint and
| check that both endpoints get the same certificate. AIUI
| this is how the EFF SSL observatory detects MITM attacks
| in the wild, and similar approaches could be used to make
| TOFU-to-a-popular-service a lot more resilient, at least
| wrt. most attacks.
| taywrobel wrote:
| There's a difference between the transport and the protocol.
| For instance I've used Redis fronted by TLS in the past.
| Initial connection did get more tricky for sure, needing to
| have the certs in place to first connect.
|
| However after the connection was established with OpenSSL I was
| able to run all the usual commands in plain text and read all
| the responses in plain text. Having transport layer encryption
| on the TCP connection didn't effect the protocol itself at all.
| bullen wrote:
| Don't worry, the day HTTP is deprecated is the day civilization
| is over.
| InfiniteRand wrote:
| It would be good to have a standard checklist of edge cases to
| handle with plaintext protocol design. Anyone know of one?
|
| I'm thinking along the lines of:
|
| 1. Control characters
|
| 2. Whitespace normalization
|
| 3. Newline normalization
|
| 4. Options for compression
|
| 5. Escaping characters significant to the protocol
|
| 6. Encoding characters outside of the normal character range
|
| 7. Dealing with ambiguous characters (not really an issue with
| strict ASCII)
|
| 8. Internationalization (which is intertwined with the previous
| items)
|
| 9. Dealing with invalid characters
|
| I'm not saying plain text doesn't have its advantages, I'm just
| saying there are issues you need to consider.
| chubot wrote:
| Some comments on those issues here:
|
| https://www.arp242.net/the-art-of-unix-programming/#id290742...
|
| https://lobste.rs/s/rzhxyk/plain_text_protocols#c_5vx4ez
| [deleted]
| bullen wrote:
| HTTP is unbeatable when you remove optional headers, not because
| of bandwidth;
|
| but because there are robust servers that can multi-thread joint-
| memory access with non-blocking IO and atomic concurrency.
|
| I use comet-stream for real-time 3D Action MMO data, so I have my
| own text based protocol wrapped in 2x sockets HTTP:
|
| F.ex. message =
| "move|<session>|<x>,<y>,<z>|<x>,<y>,<z>,<w>|walk":
|
| ### push (client -> server): "GET /push?data=" +
| message + " HTTP/1.1\r\nHost: my.host.name",
|
| then you get back "200 OK\r\nContent-Length:
| <length>\r\n\r\n<content>". In this case "0\r\n\r\n".
|
| ### pull (server -> client):
|
| Just one request pulls infinite response chunks:
| "GET /pull HTTP/1.1\r\nHost: my.host.name\r\nAccept: text/event-
| stream"
|
| then you get back "200 OK\r\nTransfer-Encoding:
| chunked\r\n\r\n". while(true) {
| <hex_length> + "\r\n" "data:" + message +
| "\n\n\r\n\r\n" }
|
| Simple and efficient!
|
| Text- + Event- based protocols over HTTP way outscale Binary- +
| Tick- based ones for compressed (as in averaged, not zipped)
| real-time data.
| mattlondon wrote:
| Don't forget Gopher!
|
| Orignal: https://tools.ietf.org/html/rfc1436
|
| Gopher+: https://github.com/gopher-protocol/gopher-plus
|
| I feel like there is a lot of potential to "rejuvenate" Gopher
| somewhat in today's internet. No javascript, no cookies, no ads,
| no auto-play videos etc.
|
| There are some nice new modern GUIs like https://gophie.org that
| are cross platform and modern.
|
| Fun fact: Redis (yes - _that_ Redis) just added Gopher protocol
| support (https://redis.io/topics/gopher)
| hliyan wrote:
| I sometimes wonder if the web would have been a better place
| had CSS and everything but the most basic HTML tags didn't
| exist.
| tdeck wrote:
| But also no hyperlinks, no mixing text with images, and no
| unicode support in the menus.
| mattlondon wrote:
| I think there is probably a lot of mileage that can be gained
| serving markdown over gopher with embedded gopher://
| hyperlinks and images and utf8 and everything else markdown
| supports via gopher itself. Gopher0 already has sort-of
| support for HTML file types so this would not be such a wild
| divergence from the original design. Not serving HTML
| provides some basic guarantees (no JavaScript, no tracking
| pixelsnetc)
|
| Gopher+ allows for quite flexible (albeit clunky) attributes
| for menu items so I can imagine an attribute on a menu
| directing compatible browsers to the markdown version of the
| menu, but allowing old clients to just view the traditional
| menu. This kinda relegates the gopher menu to a sort of old
| school directory listing type things we used to see in HTTP,
| but there is room for some fanciness via gopher+ to style
| menus themselves if browsers support that too!
|
| All of this is possible in Gopher+ if clients support it
| (...and there is an appetite for it). Perhaps we need some
| sort of "agreement"/Python-PIP-style thing to define sets of
| common Gopher+ attributes for all of this sort of thing.
| jl6 wrote:
| I hope you are aware of Gemini which aims to do exactly that?
| andrewmcwatters wrote:
| What's sort of interesting is that there aren't too many
| overwhelming reasons why someone couldn't come up with a piece of
| software that autodetected a binary format and translated it to
| something readable in a GUI.
|
| I mean we know what the binary layout is of things, so I never
| understood (outside of the time that it would take to build such
| a utility) why I've never been able to find something that says,
| "Oh yeah, that binary string contains three doubles, separated by
| a NULL character, with an int, followed by a UTF-8 compatible
| string."
|
| Such a tool would be incredibly useful for reverse engineering
| proprietary formats, and yet I don't know of a good one, so if it
| exists it's at least obscure enough for it to have escaped my
| knowledge for well over a decade.
| nograpes wrote:
| There is a command-line program called "file" that attempts to
| determine the file type (format). It uses a series of known
| formats and returns the first matching one. I have found it
| useful to reverse engineer proprietary formats.
| andrewmcwatters wrote:
| Yeah, but that's for known formats.
|
| If I said I have a buffer of 512 bytes and piped it through
| to some cli, that would be fine if it could tell me how many
| ints, chars, floats, doubles, compressed bits of data,
| CRC32s, UTF-8 strings, etc. it contained, but there's few
| utilities out there that will do that.
| magmastonealex wrote:
| I'm curious how you'd propose doing that.
|
| If I give you a buffer of 5 bytes:
|
| [0x68 0x65 0x6c 0x6c 0x6f]
|
| there are a ton of ways to interpret that.
| - The ascii string "hello" - 5 single-byte integers
| - 2 two-byte integers and 0x6c as a delimiter - 1
| four-byte integer and ending in the char "o" - 1
| 32-bit float, and one single-byte integer
|
| etc. Or are you hoping for something that will provide you
| with all the possible combinations? That would produce
| pages of output for any decently-sized binary blob.
| andrewmcwatters wrote:
| I'm sort of looking for something that will attempt to
| narrow down possibilities. The way I'd do it is by
| providing some visualizations based on the user selecting
| what data types and lengths they're looking for.
|
| So for instance, if I know I'm looking at triangle data,
| I can guess that it's probably compressed, ask the app to
| decompress the data based on some common compression
| types, look at that data and guess that I'm looking at
| some floats or doubles.
|
| Maybe I'm wrong, so then I can ask the app to search for
| other data types at that point.
|
| To me, that would be a tremendous help over my experience
| with existing hex editors.
|
| Edit: It's not fair for me to say there aren't tools that
| do exactly this, but to be more precise, a decent user
| experience is lacking in most cases.
| stack_underflow wrote:
| Your post reminded me of the presentation on cantor.dust:
| https://sites.google.com/site/xxcantorxdustxx/
| https://www.youtube.com/watch?v=4bM3Gut1hIk - Christopher
| Domas The future of RE Dynamic Binary Visualization
| (very interesting presentation)
|
| Looks like there's even been a recently open sourced
| plugin for Ghidra released by Battelle:
| https://github.com/Battelle/cantordust
___________________________________________________________________
(page generated 2021-02-25 23:02 UTC)