[HN Gopher] CRLF is obsolete and should be abolished
___________________________________________________________________
CRLF is obsolete and should be abolished
Author : km
Score : 178 points
Date : 2024-10-13 19:16 UTC (2 hours ago)
(HTM) web link (fossil-scm.org)
(TXT) w3m dump (fossil-scm.org)
| justin66 wrote:
| Nice. I think that's the most energized I've seen Richard Hipp on
| a topic.
| michaelmior wrote:
| > various protocols (HTTP, SMTP, CSV) still "require" CRLF at the
| end of each line
|
| What would be the benefit to updating legacy protocols to just
| use NL? You save a handful of bits at the expense of a lot of
| potential bugs. HTTP/1(.1) is mostly replaced by HTTP/2 and later
| by now anyway.
|
| Sure, it makes sense not to require CRLF with any _new_
| protocols, but it doesn 't seem worth updating legacy things.
|
| > Even if an established protocol (HTTP, SMTP, CSV, FTP)
| technically requires CRLF as a line ending, do not comply.
|
| I'm hoping this is satire. Why intentionally introduce potential
| bugs for the sake of making a point?
| javajosh wrote:
| _> What would be the benefit..._
|
| It is interesting that you ignore the benefits the OP describes
| and instead present a vague and fearful characterization of the
| costs. Your reaction lies at the heart of cargo-culting, the
| maintenance of previous decisions out of sheer dread. One can
| do a cost-benefit analysis and decide what to do, or you can
| let your emotions decide. I suggest that the world is better
| off with the former approach. To wit, the OP notes for benefits
| " The extra CR serves no useful purpose. It is just a needless
| complication, a vexation to programmers, and a waste of
| bandwidth." and a mitigation of the costs "You need to search
| really, really hard to find a device or application that
| actually interprets U+000a as a true linefeed." You ignore both
| the benefits assertion and cost mitigating assertion entirely,
| which is strong evidence for your emotionality.
| perching_aix wrote:
| > you ignore the benefits the OP describes
|
| Funnily enough, the author doesn't actually describe any
| tangible benefits. It's all just (in my reading, semi-
| sarcastic) platonics:
|
| - peace
|
| - simplicity
|
| - the flourishing of humanity
|
| ... so instead of "vague and fearful", the author comes on
| with a "vague and cheerful". Yay? The whole shtick about
| saving bandwidth, lessening complications, and reducing
| programmer vexations are only ever implied by the author, and
| _were_ explicitly considered by the person you were replying
| to:
|
| > You save a handful of bits at the expense of a lot of
| potential bugs.
|
| ... they just happened to be not super convinced.
|
| Is this the kind of HackerNews comment I'm supposed to feel
| impressed by? That demonstrates this forum being so much
| better than others?
| YZF wrote:
| What's your estimate for the cost of changing legacy
| protocols that use CRLF vs. the work that will be done to
| support those?
|
| My intuition (not emotion) agrees with the parent that
| investing in changing legacy code that works, and doesn't see
| a lot of churn, is likely a lot more expensive than leaving
| it be and focusing on new protocols that over time end up
| replacing the old protocols anyways.
|
| OP does not really talk about the benefit, he just opines.
| How many programmers are vexed when implementing "HTTP, SMTP,
| CSV, FTP"? I'd argue not many programmers work on
| implementations of these protocols today. How much traffic is
| wasted by a few extra characters in these protocols? I'd
| argue almost nothing. Most of the bits are (binary,
| compressed) payload anyways. There is no analysis by OP of
| the cost of not complying with the standard which potentially
| results in breakage and the difficulty of being able to
| accurately estimate the breakage/blast radius of that lack of
| compliance. That just makes software less reliable and less
| predictable.
| LegionMammal978 wrote:
| The cost is, if people start transitioning to a world where
| senders only transmit LF in opposition to current standards
| for protocols like HTTP/1.1 or SMTP (especially aggressively,
| e.g., by creating popular HTTP libraries without a CRLF
| option), then it will create the mental and procedural
| overhead of tracking which receivers accept LF alone vs.
| which still require CRLF. Switching established protocols is
| never free, even when there are definite benefits: see the
| Python 2-to-3 fiasco, caused by newer programs being
| incompatible with most older libraries.
| michaelmior wrote:
| You're right that I didn't mention the supposed benefits in
| my response. But let's incorporate those benefits into new
| protocols rather than break existing protocols. I just don't
| see the benefit in intentionally breaking existing protocols.
| chasil wrote:
| FYI, Sendmail accepts LF without CR, but Exchange doesn't.
| 9dev wrote:
| ...how very in character for each of them!
| phkahler wrote:
| >> I'm hoping this is satire. Why intentionally introduce
| potential bugs for the sake of making a point?
|
| It's not satire and it's not just trying to make a point. It's
| trying to make things simpler. As he says, a lot of software
| will accept input without the CR already, even if it's supposed
| to be there. But we should change the standard over time so
| people in 2050 can stop writing code that's more complicated
| (by needing to eat CR) or inserts extra characters. And never
| mind the 2050 part, just do it today.
| michaelmior wrote:
| Ignoring established protocols doesn't make things simpler.
| It makes things vastly more complicated.
|
| Let's absolutely fix _new_ protocols (or new versions of
| existing protocols). But intentionally breaking existing
| protocols doesn 't simplify anything.
| nsnshsuejeb wrote:
| Yes. We all know how to do this. You know that API version
| thingy. I agree to drop the carriage return when not needed
| but do it in future protocols.
|
| Obviously IPv6 shows you need to be patient. Your great
| grandkids may see a useless carriage return!
|
| Windows doesn't help here.
| Ekaros wrote:
| Thinking about it. Using CR alone in protocols actually make
| infinitely more sense. As that would allow use of LF in
| records. Which would make many use cases much simpler.
|
| Just think about text protocols like HTTP, how much easier
| something like cookies would be to parse if you had CR as
| terminating character. And then each record separated by LF.
| gpvos wrote:
| That is so backwards incompatible that it is never, ever
| going to fly.
| mattmerr wrote:
| ASCII already has designated bytes for unit, group, and
| record separators. That aside, a big drawback of using
| unprintable bytes like these is they're more difficult for
| humans to read in dumps or type on a keyboard than a newline
| (provided newline has a strict definition CRLF, LF, etc)
| mechanicalpulse wrote:
| > Why intentionally introduce potential bugs for the sake of
| making a point?
|
| It seems spiteful, but it strikes me as an interesting
| illustration of how the robustness principle could be hacked to
| force change. It's a descriptivist versus prescriptivist view
| of standards, which is not how we typically view standards.
| amluto wrote:
| > I'm hoping this is satire. Why intentionally introduce
| potential bugs for the sake of making a point?
|
| It's worse than satire. Postel's Law is definitively wrong, at
| least in the context of network protocols, and delimiters,
| especially, MUST be precise. See, for example:
|
| https://www.postfix.org/smtp-smuggling.html
|
| Send _exactly_ what the spec requires, and parse _exactly_ as
| the spec requires. Do not accept garbage. And LF, where CRLF is
| specified, is garbage.
| tptacek wrote:
| If two systems agree, independent of any specification
| someone somewhere else wrote, to accept a bare NL where a
| CRLF is specified, that is not "garbage". Standards documents
| are not laws; the horse drags the cart.
| DaiPlusPlus wrote:
| > Standards documents are not laws; the horse drags the
| cart.
|
| They can be: c.f. legally-enforced safety-regulations.
| tptacek wrote:
| These aren't.
| perching_aix wrote:
| Laws are also just some ink on paper (and are routinely
| overruled, circumvented or unenforced in certain
| jurisdictions), so using this kind of logic in order to
| encourage standard violations is unsound.
|
| There is a method to this madness, and that's revising the
| standards.
| tptacek wrote:
| What's a "standard violation"? The original history of
| the IETF is a rejection of exactly this mode of thinking
| about the inviolability of standards, which was the ethos
| of the OSI.
| perching_aix wrote:
| When an implementation is noncomformant to a standard in
| question.
| tptacek wrote:
| IETF standards are tools to help developers get stuff
| done on the Internet. They are not the only tool, and
| they don't carry any moral force.
| perching_aix wrote:
| Apart from colloquially considering standards not-
| necessarily-normative being, in my opinion, nonsensical
| (see below), to the best of my knowledge at the very
| least the STD subseries of IETF standards documents _are_
| normative in nature: https://datatracker.ietf.org/doc/std
|
| > They are not the only tool, and they don't carry any
| moral force.
|
| Indeed there are countless other standards bodies in the
| world also producing normative definitions for many
| things, so I'm definitely a bit confused why the focus on
| IETF specifically.
|
| To be even more exact, I do not know of any standards
| bodies who would publish what they and the world consider
| as standards, that would be entirely, or at least
| primarily, informational rather than normative in nature.
| Like, do I know the word "standard" incorrectly? What
| even is a point of a standard, if it doesn't aim to
| control?
| nsnshsuejeb wrote:
| Elephant in the room is the trillions of actual servers
| and user agents that would need to be tested and patched
| if you retroactively change a standard. Luckily there are
| some digits after HTTP that allow the concept of new
| versions of the standard.
| djbusby wrote:
| That's just two systems that happen to agree on garbage.
| FiloSottile wrote:
| Exactly. Please DO NOT mess with protocols, especially legacy
| critical protocols based on in-band signaling.
|
| HTTP/1.1 was regrettably but irreversibly designed with
| security-critical parser alignment requirements. If two
| implementations disagree on whether `A:B\nC:D` contains a value
| for C, you can build a request smuggling gadget, leading to
| significant attacks. We live in a post-Postel world, only ever
| generate and accept CRLF in protocols that specify it, however
| legacy and nonsensical it might be.
|
| (I am a massive, massive SQLite fan, but this is giving me
| pause about using other software by the same author, at least
| when networks are involved.)
| tptacek wrote:
| This would be more persuasive if HTTP servers didn't already
| widely accept bare 0ah line termination. What's the first
| major public web site you can find that doesn't?
| michaelmior wrote:
| We're talking about servers and clients here. The best way
| to ensure things work is to adhere to an established
| protocol. Aside from saving a few bytes, there doesn't seem
| to be any good reason to deviate.
| tptacek wrote:
| I'm saying the consistency that Filippo says our security
| depends on doesn't really seem to exist in the world,
| which hurts the persuasiveness of that particular
| argument in favor of consistency.
| dwattttt wrote:
| But no one expects 0ah to be sufficient. Change that
| expectation, and now you have to wonder if your
| middleware and your backend agree on whether the
| middleware filtered out internal-only headers.
| tptacek wrote:
| Yeah, I'm not certain that this is a real issue. It might
| be? Certainly, I'm read in to things like TECL desync. I
| get the concern, that any disagreement in parsing
| policies is problematic for HTTP because of middleboxes.
| But I think the ship may have sailed on 0ah, and that it
| may be the case that you simply have to build HTTP
| systems to be bare-0ah-tolerant if you want your system
| to be resilient.
| Ekaros wrote:
| There is very good reasons not to deviate as mismatch in
| various other things that can or are not on the path can
| affect things. Like reverse proxies, load balancers and
| so on.
| FiloSottile wrote:
| Hrm, this is what I get for logging in to HN from my phone.
| It's possible I am confusing this with one of the other
| exploitable HTTP/1.1 header parser alignment issues.
|
| Maybe this was so widespread that ~everything already
| handles it because non-malicious stuff breaks if you don't.
| In that case, my bad, but I still would like to make a
| general plea as an implementer for sticking strictly to
| specified behavior in this sort of protocols.
| refulgentis wrote:
| I wouldn't be too worried and making personal judgements, he
| says the same thing you are (though I assume you disagree)
| mackal wrote:
| > massive SQLite fan, but this is giving me pause about using
| other software by the same author
|
| Even if I wanted to contribute code to SQLite, I can't. I
| acknowledge the fact God doesn't exist, so he doesn't want my
| contributions :P
| halter73 wrote:
| > I'm hoping this is satire.
|
| Me too. It's one thing to accept single LFs in protocols that
| expect CRLF, but sending single LFs is a bridge to far in my
| opinion. I'm really surprised most of the other replies to your
| comment currently seem to unironically support not complying
| with well-established protocol specifications under the
| misguided notion that it will somehow make things "simpler" or
| "easier" for developers.
|
| I work on Kestrel which is an HTTP server for ASP.NET Core.
| Kestrel didn't support LF without a CR in HTTP/1.1 request
| headers until .NET 7 [1]. Thankfully, I'm unaware of any widely
| used HTTP client that even supports sending HTTP/1.1 requests
| without CRLF header endings, but we did eventually get reports
| of custom clients that used only LFs to terminate headers.
|
| I admit that we should have recognized a single LF as a line
| terminator instead of just CRLF from the beginning like the
| spec suggests, but people using just LF instead of CRLF in
| their custom clients certainly did not make things any simpler
| or easier for me as an HTTP server developer. Initially, we
| wanted to be as strict as possible when parsing request headers
| to avoid possible HTTP request smuggling attacks. I don't think
| allowing LF termination really allows for smuggling, but it is
| something we had to consider.
|
| I do not support even adding the option to terminate HTTP/1.1
| request/response headers with single LFs in HttpClient/Kestrel.
| That's just asking for problems because it's so uncommon. There
| are clients and servers out there that will reject headers with
| single LFs while they all support CRLF. And if HTTP/1.1 is
| still being used in 2050 (which seems like a safe bet), I
| guarantee most clients and servers will still use CRLF header
| endings. Having multiple ways to represent the exact same thing
| does not make a protocol simpler or easier.
|
| [1]: https://github.com/dotnet/aspnetcore/pull/43202
| deltaknight wrote:
| As an implementation detail, I assume many programs simply ignore
| the CR character already? Whilst of course many windows programs
| (and protocols as mentioned) still require CRLF, surely the most
| efficient way to make something cross-platform if to simply act
| on the LF part of CRLF, that way it works for both CRLF and LF
| line ends.
|
| The fact that both CRLF and LF used the same control character in
| my eyes in a huge bonus for this type of action to actually work.
| Simply make everything cross platform and start ignoring CR
| completely. I'm surprised this isn't mentioned explicitly as a
| course of action in the article, instead it focuses on making
| people change their understanding of LF in to NL which is as
| unnecessary complication that will cause inevitable bikeshedding
| around this idea.
| phkahler wrote:
| >> instead it focuses on making people change their
| understanding of LF in to NL which is as unnecessary
| complication that will cause inevitable bikeshedding around
| this idea.
|
| Not really. In order to ignore CR you need to treat LF as NL.
| deltaknight wrote:
| Fair point, although I'd suggest that many programs already
| treat LF as NL (e.g. unix text files), so this understanding
| of the meaning of LF already exists in the world. If you're
| writing anything generic/cross-platform, you have to be able
| to treat LF as NL. So there isn't really a change to be made
| here.
| djha-skin wrote:
| > Let's make CRLF one less thing that your grandchildren need to
| know about or worry about.
|
| The struggle is real, the problem is real. Parents, teach your
| kids to use .gitattribute files[1]. While you're at it, teach
| them to hate byte order marks[2].
|
| 1: https://stackoverflow.com/questions/73086622/is-a-
| gitattribu...
|
| 2: https://blog.djhaskin.com/blog/byte-order-marks-must-diemd/
| nsnshsuejeb wrote:
| The letters after the dot in my filename don't map 1 to 1 with
| the file format.
| fortran77 wrote:
| The article had some major gaffes. Teletypes never had a ball.
| The stationary platen models had type boxes and cylinders, but
| never balls.
| refset wrote:
| Not sure whether this changes anything about your critique, but
| note that the IBM 2741 terminal embedded a Selectric
| typewriter:
|
| _> Selectric-based mechanisms were also widely used as
| terminals for computers, replacing both Teletypes and older
| typebar-based output devices. One popular example was the IBM
| 2741 terminal_
|
| https://en.wikipedia.org/wiki/IBM_Selectric
| perching_aix wrote:
| Well, at least the title is honest. Straight up asking people to
| break standards out of sheer conviction is a new one for me
| personally, but it's definitely one of the attitudes of all time,
| so maybe it's just me being green.
|
| Can we ask for the typical *nix text editors to disobey the POSIX
| standard of a text file next, so that I don't need to use hex
| editing to get trailing newlines off the end of files?
| 201984 wrote:
| What's wrong with trailing newlines?
| perching_aix wrote:
| Other than select software being pissy about it, not much.
| Just like how there's nothing wrong with CRLF, except for
| select software being pissy about that too.
| norir wrote:
| It makes writing parsers more complicated in certain cases
| because you can't tell without lookahead if a newline
| character should be treated as a newline or an eof.
| viraptor wrote:
| What? Which crazy non-binary format makes a distinction
| between CRLF(EOF) and just (EOF)? Apart from a plain text
| file, that is.
| rkeene2 wrote:
| People don't seem to mind when Chrome does it [0]. The response
| "standards aren't a death pact" stands out in particular.
|
| [0] https://news.ycombinator.com/item?id=13860682
| perching_aix wrote:
| Might be just my personal impression, but I'm pretty sure
| Chrome is _extremely_ notorious for abusing its market leader
| position, including in this way. So gonna have to disagree
| there, from my view people do mind Chrome and its
| implementation particularities quite a lot.
| dijit wrote:
| I think the parent is equally denigrating the situation.
|
| Leaders choose the standards, especially as they approach
| monopoly.
|
| Worse still: people will come out of the woodwork to
| actively defend the monopolist de facto standard producer.
| nsnshsuejeb wrote:
| Not defending the producer, just making pragmatic
| choices!
| bigstrat2003 wrote:
| Yeah, I have no idea what the author is smoking. Deliberately
| breaking standards is simply not an acceptable solution to the
| problem, even if it were a serious problem (it's not).
| Ekaros wrote:
| If there truly is a problem with existing protocols, propose
| and properly design new one that can replace it. Then if it
| is technically superior solution it should win in long run.
| nsnshsuejeb wrote:
| No need. Just convince the king (e.g. Google for HTTP) to
| make a tweak in the next standard version.
| fweimer wrote:
| SMTP
| <https://datatracker.ietf.org/doc/html/rfc2821#section-4.1.1....>
| is pretty clear that the message termination sequence is CR LF .
| CR LF, not LF . LF, and disagreements in this spot are known to
| cause problems (include undesirable message injection). But then
| enough alternative implementations that recognize LF . LF as well
| are out there, so maybe the original SMTP rules do not matter
| anymore.
| theginger wrote:
| Ridiculous! We need to develop 1 universal standard that covers
| everyone's use cases. Yeah!
| Ekaros wrote:
| I think I can offer most reasonable compromise here. Decide upon
| on new UTF-8 code point. Have the use mandated and ignore and ban
| all end-points that do not use this code-point instead of CRLF or
| just LF alone.
| phkahler wrote:
| So break _everything_.
| whizzter wrote:
| https://xkcd.com/927/
| kps wrote:
| You mean U+2028 LINE SEPARATOR?
| Ekaros wrote:
| Perfect. So now we just need to start filing bug reports to
| any tool that does not support it instead of CRLF or LF
| alone.
| bear8642 wrote:
| Oh, yet another option - first thought was U+0085 NEXT LINE
| as above
| bear8642 wrote:
| > Decide upon on new UTF-8 code point.
|
| Unicode have already done so - (NEL)
| https://www.compart.com/en/unicode/U+0085
| shadowgovt wrote:
| Define "abolish."
|
| We could certainly try to write no new software that uses them.
|
| But last I checked, there are terabytes and terabytes of stored
| data in various formats (to say nothing of living protocols
| already deployed) and they aren't gonna stop using CRLF any time
| soon.
| tedunangst wrote:
| No mention of what happened the last time we mixed and matched
| line endings? https://smtpsmuggling.com/
| deltaknight wrote:
| Doesn't this show that ignoring CR and only processing LFs is a
| good idea? If I'm understanding right (probably wrong), this
| vuln relied on some servers using CRLF only as endings, and
| others supporting both CRLF and LF.
|
| If every server updated to line-end of LF, thereby supporting
| both types, this vuln wouldn't happen?
|
| Of course if there's is a mixed bag then I guess this is still
| possible, if your server only supports CRLF. At least in that
| scenario you have some control over the issue though.
| WesolyKubeczek wrote:
| > Even if an established protocol (HTTP, SMTP, CSV, FTP)
| technically requires CRLF as a line ending, do not comply. Send
| only NL.
|
| Now just go pound sand. Seriously. And you owe me 5 minutes of my
| life wasted on reading the whole thing.
|
| My god, I would have thought all those "simplification" ideas die
| off once you have 3 years of experience or more. Some people
| won't learn.
|
| P. S. Guess even the most brilliant people tend to have dumb
| ideas sometimes.
| rgmerk wrote:
| Of all the stupid and obsolete things in standards we use to
| interoperate, CRLF is one of the least consequential.
| moomin wrote:
| Counterpoint: Unix deciding on a non-standard line ending was
| always a mistake. It has produced decades of random
| incompatibility for no particular benefit. CRLF isn't a
| convention: it's two different pieces of the base terminal API.
| You have no idea how many programs rely on CR and LF working
| correctly.
| matheusmoreira wrote:
| Yeah. It's weird how Unix picked LF given its love of
| terminals. CRLF is the semantically correct line ending
| considering terminal semantics. It's present in the terminal
| subsystem to this day, people just don't notice because they
| have OPOST output post processing enabled which automatically
| converts LF into CRLF.
| fanf2 wrote:
| [delayed]
| lynx23 wrote:
| Can OP please tell me how to abolsih CR while in Raw Mode? Did he
| forget about it, or am I just unimaginative?
| samatman wrote:
| Right, you don't need to search _that_ hard for a device which
| interprets 0xA as a line feed, just set your terminal to raw
| mode, done.
|
| But given the very first sentence:
|
| > _CR and NL are both useful control characters._
|
| I'm willing to conclude that he doesn't intend _A Blaste
| Against The Useless Appendage of Carriage Return Upon a New
| Line, or Line Feed As Some Style It_ , to apply to emulators of
| the old devices which make actual use of the distinction.
| forrestthewoods wrote:
| I could not possibly disagree with this more strongly or
| violently.
|
| In short - shutup and deal with it. Is it an extremely mild and
| barely inconvenient nuisance to deal with different or mixed line
| endings? Yes. Is this actually a hard or difficult problem? No.
|
| Stop trying to force everyone to break their backs so your life
| is inconsequentially easier. Deal with it and move on.
| Animats wrote:
| Now convince Microsoft. It's really the legacy of DOS that keeps
| this alive.
| A4ET8a8uTh0 wrote:
| I feel it necessary to have an obligatory 'Would someone think of
| banking?' before we 'abolish'(however we eventually arrive at
| defining it )anything.
|
| I mean it is all cool to have this idea, but real world
| implications, where half the stuff dangles on a text file, appear
| to be not considered here.
|
| For clarity's sake, I am not saying don't do it. I am saying: how
| will that work?
|
| edit: spaces, tabs and one crlf
| anonymousiam wrote:
| This article seems like it was written to troll people into a
| flame war. There is no such character as NL, and the article does
| not at all address that fact that the "ENTER" key on every
| keyboard sends a CR and not a LF. Things work fine the way they
| are.
| sunk1st wrote:
| > Nobody ever wants to be in the middle of a line, then move down
| to the next line and continue writing in the next column from
| where you left off. No real-world program ever wants to do that.
|
| Is this true?
| gfody wrote:
| we should leave it for backwards compatibility and adopt U+0085
| as the standard next line codepoint. and utf8 libraries could
| unofficially support every combination of 0A 0D as escape
| sequences.
| NelsonMinar wrote:
| sqlite is a work of absolute genius. But every once in awhile
| something comes along to remind us how _weird_ its software
| background is. Fossil. The build system. The TCL test harness.
| And now this, a quixotic attempt to break 50+ years of text
| formatting and network protocols.
|
| Yes CRLF is dumb. No, replacing it is not realistic.
___________________________________________________________________
(page generated 2024-10-13 22:00 UTC)