[HN Gopher] How do HTTP servers figure out Content-Length?
___________________________________________________________________
How do HTTP servers figure out Content-Length?
Author : misonic
Score : 237 points
Date : 2024-10-07 03:18 UTC (19 hours ago)
(HTM) web link (aarol.dev)
(TXT) w3m dump (aarol.dev)
| pkulak wrote:
| And if you set your own content length header, most http servers
| will respect it and not chunk. That way, you can stream a 4-gig
| file that you know the size of per the metadata. This makes
| downloading nicer because browsers and such will then show a
| progress bar and time estimate.
|
| However, you better be right! I just found a bug in some really
| old code that was gzipping every response when it was appropriate
| (ie, asked for, textual, etc). But it was ignoring the content-
| length header! So, if it was set manually, it would then be wrong
| after compression. That caused insidious bugs for years. The fix,
| obviously, was to just delete that manual header if the stream
| was going to be compressed.
| bugtodiffer wrote:
| Did you see if you could turn this into HTTP Request Smuggling?
| Or something else with security impact?
|
| Sounds like a powerful bug you have, potentially.
| knallfrosch wrote:
| To me it sounds like the server handled the request just fine
| and reused the header (which was wrong.) The client then had
| the problem of a wrong response.
| guappa wrote:
| If you say to read 1000 bytes from memory and then pass a
| 900 bytes long array, that's a security bug that can cause
| crash, corrupt data, and leaking stuff that shouldn't have
| leaked.
| jpc0 wrote:
| The size of the buffer and how many bytes are written
| have nothing intrinsically linked to what the header
| says. It's a bug sure but does not mean there's any
| security issue on the server.
| guappa wrote:
| It will likely generate corrupt files on the client as
| well.
| Aachen wrote:
| Not very. The system might allocate that length ahead of
| time (I've seen that option in torrent clients and iirc
| ftp systems) but, latest by the time a FIN comes in,
| it'll know the file is finished and can truncate it. If
| finished-early downloads are not implemented despite it
| doing preallocation, that's still not a security bug
| guappa wrote:
| if a FIN comes, the client will mark the file as
| partially downloaded.
|
| but it might not come, since decades http sends more than
| one file per connection, so it might just get the
| beginning of the next reply, write that and the next
| reply will be corrupt as well.
| michaelmior wrote:
| It's a bug for sure, but I think whether it's a security
| issue could depend on the language. If the callee is able
| to determine the length of the array, it can just return
| an error instead of a potential buffer overrun.
| stickfigure wrote:
| In this case, the 1000 bytes aren't being read from
| memory, they're being read from a socket. If you try to
| over-read from a socket the worst you'll get is a
| blocking call or an error (depending what mode you're
| in).
| bugtodiffer wrote:
| maybe response based trickery then? :D What happens to the
| response after that one, are the first 100 bytes cut of, or
| what?
|
| I'm pretty sure something like this can cause some form of
| HTTP desync in a loadbalancer/proxy setup.
| dotancohen wrote:
| > That caused insidious bugs for years.
|
| A lot of people here could probably benefit professionally from
| hearing about what the bugs were. Knowing what to identify in
| the future could be really helpful. Thanks.
| pkulak wrote:
| Well, for years our logs would fill up with these nasty
| warnings of "connection dropped", something like that.
| Naturally, you think that's just some badly configured
| client, mobile connection, something. But then why would it
| be configured to log that as a warning (or it may have even
| triggered an error). I think that was because when payloads
| are small the compression overhead makes them larger, which
| means the Content-Length is too small and clients would be
| terminating the connection early. And getting garbage or
| truncated responses. Ouch!
| hobofan wrote:
| I think the article should be called "How do Go standard library
| HTTP servers figure out Content-Length?".
|
| In most HTTP server implementations from other languages I've
| worked with I recall having to either:
|
| - explicitly define the Content-Length up-front (clients then
| usually don't like it if you send too little and servers don't
| like it if you send too much)
|
| - have a single "write" operation with an object where the
| Content-Length can be figured out quite easily
|
| - turn on chunking myself and handle the chunk writing myself
|
| I don't recall having seen the kind of automatic chunking
| described in the article before (and I'm not too sure whether I'm
| a fan of it).
| lifthrasiir wrote:
| I believe the closest prior art would be PHP. It buffers a
| response by default until the buffer is full or `flush()` gets
| called, and will automatically set `Content-Encoding: chunked`
| if `Content-Length` wasn't explicitly set. Any subsequent
| writes will be automatically chunked.
|
| This approach makes sense from the API standpoint because the
| caller generally has no idea whether the chunked encoding is
| necessary, or even its very existence. Honestly that's less
| confusing than what express.js does to the middleware function:
| `app.get("/", (req, res) => { ... })` and `app.get("/", (req,
| res, next) => { ... })` behave differently because it tries to
| infer the presence of `next` by probing
| `Function.prototype.length`.
| nolok wrote:
| Fun thing about that : core PHP used to and still is very
| very close to HTTP, to the point where I would say your
| average decent PHP programmer used to knows more about how
| HTTP work that your average other similar language where the
| web library abstract stuff. Eg a PHP dev knows a form has to
| be multipart/form-data if you send files etc ...
|
| But one of the if not THE major exception is this : buffering
| and flushing works automagically and a lot of PHP dev end up
| massively blindsinded by it at some point
|
| PS: with the rise of modern PHP and it's high quality object
| based framework, this become less and less true
|
| PS2: I am not in ANY way saying anything good or bad or
| superior or inferior about any dev here, just a difference in
| approach
| JodieBenitez wrote:
| Well... not sure it was that magic. We used to play a lot
| with the ob_* functions.
| nolok wrote:
| Oh I didn't mean it figured stuff out in a smart way,
| only that it did auto buffering/chuncking/flushing for
| you in a way to abstract that whole idea from the dev,
| while other plateform had you care about it (cf above
| messages).
|
| But yeah the moment you ended up wanting to do anything
| advanced, you were doing your own buffer on top of that
| anyway, or disabling it and going raw.
| JodieBenitez wrote:
| and the joy of "pure php scripts" with closing ?> tags
| messing with the output when you didn't want it... all in
| libraries you had no say...
|
| I can't say I miss those days ! Or this platform for that
| matter.
| lifthrasiir wrote:
| Eventually people started omitting ?> at the very end,
| which is correct but also unsettling.
| gary_0 wrote:
| It's even recommended in the PHP docs:
| https://www.php.net/manual/en/language.basic-
| syntax.phptags....
| earthboundkid wrote:
| When I started my first job, I was like "Hey, these files
| don't have a closing ?>!!" and then my boss sighed.
| stefs wrote:
| so irksome having to leave the tags unbalanced!
| matsemann wrote:
| This gave me flashbacks to my youth of PHP programming. "
| headers already sent by (output started at ....", when you
| tried to modify headers but had already started to write
| the http content (hence the header part was set in stone)
| nolok wrote:
| And of course it was a space before or after the php
| delimiters in some random file.
| marcosdumay wrote:
| Good thing zero width space wasn't common in texts back
| then when PHP was the most common platform.
| rstuart4133 wrote:
| Err, https://w3techs.com/technologies/overview/programmin
| g_langua... : PHP 75.8% Ruby
| 6.0% ....
| lolinder wrote:
| > Honestly that's less confusing than what express.js does to
| the middleware function: `app.get("/", (req, res) => { ...
| })` and `app.get("/", (req, res, next) => { ... })` behave
| differently because it tries to infer the presence of `next`
| by probing `Function.prototype.length`.
|
| This feels like a completely random swipe at an unrelated
| feature of a JavaScript framework, and I'm not even sure that
| it's an _accurate_ swipe.
|
| The entire point of Function.length (slight nit:
| Function.prototype.length is different and is always zero) is
| to check the arity of the function [0]. There's no "tries
| to": if your middleware function accepts three arguments then
| it will have a length of 3.
|
| Aside from that, I've also done a bunch of digging and can't
| find any evidence that they're doing [1]. Do you have a
| source for the claim that this is what they're doing?
|
| [0] https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
|
| [1] https://github.com/search?q=repo%3Aexpressjs%2Fexpress%20
| %22...
| lifthrasiir wrote:
| Because we were talking about HTTP server frameworks, it
| seemed not that problematic to mention one of the most
| surprising things I've seen in this space. Not necessarily
| JS bashing, but sorry for that anyway.
|
| I'm surprised to see that it's now gone too! The exact
| commit is [1], which happened before Express.js 4.7, and
| you can search for the variable name `arity` in any
| previous versions to see what I was talking. It seems that
| my memory was slightly off as well, my bad. The correct
| description would be that older versions of Express.js used
| to distinguish "error" callbacks from normal router
| callbacks by their arities, so `(req, res)` and `(req, res,
| next)` would have been thankfully okay, while any extra
| argument added by an accident will effectively disable that
| callback without any indication. It was a very good reason
| for me to be surprised and annoyed at that time.
|
| [1] https://github.com/expressjs/express/commit/76e8bfa1dcb
| 7b293...
| LegionMammal978 wrote:
| Actually, it still uses <= 3 vs. = 4 arguments to
| distinguish between request callbacks and error
| callbacks. Check out the added lines to
| lib/router/layer.js in the commit you mention, or the
| equivalent functions in the current router v2.0.0 package
| [0].
|
| [0] https://github.com/pillarjs/router/blob/2e7fb67ad1b0c
| 1cd2d9e...
| everforward wrote:
| I may be alone here, but I don't it find it that absurd
| (though the implementation may be if JS doesn't support this
| well, no idea). This would be crazy in languages that
| actually enforce typing, but function type signature
| overloading to alter behavior seems semi-common in languages
| like Python and JS.
|
| Entirely unrelated, but the older I get, the more it seems
| like exposing the things under ".prototype" as parts of the
| object was probably a mistake. If I'm not mistaken, that is
| reflection, and it feels like JS reaches for reflection much
| more often than other languages. I think in part because it's
| a native part of the object rather than a reflection library,
| so it feels like less of an anti-pattern.
| lifthrasiir wrote:
| To be clear, distinguishing different types based on arity
| would have been okay if JS was statically typed or
| `Function` exposed more thorough information about its
| signature. `Function.prototype.length` is very primitive
| (it doesn't count any spread argument, partly because it
| dates back to the first edition of ECMAScript) and there is
| even no easy way to override it like Python's
| `@functools.wraps`. JS functions also don't check the
| number of given arguments at all, which is already much
| worse compared to Python but anyway, JS programmers like me
| would have reasonably expected excess arguments to be
| simply ignored.
| everforward wrote:
| > JS functions also don't check the number of given
| arguments at all
|
| I never really thought about this, but it does explain
| how optional arguments without a default value work in
| Typescript. How very strange of a language decision.
|
| > To be clear, distinguishing different types based on
| arity would have been okay if JS was statically typed or
| `Function` exposed more thorough information about its
| signature.
|
| I actually like this less in a system with better typing.
| I don't personally think it's a good tradeoff to
| dramatically increase the complexity of the types just to
| avoid having a separate method to register a chunked
| handler. It would make more sense to me to have
| "app.get()" and "app.getChunked()", or some kind of
| closure that converts a chunked handler to something
| app.get() will allow, like "app.get(chunked((req, res,
| next) => {}))".
|
| The typing effectively becomes part of the control flow
| of the application, which is something I tend to prefer
| avoiding. Data modelling should model the domain, code
| should implement business logic. Having data modelling
| impact business logic feels like some kind of recursive
| anti-pattern, but I'm not quite clever enough to figure
| out why it makes me feel that way.
| ahoka wrote:
| Some frameworks do automatic chunking when you pass a stream as
| the response body.
| dbrueck wrote:
| Chunked transfer encoding can be a pain, but it's a reasonable
| solution to several problems: when the response is too big to
| fit into memory, when the response size is unknown by the HTTP
| library, when the response size is unknown by the caller of the
| HTTP library, or when the response doesn't have a total size at
| all (never-ending data stream).
| danudey wrote:
| > explicitly define the Content-Length up-front (clients then
| usually don't like it if you send too little and servers don't
| like it if you send too much)
|
| We had a small router/firewall thing at a previous company that
| had a web interface, but for some reason its Content-Length
| header had an off-by-one error. IIRC Chrome handled this okay
| (once the connection was closed it would display the content)
| while Firefox would hang waiting for that one extra byte that
| never came.
| aragilar wrote:
| Note that there can be trailer fields (the phrase "trailing
| header" is both an oxymoron and a good description of it):
| https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Tr...
| eastbound wrote:
| Oh yeah, put the Content-Length _after_ the content, when you
| know its size! /j
| _ache_ wrote:
| > Anyone who has implemented a simple HTTP server can tell you
| that it is a really simple protocol
|
| It's not. Like, hell no. That is so complex. Multiplexing,
| underlying TCP specifications, Server Push, Stream prioritization
| (vs priorization !), encryption (ALPN or NPN ?), extension like
| HSTS, CORS, WebDav or HLS, ...
|
| It's a great protocol, nowhere near simple.
|
| > Basically, it's a text file that has some specific rules to
| make parsing it easier.
|
| Nope, since HTTP/2 that is just a textual representation, not the
| real "on the wire" protocol. HTTP/2 is 10 now.
| TickleSteve wrote:
| He was referring to HTTP 1.0 & 1.1
| lionkor wrote:
| Which is reasonably simple that you can build a complete 1.0
| server in C in an afternoon, and add some 1.1 stuff like
| keep-alives, content-length, etc. I did that for fun once, on
| github.com/lionkor/http
| secondcoming wrote:
| That's the easy part. The hard part is working around non-
| compliant third parties. HTTP is a real mess.
| lionkor wrote:
| true
| _ache_ wrote:
| Should have say that so (Nowhere does the article say 'up to
| HTTP/1.1' even talking about HTTP/2 and HTTP/3).
|
| HTTP/1.0 is simple. HTTP/1.1 is undoubtedly more complex but
| manageable.
|
| The statement that HTTP is simple is just not true. Even if
| Go makes it look easy.
| Cthulhu_ wrote:
| Every example in the article explicitly states HTTP 1.1,
| only at the end does it have a remark about how HTTP 2 and
| 3 don't have chunking as they have their own streaming
| mechanisms. The basic HTTP protocol _is_ simple, but 2 /3
| are no longer the basic HTTP protocols.
| _ache_ wrote:
| My point is HTTP isn't HTTP/1.1. There is a lot under the
| wood, even with HTTP/1.1. Actually, the fact that the
| whole article explains an implementation of an HTTP
| header is against the fact that it's simple.
|
| So when the article say "All HTTP requests look something
| like this", that's false, that is not a big deal but it
| spread that idea that HTTP is easy and it's not.
| mannyv wrote:
| Parts of 1.1 are pretty complicated if you try and implement
| them. Parts of it are simple.
|
| The whole section on cache is "reality based," and it's only
| gotten worse as the years have moved on.
|
| Anyway, back in the day Content-Length was one of the fields
| you were never supposed to trust. There's really no reason to
| trust it now, but I suppose you can use it as a hint to see
| amount of buffer you're supposed to allocate. But of course,
| the content length may exceed that length, which would mean
| that if you did it incorrectly you'd copy the incoming
| request data past the end of the buffer.
|
| So even today, don't trust Content-Length.
| treflop wrote:
| You can't trust lengths in any protocol or format.
| dbrueck wrote:
| The HTTP 1.1 spec isn't 175 pages long just for the fun of
| it. :)
| lloeki wrote:
| Chunked progress is fun, not many know it supports more than just
| sending chunk size but can synchronously multiplex information!
|
| e.g I drafted this a long time ago, because if you generate
| something live and send it in a streaming fashion, well you can't
| have progress reporting since you don't know the final size in
| bytes, even though server side you know how far you're into
| generating.
|
| This was used for multiple things like generating CSV exports
| from a bunch of RDBM records, or compressed tarballs from a set
| of files, or a bunch of other silly things like generating
| sequences (Fibonacci, random integers, whatever...), that could
| take "a while" (as in, enough to be friendly and report
| progress).
|
| https://github.com/lloeki/http-chunked-progress/blob/master/...
| oefrha wrote:
| I once wrote a terminal-style interactive web app as a single
| chunked HTML page, because I couldn't be bothered to implement
| a websocket endpoint. HTML and inline JS are interweaved and
| sent in reaction to user actions on the page. The only problem
| is browsers think the page never finishes loading.
| eastbound wrote:
| Couldn't you put the initial page in its own separate html,
| and load the rest as a long-running JS file using AJAX?
| oefrha wrote:
| Sure, but browsers don't do streaming execution of
| JavaScript (need to download the entire script), so you
| then need to manually stream the response and do HTML
| patching / eval chunked JS, as opposed to the browser
| trivially loading HTML / executing inline JS as the HTML
| page is streamed in all by itself. It does solve the
| loading spinner problem.
| pknerd wrote:
| Why would someone implement the chunk logic when websockets are
| here? Am I missing something? What are the use cases?
| blueflow wrote:
| chunked-encoding is a method of encoding an HTTP response body.
| The semantics for HTTP responses still apply, caching,
| compression, etc.
|
| Websocket is a different protocol that is started up via HTTP.
| rsynnott wrote:
| HTTP/1.1 came out in 1997. It's extremely well supported.
| Websockets were only standardised in 2011, and still have proxy
| traversal issues.
|
| You can absolutely assume that http 1.1 will work on basically
| anything; websockets are more finicky even now, and certainly
| were back in the day.
| masklinn wrote:
| Websockets are also on the far side of useless when it comes
| to streaming content the user is _downloading_. Javascript-
| filled pages are not the only clients of http.
| dylan604 wrote:
| > Javascript-filled pages are not the only clients of http.
|
| Whaaaaa??? We should eliminate these non-JS filled nonsense
| immediately!
| badmintonbaseba wrote:
| It wouldn't be so bad, if web and application APIs made
| stream processing of messages possible. The protocol itself
| could handle streaming content just fine, or at least not
| worse than HTTP.
| mannyv wrote:
| Web sockets have their own issues that can be/are
| implementation dependent.
|
| For example, some websocket servers don't pass back errors to
| the client (AWS). That makes it quite difficult to, say, retry
| on the client side.
|
| Chunked encoding is used by video players - so you can request
| X bytes of a video file. That means you don't have to download
| the whole file, and if the user closes the video you didn't
| waste bandwidth. There are likely more uses of it.
| floating-io wrote:
| > Chunked encoding is used by video players - so you can
| request X bytes of a video file. That means you don't have to
| download the whole file, and if the user closes the video you
| didn't waste bandwidth. There are likely more uses of it.
|
| Just a nitpick, but what you describe here is byte range
| requests. They can be used with or without chunked encoding,
| which is a separate thing.
| david422 wrote:
| I would say rule of thumb, websockets are for two way realtime
| communication, http chunked is just for 1 way streaming
| communication.
| pknerd wrote:
| Wow. Downvoted
| flohofwoe wrote:
| Unfortunately the article doesn't mention compression, because
| this is where it gets really ugly (especially with range
| requests), because IIRC the content-size reported in http
| responses and the range defined in range requests are on the
| compressed data, but at least in browsers you only get the
| uncompressed data back and don't even have access to the
| compressed data.
| meindnoch wrote:
| +1
|
| This is why in 2024 you still must use XmlHttpRequest instead
| of fetch() when progress reporting is needed. fetch() cannot do
| progress reporting on compressed streams.
| shakna wrote:
| Once the header is read, you can iterate over the
| ReadableStream, though, can't you?
| meindnoch wrote:
| 1. You know the size of the _compressed_ data from the
| Content-Length header.
|
| 2. You can iterate through the _uncompressed_ response
| bytes with a ReadableStream.
|
| Please explain how would you produce a progress percentage
| from these?
| lmz wrote:
| If you had control of both ends you could embed a header
| in the uncompressed data with the number of uncompressed
| bytes.
| mananaysiempre wrote:
| Or put that length in the response headers.
| meindnoch wrote:
| You won't be able to do this if you're downloading from a
| CDN. Which is exactly where you would host large files,
| for which progress reporting really matters.
| floating-io wrote:
| Why not? Every CDN I've ever worked with preserves custom
| headers.
| meindnoch wrote:
| Right. For example S3 supports custom headers, _as long
| as that header happens to start with "x-amz-meta-..."_ -
| and now your progress reporting is tied to your CDN
| choice!
|
| Not sure about you, but to me "XmlHttpRequest" in my
| request handling code feels less dirty than "x-amz-
| meta-". But to each their own I guess.
| meindnoch wrote:
| Good idea. We could give it a new MIME type too. E.g.
| application/octet-stream-with-length-prefix-due-to-
| idiotic-fetch-api
|
| Or we can keep using XmlHttpRequest.
|
| Tough choice.
| badmintonbaseba wrote:
| "processed uncompressed bytes"/"all uncompressed bytes"
| is a distorted progress indication anyway.
| jstanley wrote:
| Recompress the ReadableStream to work out roughly how
| long the compressed version is, and use the ratio of the
| length of your recompressed stream to the Content-Length
| to work out an approximate progress percentage.
| meindnoch wrote:
| Lol! Upvoted for creative thinking.
| jaffathecake wrote:
| The results might be totally different now, but back in 2014 I
| looked at how browsers behave if the resource is different to the
| content-length
| https://github.com/w3c/ServiceWorker/issues/362#issuecomment...
|
| Also in 2018, some fun where when downloading a file, browsers
| report bytes written to disk vs content-length, which is wildly
| out when you factor in gzip
| https://x.com/jaffathecake/status/996720156905820160
| Am4TIfIsER0ppos wrote:
| stat()?
| TZubiri wrote:
| len(response)
| simonjgreen wrote:
| Along this theme of knowledge, there is the lost art of tuning
| your page and content sizes such that they fit in as few packets
| as possible to speed up transmission. The front page of Google
| for example famously fitted in a single packet (I don't know if
| that's still the case). There is a brilliant book that used to be
| a bit of a bible in the world of web sysadmin from the Yahoo
| Exceptional Performance Team which is less relevant these days
| but interesting to understand the era.
|
| https://www.oreilly.com/library/view/high-performance-web/97...
| NelsonMinar wrote:
| See also the 14KB website article: https://endtimes.dev/why-
| your-website-should-be-under-14kb-i...
|
| Optimizing per-packet really improves things but has gotten
| very difficult with SSL and now QUIC. I'm not sure Google ever
| got the front page down to a single packet (would love a
| reference!) but it definitely paid very close attention to
| every byte and details of TCP performance.
| ryantownsend wrote:
| iirc, most content delivery networks have now configured
| initcwnd to be around 40, meaning ~58kb gets sent within the
| TCP slow start window and therefore 14kb is no longer
| relevant to most commercial websites (at least with H1/H2, as
| you mentioned QUIC/H3 uses UDP so it's different)
| divbzero wrote:
| When and where you heard that _initcwnd_ is typically 40
| for most CDNs?
|
| I was curious but the most recent data I could find was
| from 2017 when there was a mix of CDNs at _initcwnd=10_ and
| _initcwnd >10_:
|
| https://www.cdnplanet.com/blog/initcwnd-settings-major-
| cdn-p...
|
| Currently Linux still follows RFC6928 and defaults to
| _initcwnd=10_ :
|
| https://github.com/torvalds/linux/blob/v6.11/include/net/tc
| p...
| eastbound wrote:
| And the good olden time when IE only supported 31KB of
| Javascript.
| recursive wrote:
| It's time to bring back that rule.
| o11c wrote:
| I recently did a deep dive into the history of JavaScript
| standards: (pre-ecmascript versions of JS
| not investigated) EcmaScript 1(1997) = JavaScript
| 1.1 - missing many ES3 features (see below), of which
| exceptions are the unrecoverable thing. EcmaScript
| 2(1998) - minimal changes, mostly deprecations and
| clarifications of intent, reserve Java keywords
| EcmaScript 3(1999) - exceptions, regexes, switch, do-
| while, instanceof, undefined, strict equality, encodeURI*
| instead of escape, JSON, several methods on
| Object/String/Array/Date EcmaScript 4(2003) - does
| not exist due to committee implosion EcmaScript
| 5(2009) - strict mode, getters/setters, remove
| reservations of many Java keywords, add reservation for
| let/yield, debugger, many static functions of Object,
| Array.isArray, many Array methods, String().trim method,
| Date.now, Date().toISOString, Date().toJSON
| EcmaScript 5.1(2011) - I did not notice any changes
| compared to ES5, likely just wording changes. This is the
| first one that's available in HTML rather than just PDF.
| EcmaScript 6(2015) - classes, let/const, symbols, modules
| (in theory; it's $CURRENTYEAR and there are still major
| problems with them in practice), and all sorts of things
| (not listed) EcmaScript 11(2020) - bigint,
| globalThis
|
| If it were up to me, I'd restrict the web to ES3 with ES5
| library features, let/const from ES6, and
| bigint/globalThis from ES2020. That gives correctness and
| convenience without tempting people to actually try to
| write complex logic in it.
|
| There _are_ still pre-ES6 implementations in the wild
| (not for the general web obviously) ... from what I 've
| seen they're mostly ES5, sometimes with a few easy ES6
| features added.
| divbzero wrote:
| The "Slow-Start" section [1] of Ilya Grigorik's _High
| Performance Browser Networking_ also has a good explanation
| for why 14 KB is typically the size of the initial congestion
| window.
|
| [1]: https://hpbn.co/building-blocks-of-tcp/#slow-start
| benmmurphy wrote:
| It is interesting how bad some of the http2 clients in browsers
| send the first http2 request on a connection. It's often
| possible to smush it all into 1 TCP packet but browsers are
| often sending the request in 3 or 4 packets. I've even seen
| some server side bot detection systems check for this brain
| dead behaviour to verify it's really a browser making the
| request. I think this is due to the way all the abstractions
| interact and the lack of a corking option for the TLS library.
| remon wrote:
| Totally worth an article.
| skrebbel wrote:
| I thought I knew basic HTTP 1(.1), but I didn't know about
| trailers! Nice one, thanks.
| marcosdumay wrote:
| Half of the middleboxes on the way in the internet don't know
| about them either.
| klempner wrote:
| And browsers don't support them either, at least in any
| useful manner.
| jillesvangurp wrote:
| It's a nice exercise in any web framework to figure out how you
| would serve a big response without buffering it in memory. This
| can be surprisingly hard with some frameworks that just assume
| that you are buffering the entire response in memory. Usually, if
| you look hard there is a way around this.
|
| Buffering can be appropriate for small responses; or at least
| convenient. But for bigger responses this can be error prone. If
| you do this right, you serve the first byte of the response to
| the user before you read the last byte from wherever you are
| reading (database, file system, S3, etc.). If you do it wrong,
| you might run out of memory. Or your user's request times out
| before you are ready to respond.
|
| This is a thing that's gotten harder with non-blocking
| frameworks. Spring Boot in particular can be a PITA on this front
| if you use it with non-blocking IO. I had some fun figuring that
| out some years ago. Using Kotlin makes it slightly easier to deal
| with low level Spring internals (fluxes and what not).
|
| Sometimes the right answer is that it's too expensive to figure
| out the content length, or a content hash. Whatever you do, you
| need to send the headers with that information before you send
| anything else. And if you need to read everything before you can
| calculate that information and send it, your choices are
| buffering or omitting that information.
| jerf wrote:
| "This can be surprisingly hard with some frameworks that just
| assume that you are buffering the entire response in memory.
| Usually, if you look hard there is a way around this."
|
| This is the #1 most common mistake made by a "web framework".
|
| Before $YOU jump up with a list of exceptions, it slowly gets
| better over time, and it has been getting better for a while,
| and there are many frameworks in the world, so the list that
| get it right is quite long. But there's still a lot of
| frameworks out there that assume this, that consider streaming
| to be the "exception" rather than non-streaming being a special
| case of streaming, and I still see new people make this mistake
| with some frequency, so the list of frameworks that still
| incorporate this mistake into their very core is also quite
| long.
|
| My favorite is when I see a new framework sit on top of
| something like Go that properly streams, and it actively wrecks
| the underlying streaming capability to turn an HTTP response
| into a string.
|
| Streaming properly is harder in the short term, but writing a
| framework where all responses are strings becomes harder in the
| long term. You eventually hit the wall where that is no longer
| feasible, but then, fixing it becomes very difficult.
|
| Simply not sending a content-length is often the right answer.
| In an API situation, whatever negative consequences there are
| are fairly muted. The real problem I encounter a lot is when
| I'm streaming out some response from some DB query and I
| encounter a situation that I would have yielded a 500-type
| response for after I've already streamed out some content. It
| can be helpful to specify in your API that you may both emit
| content _and_ an error and users need to check both. For
| instance, in the common case of dumping JSON, you can spec a
| top-level { "results": [...], "error": ...} as your return
| type, stream out a "results", but if a later error occurs,
| still return an "error" later. Arguably suboptimal, but
| requiring all errors to be known up front in a streaming
| situation is impossible, so... suboptimal wins over impossible.
| nraynaud wrote:
| I have done crazy stuff to compute the content length of some
| payloads. For context one of my client works in cloud stuff and I
| worked in converting hdd format on the fly in a UI VM. The
| webserver that accepts the files doesn't do chunked encoding. And
| there is no space to store the file. So I had to resort to
| passing over the input file once to transform it, compute its
| allocation table and transformed size, then throw away everything
| but the file and the table, restart the scan with the correct
| header and re-do the transformation.
| dicroce wrote:
| At least in the implementation I wrote the default way to provide
| the body was a string... which has a length. For binary data I
| believe the API could accept either a std::vector<uint8_t> (which
| has a size) or a pointer and a size. If you needed chunked
| transfer encoding you had to ask for it and then make repeated
| calls to write chunks (that each have a fixed length).
|
| To me the more interesting question is how web server receive an
| incoming request. You want to be able to read the whole thing
| into a single buffer, but you don't know how long its going to be
| until you actually read some of it. I learned recently that libc
| has a way to "peek" at some data without removing it from the
| recv buffer..... I'm curious if this is ever used to optimize the
| receive process?
| mikepurvis wrote:
| Not sure about Linux, but for LwIP on embedded, the buffer
| isn't continuous; it's a linked list of preallocated pbuf
| objects. So you can either read directly from those if you play
| by their rules or if you really do need it in contiguous memory
| you call a function to copy from LwIP's buffers to one you
| supply.
| AndrewStephens wrote:
| When I worked on a commercial HTTP proxy in the early 2000s, it
| was very common for servers to return off-by-one values for
| Content-Length - so much so that we had to implement heuristics
| to ignore and fix such errors.
|
| It may be better now but a huge number of libraries and
| frameworks would either include the terminating NULL byte in the
| count but not send it, or not include the terminator in the count
| but include it in the stream.
| matthewaveryusa wrote:
| Next up is how forms with (multiple) attachments are uploaded
| with Content-Type=multipart/form-data; boundary=$something_unique
|
| https://notes.benheater.com/books/web/page/multipart-forms-a...
| Sytten wrote:
| There is a whole class of attacks called HTTP Desync Attacks that
| target just that problem since it is hard to get that right,
| especially accross multiple different http stacks. And if you
| dont get it right the result.is that bytes are left on the TCP
| connections and read as the next request in case of a reuse.
___________________________________________________________________
(page generated 2024-10-07 23:01 UTC)