[HN Gopher] How do HTTP servers figure out Content-Length?
       ___________________________________________________________________
        
       How do HTTP servers figure out Content-Length?
        
       Author : misonic
       Score  : 237 points
       Date   : 2024-10-07 03:18 UTC (19 hours ago)
        
 (HTM) web link (aarol.dev)
 (TXT) w3m dump (aarol.dev)
        
       | pkulak wrote:
       | And if you set your own content length header, most http servers
       | will respect it and not chunk. That way, you can stream a 4-gig
       | file that you know the size of per the metadata. This makes
       | downloading nicer because browsers and such will then show a
       | progress bar and time estimate.
       | 
       | However, you better be right! I just found a bug in some really
       | old code that was gzipping every response when it was appropriate
       | (ie, asked for, textual, etc). But it was ignoring the content-
       | length header! So, if it was set manually, it would then be wrong
       | after compression. That caused insidious bugs for years. The fix,
       | obviously, was to just delete that manual header if the stream
       | was going to be compressed.
        
         | bugtodiffer wrote:
         | Did you see if you could turn this into HTTP Request Smuggling?
         | Or something else with security impact?
         | 
         | Sounds like a powerful bug you have, potentially.
        
           | knallfrosch wrote:
           | To me it sounds like the server handled the request just fine
           | and reused the header (which was wrong.) The client then had
           | the problem of a wrong response.
        
             | guappa wrote:
             | If you say to read 1000 bytes from memory and then pass a
             | 900 bytes long array, that's a security bug that can cause
             | crash, corrupt data, and leaking stuff that shouldn't have
             | leaked.
        
               | jpc0 wrote:
               | The size of the buffer and how many bytes are written
               | have nothing intrinsically linked to what the header
               | says. It's a bug sure but does not mean there's any
               | security issue on the server.
        
               | guappa wrote:
               | It will likely generate corrupt files on the client as
               | well.
        
               | Aachen wrote:
               | Not very. The system might allocate that length ahead of
               | time (I've seen that option in torrent clients and iirc
               | ftp systems) but, latest by the time a FIN comes in,
               | it'll know the file is finished and can truncate it. If
               | finished-early downloads are not implemented despite it
               | doing preallocation, that's still not a security bug
        
               | guappa wrote:
               | if a FIN comes, the client will mark the file as
               | partially downloaded.
               | 
               | but it might not come, since decades http sends more than
               | one file per connection, so it might just get the
               | beginning of the next reply, write that and the next
               | reply will be corrupt as well.
        
               | michaelmior wrote:
               | It's a bug for sure, but I think whether it's a security
               | issue could depend on the language. If the callee is able
               | to determine the length of the array, it can just return
               | an error instead of a potential buffer overrun.
        
               | stickfigure wrote:
               | In this case, the 1000 bytes aren't being read from
               | memory, they're being read from a socket. If you try to
               | over-read from a socket the worst you'll get is a
               | blocking call or an error (depending what mode you're
               | in).
        
             | bugtodiffer wrote:
             | maybe response based trickery then? :D What happens to the
             | response after that one, are the first 100 bytes cut of, or
             | what?
             | 
             | I'm pretty sure something like this can cause some form of
             | HTTP desync in a loadbalancer/proxy setup.
        
         | dotancohen wrote:
         | > That caused insidious bugs for years.
         | 
         | A lot of people here could probably benefit professionally from
         | hearing about what the bugs were. Knowing what to identify in
         | the future could be really helpful. Thanks.
        
           | pkulak wrote:
           | Well, for years our logs would fill up with these nasty
           | warnings of "connection dropped", something like that.
           | Naturally, you think that's just some badly configured
           | client, mobile connection, something. But then why would it
           | be configured to log that as a warning (or it may have even
           | triggered an error). I think that was because when payloads
           | are small the compression overhead makes them larger, which
           | means the Content-Length is too small and clients would be
           | terminating the connection early. And getting garbage or
           | truncated responses. Ouch!
        
       | hobofan wrote:
       | I think the article should be called "How do Go standard library
       | HTTP servers figure out Content-Length?".
       | 
       | In most HTTP server implementations from other languages I've
       | worked with I recall having to either:
       | 
       | - explicitly define the Content-Length up-front (clients then
       | usually don't like it if you send too little and servers don't
       | like it if you send too much)
       | 
       | - have a single "write" operation with an object where the
       | Content-Length can be figured out quite easily
       | 
       | - turn on chunking myself and handle the chunk writing myself
       | 
       | I don't recall having seen the kind of automatic chunking
       | described in the article before (and I'm not too sure whether I'm
       | a fan of it).
        
         | lifthrasiir wrote:
         | I believe the closest prior art would be PHP. It buffers a
         | response by default until the buffer is full or `flush()` gets
         | called, and will automatically set `Content-Encoding: chunked`
         | if `Content-Length` wasn't explicitly set. Any subsequent
         | writes will be automatically chunked.
         | 
         | This approach makes sense from the API standpoint because the
         | caller generally has no idea whether the chunked encoding is
         | necessary, or even its very existence. Honestly that's less
         | confusing than what express.js does to the middleware function:
         | `app.get("/", (req, res) => { ... })` and `app.get("/", (req,
         | res, next) => { ... })` behave differently because it tries to
         | infer the presence of `next` by probing
         | `Function.prototype.length`.
        
           | nolok wrote:
           | Fun thing about that : core PHP used to and still is very
           | very close to HTTP, to the point where I would say your
           | average decent PHP programmer used to knows more about how
           | HTTP work that your average other similar language where the
           | web library abstract stuff. Eg a PHP dev knows a form has to
           | be multipart/form-data if you send files etc ...
           | 
           | But one of the if not THE major exception is this : buffering
           | and flushing works automagically and a lot of PHP dev end up
           | massively blindsinded by it at some point
           | 
           | PS: with the rise of modern PHP and it's high quality object
           | based framework, this become less and less true
           | 
           | PS2: I am not in ANY way saying anything good or bad or
           | superior or inferior about any dev here, just a difference in
           | approach
        
             | JodieBenitez wrote:
             | Well... not sure it was that magic. We used to play a lot
             | with the ob_* functions.
        
               | nolok wrote:
               | Oh I didn't mean it figured stuff out in a smart way,
               | only that it did auto buffering/chuncking/flushing for
               | you in a way to abstract that whole idea from the dev,
               | while other plateform had you care about it (cf above
               | messages).
               | 
               | But yeah the moment you ended up wanting to do anything
               | advanced, you were doing your own buffer on top of that
               | anyway, or disabling it and going raw.
        
               | JodieBenitez wrote:
               | and the joy of "pure php scripts" with closing ?> tags
               | messing with the output when you didn't want it... all in
               | libraries you had no say...
               | 
               | I can't say I miss those days ! Or this platform for that
               | matter.
        
               | lifthrasiir wrote:
               | Eventually people started omitting ?> at the very end,
               | which is correct but also unsettling.
        
               | gary_0 wrote:
               | It's even recommended in the PHP docs:
               | https://www.php.net/manual/en/language.basic-
               | syntax.phptags....
        
               | earthboundkid wrote:
               | When I started my first job, I was like "Hey, these files
               | don't have a closing ?>!!" and then my boss sighed.
        
               | stefs wrote:
               | so irksome having to leave the tags unbalanced!
        
             | matsemann wrote:
             | This gave me flashbacks to my youth of PHP programming. "
             | headers already sent by (output started at ....", when you
             | tried to modify headers but had already started to write
             | the http content (hence the header part was set in stone)
        
               | nolok wrote:
               | And of course it was a space before or after the php
               | delimiters in some random file.
        
               | marcosdumay wrote:
               | Good thing zero width space wasn't common in texts back
               | then when PHP was the most common platform.
        
               | rstuart4133 wrote:
               | Err, https://w3techs.com/technologies/overview/programmin
               | g_langua... :                   PHP 75.8%         Ruby
               | 6.0%         ....
        
           | lolinder wrote:
           | > Honestly that's less confusing than what express.js does to
           | the middleware function: `app.get("/", (req, res) => { ...
           | })` and `app.get("/", (req, res, next) => { ... })` behave
           | differently because it tries to infer the presence of `next`
           | by probing `Function.prototype.length`.
           | 
           | This feels like a completely random swipe at an unrelated
           | feature of a JavaScript framework, and I'm not even sure that
           | it's an _accurate_ swipe.
           | 
           | The entire point of Function.length (slight nit:
           | Function.prototype.length is different and is always zero) is
           | to check the arity of the function [0]. There's no "tries
           | to": if your middleware function accepts three arguments then
           | it will have a length of 3.
           | 
           | Aside from that, I've also done a bunch of digging and can't
           | find any evidence that they're doing [1]. Do you have a
           | source for the claim that this is what they're doing?
           | 
           | [0] https://developer.mozilla.org/en-
           | US/docs/Web/JavaScript/Refe...
           | 
           | [1] https://github.com/search?q=repo%3Aexpressjs%2Fexpress%20
           | %22...
        
             | lifthrasiir wrote:
             | Because we were talking about HTTP server frameworks, it
             | seemed not that problematic to mention one of the most
             | surprising things I've seen in this space. Not necessarily
             | JS bashing, but sorry for that anyway.
             | 
             | I'm surprised to see that it's now gone too! The exact
             | commit is [1], which happened before Express.js 4.7, and
             | you can search for the variable name `arity` in any
             | previous versions to see what I was talking. It seems that
             | my memory was slightly off as well, my bad. The correct
             | description would be that older versions of Express.js used
             | to distinguish "error" callbacks from normal router
             | callbacks by their arities, so `(req, res)` and `(req, res,
             | next)` would have been thankfully okay, while any extra
             | argument added by an accident will effectively disable that
             | callback without any indication. It was a very good reason
             | for me to be surprised and annoyed at that time.
             | 
             | [1] https://github.com/expressjs/express/commit/76e8bfa1dcb
             | 7b293...
        
               | LegionMammal978 wrote:
               | Actually, it still uses <= 3 vs. = 4 arguments to
               | distinguish between request callbacks and error
               | callbacks. Check out the added lines to
               | lib/router/layer.js in the commit you mention, or the
               | equivalent functions in the current router v2.0.0 package
               | [0].
               | 
               | [0] https://github.com/pillarjs/router/blob/2e7fb67ad1b0c
               | 1cd2d9e...
        
           | everforward wrote:
           | I may be alone here, but I don't it find it that absurd
           | (though the implementation may be if JS doesn't support this
           | well, no idea). This would be crazy in languages that
           | actually enforce typing, but function type signature
           | overloading to alter behavior seems semi-common in languages
           | like Python and JS.
           | 
           | Entirely unrelated, but the older I get, the more it seems
           | like exposing the things under ".prototype" as parts of the
           | object was probably a mistake. If I'm not mistaken, that is
           | reflection, and it feels like JS reaches for reflection much
           | more often than other languages. I think in part because it's
           | a native part of the object rather than a reflection library,
           | so it feels like less of an anti-pattern.
        
             | lifthrasiir wrote:
             | To be clear, distinguishing different types based on arity
             | would have been okay if JS was statically typed or
             | `Function` exposed more thorough information about its
             | signature. `Function.prototype.length` is very primitive
             | (it doesn't count any spread argument, partly because it
             | dates back to the first edition of ECMAScript) and there is
             | even no easy way to override it like Python's
             | `@functools.wraps`. JS functions also don't check the
             | number of given arguments at all, which is already much
             | worse compared to Python but anyway, JS programmers like me
             | would have reasonably expected excess arguments to be
             | simply ignored.
        
               | everforward wrote:
               | > JS functions also don't check the number of given
               | arguments at all
               | 
               | I never really thought about this, but it does explain
               | how optional arguments without a default value work in
               | Typescript. How very strange of a language decision.
               | 
               | > To be clear, distinguishing different types based on
               | arity would have been okay if JS was statically typed or
               | `Function` exposed more thorough information about its
               | signature.
               | 
               | I actually like this less in a system with better typing.
               | I don't personally think it's a good tradeoff to
               | dramatically increase the complexity of the types just to
               | avoid having a separate method to register a chunked
               | handler. It would make more sense to me to have
               | "app.get()" and "app.getChunked()", or some kind of
               | closure that converts a chunked handler to something
               | app.get() will allow, like "app.get(chunked((req, res,
               | next) => {}))".
               | 
               | The typing effectively becomes part of the control flow
               | of the application, which is something I tend to prefer
               | avoiding. Data modelling should model the domain, code
               | should implement business logic. Having data modelling
               | impact business logic feels like some kind of recursive
               | anti-pattern, but I'm not quite clever enough to figure
               | out why it makes me feel that way.
        
         | ahoka wrote:
         | Some frameworks do automatic chunking when you pass a stream as
         | the response body.
        
         | dbrueck wrote:
         | Chunked transfer encoding can be a pain, but it's a reasonable
         | solution to several problems: when the response is too big to
         | fit into memory, when the response size is unknown by the HTTP
         | library, when the response size is unknown by the caller of the
         | HTTP library, or when the response doesn't have a total size at
         | all (never-ending data stream).
        
         | danudey wrote:
         | > explicitly define the Content-Length up-front (clients then
         | usually don't like it if you send too little and servers don't
         | like it if you send too much)
         | 
         | We had a small router/firewall thing at a previous company that
         | had a web interface, but for some reason its Content-Length
         | header had an off-by-one error. IIRC Chrome handled this okay
         | (once the connection was closed it would display the content)
         | while Firefox would hang waiting for that one extra byte that
         | never came.
        
       | aragilar wrote:
       | Note that there can be trailer fields (the phrase "trailing
       | header" is both an oxymoron and a good description of it):
       | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Tr...
        
         | eastbound wrote:
         | Oh yeah, put the Content-Length _after_ the content, when you
         | know its size!  /j
        
       | _ache_ wrote:
       | > Anyone who has implemented a simple HTTP server can tell you
       | that it is a really simple protocol
       | 
       | It's not. Like, hell no. That is so complex. Multiplexing,
       | underlying TCP specifications, Server Push, Stream prioritization
       | (vs priorization !), encryption (ALPN or NPN ?), extension like
       | HSTS, CORS, WebDav or HLS, ...
       | 
       | It's a great protocol, nowhere near simple.
       | 
       | > Basically, it's a text file that has some specific rules to
       | make parsing it easier.
       | 
       | Nope, since HTTP/2 that is just a textual representation, not the
       | real "on the wire" protocol. HTTP/2 is 10 now.
        
         | TickleSteve wrote:
         | He was referring to HTTP 1.0 & 1.1
        
           | lionkor wrote:
           | Which is reasonably simple that you can build a complete 1.0
           | server in C in an afternoon, and add some 1.1 stuff like
           | keep-alives, content-length, etc. I did that for fun once, on
           | github.com/lionkor/http
        
             | secondcoming wrote:
             | That's the easy part. The hard part is working around non-
             | compliant third parties. HTTP is a real mess.
        
               | lionkor wrote:
               | true
        
           | _ache_ wrote:
           | Should have say that so (Nowhere does the article say 'up to
           | HTTP/1.1' even talking about HTTP/2 and HTTP/3).
           | 
           | HTTP/1.0 is simple. HTTP/1.1 is undoubtedly more complex but
           | manageable.
           | 
           | The statement that HTTP is simple is just not true. Even if
           | Go makes it look easy.
        
             | Cthulhu_ wrote:
             | Every example in the article explicitly states HTTP 1.1,
             | only at the end does it have a remark about how HTTP 2 and
             | 3 don't have chunking as they have their own streaming
             | mechanisms. The basic HTTP protocol _is_ simple, but 2 /3
             | are no longer the basic HTTP protocols.
        
               | _ache_ wrote:
               | My point is HTTP isn't HTTP/1.1. There is a lot under the
               | wood, even with HTTP/1.1. Actually, the fact that the
               | whole article explains an implementation of an HTTP
               | header is against the fact that it's simple.
               | 
               | So when the article say "All HTTP requests look something
               | like this", that's false, that is not a big deal but it
               | spread that idea that HTTP is easy and it's not.
        
           | mannyv wrote:
           | Parts of 1.1 are pretty complicated if you try and implement
           | them. Parts of it are simple.
           | 
           | The whole section on cache is "reality based," and it's only
           | gotten worse as the years have moved on.
           | 
           | Anyway, back in the day Content-Length was one of the fields
           | you were never supposed to trust. There's really no reason to
           | trust it now, but I suppose you can use it as a hint to see
           | amount of buffer you're supposed to allocate. But of course,
           | the content length may exceed that length, which would mean
           | that if you did it incorrectly you'd copy the incoming
           | request data past the end of the buffer.
           | 
           | So even today, don't trust Content-Length.
        
             | treflop wrote:
             | You can't trust lengths in any protocol or format.
        
           | dbrueck wrote:
           | The HTTP 1.1 spec isn't 175 pages long just for the fun of
           | it. :)
        
       | lloeki wrote:
       | Chunked progress is fun, not many know it supports more than just
       | sending chunk size but can synchronously multiplex information!
       | 
       | e.g I drafted this a long time ago, because if you generate
       | something live and send it in a streaming fashion, well you can't
       | have progress reporting since you don't know the final size in
       | bytes, even though server side you know how far you're into
       | generating.
       | 
       | This was used for multiple things like generating CSV exports
       | from a bunch of RDBM records, or compressed tarballs from a set
       | of files, or a bunch of other silly things like generating
       | sequences (Fibonacci, random integers, whatever...), that could
       | take "a while" (as in, enough to be friendly and report
       | progress).
       | 
       | https://github.com/lloeki/http-chunked-progress/blob/master/...
        
         | oefrha wrote:
         | I once wrote a terminal-style interactive web app as a single
         | chunked HTML page, because I couldn't be bothered to implement
         | a websocket endpoint. HTML and inline JS are interweaved and
         | sent in reaction to user actions on the page. The only problem
         | is browsers think the page never finishes loading.
        
           | eastbound wrote:
           | Couldn't you put the initial page in its own separate html,
           | and load the rest as a long-running JS file using AJAX?
        
             | oefrha wrote:
             | Sure, but browsers don't do streaming execution of
             | JavaScript (need to download the entire script), so you
             | then need to manually stream the response and do HTML
             | patching / eval chunked JS, as opposed to the browser
             | trivially loading HTML / executing inline JS as the HTML
             | page is streamed in all by itself. It does solve the
             | loading spinner problem.
        
       | pknerd wrote:
       | Why would someone implement the chunk logic when websockets are
       | here? Am I missing something? What are the use cases?
        
         | blueflow wrote:
         | chunked-encoding is a method of encoding an HTTP response body.
         | The semantics for HTTP responses still apply, caching,
         | compression, etc.
         | 
         | Websocket is a different protocol that is started up via HTTP.
        
         | rsynnott wrote:
         | HTTP/1.1 came out in 1997. It's extremely well supported.
         | Websockets were only standardised in 2011, and still have proxy
         | traversal issues.
         | 
         | You can absolutely assume that http 1.1 will work on basically
         | anything; websockets are more finicky even now, and certainly
         | were back in the day.
        
           | masklinn wrote:
           | Websockets are also on the far side of useless when it comes
           | to streaming content the user is _downloading_. Javascript-
           | filled pages are not the only clients of http.
        
             | dylan604 wrote:
             | > Javascript-filled pages are not the only clients of http.
             | 
             | Whaaaaa??? We should eliminate these non-JS filled nonsense
             | immediately!
        
             | badmintonbaseba wrote:
             | It wouldn't be so bad, if web and application APIs made
             | stream processing of messages possible. The protocol itself
             | could handle streaming content just fine, or at least not
             | worse than HTTP.
        
         | mannyv wrote:
         | Web sockets have their own issues that can be/are
         | implementation dependent.
         | 
         | For example, some websocket servers don't pass back errors to
         | the client (AWS). That makes it quite difficult to, say, retry
         | on the client side.
         | 
         | Chunked encoding is used by video players - so you can request
         | X bytes of a video file. That means you don't have to download
         | the whole file, and if the user closes the video you didn't
         | waste bandwidth. There are likely more uses of it.
        
           | floating-io wrote:
           | > Chunked encoding is used by video players - so you can
           | request X bytes of a video file. That means you don't have to
           | download the whole file, and if the user closes the video you
           | didn't waste bandwidth. There are likely more uses of it.
           | 
           | Just a nitpick, but what you describe here is byte range
           | requests. They can be used with or without chunked encoding,
           | which is a separate thing.
        
         | david422 wrote:
         | I would say rule of thumb, websockets are for two way realtime
         | communication, http chunked is just for 1 way streaming
         | communication.
        
         | pknerd wrote:
         | Wow. Downvoted
        
       | flohofwoe wrote:
       | Unfortunately the article doesn't mention compression, because
       | this is where it gets really ugly (especially with range
       | requests), because IIRC the content-size reported in http
       | responses and the range defined in range requests are on the
       | compressed data, but at least in browsers you only get the
       | uncompressed data back and don't even have access to the
       | compressed data.
        
         | meindnoch wrote:
         | +1
         | 
         | This is why in 2024 you still must use XmlHttpRequest instead
         | of fetch() when progress reporting is needed. fetch() cannot do
         | progress reporting on compressed streams.
        
           | shakna wrote:
           | Once the header is read, you can iterate over the
           | ReadableStream, though, can't you?
        
             | meindnoch wrote:
             | 1. You know the size of the _compressed_ data from the
             | Content-Length header.
             | 
             | 2. You can iterate through the _uncompressed_ response
             | bytes with a ReadableStream.
             | 
             | Please explain how would you produce a progress percentage
             | from these?
        
               | lmz wrote:
               | If you had control of both ends you could embed a header
               | in the uncompressed data with the number of uncompressed
               | bytes.
        
               | mananaysiempre wrote:
               | Or put that length in the response headers.
        
               | meindnoch wrote:
               | You won't be able to do this if you're downloading from a
               | CDN. Which is exactly where you would host large files,
               | for which progress reporting really matters.
        
               | floating-io wrote:
               | Why not? Every CDN I've ever worked with preserves custom
               | headers.
        
               | meindnoch wrote:
               | Right. For example S3 supports custom headers, _as long
               | as that header happens to start with "x-amz-meta-..."_ -
               | and now your progress reporting is tied to your CDN
               | choice!
               | 
               | Not sure about you, but to me "XmlHttpRequest" in my
               | request handling code feels less dirty than "x-amz-
               | meta-". But to each their own I guess.
        
               | meindnoch wrote:
               | Good idea. We could give it a new MIME type too. E.g.
               | application/octet-stream-with-length-prefix-due-to-
               | idiotic-fetch-api
               | 
               | Or we can keep using XmlHttpRequest.
               | 
               | Tough choice.
        
               | badmintonbaseba wrote:
               | "processed uncompressed bytes"/"all uncompressed bytes"
               | is a distorted progress indication anyway.
        
               | jstanley wrote:
               | Recompress the ReadableStream to work out roughly how
               | long the compressed version is, and use the ratio of the
               | length of your recompressed stream to the Content-Length
               | to work out an approximate progress percentage.
        
               | meindnoch wrote:
               | Lol! Upvoted for creative thinking.
        
       | jaffathecake wrote:
       | The results might be totally different now, but back in 2014 I
       | looked at how browsers behave if the resource is different to the
       | content-length
       | https://github.com/w3c/ServiceWorker/issues/362#issuecomment...
       | 
       | Also in 2018, some fun where when downloading a file, browsers
       | report bytes written to disk vs content-length, which is wildly
       | out when you factor in gzip
       | https://x.com/jaffathecake/status/996720156905820160
        
       | Am4TIfIsER0ppos wrote:
       | stat()?
        
       | TZubiri wrote:
       | len(response)
        
       | simonjgreen wrote:
       | Along this theme of knowledge, there is the lost art of tuning
       | your page and content sizes such that they fit in as few packets
       | as possible to speed up transmission. The front page of Google
       | for example famously fitted in a single packet (I don't know if
       | that's still the case). There is a brilliant book that used to be
       | a bit of a bible in the world of web sysadmin from the Yahoo
       | Exceptional Performance Team which is less relevant these days
       | but interesting to understand the era.
       | 
       | https://www.oreilly.com/library/view/high-performance-web/97...
        
         | NelsonMinar wrote:
         | See also the 14KB website article: https://endtimes.dev/why-
         | your-website-should-be-under-14kb-i...
         | 
         | Optimizing per-packet really improves things but has gotten
         | very difficult with SSL and now QUIC. I'm not sure Google ever
         | got the front page down to a single packet (would love a
         | reference!) but it definitely paid very close attention to
         | every byte and details of TCP performance.
        
           | ryantownsend wrote:
           | iirc, most content delivery networks have now configured
           | initcwnd to be around 40, meaning ~58kb gets sent within the
           | TCP slow start window and therefore 14kb is no longer
           | relevant to most commercial websites (at least with H1/H2, as
           | you mentioned QUIC/H3 uses UDP so it's different)
        
             | divbzero wrote:
             | When and where you heard that _initcwnd_ is typically 40
             | for most CDNs?
             | 
             | I was curious but the most recent data I could find was
             | from 2017 when there was a mix of CDNs at _initcwnd=10_ and
             | _initcwnd >10_:
             | 
             | https://www.cdnplanet.com/blog/initcwnd-settings-major-
             | cdn-p...
             | 
             | Currently Linux still follows RFC6928 and defaults to
             | _initcwnd=10_ :
             | 
             | https://github.com/torvalds/linux/blob/v6.11/include/net/tc
             | p...
        
           | eastbound wrote:
           | And the good olden time when IE only supported 31KB of
           | Javascript.
        
             | recursive wrote:
             | It's time to bring back that rule.
        
               | o11c wrote:
               | I recently did a deep dive into the history of JavaScript
               | standards:                 (pre-ecmascript versions of JS
               | not investigated)       EcmaScript 1(1997) = JavaScript
               | 1.1 - missing many ES3 features (see below), of which
               | exceptions are the unrecoverable thing.       EcmaScript
               | 2(1998) - minimal changes, mostly deprecations and
               | clarifications of intent, reserve Java keywords
               | EcmaScript 3(1999) - exceptions, regexes, switch, do-
               | while, instanceof, undefined, strict equality, encodeURI*
               | instead of escape, JSON, several methods on
               | Object/String/Array/Date       EcmaScript 4(2003) - does
               | not exist due to committee implosion       EcmaScript
               | 5(2009) - strict mode, getters/setters, remove
               | reservations of many Java keywords, add reservation for
               | let/yield, debugger, many static functions of Object,
               | Array.isArray, many Array methods, String().trim method,
               | Date.now, Date().toISOString, Date().toJSON
               | EcmaScript 5.1(2011) - I did not notice any changes
               | compared to ES5, likely just wording changes. This is the
               | first one that's available in HTML rather than just PDF.
               | EcmaScript 6(2015) - classes, let/const, symbols, modules
               | (in theory; it's $CURRENTYEAR and there are still major
               | problems with them in practice), and all sorts of things
               | (not listed)       EcmaScript 11(2020) - bigint,
               | globalThis
               | 
               | If it were up to me, I'd restrict the web to ES3 with ES5
               | library features, let/const from ES6, and
               | bigint/globalThis from ES2020. That gives correctness and
               | convenience without tempting people to actually try to
               | write complex logic in it.
               | 
               | There _are_ still pre-ES6 implementations in the wild
               | (not for the general web obviously) ... from what I 've
               | seen they're mostly ES5, sometimes with a few easy ES6
               | features added.
        
           | divbzero wrote:
           | The "Slow-Start" section [1] of Ilya Grigorik's _High
           | Performance Browser Networking_ also has a good explanation
           | for why 14 KB is typically the size of the initial congestion
           | window.
           | 
           | [1]: https://hpbn.co/building-blocks-of-tcp/#slow-start
        
         | benmmurphy wrote:
         | It is interesting how bad some of the http2 clients in browsers
         | send the first http2 request on a connection. It's often
         | possible to smush it all into 1 TCP packet but browsers are
         | often sending the request in 3 or 4 packets. I've even seen
         | some server side bot detection systems check for this brain
         | dead behaviour to verify it's really a browser making the
         | request. I think this is due to the way all the abstractions
         | interact and the lack of a corking option for the TLS library.
        
       | remon wrote:
       | Totally worth an article.
        
       | skrebbel wrote:
       | I thought I knew basic HTTP 1(.1), but I didn't know about
       | trailers! Nice one, thanks.
        
         | marcosdumay wrote:
         | Half of the middleboxes on the way in the internet don't know
         | about them either.
        
           | klempner wrote:
           | And browsers don't support them either, at least in any
           | useful manner.
        
       | jillesvangurp wrote:
       | It's a nice exercise in any web framework to figure out how you
       | would serve a big response without buffering it in memory. This
       | can be surprisingly hard with some frameworks that just assume
       | that you are buffering the entire response in memory. Usually, if
       | you look hard there is a way around this.
       | 
       | Buffering can be appropriate for small responses; or at least
       | convenient. But for bigger responses this can be error prone. If
       | you do this right, you serve the first byte of the response to
       | the user before you read the last byte from wherever you are
       | reading (database, file system, S3, etc.). If you do it wrong,
       | you might run out of memory. Or your user's request times out
       | before you are ready to respond.
       | 
       | This is a thing that's gotten harder with non-blocking
       | frameworks. Spring Boot in particular can be a PITA on this front
       | if you use it with non-blocking IO. I had some fun figuring that
       | out some years ago. Using Kotlin makes it slightly easier to deal
       | with low level Spring internals (fluxes and what not).
       | 
       | Sometimes the right answer is that it's too expensive to figure
       | out the content length, or a content hash. Whatever you do, you
       | need to send the headers with that information before you send
       | anything else. And if you need to read everything before you can
       | calculate that information and send it, your choices are
       | buffering or omitting that information.
        
         | jerf wrote:
         | "This can be surprisingly hard with some frameworks that just
         | assume that you are buffering the entire response in memory.
         | Usually, if you look hard there is a way around this."
         | 
         | This is the #1 most common mistake made by a "web framework".
         | 
         | Before $YOU jump up with a list of exceptions, it slowly gets
         | better over time, and it has been getting better for a while,
         | and there are many frameworks in the world, so the list that
         | get it right is quite long. But there's still a lot of
         | frameworks out there that assume this, that consider streaming
         | to be the "exception" rather than non-streaming being a special
         | case of streaming, and I still see new people make this mistake
         | with some frequency, so the list of frameworks that still
         | incorporate this mistake into their very core is also quite
         | long.
         | 
         | My favorite is when I see a new framework sit on top of
         | something like Go that properly streams, and it actively wrecks
         | the underlying streaming capability to turn an HTTP response
         | into a string.
         | 
         | Streaming properly is harder in the short term, but writing a
         | framework where all responses are strings becomes harder in the
         | long term. You eventually hit the wall where that is no longer
         | feasible, but then, fixing it becomes very difficult.
         | 
         | Simply not sending a content-length is often the right answer.
         | In an API situation, whatever negative consequences there are
         | are fairly muted. The real problem I encounter a lot is when
         | I'm streaming out some response from some DB query and I
         | encounter a situation that I would have yielded a 500-type
         | response for after I've already streamed out some content. It
         | can be helpful to specify in your API that you may both emit
         | content _and_ an error and users need to check both. For
         | instance, in the common case of dumping JSON, you can spec a
         | top-level { "results": [...], "error": ...} as your return
         | type, stream out a "results", but if a later error occurs,
         | still return an "error" later. Arguably suboptimal, but
         | requiring all errors to be known up front in a streaming
         | situation is impossible, so... suboptimal wins over impossible.
        
       | nraynaud wrote:
       | I have done crazy stuff to compute the content length of some
       | payloads. For context one of my client works in cloud stuff and I
       | worked in converting hdd format on the fly in a UI VM. The
       | webserver that accepts the files doesn't do chunked encoding. And
       | there is no space to store the file. So I had to resort to
       | passing over the input file once to transform it, compute its
       | allocation table and transformed size, then throw away everything
       | but the file and the table, restart the scan with the correct
       | header and re-do the transformation.
        
       | dicroce wrote:
       | At least in the implementation I wrote the default way to provide
       | the body was a string... which has a length. For binary data I
       | believe the API could accept either a std::vector<uint8_t> (which
       | has a size) or a pointer and a size. If you needed chunked
       | transfer encoding you had to ask for it and then make repeated
       | calls to write chunks (that each have a fixed length).
       | 
       | To me the more interesting question is how web server receive an
       | incoming request. You want to be able to read the whole thing
       | into a single buffer, but you don't know how long its going to be
       | until you actually read some of it. I learned recently that libc
       | has a way to "peek" at some data without removing it from the
       | recv buffer..... I'm curious if this is ever used to optimize the
       | receive process?
        
         | mikepurvis wrote:
         | Not sure about Linux, but for LwIP on embedded, the buffer
         | isn't continuous; it's a linked list of preallocated pbuf
         | objects. So you can either read directly from those if you play
         | by their rules or if you really do need it in contiguous memory
         | you call a function to copy from LwIP's buffers to one you
         | supply.
        
       | AndrewStephens wrote:
       | When I worked on a commercial HTTP proxy in the early 2000s, it
       | was very common for servers to return off-by-one values for
       | Content-Length - so much so that we had to implement heuristics
       | to ignore and fix such errors.
       | 
       | It may be better now but a huge number of libraries and
       | frameworks would either include the terminating NULL byte in the
       | count but not send it, or not include the terminator in the count
       | but include it in the stream.
        
       | matthewaveryusa wrote:
       | Next up is how forms with (multiple) attachments are uploaded
       | with Content-Type=multipart/form-data; boundary=$something_unique
       | 
       | https://notes.benheater.com/books/web/page/multipart-forms-a...
        
       | Sytten wrote:
       | There is a whole class of attacks called HTTP Desync Attacks that
       | target just that problem since it is hard to get that right,
       | especially accross multiple different http stacks. And if you
       | dont get it right the result.is that bytes are left on the TCP
       | connections and read as the next request in case of a reuse.
        
       ___________________________________________________________________
       (page generated 2024-10-07 23:01 UTC)