[HN Gopher] HTTPWTF
___________________________________________________________________
HTTPWTF
Author : pimterry
Score : 477 points
Date : 2021-03-04 15:27 UTC (7 hours ago)
(HTM) web link (httptoolkit.tech)
(TXT) w3m dump (httptoolkit.tech)
| [deleted]
| bombcar wrote:
| Referer being spelled wrong - I KNEW something was wrong about it
| every time I saw it but it never actually clicked.
| indentit wrote:
| I just figured it was one of those words spelt differently in
| American English, which most RFCs etc are written in. (British
| English native here.)
| sophacles wrote:
| > (British English native here.)
|
| That's why you spelled 'spelled': _spelt_ :D
| grishka wrote:
| It's a bit infuriating when English isn't your native language
| because I could never remember the right spelling.
| Zash wrote:
| Since we're sharing our own WTFs;
|
| You can include the same header multiple time in a HTTP message,
| and this is equivalent to having one such header with a comma-
| separated list of values.
|
| Then there's WWW-Authenticate (the one telling you to re-try with
| credentials). It has a comma-separated list of parameters.
|
| The combination of those two leads to brokenness, like how
| recently an API thing would not get Firefox to ask for username
| and password, because it happened to have put "Bearer" before
| "Basic" in the list.
|
| https://tools.ietf.org/html/rfc7235#section-4.1
| superhawk610 wrote:
| This article [1] is a really great read on some of the pitfalls
| you encounter due to the way duplicate headers are parsed in
| different browsers (skip to "Let's talk about HTTP headers" if
| you want to jump right into the code).
|
| [1]: https://fasterthanli.me/articles/aiming-for-correctness-
| with...
| richdougherty wrote:
| And some headers have their own exceptions to this.
|
| The Set-Cookie header (sent by the server) should always be
| sent as multiple headers, not comma separated as user agents
| may follow Netscape's original spec.
| https://stackoverflow.com/questions/2880047/is-it-possible-to-
| set-more-than-one-cookie-with-a-single-set-cookie
| https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-
| Cookie
|
| On the other hand in HTTP/1.1 the Cookie header should always
| be sent as a single header, not multiple. In HTTP/2, they may
| be sent as separate headers to improve compression. :)
| https://stackoverflow.com/questions/16305814/are-multiple-
| cookie-headers-allowed-in-an-http-request
| wincy wrote:
| I'm guessing based on the username OP is the original author,
| caught a typo that could trip a novice up if they're reading :
|
| This becomes useful though if you send a request including a
| Except: 100-continue header. That header tells the server you
| expect a 100 response, and you're not going to send the full
| request body until you receive it.
|
| I'm guessing that should be Expect?
|
| Overall interesting article, thanks for writing it!
| plmpsu wrote:
| In the same section, there's also a reference to the 101 status
| instead of 100.
| pimterry wrote:
| Another good spot, now also fixed, thanks!
| pimterry wrote:
| Good catch! Thanks for that, now fixed.
| wrboyce wrote:
| Oh, hey Tim! Hope life is treating you well!
| pimterry wrote:
| Haha, hey Will! The internet is a small world :-)
| 0xy wrote:
| This is a little bit of a tangent but I sure love working with
| Websockets. It really feels like Websockets are what HTTP
| should've been. Asynchronous realtime communication.
|
| When things happen on the site and it's shown to the customer
| immediately via WS, it's just a delightful experience.
| Jach wrote:
| Web goes round and round. ActiveX, Java applets, and Flash all
| supported sockets. It is nice that we can have them now without
| such things, but it's not like they're only a boon with no
| tradeoffs.
| gmfawcett wrote:
| In its day, I don't think HTTP would ever have escaped orbit if
| it had been designed as a stateful protocol, like Websockets.
| luhn wrote:
| imo, Server-Sent Events are the better solution for realtime
| updates. Sometimes you need the stateful bidirectional protocol
| WebSockets offer, but most of the time HTTP for RPC and SSE for
| streaming updates gets you where you need to go with standard
| HTTP, no special protocols.
| jayd16 wrote:
| They screwed this all up.
|
| Private? No. Cache!
| tpetry wrote:
| Hmm, 'no-cache' meaning please cache it in reality may have been
| the problem in the past why those damn internet explorers cached
| ajax responses and the only way to solve it was to append a
| random query parameter?
| s3cur3 wrote:
| Reading that, I finally understood so many hours of debugging
| throughout my life.
| deathanatos wrote:
| I'd add to this list:
|
| Chunk extensions. Most people know HTTP/1.1 can return a
| "chunked" response body: it breaks the body up into chunks, so
| that we can send a response whose length we don't know in
| advance, but also it allows us to keep the connection open after
| we're done. What most people _don 't_ know is that chunks can
| carry key-value metadata. The spec technically requires an
| implementation to at least parse them, though I think it is
| permitted to ignore them. I've never seen anything ever use this,
| and I hope that never changes. They're gone in HTTP/2. (So, also,
| if you thought HTTP/2 was backwards compatible: not technically!)
|
| The "Authorization" header: like "Referer", this header is
| misspelled. (It should be spelled, "Authentication".) Same
| applies to "401 Unauthorized", which really ought to be "401
| Unauthenticated". ("Unauthorized" is "403 Forbidden", or
| sometimes "404 Not Found".)
|
| Also, header values. They're basically require implementing a
| custom string type to handle correctly; they're a baroque mix of
| characters & "opaque octets".
| varajelle wrote:
| Also the ridiculous User-Agent header which everyone spoofs.
| [deleted]
| [deleted]
| jsmith45 wrote:
| Those key value pairs in chunked encoding ("chunk extensions")
| are spec'ed to only be hop-by-hop, which makes them more or
| less completely unsuitable for actually using by end
| applications. Any proxy or reverse proxy are allowed to strip
| them. Indeed it can be argued that a conformant proxy is
| required to strip them, due to MUST ignore unknown extensions
| value requirement. (I suspect most do not strip them, and there
| is an argument to be made that blindly passing them through if
| not changing encoding could be considered ignoring them, but
| I'm not certain that is actually a conforming interpretation).
|
| Plus surely there are many crusty middleboxes that will break
| if anybody tried to use that feature. Remember all the hoops
| websockets had to jump through to have much of a chance working
| for most people because of those? Many break badly if anything
| they were not programmed to handle tries to pass through.
| anticristi wrote:
| I used this tons for MJPEG streams.
| bluesmoon wrote:
| Another thing to note about custom headers is that when used in
| an XHR (eg: X-Requested-With), they will force a preflight
| request (with the OPTIONS method). If your webserver isn't
| configured to handle OPTIONS and return the correct CORS headers,
| that will effectively break clients.
|
| Best to just never use custom headers.
|
| I've written more about this here:
| https://developer.akamai.com/blog/2015/08/17/solving-options...
| airza wrote:
| I see this a lot as an anti CSRF technique in AJAX based SPAs.
| bluesmoon wrote:
| yeah, those techniques predate CORS, but even back then,
| you'd typically add your anti-csrf token to the payload
| rather than the header. CSRF is application level logic
| rather than protocol level.
| uuidgen wrote:
| > they will force a preflight request
|
| That's why they're so great. use a custom header and never
| worry about CSRF issues.
|
| Use custom header and be sure that if request comes from the
| browser it was made by legitimate code from your origin.
| pimterry wrote:
| Yep, you've got to be careful with browser HTTP requests!
| Conveniently on this very same site I built a CORS tool that
| knows all those rules and can tell you how they work for every
| case: https://httptoolkit.tech/will-it-cors/
| xg15 wrote:
| > _HTTP 103: When the server receives a request that takes a
| little processing, it often can 't fully send the response
| headers until that processing completes. HTTP 103 allows the
| server to immediately nudge the client to download other content
| in parallel, without waiting for the requested resource data to
| be ready._
|
| Could someone explain why this needs a new status code at all? At
| the point where the new status code sends "early headers", the
| client was expecting the regular status code and headers anyway.
| Why could the server not simply do:
|
| 1) Receive request
|
| 2) Send 200 OK and early headers, but only send a single trailing
| newline (i.e., terminate the status line and last early header
| field, but don't terminate the header list as a whole)
|
| 3) Do the actual request processing, heavy lifting, etc
|
| 4) Send remaining headers, double-newline and response body, if
| any.
|
| On the client side, a client could simply start to preload link
| headers as soon as it receives them, without waiting for the
| whole response.
|
| This seems like it would lead to pretty much the same latency
| characteristics without needing to extend the protocol.
|
| The only major new ability I see is to send headers before the
| (final) status code. But what would be the use-case for that?
|
| Edit:
|
| The RFC[1] sheds some light on this: The point seems to be that
| the headers sent in an 103 are only "canon" if they are repeated
| in the final response. So a server could send a link header as an
| early hint, then effectively say "whoops, disregard that, I
| changed my mind" by _not_ sending the header again in the final
| response.
|
| I still don't see a lot of ways a client could meaningfully
| respond to that, but I guess it could at least abort preloading
| to save bandwidth or purge the resource from the cache if it was
| already preloaded.
|
| [1] https://tools.ietf.org/html/rfc8297#section-2
| derefr wrote:
| As with a 100, a 103 is _tentative_ -- it doesn 't guarantee
| that the final result will be 2xx. This can happen if e.g. your
| web server is responsible for sending the early hints, before
| proxying to your app server.
| staplung wrote:
| I used to encourage back-end web developers to write a web server
| from scratch as a learning exercise. With HTTP 1.1 it was
| actually pretty easy to write one in C (plus berkeley sockets);
| the idea being that you learn a lot about how things actually
| work at the lowest level without spending an inordinate amount of
| time. It's not really practical with HTTP 2 anymore but in any
| case, having done my own exercise I had no idea about many of
| these quirks.
| jventura wrote:
| I teach web development and distributed systems in a local
| university, and one of my lab exercises is building an HTTP/1.0
| Server in Python with sockets. I do have a blog post [1] that
| shows how to do it if someone's interested..
|
| [1] https://joaoventura.net/blog/2017/python-webserver/
| smoldesu wrote:
| https://doc.rust-lang.org/book/ch20-00-final-project-a-web-s...
|
| The Rust Book has an awesome "final project" where it walks you
| through building a multi-threaded web server. If you're a
| battle-hardened C/C++ dev looking for an inroad to Rust, this
| is a great place to start.
| cosmodisk wrote:
| Never touched Rust but having skimmed through this, looks
| like a fantastic tutorial.
| steveklabnik wrote:
| Thank you!
| cmehdy wrote:
| Seems like you are the author of the book. Just wanted to
| say that this book makes me want to pick up Rust even
| though I have no specific goal for it, because the book
| is appealing in writing and appearance, layout and
| illustrations, ideas and execution.. basically good job
| and thank you!
| steveklabnik wrote:
| One of two authors. I'll share this with my co-author,
| thanks a ton :)
| smoldesu wrote:
| I'm also here to worship your work! The Rust book is one
| of my favorite documentations around, and just the other
| day I sent it to a colleague who was interested in
| learning Rust. Even though he only had experience in
| Typescript and Java, he made a working chess engine less
| than three days later.
| chucke wrote:
| This reminded me of a post I wrote,a couple of years ago:
| https://honeyryderchuck.gitlab.io/httpx/2019/02/10/falacies-...
| banana_giraffe wrote:
| My current favorite is chunked encoding.
|
| Does amazon.com really make it's page more performant by sending
| 25 chunks less than 2k, some less than 50 bytes, while I'm trying
| to grab 115k for a page?
|
| It's all so weird to me.
| marcosdumay wrote:
| It may make a huge difference for their servers, that can
| generate 2kB of data and send you right away, instead of
| generating the entire 115kB before they can send it.
|
| Those 2kB is a bit too small for top network performance, so
| you may see a negative impact. But if they increase it to
| something like 10kB, it's harmless.
| banana_giraffe wrote:
| I can get it sometimes, but some sites are just bizarre.
| twitter.com sends out a few dozen 74 byte chunks. I can't
| find it now, but I've seen pages composed of chunks of 10 or
| 20 bytes big. So much overhead.
| marcosdumay wrote:
| Some frameworks make it really easy to create reusable code
| that calculates something, pushes it into the network, and
| returns to the rest of the page.
|
| You are right that it's not a great thing to do. A little
| bit of buffering on the sender can improve things a lot.
| But it's an easy thing to do, so people do it.
| halter73 wrote:
| Usually it's because the app doesn't know the length of the
| entire response body up front and wants to start sending the
| response before buffering the whole thing. The 50 byte chunks
| probably aren't that useful, but that can happen as a
| consequence. Something like nagling can prevent those small
| chunks, but then there would likely be higher latency.
| zlynx wrote:
| From my experience (not with amazon) these strange chunk sizes
| come from non-blocking IO. When a source gets some data and
| triggers select/poll/epoll/whatever, the callback (or
| equivalent) immediately writes it out as a chunk.
|
| This works even better in HTTP/2 or HTTP/3 / QUIC. A Go server
| reading from a lot of microservices can produce pretty weird
| output on HTTP/2 because now not only is it in odd sizes
| determined by network timing, it doesn't even need to be in
| order.
| derefr wrote:
| I've been searching for a while for a good way to know whether a
| client has disconnected in the middle of a long-running HTTP
| request. (We do heavyweight SQL queries of indeterminate length
| in response to those requests, and we'd like to be able to cancel
| them, rather than wasting DB-server CPU cycles calculating
| reports nobody's going to consume.)
|
| You can't actually know whether the outgoing side of a TCP socket
| is closed, unless you write something to it. But it's hard to
| come up with something to write to an HTTP/1.1-over-TCP socket
| _before_ you respond with anything, that would be a valid NOP
| according to all the protocol layers in play. (TCP keepalives
| _would_ be perfect for this... if routers didn 't silently drop
| them.)
|
| But I guess sending an HTTP 102 every second or two could be used
| for exactly this: prodding the socket with something that
| middleboxes _will_ be sure to pass back to the client.
|
| If so, that's awesome! ...and also something I wish could be
| handled for me automatically by web frameworks, because getting
| that working sounds kind of ridiculous :)
| mikl wrote:
| Lampooning the Cache-Control header is all fun and games, but
| remember it was designed in a time where Internet in big
| organisations often was behind a caching proxy like Squid. With
| that in mind, the explanations at
| https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ca...
| make good sense.
| djrogers wrote:
| No, not really - those explanations are nothing more than
| what's in TFA, and they don't help it make any more sense.
| There's no reason in the world why no-cache should mean 'cache
| this please but with caveats', when we have all of the English
| language available to come up with an alternative header.
| cbsmith wrote:
| Let me try this...
|
| The "no-cache" was a hint not about caching the _content_ but
| about caching subsequent _requests_ , and could optionally
| specify specific fields that would indicate that a new
| candidate request needed to be sent to the server as the
| content might be different. There's this reality that just to
| render content, the browser effectively _must_ have a cached
| copy of the content, so the notion that the response wouldn
| 't be cached wasn't really even in the cards. Whether you
| used the cache or not was a decision made at the time you
| were sending a request, not when you were consuming the
| response.
|
| The "no-cache" directive meant, "hey, don't check for a
| cached copy of the content, just go fetch new content". It
| was often used by analytics pieces so that the server could
| count how often content was looked at.
|
| Back in the day you had terrible latencies (particularly over
| dialup). You also had issues with horribly asymmetric
| bandwidth that meant the data you _sent_ could become the
| bandwidth bottleneck (outbound bandwidth constraints would
| mean ACK packets would get queued up, delaying downloads even
| when you had plenty of download bandwidth), and of course
| HTTP requests weren 't terribly compact, so this could really
| make a big difference.
|
| Caching requests was a big deal. Performance could be
| improved significantly by "cheating" and just not sending a
| new request, and this lead to some very aggressive caching
| strategies. The, "check if the content really _is_ different,
| and just use the original copy if they aren 't" hack a pretty
| common one. If nothing else, it saved the browser the
| overhead of re-rendering the page and the accompanying
| annoying user experience of seeing the re-render.
|
| The original protocol didn't have any notion of no-store, and
| specifically mentioned that "private" didn't really provide
| privacy, but more that the content should be "private" in the
| sense that only the browser itself should store the content.
| Again, there's an assumption that the browser is going to put
| everything it gets into a "cache", because it has to.
|
| You could use "max-age", but a lot of caches would still
| shove the object in their cache and only expire it on a FIFO
| basis or when a new request was to be sent (and it was
| vulnerable to clock skew problems). Sounds dumb, but it was
| the kind of dumb that kept code simple and worked pretty
| well.
|
| So now that the practices were in place, you need a _new_
| directive to say, "hold up, that old approach is NOT a good
| idea here". So they came up with "no-store" as a way to say,
| "don't even put it in the cache in the first place".
| the_duke wrote:
| Well, I can see how someone fixated on a "store" Vs "cache"
| terminology might arrive at that name.
|
| Browsers store, proxies cache , so it should be no-cache,
| obviously!
|
| Sure, it's stupid, but naming is hard and these things happen
| all the time.
| markdog12 wrote:
| I always liked this http caching article, done in a
| conversational tone: https://jakearchibald.com/2016/caching-
| best-practices/
| mikl wrote:
| Caching in this context means "no need to ask the server for
| a new copy of this within the cache lifetime". no-cache then
| does what it says: You can store it if you like, but you need
| to check with the server before reusing it.
|
| That might be a little counter-intuitive, but if you read the
| definitions of the words, it does make sense.
| tinyhitman wrote:
| NotOnly-cache?
| have_faith wrote:
| @author. Every time I click anything in the page the whole page
| flashes, assumedly React is re-rendering for some reason. As
| someone who highlights text as I read it was quite an interesting
| experience :)
| pimterry wrote:
| Hmm, that's very weird. I don't see it myself in the latest
| Firefox or Chrome. What browser & OS?
| have_faith wrote:
| FireFox 85 on MacOS. It happens anywhere I click on the page,
| the body text very quickly flashes off and back on again.
| pimterry wrote:
| Ok, thanks, I'll look into it.
| ensignavenger wrote:
| Happens for me too- Firefox 86 on Kubuntu 18.04
| jedberg wrote:
| I've been running public web servers for decades, and almost all
| of this was new information. Excellent article!
|
| Fun fact, reddit used to have 'X-Bender: Bite my shiny metal ass'
| on every response. Sadly they seem to have removed it.
| ketralnis wrote:
| You're thinking robots.txt https://www.reddit.com/robots.txt
| and it's still there
| jedberg wrote:
| Ah yes, bender was in the robots file. But we also had a
| funny X-header. Maybe you can find it in GitHub in the
| haproxy config.
| grishka wrote:
| Not there, but it does have "x-moose: majestic"
| nitrogen wrote:
| _> User-Agent: bender
|
| > Disallow: /my_shiny_metal_ass_
|
| This was there when I checked just now; was it removed and
| re-added?
| grishka wrote:
| Yes I saw that, but the parent comment said about it
| being a header ;)
| richdougherty wrote:
| Some HTTP headers support extended parameters, parameters with a
| " _" after them which allow character encoding of the header
| value, e.g. in UTF-8. Confusingly, they also support sending both
| regular and extended parameters in the same header.
| https://tools.ietf.org/html/rfc5987
|
| E.g. sending the file "naive.txt" using the Content-Disposition
| header. Content-Disposition: attachment;
| filename=na_ve.txt; filename*=utf8''na%C3%AFve.txt
|
| https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co...
| The parameters filename and filename* differ only in that
| filename* uses the encoding defined in RFC 5987. When both
| filename and filename* are present in a single header field
| value, filename* is preferred over filename when both are
| understood.
|
| _
| aparks517 wrote:
| What a delight! I implemented an HTTP server from scratch (well,
| from RFC) in Objective-C some years ago. Many of these hit pretty
| close to home. Lots of plot twists in those RFCs.
| rank0 wrote:
| > X-Requested-With: XMLHttpRequest - appended by various JS
| frameworks including jQuery, to clearly differentiate AJAX
| requests from resource requests (which can't include custom
| headers like this).
|
| What does the author mean by this? Why can't a "resource request"
| include custom headers? I am assuming that a "resource request"
| is just a non AJAX request. Any HTTP client should be able to
| include whatever headers they want no matter the source.
| abahlo wrote:
| I think they mean requests from a browser.
| achillean wrote:
| Woah I had no idea about these 100 responses. Looks like there
| are quite a few of them on the Internet:
|
| https://beta.shodan.io/search/report?query=http.status%3A%3E...
| avolcano wrote:
| This is both a great post and an effective ad - I've been looking
| for a lighter-weight Postman alternative (and HTTPie, while nice,
| is no substitute for a graphical UI for such a thing). Will check
| HTTP Toolkit out!
| pimterry wrote:
| Thanks! It's a difficult balance to walk, I've taken to just
| trying to write great HTTP articles and ignoring the
| advertising angle entirely, seems to be working OK.
|
| Do try out HTTP Toolkit and let me know what you think, but
| it's not a general purpose HTTP client like Postman or HTTPie.
| It's actually an HTTP debugger, more like
| Fiddler/Charles/mitmproxy, for debugging & testing. A
| convenient HTTP client is definitely planned as part of that
| eventually, but not today.
| avolcano wrote:
| Ah, gotcha. I actually do have a good use case for that as
| well (and do think they could go together nicely someday), so
| I'll still check it out!
| johns wrote:
| Insomnia.rest
| RMPR wrote:
| If I'm not mistaken insomnia is also using electron. I
| wouldn't really put it as a lightweight alternative to
| postman.
| grishka wrote:
| In case anyone wants a native (Cocoa) REST client for macOS,
| there's https://paw.cloud. It's paid, but they sometimes give
| away free licenses for retweets, which is how I got mine.
| notatoad wrote:
| not that effective - i've also been on the lookout for this,
| but without your comment wouldn't have realized that's what
| this website was offering.
| joeraut wrote:
| Seconded. This was a great blog post, I think a more visible
| plug at the end is more than justified.
| anaphor wrote:
| Another one is that it's technically valid to have a request
| target of '*' for the HTTP OPTIONS request type. It's supposed to
| return general information about the whole server. You can try it
| out with e.g. `curl -XOPTIONS http://google.com --request-target
| '*'`
|
| Nginx gives you a 400 Bad Request response, Apache does nothing,
| and other servers vary in whether they return a non-error code.
|
| https://curl.se/mail/lib-2016-08/0167.html
|
| https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
| bch wrote:
| I'm trying this out of curiosity, but getting (e.g.):
| HTTP/1.0 400 Invalid HTTP Request
|
| My mistake, or are there other working end-points out there (I
| tried google, yahoo, and cbc.ca)?
| PebblesRox wrote:
| "X-Clacks-Overhead: GNU Terry Pratchett - a tribute to Terry
| Pratchett, based on the message protocols within his own books."
|
| I'll enjoy knowing this next time I reread Going Postal :)
| Twirrim wrote:
| With nginx it's as easy as: add_header
| X-Clacks-Overhead "GNU Terry Pratchett";
|
| in your server{} block.
| sophacles wrote:
| Here's a site about it: http://www.gnuterrypratchett.com/ with
| snippets (etc) to configure it into your servers and apps.
___________________________________________________________________
(page generated 2021-03-04 23:00 UTC)