[HN Gopher] New HTTP standards for caching on the modern web
___________________________________________________________________
New HTTP standards for caching on the modern web
Author : pimterry
Score : 38 points
Date : 2021-10-20 14:19 UTC (1 days ago)
(HTM) web link (httptoolkit.tech)
(TXT) w3m dump (httptoolkit.tech)
| simonw wrote:
| Really good article - I hadn't heard about either of these
| headers and I really appreciated the clear explanation of both.
| forgotmypw17 wrote:
| One challenge I've experienced recently is I can't figure out how
| to hint to the browser that it should refresh a particular cached
| page. (Without appending ?time=1634851491 to the URL.)
|
| For example, let's say I've already cached the page /new.html
|
| Now, I click a button which triggers a change to the page, and I
| am redirected back to it.
|
| Even though the page has changed, and the browser should see a
| new timestamp in the header if pinging the server, it just
| doesn't seem to happen.
|
| Has anyone dealt with this before? I tried to ask on
| StackOverflow, but lately my questions don't seem to get any
| attention, and I've run out of reputation to spend on bounties.
| toast0 wrote:
| There's no standard way for one page to invalidate another.
| I've seen some private patches to do it in squid, but that
| doesn't help because you want to do it for browsers.
|
| Your options are probably:
|
| a) redirect to a different URL as you've done by appending
| stuff to it
|
| b) require revalidation on each request, recipies shown by
| other posters
|
| c) POST to the url you want refreshed; post isn't cachable.
| Note that you can't redirect to POST somewhere else, but you
| can do it with javascript.
|
| d) use XHR to force a request as another poster mentioned.
| ryanpetrich wrote:
| This is what ETags are for. Upon a user's first visit the
| server should return an ETag uniquely representing the current
| version of the page. The browser will cache both the page and
| the tag. Upon subsequent page visits the browser will send an
| If-None-Match header containing the tag for the version of the
| page it has cached. The server should compare the incoming tag
| with the tag for the current version and return a "304 Not
| Modified" response if the tags match or a full response with
| the newer tag in the ETag header if they don't.
| tshaddox wrote:
| Yeah, and it works the same way with If-Modified-Since and
| Last-Modified.
| tyingq wrote:
| It's a combination of different headers that's hard to sum up
| in a short comment. A good article on the subject should talk
| about all these headers: Expires, Cache-control, Etag, Pragma,
| Vary, Last-Modified
|
| Key CDN has an article on it. They certainly would have
| experience and expertise there. I didn't read the whole thing,
| but it seems to have it covered:
| https://www.keycdn.com/blog/http-cache-headers
|
| There's also some interesting exceptions where rules aren't
| followed. Like browsers typically have a completely separate
| cache for favicons. I suppose because they use the icons in
| funny/different ways, like bookmarks.
|
| There are also sometimes proxies (especially corporate MITM
| ones) that don't follow the rules. Hence the popularity of
| cache-busting parameters like you described.
| bandie91 wrote:
| as of my understanding of the original design of HTTP, each
| HTTP resource may state how long itself can be cached in the
| response header; and the client (browser, proxy, etc) does not
| have to re-request the resource before the expiry. this is the
| sandard, so you can not hint that a resource has to be
| revalidated - in standard way. obviously since then, several
| tricks emerged, like your mentioned timestamped URL approach -
| however i'm not sure upto what extent is it standardized in
| clients to understand that "/path?query" is somehow related to
| "/path", because originally the request string (path and url
| parameters) was opaque to the http client, so they should be
| cached independently. things obviously changed since then. the
| method i use is to fire a request to the URL which has to be
| refreshed by Ajax (XHR) with Cache-Control header (yes, it is a
| request header too), then display the response content or
| redirect to it.
| scottlamb wrote:
| > however i'm not sure upto what extent is it standardized in
| clients to understand that "/path?query" is somehow related
| to "/path", because originally the request string (path and
| url parameters) was opaque to the http client, so they should
| be cached independently. things obviously changed since then.
|
| It hasn't changed. Those are still cached completely
| independently by the user agent. The ?time=... cache busting
| trick is meant to produce a cache key that's never been used
| before, thus requiring a fresh request. The new request
| doesn't clean up the cache entries for the old URLs; it just
| doesn't use them. That's one reason it's better to use etag
| and such to make the caches work properly, rather than fight
| them with this trick.
|
| On many servers, if new.html is a static file, the same
| entity is produced regardless of parameters. But the user
| agent doesn't know this.
| bawolff wrote:
| Cache-Control: max-age=0, must-revalidate
|
| Sounds like what you want (presuming your server handles 304
| logic correctly)
| forgotmypw17 wrote:
| I do want caching to happen, however -- until something
| changes the page.
| scottlamb wrote:
| I appreciate that the Cache-Status: header they describe uses RFC
| 8941 structured fields and thus ";" to separate items within each
| cache and "," between caches. It's like someone put effort into
| making it easy to parse.
|
| Rant time: I just finished writing a state machine parser for
| "WWW-Authenticate:" and "Proxy-Authenticate:". Those headers use
| comma both to separate challenges and to separate parameters
| within a challenge, which just seems mean-spirited. Other things
| about HTTP authentication that seem mean, dumb, annoying, or all
| of the above: both the RFC 2069 example response and the RFC 7616
| SHA-512-256 example response are calculated incorrectly; RFC
| 7616's userhash field seems to require the server to do
| O(users_in_database) hashes to know what user to operate on; RFC
| 7235's challenge grammar describes a token68 syntax that really
| only is used for the credentials in basic, never a challenge; RFC
| 7616 drops backwards compatibility for RFC 2069 even though I
| bought a product this year that still uses RFC 2069-style
| calculations; and it's based on old standards that followed "be
| conservative in what you do, be liberal in what you accept from
| others" so RFC 7230 section 7 has separate grammars for what
| lists you must send and what lists you must accept, which further
| complicates parsing the nested lists.
___________________________________________________________________
(page generated 2021-10-21 23:00 UTC)