[HN Gopher] When network is faster than browser cache (2020)
___________________________________________________________________
When network is faster than browser cache (2020)
Author : harporoeder
Score : 173 points
Date : 2022-06-29 16:55 UTC (6 hours ago)
(HTM) web link (simonhearne.com)
(TXT) w3m dump (simonhearne.com)
| hgazx wrote:
| When I had a very old and slow hard disk I ran my browsers
| without disk cache precisely because of this reason.
| agumonkey wrote:
| What's the storage capacity of internet cables ?
| moralestapia wrote:
| On the order of petabytes.
| arccy wrote:
| https://github.com/yarrick/pingfs
| LeonenTheDK wrote:
| Explored here: http://tom7.org/harder/
|
| There's a video and a paper linked on this page with the
| information on this absurdity.
| agumonkey wrote:
| oh of course, how could I forget this :)
| kreetx wrote:
| Much confusion in the comments.
|
| Tl;dr: cache is "slow" because the number of ongoing requests--
| including to cache! --are throttled by the browser. I.e the cache
| isn't slow, but reading it is waited upon and the network request
| might be ahead in the queue.
| neomantra wrote:
| Different anecdote, but similar vibe....
|
| In ~2010, I was benchmarking Solarflare (Xilinx/AMD now) cards
| and their OpenOnload kernel-bypass network stack. The results
| showed that two well-tuned systems could communicate faster
| (lower latency) than two CPU sockets within the same server that
| had to wait for the kernel to get involved (standard network
| stack). It was really illuminating and I started re-architecting
| based on that result.
|
| Backing out some of that history... in ~2008, we started using
| FPGAs to handle specific network loads (US equity market data).
| It was exotic and a lot of work, but it significantly benefited
| that use case, both because of DMA to user-land and its filtering
| capabilities.
|
| At that time our network was all 1 Gigabit. Soon thereafter,
| exchanges started offering 10G handoffs, so we upgraded our
| entire infrastructure to 10G cut-through switches (Arista) and
| 10G NICs (Myricom). This performed much better than the 1G FPGA
| and dramatically improved our entire infrastructure.
|
| We then ported our market data feed handler's to Myricom's user-
| space network stack, because loads were continually increasing
| and the trading world was continually more competitive... and
| again we had a narrow solution (this time in software) to a
| challenging problem.
|
| Then about a year later, Solarflare and it's kernel-compatible
| OpenOnload arrived and we could then apply the power of kernel
| bypass to our entire infrastructure.
|
| After that, the industry returned to FPGAs again with 10G PHY and
| tons of space to put whole strategies... although I was never
| involved with that next generation of trading tech.
|
| I personally stayed with OpenOnload for all sorts of workloads,
| growing to use it with containerization and web stacks (Redis,
| Nginx). Nowadays you can use OpenOnload with XDP; again a narrow
| technology grows to fit broad applicability.
| throw6554 wrote:
| That reminds me of the when the openWRT and other open source
| guys were complaining that the home gateways of the time did
| not have a big enough CPU to max out the uplink (10-100mbps at
| the time), and instead built in hardware accelerators. What
| they did not know was that the hw accelerator was merely an
| even smaller CPU running a proprietary network stack.
| tomcam wrote:
| That was such a cool comment that I actually went to
| http://www.neomantra.com/ and briefly considered applying for a
| job there and un-retiring
| jxy wrote:
| Does containerization have any impact on performance in your
| use cases?
| neomantra wrote:
| I've not exhaustively characterized it, but RedHat has this
| comparison [1]. It is not enough for the scope/scale of my
| needs. I do still run bare metal workloads though.
|
| That said, I have had operational issues in migrating to
| Docker, which is another sort of performance impact! I
| reference some of my core isolation issues in that Redis gist
| and this GitHub issue [2].
|
| [1] https://www.redhat.com/cms/managed-
| files/201504-onload_conta... [2] https://github.com/moby/moby
| /issues/31086#issuecomment-30374...
| oogali wrote:
| I went down a similar path (Solarflare, Myricom, and Chelsio;
| Arista, BNT, and 10G Twinax) and found we could get lower
| latency internally by confining two OpenOnload-enabled
| processes that needed to communicate with each other to the
| same NUMA domain.
|
| We architected our applications around that as well. The firm
| continued on to the FPGA path though that was after my time
| there.
|
| I do still pick up SolarFlare cards off of eBay for home lab
| purposes.
| neomantra wrote:
| Nice! We deploy like that too. I document how to do that with
| Redis here [1].
|
| [1] https://gist.github.com/neomantra/3c9b89887d19be6fa5708bf
| 401...
| oogali wrote:
| Yup, all the same techniques we used (minus Docker but with
| the addition of disabling Intel Turbo Boost and hyper
| threading).
|
| A few years ago I met with a cloud computing company who
| was following similar techniques to reduce the noisy
| neighbor problem and increase performance consistency.
|
| Frankly, it's good to see that there still are bare metal
| performance specialists out there doing their thing.
| scott_s wrote:
| I discovered something similar in the early to mid 2010s:
| processes on different Linux systems communicated over TCP
| faster than processes on the same Linux system. That is, going
| over the network was faster than on the same machine. The
| reason was simple: there is a global lock per system for the
| localhost pseudo-device.
|
| Processes communicating on different systems had actual
| parallelism, because they each had their own network device.
| Processes communicating on the same system were essentially
| serialized, because they were competing with each other for the
| same pseudo-device. At the time, Linux kernel developers
| basically said "Yeah, don't do that" when people brought up the
| performance problem.
| nine_k wrote:
| I wonder if creating more pseudo-devices for in-host
| networking would help. Doesn't Docker do that already, for
| other purposes?
| kcexn wrote:
| That makes sense, Linux has many highly performant IPC
| options that don't involve the network device. Just the time
| it takes to correctly construct an ethernet frame is not
| negligible.
| btdmaster wrote:
| The cache, for me, is 100:1 winning the races compared to the
| network. Are there greatly different results for others in
| about:networking#rcwn?
| sattoshi wrote:
| > This seemed odd to me, surely using a cached response would
| always be faster than making the whole request again! Well it
| turns out that in some cases, the network is faster than the
| cache.
|
| Did I miss a follow up on this, or did it remain unanswered as to
| what the benefit of racing against the network is?
|
| The post basically says that sometimes the cache is slower
| because of throttling or bugs, but mostly bugs.
|
| Why is Firefox sending an extra request instead of figuring out
| what is slowing down the cache? It seems like an overly expensive
| mitigation...
| philsnow wrote:
| > Concatenating / bundling your assets is probably still a good
| practice, even on H/2 connections. Obviously this comes on
| balance with cache eviction costs and splitting pre- and post-
| load bundles.
|
| I guess this latter part refers to the trade-off between
| compiling all your assets into a single file, and then requiring
| clients to re-download the entire bundle if you change a single
| CSS color. The other extreme is to not bundle anything (which, I
| gather from the article, is the standard practice since all major
| browsers support HTTP/2) but this leads to the described issue.
|
| What about aggressively bundling, but also keeping track at
| compile time of diffs between historical bundles and the new
| bundle? Re-connecting clients could grab a manifest that names
| the newest mega-bundle as well as a map from historical versions
| to the patches needed to bring them up to date. A lot more work
| on the server side but maybe it could be a good compromise?
|
| Of course that's the easy version, but it has a huge flaw which
| is all first-time clients have to download the entire huge mega
| bundle before the browser can render anything, so to make it
| workable it would have to compile things into a few bootstrap
| stages instead of a single mega-bundle.
|
| I am _clearly_ not a frontend dev. If you 're going to throw
| tomatoes please also tell me why ;)
|
| * edit: found the repo that made me think of this idea,
| https://github.com/msolo/msolo/blob/master/vfl/vfl.py but it's
| from antiquity and probably predates node and babel/webpack, but
| the idea is you name individual asset versions with a SHA or tag
| and let them be cached forever, and to update the web app you
| just change a reference in the root to make clients download a
| different resource dependency tree root (and they re-use
| unchanged ones) *
| staticassertion wrote:
| There's probably some balance here. Since there's a limit of 9
| concurrent requests before throttling occurs you can bucket 9
| objects, concatenating into each. So if you have a bunch of
| static content, concat that into 1 bucket. If you have another
| object that changes a lot, keep that separate. If you have two
| other objects that change together, bucket those, etc.
|
| Seems like a huge pain to think about tbh. Seems like part of
| the problem would be solved by compiling everything into a
| single file that supported streaming execution.
| Klathmon wrote:
| There's a good middle ground of bundling your SPA into chunks
| of related files (I prefer to name them the SHA hash of the
| content), and giving them good cache lifetimes.
|
| You can have a "vendor" chunk (or a few) which just holds all
| 3rd party dependencies, a "core components" chunk which holds
| components which are likely used on most pages, and then
| individual chunks for the rest of the app broken down by page
| or something.
|
| It speeds up compilation, gives better caching, no need for a
| stateful mapping file outside of the HTML file loaded (which is
| kind of the point of the <link> tag anyway!), and has lots of
| knobs to tune if you want to really squeeze out the best load
| times.
| philsnow wrote:
| Another tool that bears on this idea is courgette [0], in that
| it operates on one of the intermediate representations of the
| final bundle in order to achieve better compression.
|
| https://blog.chromium.org/2009/07/smaller-is-faster-and-safe...
| oblak wrote:
| Well, yeah. Disk cache can take hundreds of MS to retrieve, even
| on modern SSDs. I had a handful of oddly heated discussions with
| an architect about this exact thing at my previous job. Showing
| him the network tab did not because he had read articles and was
| well informed about these things.
| OJFord wrote:
| Why are you measuring latency in mega Siemens?
| kccqzy wrote:
| At a previous job I worked on serving data from SSDs. I wasn't
| really involved in configuring the hardware but I believe they
| were good quality enterprise-grade SSDs. My experience was that
| a random read (which could be a small number of underlying
| reads) from mmap()'ed files from those SSDs took between 100
| and 200 microseconds. That's far from your figure of hundreds
| of milliseconds.
|
| Of course 200 microseconds still isn't fast. That translates to
| serving 5000 requests per second, leaving the CPU almost
| completely idle.
|
| Another odd fact was that we in fact did have to implement our
| own semaphores and throttling to limit concurrent reads from
| SSDs.
| TheDudeMan wrote:
| According to this, it can take 10ms (not 100s of ms) and only
| for a very large read (like 32MB).
| https://i.gzn.jp/img/2021/06/07/reading-from-external-memory...
| Full article:
| https://gigazine.net/gsc_news/en/20210607-reading-from-exter...
| [deleted]
| Retr0id wrote:
| This assumes it only takes a single disk read to locate and
| retreive an object from the cache (which is unlikely to be
| the case).
| TheDudeMan wrote:
| OK, one small read plus one huge read would top-out at
| about 10.1ms, according to that graph.
| Retr0id wrote:
| Also a big assumption (which we could verify by looking
| at the relevant implementations, but I'm not going to)
| TheDudeMan wrote:
| (Anyone who is not running their OS and temp space on NVME
| should not expect good performance. Such a configuration has
| been very cheap for several years now.)
| zinekeller wrote:
| > Such a configuration has been very cheap for several
| years now.
|
| This is a very weird comment, considering that a) it's
| cheap _er_ than yesteryear but SATA SDD (or even modern
| magnetic HDDs) are still sold and are in active use and b)
| ignores phones completely, where a large number of sites
| would have mobile-dominated visitors and can 't just switch
| to an NVMe-like performance even for those with large
| disposable incomes (because at the end of the day even with
| UFS phones are still slower than NVMe latency-wise).
| staticassertion wrote:
| The issue has nothing to do with disk speed. If you had read
| the article you'd see a very nice chart that shows the vast
| majority of cache hits returning in under 2 or 3ms.
| divbzero wrote:
| I wish I had a clearer memory or record of this, but I think
| I've also ~100ms for browser cache retrieval on an SSD. Has
| anyone else observed this and have an explanation? A sibling
| comment points out that SSD read latency should be ~10ms at
| most so the limitation must be in the software?
|
| OP mentioned specifically that "there have been bugs in
| Chromium with request prioritisation, where cached resources
| were delayed while the browser fetched higher priority requests
| over the network" and that "Chrome actively throttles requests,
| including those to cached resources, to reduce I/O contention".
| I wonder if there are also other limitations with how browsers
| retrieve from cache.
| staticassertion wrote:
| > Has anyone else observed this and have an explanation?
|
| Yes that is the subject of this post.
|
| https://simonhearne.com/2020/network-faster-than-
| cache/#the-...
| divbzero wrote:
| The graphs in OP show that cache latency is mostly ~10ms
| for desktop browsers. ~100ms would still be an outlier.
| staticassertion wrote:
| The Y axis of the chart that I linked, entitled 'Cache
| Retrieval Time by Count of Cached Assets', shows latency
| well above 100ms.
| divbzero wrote:
| Thanks. Switching the metric from _Average_ to _Max_ does
| show that the cache retrieval time can reach ~100ms even
| when cached resource count is low.
| dblohm7 wrote:
| Firefox's HTTP cache races with the network for precisely this
| reason.
| [deleted]
| cwoolfe wrote:
| So is the takeaway that data in the RAM of some server connected
| by fast network is sometimes "closer" in retrieval time than that
| same data on a local SSD?
| philsnow wrote:
| Back in ~2003 I had bought a new motherboard + cpu (a Duron
| 800MHz IIRC) but as a poor college kid, only had enough money
| left over for 128MB of RAM.. but the system I was replacing had
| ~768MB. I made a ~640MB ramdisk on the old system and mounted
| it on the new system as a network block device, and the result
| was much, much faster than local swap (this was before consumer
| SSDs though).
|
| [0] "nbd" / https://en.wikipedia.org/wiki/Network_block_device
| ; this driver is of course still in the kernel; you could do
| this today with an anemic raspberry pi if you wanted
| adamius wrote:
| Now I'm imagining a rack of raspis acting as one giant ram
| swap drive over nbd. This could work for a given value of
| "work". cost of pi vs stick of ram. A kv storage as well
| perhaps.
|
| Then again, whats a TB worth on just one xeon server?
| Probably cheaper... or not?
| jhartwig wrote:
| Have you seen the cost of Pis lately :)
| staticassertion wrote:
| Not really. The issue is with throttling.
| r1ch wrote:
| For me this is very noticeable whenever I open a new Chrome tab.
| It takes 3+ seconds for the icons of the recently visited sites
| to appear, whatever cache is used for the favicons is extremely
| slow. Thankfully the disk cache for other resources runs at a
| more normal speed.
| TheDudeMan wrote:
| Seems like a badly-designed cache.
| charcircuit wrote:
| When you make the request the server the server will have to
| look up the image from its own "cache" before having to send it
| back to you. The client's cache would have to be not only
| slower than it's ping, but slower than it's ping + the server's
| "cache."
| staticassertion wrote:
| The issue has nothing to do with the cache itself. It's about
| the throttling behavior.
| gowld wrote:
| The throttling is part of the cache design. It's browser
| cache, not a multi-client cache in the OS.
| staticassertion wrote:
| That doesn't sound right based on my reading of this
| document. If I'm wrong please do correct me, I don't know a
| ton about the internals here.
|
| https://docs.google.com/document/d/1Aa7OKFRdtmn4IFzgHYfqeqk
| 5...
| gowld wrote:
| The article shows a lot of data about cache speed, but I don't
| see a comparison to cacheless network.
| dllthomas wrote:
| ... oh, _that_ cache.
| femiagbabiaka wrote:
| I had the same reaction. It's a good title.
| staticassertion wrote:
| Since apparently no one is willing to read this excellent
| article, which even comes with fun sliders and charts...
|
| > It turns out that Chrome actively throttles requests, including
| those to cached resources, to reduce I/O contention. This
| generally improves performance, but will mean that pages with a
| large number of cached resources will see a slower retrieval time
| for each resource.
| pclmulqdq wrote:
| I am a systems engineer. I read the article title, then started
| reading the article, and realized it was a bait and switch. It
| is not about "network" vs "cache" in computer systems terms,
| which is what you might expect. It is about "network" vs "the
| (usually antiquated) file-backed database your browser calls a
| cache." The former would have been a compelling article. The
| latter is kind of self-evident: the browser cache is there to
| save bandwidth, not to be faster.
| staticassertion wrote:
| I find it odd to call it a bait and switch when the first
| thing in the article is an outline.
|
| > It is about "network" vs "the (usually antiquated) file-
| backed database your browser calls a cache."
|
| It's actually nothing to do with the design of the cache
| itself as far as I can tell. If you finish reading the
| article you'll see that it's about a throttling behavior that
| interacts poorly with the common optimization advice for HTTP
| 1.1+, exposed by caching.
|
| > The latter is kind of self-evident: the browser cache is
| there to save bandwidth, not to be faster.
|
| I don't think that's something you can just state
| definitively. I suspect most people do in fact view the cache
| as an optimization for latency. Especially since right at the
| start of the article, the first sentence, the "race the
| cache" optimization is introduced - an optimization that is
| clearly for latency and not bandwidth purposes.
| karmakaze wrote:
| It is a bait and switch. The issue is with local files and
| throttling, neither word appears in the title or outline.
|
| Edit: I didn't need this post to tell me about the "waiting
| for cache" message I used to see with Chrome.
| dchftcs wrote:
| When I see "browser cache" I can't think of anything
| other than a local file storage. Maybe it confused you,
| but there's no deliberate misleading.
| karmakaze wrote:
| But it's not even in general it's about Chrome's specific
| implementation.
| hinkley wrote:
| Memcached exists for two reasons: Popular languages that
| hit inflection points when in-memory caches exceeded a
| certain size, and network cards becoming lower latency than
| SATA hard drives.
|
| The latter is a well known and documented phenomenon. The
| moment popular drives are consistently faster than network
| again, expect to see a bunch of people writing articles
| about using local or disk cache, recycling old arguments
| from two decades ago.
| staticassertion wrote:
| OK but that's got nothing to do with the post.
| pclmulqdq wrote:
| When I have worked on distributed systems, there are often
| several layers of caches that have nothing to do with
| latency: the point of the cache is to reduce load on your
| backend. Often, these caches are designed with the
| principle that they should not hurt any metric (ie a well-
| designed cache in a distributed system should not have
| worse latency than the backend). This, in turn, improves
| average latency and systemic throughput, and lets you serve
| more QPS for less money.
|
| CPU caches are such a common thing to think about now that
| we have associated the word "cache" with latency
| improvements, since that is one of the most obvious
| benefits of CPU caches. It is not a required feature of
| caches in general, though.
|
| The browser cache was built for a time when bandwidth was
| expensive, often paid per byte, the WAN had high latency,
| and disk was cheap (but slow). I don't know exactly when
| the browser cache was invented, but it was exempted from
| the DMCA in 1998. Today, bandwidth is cheap and internet
| latency is a lot lower than it used to be. From first
| principles, it makes sense that the browser cache, designed
| to save you bandwidth, does not help your website's
| latency.
|
| Edit: In light of the changes in the characteristics of
| computers and the web, this article seems to mainly be an
| argument for disabling caching on high-bandwidth links on
| the browser side, rather than suggesting "performance
| optimizations" that might silently cost your customers on
| low-bandwidth links money.
| kreetx wrote:
| Did you read the article?
|
| Cache is "slow" because the number of ongoing requests--
| including to cache! --are throttled by the browser. I.e
| the cache isn't slow, but reading it is waited upon and
| the network request might be ahead in the queue.
| citrin_ru wrote:
| > "network" vs "the (usually antiquated) file-backed database
| your browser calls a cache."
|
| Firefox uses SQLite for on-disk cache backend. Not the latest
| buzzword, but not exactly antiquated. I expect a cache backed
| in Chrome to be at least as fast.
|
| > cache is there to save bandwidth, not to be faster
|
| In most cases a cache saves bandwidth and reduces page load
| time at the same time. Internet connection which is faster
| than a local SSD/HDD is a rare case.
| cuteboy19 wrote:
| It is not immediately obvious why a local filesystem would be
| slower than network
| bee_rider wrote:
| Well. I mean, get a slow enough hard drive and a nice
| enough network and we can get there, haha.
| pclmulqdq wrote:
| If you live in a rich country and are on fiber internet
| rather than a cable modem (or if your cable modem is new
| enough), you likely have better latency to your nearest
| CDN than you do to the average byte on an HDD in your
| computer. An SSD will still win, though.
|
| The browser cache is kind of like a database and also
| tends to hold a lot of cold files, so it may take
| multiple seeks to retrieve one of them. Apparently it has
| throughput problems too, thanks to some bottlenecks that
| are created by the abstractions involved.
| roywashere wrote:
| On 'recent' files your operating system will most typically
| not even touch the disk because they will aggressively
| cache those files in memory
| cogman10 wrote:
| Seems like a shortcut that shouldn't be.
|
| I can understand throttling network requests, but disk
| requests? The only reason to do that would be for power savings
| (you don't want to push the CPU into a higher state as it loads
| up the data).
| fnordpiglet wrote:
| Depends on if you value latency for the user. Saying I try
| both and choose the one coming first hurts no one but the
| servers not being protected by a client cache. But there's
| absolutely no reason to believe a client has a cache that
| masks requests. AFAIK there's no standard that says clients
| use caches for parsimony and not exclusively latency. As a
| matter of fact I think this is a good idea if it ever takes
| time to consult the cache, and the trade off is more
| bandwidth consumption which we are awash in. If you care that
| much run a caching proxy and use that and you'll get the same
| effect of the client side cache masking requests. But I would
| say it's superior because it always uses the local cache
| first and doesn't waste user time on the edge condition in
| their cache coherency. It comes from Netscape which famously
| convinced everyone that it's one of the hardest problems.
| That leads to the final benefit, the cache doesn't have to
| cohere. If it's too expensive at that moment to cohere and
| query then I can use the network source. Again the only
| downside is the network bandwidth is more consistently user.
| I would be hard pressed to believe most Firefox users already
| are grossly bandwidth over provisioned, and the amount of a
| fraction of a cable line a web browser loading from the cache
| no one could even notice that.
| cogman10 wrote:
| > Depends on if you value latency for the user.
|
| I do. Which is why it's silly to throttle the cache IO.
|
| The read latency for disks is measured in microseconds. Is
| it possible for the server to be able to respond faster?
| Sure. However, if you aren't within 20 miles of the server
| then I can't see how it could be faster (speed of light and
| everything).
|
| These design considerations will depend greatly on where
| you are at. It MIGHT be the case that eschewing a client
| cache in a server to server talk is the right move because
| your servers are likely physically close together. That'd
| mean the server can do a better job making multiple client
| requests faster through caching saving the memory/disk
| space required for the clients.
|
| There is also the power consideration. It take a lot more
| power for a cell phone to handle a network request than it
| does to handle a disk request. Shouting into the ether
| isn't cheap.
| Spooky23 wrote:
| What happens when your 200 browser tabs are all looking
| for updates?
| citrin_ru wrote:
| It makes sense to apply some throttling (and reduce CPU
| priority) for inactive tabs.
|
| But the active tab when browser window is an active one
| should work without any throttling for lower response
| time.
| cogman10 wrote:
| If your 200 browser tabs are saturating your 8gbps link
| to your m.2 ssd, what do you think they'll do to your
| 10mbps connection to the internet?
| sieabahlpark wrote:
| staticassertion wrote:
| I imagine it's just that the cache is in an awkward place
| where the throttling logic is unaware of the cache and the
| cache is not able to preempt the throttling logic.
| cogman10 wrote:
| I'd guess the same thing is going on. Definitely FEELS like
| a code structure problem.
| cyounkins wrote:
| Worth noting that around the end of 2020, Chrome and Firefox
| enabled cache partitioning by eTLD+1 to prevent scripts from
| gaining info from a shared cache. This kills the expected high
| hit rate from CDN-hosted libraries.
| https://developer.chrome.com/blog/http-cache-partitioning/
| https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P...
___________________________________________________________________
(page generated 2022-06-29 23:00 UTC)