[HN Gopher] Caching is an abstraction, not an optimization
___________________________________________________________________
Caching is an abstraction, not an optimization
Author : samuel246
Score : 71 points
Date : 2025-07-01 10:42 UTC (2 days ago)
(HTM) web link (buttondown.com)
(TXT) w3m dump (buttondown.com)
| ckdot2 wrote:
| "I think now caching is probably best understood as a tool for
| making software simpler" - that's cute. Caching might be
| beneficial for many cases, but if it doesn't do one thing then
| this is simplifying software. There's that famous quote "There
| are only two hard things in Computer Science: cache invalidation
| and naming things.", and, sure, it's a bit ironical, but there's
| some truth in there.
| bell-cot wrote:
| (You forgot off-by-1 errors.)
|
| All software has to name things, and count. Caching (including
| invalidation) is best understood as a liability. If you can
| foist it off on your CPU and OS and DB, good for you.
| Programming whatever you're actually trying to get done is
| already hard enough.
| yxhuvud wrote:
| Off by 1-errors is not part of the original quote, but is
| just a later addon to make it funny.
|
| They also tend not to be very hard.
| TeMPOraL wrote:
| Except when they're part of some base assumptions in the
| domain or dozen of layers of abstractions below you. They
| are hard to _prevent from happening_.
| whateveracct wrote:
| caching often does simplify software though when done well
|
| and - as the OP suggests - it works best when the cache is a
| well-defined abstraction with properties and rules about how it
| works
|
| just because "caching" is mentioned in a meme doesn't mean it
| can't be true that it can simplify software
| meesles wrote:
| > caching often does simplify software though when done well
|
| I have to push back here, I think this is objectively untrue.
| By definition a system or piece of code on where you add a
| condition where something else happens (cache) that behaves
| differently than the uncached path increases complexity.
|
| I'm not saying it's wrong to cache things or that they aren't
| useful, but I think they absolutely are an abstraction and an
| optimization at the cost of complexity. Good code bases hide
| complexity from the devs all the time, so it's not a question
| of whether you can code it away, but rather how difficult is
| it to troubleshoot the internals of the system.
| PaulHoule wrote:
| Trying some other way to explicitly manage multiple storage
| tiers could get pretty complicated.
| jameshart wrote:
| If you hide caching away as an implementation detail behind
| an abstraction, it comes back and bites you as a leaky
| abstraction later.
|
| Look at how CPU cache line behaviors radically change the
| performance of superficially similar algorithms.
|
| Look at how query performance for a database server drops off
| a cliff the moment the working cache no longer fits in
| memory.
|
| Hiding complexity can be a simplification, until you exceed
| the bounds of the simplification and the complexity you hid
| demands your attention anyway.
| atq2119 wrote:
| CPUs are still a great example for how caching simplifies
| things.
|
| There's a long history in computer architecture of cores
| and accelerators that don't have a cache but instead rely
| on explicitly programmed local scratchpads. They are
| universally more difficult to program than general purpose
| CPUs because of that.
| hmottestad wrote:
| I'm sure the CPU designers would love it if they didn't
| need several different layers of cache. Or no cache at
| all. Imagine if memory IOPS were as fast as L1 cache, no
| need for all that dedicated SRAM on the chip or worry
| about side channel attacks.
| aswanson wrote:
| I guess simplification needs to include "at what level" as a
| qualifier.
| ckdot2 wrote:
| That abstraction is another layer though. And additional
| layers are additional complexity. So, if you add another
| layer, the software is less simple than before. You might
| need to have caching in your software. I don't doubt that.
| But there's simply no way it makes the software more simple
| except if you assume some unfortunate starting point where
| you could get rid of any high-complex performance
| optimizations in your existing code by replacing them with a
| more simple cache solution. But then the statement should be
| "refactoring makes your code simpler".
| whateveracct wrote:
| additional layers (or software in general) are not
| inherently additional complexity
| TeMPOraL wrote:
| In some sense they are, since establishing an abstraction
| is strictly additive. Abstractions help _manage
| complexity_.
| moritzwarhier wrote:
| Getting cache keys or caching events wrong is easy and a
| nightmare.
|
| But getting them right can easily cross the boundary of
| purely optimizing performance towards simplifying public API
| of something. I think this is true.
|
| I'd imagine an involved example where semantics and caching
| really start to offer a trade-off.
|
| Imagine that somehow querying the actual meteorological data
| is quite expensive, and consider this badly written
| pseudocode (equals sign denoting default parameters):
|
| - measureCurrentTemparature()
|
| - retrieveAccurateTemperatureForNanoSecond(momentInTime)
|
| -> cached abstractions which would access cached data:
|
| - getTempearature(moment = now(), tolerance = 1min)
|
| - getCurrentTemperature(tolerance = MIN_TOLERANCE)
|
| I know, reality is much more complicated, and using time
| (seeing it as quasi-continuous) as a caching parameter is
| already stretching it so far.
|
| Just a stupid example that came to my mind.
|
| I've bitten myself in the ass with caching rasterized
| reprentations of images more than once, where the input were
| SVG images or limited formats that convert to SVG.
| EGreg wrote:
| I never understood about cache invalidation or naming things
|
| Both are not that difficult, honestly.
|
| Aren't there a lot harder things out there
| gryfft wrote:
| It's a little bit tongue in cheek; no one is seriously
| suggesting it's harder than P=NP or the problem of
| consciousness. But there's something a bit "death and taxes"
| to the inevitability that any large enough project is going
| to have some corner cases involving these old chestnuts.
|
| Heck you can probably prove that any system for naming things
| is either inconsistent or incomplete.
| TeMPOraL wrote:
| > _no one is seriously suggesting it 's harder than P=NP or
| the problem of consciousness._
|
| Well, I for one feel that "naming things" ultimately boils
| down to the latter, which may or may not be harder than the
| former.
| Valodim wrote:
| In my experience, the larger the software you write, the
| truer these become. At some point all obvious names will have
| collisions, and getting caching right is crucial to do but
| difficult to achieve because it transcends the entire stack.
|
| You could group these two things into "getting the data model
| right" as the single hard thing, perhaps that rings more true
| to you :)
| quuxplusone wrote:
| For "only two hard problems," read "two candidates for among
| the hardest problems (but we feel strongly that these are
| indeed good candidates)," or something along those lines,
| more or less.
|
| It's also possible that these _used_ to be the only two hard
| problems at the time the aphorism was first recorded, but the
| underlying state of the world has changed since then and the
| aphorism, as recorded, is no longer current.
| IshKebab wrote:
| Cache invalidation isn't hard _in theory_. It 's just one of
| those things that is very easy to get subtly wrong and
| difficult to test.
|
| Think about all those times your program isn't building and
| `make clean` fixes it.
| throwaway150 wrote:
| > Think about all those times your program isn't building
| and `make clean` fixes it.
|
| I don't think that's a good example. I've worked with more
| than 20 different build tools by now, and I cannot recall a
| single instance where the problem actually came down to
| cache invalidation. Every time I dug into it, the real
| cause was something else: a mistake in the build script, an
| incorrectly declared dependency, or something similar.
|
| So when you say "think about all those times", not a single
| such time comes to my mind!
| gpderetta wrote:
| Namings things is of course a bit tongue in cheek. But cache
| invalidation is hard. For example, allegedly MESI is one of
| the hardest things to validate in processor design.
| TOGoS wrote:
| There is a secret technique, called content-addressing[1],
| which elegantly solves both of them at once.
|
| A lot of people haven't caught on, and try to cache things
| using ambiguous names, hence the struggle to invalidate their
| caches when the meaning changes.
|
| [1] This can be applied even if you don't know the content
| yet; you just have to unambiguously name the inputs to the
| function that produces it. You might not _know_ what all the
| inputs are, and then you have to start adding stuff like
| "unknown-unknown-2025-07-03T16", but it'll still basically
| work.
| bloppe wrote:
| "Two programs could have similar behaviour but structured very
| differently, the difference being that one utilizes caching as
| an abstraction and one explicitly has the concept of different
| tiers of storage."
|
| The author is comparing "off-the-shelf" caching with custom
| caching. They're coming from the assumption that you must be
| caching _somehow_ and arguing that the word "caching" should
| be understood to mean only particular approaches to the general
| idea of caching. And obviously the whole point of the general
| idea is to optimize things.
|
| It's a rhetorical mess
| heikkilevanto wrote:
| Caching is simple, yes. The hard part is in the last word,
| invalidation. Even that is manageable for a single process. But
| as soon as you have multiple (threads / processes / nodes /
| data centers) updating the data, it does get quite complex,
| pretty fast.
|
| Likewise, naming things is simple as long as you alone, or a in
| a small team. But as soon as there are multiple organizations
| with all their own traditions, it gets tricky. Just witness the
| eternal flame wars about camelCase, PascalCase, snake_case,
| kebab-case, and UPPER_CASE. It is almost as hopeless culture
| clash as Emacs vs Vi vs PowerPoint...
|
| (I leave the off-by-one errors as an exercise for the reader)
| TeMPOraL wrote:
| I'd say this is not the "naming things" that's hard. Beyond
| picking a common identifier format in the team, there are at
| least two dimensions that are _much harder_ :
|
| - The language dimension - choice of words, that are good
| enough for the purpose, and not confusing. For example,
| "Manager" is as ambiguous as it gets, it can mean many thing,
| except we've been using it long enough that there's a more
| specific shape of meaning[0] for that word in code/program
| architecture contexts - so you still would use it instead of,
| say "Coordinator", which would raise all kinds of questions
| that "Manager" no longer does.
|
| - The epistemological dimension - whether the word you chose
| correctly names the concept you meant, and whether the
| concept you meant is actually the right one to describe the
| thing you're trying to describe. Ultimately, this is _the_
| hard thing at the root of philosophy. In practice, it
| manifests like e.g. choice between digging into some obscure
| branches of mathematics to correctly name the thing
| "endofunctor" or something, or calling it "Square" and saying
| "fuck it, we'll clarify the exceptions in the comments".
|
| --
|
| [0] - I mean "more specific" in the sense it's distinct from
| the other meanings and somewhat narrow - but still it's fuzzy
| as heck and you can't describe it fully in words; it's
| basically tacit knowledge.
| Traubenfuchs wrote:
| I never understood this meme.
|
| We use caching a lot, anything that gets cached can only be
| written by one service each. The writing services emit cache
| invalidation messages via SNS that cache users must listen to
| via SQS, to clear/update their cache.
|
| Alternatively we cache stuff with just a TTL, when immediate
| cache invalidation is not important.
|
| Where's the struggle?
| porridgeraisin wrote:
| Here's one: everybody invalidating and refreshing their cache
| at the same time can cause a thundering herd problem.
| hmottestad wrote:
| Does SQS guarantee delivery to all clients? If it does then
| that's doing a lot of heavy lifting for you.
|
| If it doesn't guarantee delivery, then I believe you will at
| some point have a client that reads a cached value thinking
| it's still valid because the invalidation message got lost in
| the network.
| maccard wrote:
| Eventually. The problem is that eventually delivering that
| message will result in clients assuming that it will always
| be the same, when it's not.
| williamdclt wrote:
| You don't support read-your-own-write and your cache data
| might be stale for arbitrarily long. These relaxed
| consistency constraints make caching a lot easier. If that's
| acceptable to your use cases then you're in a great place! If
| not... well, at scale you often need to find a way for it to
| be acceptable anyway
| pton_xd wrote:
| > Where's the struggle?
|
| If there are no real consequences for reading stale data, and
| your writes are infrequent enough, then indeed you're lucky
| and have a relatively simple problem.
| hatthew wrote:
| If you have a system with "slow storage", caching is a way to
| optimize that to "storage that is sometimes fast".
|
| If you have a system with "slow storage" and "fast storage",
| caching is a way to abstract that away to just "storage".
|
| The author is arguing that the latter is the default way we
| should think about the concept of caching, which is a valid
| opinion to have.
| Joker_vD wrote:
| There is also an important (but often overlooked) detail that
| you/your application may not be the only user of the cache. At
| which point caching, indeed, is an optimization via abstraction:
| when you fetch an X, you are in no position to predict that the
| next fifty completely unrelated to you requests would also want
| to fetch the same X, so it should probably be cached to be
| readily served.
|
| Which is why solving the "I want my data in fast storage as often
| as possible" problem may be counter-productive on the whole: you
| ain't the only client of the system; let it breath and server
| requests from others.
| eigenform wrote:
| Even more obvious if you think about the case of hardware-managed
| caches! The ISA typically exposes some simple cache control
| instructions (and I guess non-temporal loads/stores?), but apart
| from that, the actual choice of storage location is abstracted
| away from you (and your compiler).
| necovek wrote:
| On top of the other things mentioned (caching always introduces
| complexity with lifetime tracking, and thus can't make things
| simple), the article's got it the wrong way around.
|
| When code has abstract interfaces for data access, introducing
| caching can be simpler (but not simple) by localizing it in the
| abstraction implementation which has or doesn't have caching.
|
| But it is not an _abstraction_ (you can perfectly well do caching
| without any abstractions, and it 's frequently done exactly that
| way).
| movpasd wrote:
| I think you and the article are referring to abstractions over
| different concerns.
|
| The concern you're talking about is about the actual access to
| the data. My understanding of the article is that it's about
| how caching algorithms can abstract the concern of minimising
| retrieval cost.
|
| So in some ways you're coming at it from opposite directions.
| You're talking about a prior of "disk by default" and saying
| that a good abstraction lets you insert cache layers above
| that, whereas for the author the base case is "manually
| managing the layers of storage".
| foldU wrote:
| This is correct, I appreciate you for putting it so
| coherently :). I think I didn't make it clear enough in the
| piece that I'm coming from a stance of fast access being
| table stakes, and the question being about how that's
| accomplished.
| necovek wrote:
| "Caching" is an _idea_ of storing a result of an expensive
| computation in storage that is faster to get from than
| doing the original computation (in very generic computer
| terms, computation can be simply fetching from the network
| or slower local storage).
|
| What you describe as "caching algorithms" are not really
| caching algorithms, but cached object lifetime management
| algorithms (LRU, LFU...).
|
| "Abstraction" is a higher level, simplified view of a set
| of concepts, yet caching is a single concept. See eg. https
| ://en.wikipedia.org/wiki/Abstraction_(computer_science)
|
| It sounds like you are both trying to redefine what
| "caching" means (tying it to _implementations_ of
| particular algorithms), but also what "abstraction" means.
|
| We should be very deliberate with the language we use, and
| our main goal should be to make it simpler to understand,
| not harder -- I believe you are doing the latter here.
| necovek wrote:
| The language used is seriously confusing here.
|
| Algorithms can't really abstract anything since they are,
| well, just algorithms (formal descriptions of how a
| computation should be done).
|
| Looking at the author's examples again, I think most
| everybody would say that caching is used in both:
| if data_id in fast_storage: return
| fast_storage.get(data_id) else: data =
| slow_storage.get(data_id) fast_storage.set(data_id,
| data) return data
|
| and # Uses fast storage or slow storage just
| like above, but behind the get() method. return
| storage.get(data_id)
|
| The first one does not make an abstraction on storage, the
| second one does, but they are both "caching" data internally.
|
| While there are _generic_ _implementations_ of caching
| algorithms and we can consider those abstractions, "caching"
| is a wider term than those implementations, and is
| specifically not an abstraction (the fact that there is a
| caching implementation that abstracts something does not make
| all caching an abstraction).
|
| Edit: Let me also point out that "abstract the concern of
| minimising retrieval cost" is not caching -- I can say that
| eg. a simple interface with method FastGet(id) does the
| former, and it needs not use any caching if the underlying
| structure is fast enough and eg. directly in memory.
| gmuslera wrote:
| "fast storage" is about performance, your abstraction includes
| performance elements. If you go that down, then you are
| optimizing on your abstraction designs. What doesn't have to be
| wrong, but then don't say that is not optimization.
| LudwigNagasena wrote:
| Caching is an optimisation. Sometimes caching can be abstracted
| away, eg CPU cache or build cache are pretty much abstracted away
| for a usual web developer. But web page caching is very hard to
| abstract without any abstraction leaks and weird bugs. And even
| CPU cache is no longer an abstraction if you deal with very high
| performance code.
| k__ wrote:
| Anything can be an abstraction if designed carefully.
| jbverschoor wrote:
| Everything is caching. Almost nothing operates on the target data
| directly.
| canyp wrote:
| Did you hand-draw that graffiti? Never quite realized that
| graffiti of technical ideas looks really goated. Best part of the
| post, to be honest.
| jxjnskkzxxhx wrote:
| > looks really goated
|
| Oof you're trying so hard you could cut diamond with that line.
| timewizard wrote:
| > I've always been told that caching is a tool to make software
| faster.
|
| Who told you that?
|
| > you don't have to go all the way back to some backend database
| or API server or SSD [...] Caching is thus a tool to improve
| performance.
|
| That's called "latency." This is not at all the same as
| "performance."
|
| > My feelings now are that that perspective on caching is wrong
|
| I agree.
| taeric wrote:
| This reminds me of the use of materialized views as both a cache
| strategy and as an abstraction helper.
| neuroelectron wrote:
| This is basically semantic argument, and I will not be engaging
| in it
| jxjnskkzxxhx wrote:
| You're right, but caching is an optimization.
| pclmulqdq wrote:
| Use of a better abstraction is an optimization, though.
| jongjong wrote:
| I was discussing this with someone recently, caching is one of
| those things that people might do behind the scenes, thinking
| that it doesn't affect the API but in fact it can create all
| sorts of issues/complexity.
| zmj wrote:
| This article is talking about single-writer, single-reader
| storage. I think it's correct in that context. Most of the hairy
| problems with caches don't come up until you're multi-writer,
| multi-reader.
___________________________________________________________________
(page generated 2025-07-03 23:00 UTC)