[HN Gopher] Cache made consistent: Meta's cache invalidation sol...
___________________________________________________________________
Cache made consistent: Meta's cache invalidation solution
Author : uvdn7
Score : 143 points
Date : 2022-06-08 17:46 UTC (5 hours ago)
(HTM) web link (engineering.fb.com)
(TXT) w3m dump (engineering.fb.com)
| Hnrobert42 wrote:
| You can't even visit the FB blog while using NordVPN. Amazing.
| [deleted]
| gwbas1c wrote:
| > Based on consistency tracing, we know the following happened...
|
| I'm a little confused:
|
| - Is the database updated notification going out while the
| transaction is still in-progress? Doesn't it make more sense to
| delay the notification until after the transaction commits?
|
| - If a cache tries to update itself while there's an open
| transaction against the data it's trying to read, shouldn't the
| database block the operation until the write is complete? And
| then shouldn't another notification after the transaction commits
| trigger the cache to update?
| uvdn7 wrote:
| > Is the database updated notification going out while the
| transaction is still in-progress?
|
| No, that's not what happened.
|
| > If a cache tries to update itself while there's an open
| transaction against the data it's trying to read, shouldn't the
| database block the operation until the write is complete?
|
| Yes. Generally speaking, having a read transaction here would
| be better. There are unrelated reasons why it's done this way.
| The point of the example is that it's really intricate and we
| can still identify the bug regardless.
| pdevr wrote:
| From the article:
|
| >>Polaris pretends to be a cache server
|
| De facto, it is a cache server, with the associated performance
| decrease. Isn't the main purpose of caching to improve
| performance?
|
| >>We deploy Polaris as a separate service so that it will scale
| independently from the production service and its workload.
|
| Assuming the scaling is horizontal, so then, to synchronize among
| the service instances, what do you do? Create another meta-
| Polaris service? Not rhetoric or sarcasm - hoping for an open
| discussion.
| [deleted]
| uvdn7 wrote:
| > De facto, it is a cache server, with the associated
| performance decrease. Isn't the main purpose of caching to
| improve performance?
|
| Polaris only receives cache invalidation event, and doesn't
| serve any client queries.
|
| > so then, to synchronize among the service instances
|
| It doesn't synchronize among the service instances. It's
| actually basically stateless. It pretends to be a cache server
| only for receiving invalidation events, and acts as a client,
| and assumes no knowledge of the cache internals.
| pdevr wrote:
| Thanks for the clarification. However, in that case, won't
| the querying of all the cache copies/replicas turn out to be
| the bottleneck at high volumes? Because you are going to have
| to check all the existing copies/replicas somewhere, right?
| uvdn7 wrote:
| Yep. Polaris checks can be sampled, but still covers pretty
| much all types of workload and scenarios/races over time.
| bklaasen wrote:
| > Phil Karlton famously said, "There are only two hard things in
| computer science: cache invalidation and naming things."
|
| ...and off-by-one errors.
| [deleted]
| Trufa wrote:
| This seems to be a pure click bait title, cache invalidation will
| always be hard since it's basically a balanced give and take
| issue which simple logic dictates can't avoid.
| uvdn7 wrote:
| Can you elaborate?
|
| I went into details in the post about what I believe is the
| root cause of why cache invalidation is hard, which seems
| different than what you are saying.
| orf wrote:
| > Take TAO, for example. It serves more than one quadrillion
| queries a day
|
| That's 11,574,074,074 requests per second.
| irrational wrote:
| Phil Karlton famously said, "There are only two hard things in
| computer science: cache invalidation and naming things and off-
| by-one errors."
| ntoskrnl wrote:
| Correct me if I'm wrong, but the title seems a little clickbaity.
| "Cache invalidation might no longer be hard" --> "We built a
| tracing library and manually fixed the bugs it uncovered"
| kilburn wrote:
| "A little clickbaity" if you are generous. Short-sighted and
| grandiloquent if you are not.
|
| Short-sighted because it narrows the original problem (cache
| invalidation) to a specific version ( _detecting_ cache
| invalidation errors _past an arbitrary time-frame_ ).
|
| Grandiloquent because it then claims to have "solved" the
| general problem, whereas they haven't even solved the narrowed
| version. Notice their own premise:
|
| > For any anomaly in a stateful service, it is an anomaly only
| if clients can observe it one way or the other. Otherwise, we
| argue that it doesn't matter at all.
|
| This is a practical attitude that can yield great results, and
| probably even good engineering practice. However, it will never
| lead to _solving_ a problem, because a problem isn 't solved
| until you _prove_ that it is.
| uvdn7 wrote:
| > For any anomaly in a stateful service, it is an anomaly
| only if clients can observe it one way or the other.
| Otherwise, we argue that it doesn't matter at all.
|
| Thanks for the comment! I stand by this claim; I would love
| to hear more about why an anomaly (any anomaly) matters if it
| can't possibly be observed by any client for stateful
| service.
| tofuahdude wrote:
| It is just the difference between "matters" and "solved".
| kilburn wrote:
| The claim itself is fine because it is kind of
| tautological. An error that _cannot be observed in any way_
| is probably not an error. However, you _have_ to prove that
| it can 't be observed.
|
| This is quite different than what the described system does
| though. The system flags errors that _have_ been observed.
| The correct claim to go with what the article does would
| thus be:
|
| > For any anomaly in a stateful service, it is an anomaly
| only if the current clients running their current
| operations happen to observe it. Otherwise, we argue that
| it doesn't matter at all.
|
| It's much more noticeable this way, isn't it?
| uvdn7 wrote:
| This is a fair point!
|
| The exact same argument applies to normal exceptions and
| errors. It might be the case that out of quadrillions of
| queries a day, just none of them hit the error path, and
| hence we won't observe any issues.
|
| I can see that in this case, if it matters or not becomes
| somewhat subjective. It's a good point though!
| uvdn7 wrote:
| I am the author; so I am obviously biased here.
|
| I am serious when I say cache invalidation might no longer be a
| hard thing in computer science. In the post, I explained why
| cache invalidation is hard; and how we solve/manage its unique
| challenge and complexity. By my definition, we are solving the
| cache invalidation problem.
|
| The analogy I have is Paxos. It's notoriously hard to implement
| Paxos correctly. Google published a paper on Paxos Made Live
| just on how they managed the implementation and
| productionization complexity.
| continuational wrote:
| Please be careful with such bold claims. You don't really
| address the hard part of cache invalidation, which is to
| figure out _when_ to do it.
| uvdn7 wrote:
| > which is to figure out when to do it.
|
| Can you elaborate?
| continuational wrote:
| Sure - you typically cache the result of some expensive
| query.
|
| The hard part of cache invalidation is to detect _when_
| an update somewhere in your system is going to affect the
| result of that query, such that you need to trigger an
| invalidation.
| uvdn7 wrote:
| Good point!
|
| We have memcache which is a look-aside cache that serves
| this type of workload. What you described can be solved
| by adding one level of abstraction and letting reads and
| writes all go through it.
|
| Now on your read path, you can construct arbitrary
| complex sql queries or whatnot, but it must take some
| kind of input to filter on. Those become part of the
| "keys".
|
| The invariant is that as long as the "context/filter" you
| encode covers all the mutations which would impact your
| cache data, you should be good. Based on our experience,
| it has been fairly managable.
| irrational wrote:
| >The invariant is that as long as the "context/filter"
| you encode covers all the mutations which would impact
| your cache data, you should be good.
|
| Well, isn't this the truly hard part of cache
| invalidation?
| uvdn7 wrote:
| I can see that. One can argue that the "difficulty" is
| front loaded. If or not you can identify the list of
| "context" is local. I guess you can probably come up with
| complicated dependencies and argue it's hard to capture
| the dependencies and I would agree with you.
|
| Now getting back to the "what's really hard about cache
| invalidation" part. Even with a much simpler model. Say
| you just have a k/v store, no joins, nothing. Is cache
| invalidation simple in that case?
|
| I went into details about why it's still insanely hard.
| And the big example at the end might help make that
| point.
|
| And this is one level beneath challenges from tracking
| dependencies, and I argue that's what makes cache
| invalidation hard.
|
| Now going back to your example with complicated
| dependencies. Maybe TTL is a better solution. With many
| dependencies, any changes from the dependency list might
| trigger invalidation. At some point, just doing TTL,
| would be simpler.
| ahahahahah wrote:
| Yeah, the entire discussion here is fucked because the
| poster doesn't understand what the original quote even
| meant.
| darig wrote:
| dang wrote:
| Yes - we changed it now and I posted
| https://news.ycombinator.com/item?id=31672562.
| londons_explore wrote:
| > In other words, 99.99999999 percent of cache writes are
| consistent within five minutes.
|
| Those are the kind of rates that lead to very hard to find bugs.
| You won't properly write code in downstream systems to properly
| handle that failure case, and suddenly someone will get paged at
| 3am because it has just happened years after the code was
| deployed and nobody really understands how this corner case
| suddenly happened!
|
| When I'm building systems, every happy codepath should either
| happen frequently or never.
|
| Having a cache whose design allows it to be stale but only very
| rarely seems like a bad idea. If I was building something on top
| of this cache I would have a layer on top to force staleness for
| 1 in 1000 requests just to force the application built on top to
| be designed to handle that properly.
| uvdn7 wrote:
| I agree with everything you said.
|
| > Having a cache whose design allows it to be stale but only
| very rarely seems like a bad idea.
|
| And that's exactly why we first brought this number from 6 9's
| to more than 10 9's; and we are not done yet. Also the cache is
| not "designed" to be inconsistent. It's like Paxos is designed
| to solve the consensus problem, but when implemented, most
| Paxos implementations do not work properly.
| VWWHFSfQ wrote:
| As naive as it is, I always kinda liked MySQL's query cache
| invalidation strategy: blow away the whole cache anytime a
| mutating statement arrives at the server.
|
| Simple. Effective. But it obviously doesn't work for everything.
| Trufa wrote:
| Cause who needs scaling anyway!? Joking aside, strategy seems
| logical for smaller stuff.
| judofyr wrote:
| > At a high level, Polaris interacts with a stateful service as a
| client and assumes no knowledge of the service internals. This
| allows it to be generic. We have dozens of Polaris integrations
| at Meta. "Cache should eventually be consistent with the
| database" is a typical client-observable invariant that Polaris
| monitors, especially in the presence of asynchronous cache
| invalidation. In this case, Polaris pretends to be a cache server
| and receives cache invalidation events.
|
| I'm a bit confused here. Polaris behaves as a regular cache
| server which receives invalidation events, but isn't the typical
| bug related to cache invalidation that a service _forgets_ to
| contact the cache server for invalidation? So this will only
| catch cases where (1) you remember to contact Polaris, but you
| forgot to contact other cache servers [which Polaris happens to
| know about], OR (2) you 're not handling errors during
| invalidation requests to the cache server [and the request to
| Polaris was successful]? Or are you cache servers "smart" and
| might have internal logic which "ignores" an invalidation?
|
| What am I missing?
|
| EDIT: Reading through "A real bug we found and fixed this year"
| and I'm still a bit confused. It seems like a very contrived bug
| directed directly to how you deal with versioning (e.g. you allow
| the latest version to be present with stale metadata, or
| something?). My main concern with cache invalidation is _what_ to
| invalidate at what time.
| uvdn7 wrote:
| > isn't the typical bug related to cache invalidation that a
| service forgets to contact the cache server for invalidation?
|
| That's not the case based on our experience at FB. At-least-
| once delivery is a solved problem basically.
|
| But you are absolutely right that if there's an issue in the
| invalidation delivery, it's possible that polaris won't receive
| the event as well. Polaris actually supports a separate event
| stream (all the way from client initiated writes) to cover this
| case.
| benlivengood wrote:
| It helps that TAO is a write through cache so clients can't
| really forget to invalidate, correct? If someone were to
| directly write to MySQL shards there would be stale data
| ~indefinitely.
|
| I'm assuming the versioning is what ensures proper write,
| invalidation, and fetch ordering so that e.g. slow mysql
| writes/replication don't cause remote TAO clusters to receive
| an invalidation message and then read a stale value from a
| replica?
| uvdn7 wrote:
| > It helps that TAO is a write through cache so clients
| can't really forget to invalidate, correct?
|
| That's not the case.
|
| > If someone were to directly write to MySQL shards there
| would be stale data
|
| For the sake of this discussion, you can assume this is the
| setup, and everything should still stand.
|
| > I'm assuming the versioning is what ensures proper write,
| invalidation, and fetch ordering so that e.g. slow mysql
| writes/replication don't cause remote TAO clusters to
| receive an invalidation message and then read a stale value
| from a replica?
|
| You should join us :p This is getting into the details of
| our cache invalidation protocol and our cache consistency
| model. Maybe another post on another day!
| uvdn7 wrote:
| > EDIT: Reading through "A real bug we found and fixed this
| year" and I'm still a bit confused.
|
| yeah that specific bug is convoluted.
|
| > It seems like a very contrived bug directed directly to how
| you deal with versioning (e.g. you allow the latest version to
| be present with stale metadata, or something?).
|
| This is mostly for performance reason that I won't go into
| details too much here. Version is mostly an internal concept
| that clients don't care about. So it's OK-ish.
|
| > My main concern with cache invalidation is what to invalidate
| at what time.
|
| Can you elaborate? Our experience on cache invalidation is that
| you invalidate on mutation, and in terms of "what to
| invalidate" it depends on the computation, which you might have
| to give me a little more details.
| benlivengood wrote:
| Do you guarantee referential integrity in TAO? Last I heard the
| answer is no; complex queries are not supported and clients would
| have to do their own work with internal versions to achieve a
| consistent view on the whole graph (if such a consistent view
| even exists). But it seems to work fine since global referential
| integrity doesn't seem to be a big deal; there aren't a lot of
| cases where dependency chains are long enough or actions quick
| enough that it matters (e.g. a group admin adds a new user, makes
| them an admin, who adds an admin, and then everyone tries to
| remove the admin who added them [always locally appearing as if
| there is >=1 remaining admin], but causes the group to ultimately
| have no admins when transactions settle). Run into any fun issues
| like that?
|
| Contrasting that with Spanner where cache coherency is solved by
| reading from the recent past at consistent snapshots or by
| issuing global transactions that must complete without conflict
| within a brief truetime window. I am guessing the cost of Spanner
| underneath the social graph would be a bit too much for whatever
| benefits might be gained, but curious if anyone looked into using
| something similar.
| uvdn7 wrote:
| For the fun issues you described, read-modify-write (either
| using optimistic concurrency control or pessimistic) can work,
| if I understand your question correctly.
|
| Spanner is awesome and one of my favorite systems/papers. I
| think it would be very computational and power expensive to run
| the social graph workload on a spanner-like system. Do you have
| an estimate of how many megawatts (if not more) are needed to
| support one quadrillion queries a day on Spanner?
| lucideer wrote:
| This would have been a really good article in itself, without the
| frankly unbelievable hubris of the author.
|
| TL;DR this is a blogpost about some very interesting problems
| with distributed cache consistency and observability. The article
| doesn't address cache invalidation and - based on their replies
| here - the author doesn't seem to understand the cache
| invalidation problem at all.
| uvdn7 wrote:
| I am glad that you liked the content.
|
| On the definition of cache invalidation, and specifically why
| it's hard. Can you send me a link to any definition of it? This
| is what's in wikipedia and I think it's reasonable.
|
| > Cache invalidation is a process in a computer system whereby
| entries in a cache are replaced or removed.
|
| And I think I am describing that process, and what's hard about
| it. Some comments here explicitly talk about dependencies,
| which I can see why it's hard. My point is that even without
| dependencies, cache invalidation remains a hard problem. Now
| about dependency tracking, some of my thoughts are captured
| here https://news.ycombinator.com/item?id=31674933.
| lucideer wrote:
| Numerous replies to your comments here have pointed out why
| cache invalidation is hard. You've responded to them by
| saying "Good point!" (always with an exclamation point), and
| then proceeded to demonstrate in your response that you
| didn't understand their point.
|
| This comment describes the "hard" part of cache invalidation
| best: https://news.ycombinator.com/item?id=31674251 - your
| response to them makes very little sense.
|
| First, you acknowledge that they're correct, though you use
| obtuse language to say so: you seem to like using the very
| abstract term "dependency" to represent the very simple
| concept of detecting updates.
|
| In your second paragraph you then go off on an unrelated
| tangent by saying:
|
| > _Now getting back to the "what's really hard about cache
| invalidation" part._
|
| No. You're not "getting back" to that - you're changing the
| subject back to the topic of your article, which is unrelated
| to cache invalidation.
|
| > _Say you just have a k /v store, no joins, nothing._
|
| Cache invalidation is about invalidation - it's not about
| your store architecture. It's not about the implementation of
| logic that processes the clearing/overwrite of values, it's
| about the _when_ and nothing else.
|
| > _just doing TTL, would be simpler._
|
| Yes. It is simpler. If you want to avoid solving _the_ hard
| problem, you can use a TTL. Now... how long should it be?
| uvdn7 wrote:
| First of all, I do think you folks make good points. Let me
| try again.
|
| > The invariant is that as long as the "context/filter" you
| encode covers all the mutations which would impact your
| cache data, you should be good.
|
| I wrote this line, which is referred to as that being
| exactly why cache invalidation is hard, in the commented
| you linked.
|
| > it's about the when and nothing else.
|
| Let's talk about that. Not to over generalize this, with a
| simpler cache model (say you just cache a single item), do
| you agree that solving the "when" problem is very
| managable? If not, I would like to be enlightened.
|
| Now with this very simple cache model, where we have
| "magically" solved the "when" problem. Do you think cache
| invalidation is solved? Or it's simple? After knowing when,
| you still need to actually update cache right, and not to
| leave it in an inconsistent state (against the source of
| truth). Is that simple?
|
| Let's essentially break cache invalidation into a few parts
| 1. knowing when/who to invalidate 2. actually processing
| the invalidate
|
| My argument is that #1 can be very managable with simpler
| data models. #2 can't be avoided; and #2 is very hard.
| pyrolistical wrote:
| I don't get it. If you already have the version in the cache,
| then when a client does GET X, the cache can reply X@v4.
|
| As long as the client submits transactions with "I calculated
| this based off of X@v4" then the database can reject if there is
| a newer version of X.
|
| The client can then replay the transactions against the cache and
| by then there will be a X@v5.
|
| With this scheme you can track the transaction replay rate
| instead of having to build a new system to read-back the cache
| for consistency.
|
| To get the traceability on which cache node is stale, the client
| can forward a header from the cache node that identifies it. Then
| using the same transaction rejection rate ops has visibility on
| which cache node isn't being updated.
|
| No cache invalidation needed. Always just cache forever
| X@version, it's just that the cache allows an unstable query for
| GET X.
| jitl wrote:
| Tracking all data dependencies used for all writes in a large
| system seems rather challenging.
|
| Plus what about read workloads, like "pull message from queue
| and send it as a push notification to user Y"? I guess it's
| fine if the push is re-delivered due to stale cache?
| dragontamer wrote:
| Versions of cache-data aren't numbers, they're vectors-of-
| numbers (at a minimum).
|
| Here's a bunch of versions of variable "X":
|
| X@Alice-V1, X@Alice-V2, X@Bob-V1, X@Alice-V2/Bob-V2, X@Carol-V1
|
| Which version of "X" should you read?
|
| EDIT: Alice, Bob, and Carol are three different servers. What
| has happened here is that Alice and Bob are closer together, so
| their updates are faster between each other (synchronizing the
| writes). Carol is slower for some reason (bad connection?), and
| is going to update off of old data.
|
| In this case, the two valid caches are X@Alice-V2/Bob-V2, and
| X@Carol-V1. The other cached data is old and can be discarded.
|
| Things get complicated when you have not only reads (such as in
| your simplified question), but also writes that are cached
| (such as what happens in practice).
| moralestapia wrote:
| ???
|
| We seem to have different concepts with regards to the cache
| invalidation problem.
|
| The one I know has to do with "Is there a change in the source
| data? How should I check this? How often?" and such things.
|
| Yours seems to be "In my distributed system, I cannot guarantee
| the order of the messages that get exchanged, and thus, different
| nodes may end up having different information at times".
|
| IMO you have a consensus/distributed data problem, not a cache
| invalidation problem (the inconsistency between your cache nodes
| is a consequence of the underlying design of your system).
| throwdbaaway wrote:
| Exactly. The race condition problem from the example below
| still exists even without any caching, and perhaps still exists
| even without any replica.
|
| > Imagine that after shuffling Alice's primary message store
| from region 2 to region 1, two people, Bob and Mary, both sent
| messages to Alice. When Bob sent a message to Alice, the system
| queried the TAO replica in a region close to where Bob lives
| and sent the message to region 1. When Mary sent a message to
| Alice, it queried the TAO replica in a region close to where
| Mary lives, hit the inconsistent TAO replica, and sent the
| message to region 2. Mary and Bob sent their messages to
| different regions, and neither region/store had a complete copy
| of Alice's messages.
|
| A solution to this problem is to update Alice's primary message
| store to be "region 2;region 1" for a period of time, and get
| Alice to retrieve messages from both message stores during that
| period.
|
| However, since you have a caching layer that doesn't have an
| explicit TTL, such that any cache inconsistency can last
| indefinitely, you now have a much bigger distributed data
| problem.
| tremon wrote:
| FAFAIK the original cache invalidation problem is always seen
| from the point of view of a writer in a multiple-writer
| distributed system. It's not about determining the freshness of
| data as seen from a reader/consumer, but about atomically (!)
| updating all data copies by a write action in a distributed
| system.
|
| If you assume loosely-coupled, eventually-consistent data, you
| haven't solved the problem at all -- you've just side-stepped
| it. If you assume a single-writer, multiple-reader
| architecture, you don't have the original problem at all.
| uvdn7 wrote:
| > the original cache invalidation problem
|
| Do you have any reference or links about it? Thanks.
|
| > point of view of a writer in a multiple-writer distributed
| system > If you assume a single-writer, multiple-reader
| architecture, you don't have the original problem at all.
|
| It starts to sounds like you are describing Paxos?
| uvdn7 wrote:
| I think https://news.ycombinator.com/item?id=31672541 might
| address your question.
| moralestapia wrote:
| No, mine was not a question, it was a statement.
|
| You misnamed the problem you are trying to solve.
| uvdn7 wrote:
| OK. I am here to learn. Say we go with your definition of
| cache invalidation problem. Do you think that's THE hard
| part of cache invalidation?
|
| Say, you are just caching simple k/v data (no joins,
| nothing). Are you claiming cache invalidation is simple in
| that case?
|
| Also the reason why I didn't mention the cache invalidation
| dependencies is that _I believe_ it's a solved problem (see
| the link above, we do operate memcache at scale). I am
| happy to discuss if and why it wouldn't work for your case.
| moralestapia wrote:
| >Are you claiming cache invalidation is simple in that
| case?
|
| No.
|
| >I believe it's a solved problem
|
| It's not.
|
| Suppose you have source-of-truth A (doesn't really matter
| if it's a key-value store or whatever, it could even be a
| function for all intents) and a few clients B1, B2, B3,
| ... that rely on the data from A. You have to keep them
| in sync. When should B* check if A has changed? Every
| time they need it? Every minute? Every hour? Every day?
| This is the cache invalidation problem; which IMO is not
| even a problem but a tradeoff, but whatever.
|
| Epilogue: With all due respect, you have an unbelievable
| career, IBM, Autodesk, Google, Box and now Meta. None of
| those companies would give me five minutes of their time
| because I am self-taught, yet here we are :)
| unboxingelf wrote:
| Epilogue: With all due respect, you have an unbelievable
| career, IBM, Autodesk, Google, Box and now Meta. None of
| those companies would give me five minutes of their time
| because I am self-taught, yet here we are :)
|
| Correlation does not imply causation. I'm also self
| taught and some of those companies have given me a lot
| more than 5 minutes of their time.
|
| Here's some unsolicited feedback - I interpreted your
| comment above as implying you know more than $peer, and
| it comes off a little snarky with the emoji face. That
| approach doesn't demonstrate professionalism to me. On
| the other end, $peer focuses on the technical challenge.
| Most of the gig boils down to communicating ideas.
| uvdn7 wrote:
| > Suppose you have source-of-truth A (doesn't really
| matter if it's a key value store or whatever, it could be
| a function for all intents) and a few clients B1, B2, B3,
| ... that rely on the data from A. You have to keep them
| in sync. When should B* check if A has changed? Every
| time they need it? Every minute? Every hour? Every day?
| That is the cache invalidation problem.
|
| A few things to clarify here what you are referring to as
| clients are cache hosts (as they keep data). You seem to
| imply that the cache is running on the client? I was
| referring to cache servers (think memcache, Redis, etc.),
| for which the membership can be determined. So on update
| (e.g. when you mutate A, you know all the Bs to
| invalidate).
|
| Now continuing with your example, with cache running on
| the client. Assuming we are talking about same concept
| when we say "client", the membership is non
| deterministic. Clients can come and go (connect and
| disconnect as they wish). There are some attempts to do
| invalidation-based cache on clients, but they are hard
| because of the reason I just mentioned. So usually client
| cache is TTL'ed. E.g. the very browser you are using to
| see this comment has a lot of things cached. DNS is not
| going to send an invalidate event to your browser. It't
| TTL based.
|
| I guess what I am saying is that cache invalidation
| rarely applies to cache side cache as far as I know.
| Maybe you have a different example, which we can discuss.
| moralestapia wrote:
| Caches, clients, Facebooks, hosts, Metas, Redises(?), ...
| all those things don't really matter.
|
| What matters is B* reads from A, how often should that
| happen? That's it, literally. That's the whole problem.
| teraflop wrote:
| It doesn't matter whether the cache is co-located with
| the "client" that ultimately uses the data. Say A is a
| database, and B1, B2, B3... are memcached servers. The
| exact same situation applies.
|
| > So on update (e.g. when you mutate A, you know all the
| Bs to invalidate).
|
| But "knowing" this is a big part of what people mean when
| they say cache invalidation is hard! If the value in B is
| dependent on a complicated function of A's state, then it
| may be difficult to automatically determine, for any
| given mutation to A, which parts of B's cache need to be
| invalidated.
|
| > There are some attempts to do invalidation-based cache
| on clients, but they are hard because of the reason I
| just mentioned. So usually client cache is TTL'ed.
|
| Given this statement, the original title of this
| submission is even more baffling. If you recognize that
| data can be cached in clients, and that invalidating
| those caches is so hard that most systems -- including
| yours -- just completely abandon the goal of being able
| to do it correctly/reliably, then how can you claim your
| system makes it no longer a hard problem?
| uvdn7 wrote:
| Good points. I will try to address them.
|
| > But "knowing" this is a big part of what people mean
| when they say cache invalidation is hard!
|
| I can see that. Memcache is a look-aside cache we have at
| scale. There are abstractions on top to manage this
| complexity; and it has been fairly managable. I am sure
| you can come up with a complicated dependency tree that
| things are not obvious at all. But when you do have a
| very large dependency tree, any change in them can
| trigger cache invalidation, at which point, caching with
| TTL will be a better option IMO. I can see where you are
| coming from.
|
| > If you recognize that data can be cached in clients,
| and that invalidating those caches is so hard that most
| systems
|
| My reasoning for this being hard is different than yours
| I think. In my comment, it's due to indeterminism of the
| cluster membership. I think in that case, we are talking
| about different problems.
| uvdn7 wrote:
| It's a good observation that out-of-orderness and asynchrony
| contribute a lot to the challenge.
|
| > the inconsistency between your cache nodes is a consequence
| of the underlying design of your system
|
| I disagree with this. First of all, imposing order and
| synchrony is expensive, slow and unreliable. Even if we do
| that, cache fill and invalidation will never be externally
| coordinated; and will collide on the cache host.
| wildmanx wrote:
| > > the inconsistency between your cache nodes is a
| consequence of the underlying design of your system
|
| > I disagree with this. First of all, imposing order and
| synchrony is expensive, slow and unreliable. Even if we do
| that, cache fill and invalidation will never be externally
| coordinated; and will collide on the cache host.
|
| This is a weird reply. GP writes "A implied B" and you
| disagree by arguing that you had to do A. Sure. But it's
| still what implied B. Quite a fundamental logic flaw, no?
| uvdn7 wrote:
| I don't see that. Yes I did say I had to do A. Then I said
| that it doesn't imply B. Because even if the underlying
| system is strongly ordered, we can't order external events.
| Did I miss something?
| wildmanx wrote:
| A == "Design of your system"
|
| B == "Inconsistency between your caches"
|
| GP: A -> B
|
| You: "Imposing order and synchrony is [bad things]" ==
| "If you don't do A then [bad things]", i.e., (not A) ->
| [bad things]
|
| Also you: "Even if we did, then [other issues]" == "Even
| if you don't do A then [other issues]", i.e., (not A) ->
| ..what? B?
|
| You are certainly not arguing against the A -> B
| implication.
| uvdn7 wrote:
| I really appreciate your comment. I am learning as we
| speak.
|
| I was essentially saying !A !-> !B - even if we have a
| strongly ordered system, there will still be
| inconsistencies/races/problems.
|
| And I can see why it's a bad/weird argument. Thank you
| for pointing it out. Really appreciate it!
| seqizz wrote:
| Yeah I was also intrigued by the title and couldn't find what I
| expected tbh.
| cratermoon wrote:
| Yeah it reads to me like they redefined "cache consistency" to
| mean something different.
| uvdn7 wrote:
| That's not intended. What's your definition of cache
| consistency?
| rossmohax wrote:
| How both of these can be true?
|
| > Data in cache is not durable, which means that sometimes
| version information that is important for conflict resolution can
| get evicted.
|
| > We also added a special flag to the query that Polaris sends to
| the cache server. So, in the reply, Polaris would know whether
| the target cache server has seen and processed the cache
| invalidation event.
|
| To make special flag work, cache server need to track not only
| current version state, but also past versions. If it tracks past
| versions, then conflicts can be resolved at the cache server
| level, but whole premise of article is that cache servers can't
| resolve conflicts by themselves.
| uvdn7 wrote:
| Great question!
|
| I will try to answer this one without too much implementation
| details.
|
| First of all, cache items can be evicted and are not durable,
| which is just a fact. But that doesn't mean we can't track
| progress (in terms of "time" or "logical time"/"order" in
| distributed system terms). The social graph is sharded (not
| surprisingly). We can keep progress of each cache host per
| shard, which is just a map kept in memory, which doesn't get
| evicted.
|
| I hope this answers your question.
| isodev wrote:
| yodsanklai wrote:
| Totally off-topic, but...
|
| https://www.facebook.com/help/152637448140583
|
| > No, we don't sell your information. Instead, based on the
| information we have, advertisers and other partners pay us to
| show you personalized ads on the Facebook family of apps and
| technologies.
|
| A lot can be said about Meta but I find it counterproductive to
| make up facts. It's not correct that they "sell personal
| details".
| gtirloni wrote:
| Your comment does not contribute anything useful to this
| technical subject.
| uvdn7 wrote:
| I am the author of the blog post. I believe the methodology
| described should be applicable to most if not all invalidation-
| based caches. I am serious when I say that cache invalidation
| might no longer be a hard thing in computer science. AMA!
| robmccoll wrote:
| Saying a problem isn't hard to solve because you have tools to
| analyze the correctness of your solution seems like a stretch.
| uvdn7 wrote:
| But that's not what I am saying though ...
|
| > you have tools to analyze the correctness of your solution
|
| That's half of it. Cache invalidation is hard not only
| because of the complexity of cache coherence protocols, some
| of which is not very complicated. But cache invalidation does
| introduce countless races that manage to introduce cache
| inconsistencies in ways that are just hard to imagine ahead
| of time (in my experience). IMO, that's the harder part of
| cache invalidation - when cache inconsistencies happen,
| answering the "why" question is much harder than having a
| cache invalidation protocol (you can have TLA+ for one if you
| will). And answering the "why my cache is inconsistent" is
| the problem we solved, which I think is the harder part of
| the cache invalidation problem.
| latchkey wrote:
| > answering the "why my cache is inconsistent" is the
| problem we solved
|
| That should be the title and focus of your post. Instead,
| it feels like grandiose claims about solving cache
| invalidation itself.
| uvdn7 wrote:
| > Instead, it feels like grandiose claims about solving
| cache invalidation itself.
|
| That is definitely something I worried about. But at the
| same time, I do think we solved the harder part of the
| cache invalidation problem. TAO and Memcache serves
| quadrillions queries a day. Based on our experience,
| answering the question of "why my cache is inconsistent"
| is the hardest thing about cache invalidation.
|
| Cache invalidation protocols can be complicated, but some
| are pretty managable. You can also verify it using TLA+
| if you will. But once the rubber hits the road, some
| cache entries tend to be inconsistent.
|
| I definitely worry about people taking this the wrong
| way, but at the same time, I stand by the claim of "cache
| invalidation might no longer be a hard thing in computer
| science".
| latchkey wrote:
| The jist of what I got from the post was that you created
| a tool which monitors caches and that helped find bugs
| that cause inconsistent cache issues. How is that solving
| a computer science problem?
| uvdn7 wrote:
| The tool that monitors cache consistency is easy to
| build. That by itself doesn't solve anything major. The
| most important contribution is a novel approach on
| consistency tracing that helps find out "why" caches are
| inconsistent - pinpoint a bug is much harder than saying
| "there is a bug". This is based on the key insight that a
| cache inconsistency can only be introduced (for an
| invalidation-based cache) in a short time window after
| the write/mutation, this is what makes the tracing
| possible.
|
| > How is that solving a computer science problem?
|
| It depends on your definition of a computer science
| problem. I am definitely not solving P = NP. By your
| definition, does Google's Paxos Made Live paper solve a
| computer science problem?
|
| The claim is more a play on Phil Karlton's quote, as the
| work here makes cache invalidation much easier (in my
| opinion). Also Phil Karlton's quote doesn't necessarily
| _make_ a problem a computer science problem, don't you
| think? I think it's a good quote and there's a lot of
| truth in it.
| jitl wrote:
| How do you handle caching derived data assembled from multiple
| individual records? For example, how would you maintain a cache
| for a query like getFriendsOfFriends(userId)?
|
| My context is working on Notion's caches for page data. Our
| pages are made out of small units called "blocks" that store
| pointers to their child content. To serve all the blocks needed
| to render a page, we need to do a recursive traversal of the
| block tree. We have an inconsistent cache of "page chunks"
| right now, how would you think about making a consistent cache?
| uvdn7 wrote:
| This is fantastic question! We face something very similar
| (if not identical).
|
| There are two parts to solve this problem 1. monitor and
| measure how consistent the cache is 2. figure out why they
| are inconsistent
|
| I will focus on #1 in this comment. You can build something
| very similar to Polaris (mentioned in the blog) that - tails
| your database's binlog so it knows when e.g. "friendship"
| data is mutated - it can then perform the computation to
| figure out which cache entries "should have been" updated.
| E.g. if Alice just friended Bob, then Alice's friends-of-
| friends and Bob's friends-of-friends should reflect the
| change. And your monitoring service will "observe" that and
| alert on anomalies
| ahahahahah wrote:
| So if you just implement the cache invalidation logic in
| your monitoring system, you can tell if you got the cache
| invalidation logic correct in your caching system. That
| sounds really helpful!
| uvdn7 wrote:
| That's not the case though. Polaris acts as a client and
| only monitors client observable effect, and assumes no
| knowledge of the server internals.
|
| I am trying to help.
| continuational wrote:
| With the risk of stating the obvious - the hard part of cache
| invalidation is to know _when_ to invalidate the cache.
| uvdn7 wrote:
| We invalidate cache upon mutations. When else would you do
| it?
| continuational wrote:
| Please see https://news.ycombinator.com/item?id=31672541
| rajesh-s wrote:
| Off topic but what tool did you use to create those cache
| hierarchy diagrams?
| simonw wrote:
| Is your argument here that cache invalidation may no longer be
| hard because you can implement systems like Polaris which
| continually test your cache implementation to try and catch
| invalidation problems so you can then go and fix them?
|
| EDIT: Already answered in this reply:
| https://news.ycombinator.com/item?id=31671794
| uvdn7 wrote:
| Polaris is actually fairly simple to build. The harder
| question to answer is "why cache is inconsistent" and how you
| debug. The second half of the post talks about consistency
| tracing, which tracks all cache data state mutations.
|
| Distributed systems are state machines, with consistency
| tracing keeping track of all the state transitions, debugging
| cache inconsistencies in an invalidation-based cache is very
| actionable and managable based on our experience.
| politician wrote:
| I read the article. It seems to be suggesting that cache
| invalidation might no longer be a hard thing in computer
| science because of the insights uncovered by your tracing and
| observability solution. IOW, now that you can measure and audit
| the system, you're able to find and fix bugs in the system. Is
| that the correct take-away?
| uvdn7 wrote:
| Yep. You're exactly right. And the approach is generic and I
| think it should work for everyone.
|
| The idea is that with all these observability capabilities,
| debugging cache inconsistencies is getting very actionable
| and close to how we debug an error with message telling us
| exactly where an exception happened.
| ArrayBoundCheck wrote:
| I didn't understand the tracing part. Is tracing 100% inside of
| polaris? If not does it start at the database? The first cache?
|
| Does the invalidation need to be predictable before you can use
| tracing? What kind of parameters do you have so you don't have
| too much logging or too little?
| uvdn7 wrote:
| Tracing is not in polaris. It's a separate library that runs
| in every cache host. Its main job is logging every cache
| state mutation for traced writes. So when polaris detects
| cache inconsistencies, we know what happened and why cache is
| inconsistent.
|
| It starts all the way from client initiated write (where we
| mint a unique id, and plumb it all the way through).
|
| > What kind of parameters do you have so you don't have too
| much logging or too little?
|
| This is the key question! If you go to the second half of the
| post, it talks about an insight about how we managed to log
| only when and where cache inconsistencies _can_ be
| introduced. There's only a small window after mutation/write
| where cache can become inconsistent due to invalidation. So
| it's very cheap to trace and provide the information we need.
| [deleted]
| dang wrote:
| The submitted title ("Cache invalidation might no longer be a
| hard thing in Computer Science") broke the site guidelines badly.
| They say:
|
| " _Please use the original title, unless it is misleading or
| linkbait; don 't editorialize._" -
| https://news.ycombinator.com/newsguidelines.html
|
| Editorializing the original title to make it _more_ linkbait is
| going the wrong way down a one-way street. Please don 't.
|
| (If you're the author, then please either use the original title
| or, if it's linkbait, change the HN title to make it less so.)
| gjs278 wrote:
| bigbillheck wrote:
| As far as I'm concerned I haven't seen any change.
| uvdn7 wrote:
| Accepted. Just to be clear, it's not a link bait as far as I
| understand it for two reasons 1. I am dead serious about making
| cache invalidation a simpler problem 2. "Cache invalidation
| might no longer be a hard problem in Computer Science" is the
| subtitle of the corresponding systems @scale talk I just gave.
|
| I respect the guidelines and I am OK with the rename. I just
| want to clarify that I don't think the old title is misleading.
| PKop wrote:
| It's still massively click-bait until such time as the claim
| isn't pure speculation. Also, casually optimistic claims
| about complex and hard problems are the essence of click-
| bait. There isn't much unique about _your_ instance of it
| uvdn7 wrote:
| > until such time as the claim isn't pure speculation
|
| Which I don't think it is.
|
| > casually optimistic claims about complex and hard
| problems are the essence of click-bait
|
| A group of people at Facebook worked on this for many
| years. TAO and Memcache serves quadrillions of queries a
| day. I wanted to make this claim - cache invalidation might
| no longer be a hard problem in Computer Science - years
| ago; but I didn't because I don't want to jump the gun. We
| baked the solution for years. There's nothing casual about
| this.
| pvg wrote:
| Just about all papers and write-ups that describe big
| advances on important problems aren't titled 'Big advance
| in Problem X' or 'X now an ex-problem'.
| uvdn7 wrote:
| This is a fair point. Lesson learned. Thank you :D
| o_____________o wrote:
| Small comment of appreciation for your attitude and quick
| adjustment, nice to see on the internet!
| tomcam wrote:
| Agreed. Gracefully handled. I happened to agree with the
| arguments, but treasure civil discourse.
| AtNightWeCode wrote:
| So basically, classic timestamp versioning with some consistency
| checking. Might work. Cache invalidation is hard. The only
| problem I ever faced while working in the biz that I could not
| solve even at theory level was cache related.
|
| What I thought the article would tackle is the dependencies
| between stored objects. Some solve it with super complicated
| graphs others by just flushing the entire cache.
|
| At Meta, why not just use flat files or a store that act as one,
| pubsub the changes, listen on the changes and update aggregated
| views as needed and store them. Then just short-term cache
| everything on the edge.
| uvdn7 wrote:
| > What I thought the article would tackle is the dependencies
| between stored objects.
|
| Like tracking dependencies, so it knows what to invalidate on
| mutations?
| AtNightWeCode wrote:
| Yes. The Boss changes his name from Mark. It will directly
| hit a lot of caches. It will need an update of all bosses
| that materialized this. Then all the staff that materialized
| the boss of the boss and so on. This is simple example
| because it is a tree. This can be circular dependencies as
| well.
___________________________________________________________________
(page generated 2022-06-08 23:00 UTC)