[HN Gopher] CAP twelve years later: How the "rules" have changed...
___________________________________________________________________
CAP twelve years later: How the "rules" have changed (2012)
Author : thunderbong
Score : 76 points
Date : 2024-01-20 10:48 UTC (1 days ago)
(HTM) web link (www.infoq.com)
(TXT) w3m dump (www.infoq.com)
| dang wrote:
| Related:
|
| _CAP Twelve Years Later: How the "Rules" Have Changed (2012)_ -
| https://news.ycombinator.com/item?id=10179628 - Sept 2015 (4
| comments)
|
| _CAP Twelve Years Later: How the "Rules" Have Changed_ -
| https://news.ycombinator.com/item?id=4043260 - May 2012 (3
| comments)
| samus wrote:
| Have they changed again in the last 12 years apart from the
| higher importance of cloud solutions and privacy legislation?
| mirekrusin wrote:
| Didn't change, CAP was flawed from the very beginning, you need
| more phrases [0] to classify those systems.
|
| [0] https://jepsen.io/consistency
| vinay_ys wrote:
| At the hardware level, one thing that has changed over the years
| is the exponential growth in network speeds and feeds - within a
| datacenter and across datacenters.
|
| Layer-3 IP Clos networks are pretty much a standard within a
| datacenter, making multiple redundant paths available. Across
| datacenters, optical circuit switched networks with lots of
| redundant paths are very common.
|
| Modern network fabrics also have end-to-end mechanisms for
| bandwidth engineering based on per message QoS/priority set by
| applications as well as very accurate time-keeping.
|
| Observability mechanisms and operational practices have also
| tremendously improved. A single node becoming unreachable from
| all other nodes is quite possible. But a single node being
| reachable to some subset of other nodes in the cluster and not
| others is nearly impossible. Only way a network partition can
| happen is due to misconfiguration. Purely layer-3 IP clos
| networks with BGP advertised multiple ECMP routes are a lot
| simpler than layer-2 networks of the past. With modern switch
| operating systems, modern devops practices, Infrastructure-as-
| Code etc. misconfiguration is nearly impossible.
|
| At a higher layer, disaggregated storage and compute
| architectures are standard for any distributed database system.
| In this model, storage fabric is quite simple and much more
| distributed and redundant. Today, it is a lot easier to guarantee
| a consistent and highly available transactional write to a quorum
| of simple storage nodes spread across multiple datacenters in a
| region. On top of that, building higher-level isolation
| guaranteeing database transactions in a tier of ephemeral compute
| nodes is relatively much simpler.
|
| This kind of system is extremely resilient to failures at all
| levels - node failures are detected and lost processing capacity
| recovered in seconds; failed transactions are retried in
| milliseconds; loss of redundancy due to failed storage nodes are
| rebuilt in seconds; loss of network capacity due to failed
| circuits or intermediate switches are routed around in
| microseconds.
|
| So, in today's context, we build databases that provide flexible
| transaction isolation guarantees that is selected appropriately
| on a per transaction basis by the application. Application also
| specifies the latency budget it has available per transaction.
| With all these improvements, thankfully, application developers
| don't have worry about CAP theorem like they had to in the
| heydays of NoSQL databases in 2012.
| chx wrote:
| Could you provide links to these things for those who are stuck
| in the past and want to learn more?
| vinay_ys wrote:
| Well, there's no one comprehensive link I can suggest. But
| you can find plenty of links on arxiv.org for research
| publications on these topics. You can look at the ones from
| big companies like Google, Microsoft and Meta/Facebook. You
| can also search in Youtube for various conference talks on
| the same topics.
| vendiddy wrote:
| Thanks for the write-up. Was interesting to read!
| belter wrote:
| This is a variation on Google's argument that "our network is
| so resilient we don't need to worry about Partitions so we can
| offer C and A and P" but is a weak one, more and more on the
| face of real present experiences.
|
| Due to the quorum requirements, probability of failure
| increases with larger quorum sizes, as the number of
| intersections (and therefore potential points of failure)
| increases with more complex quorum arrangements.
| vinay_ys wrote:
| > Due to the quorum requirements, probability of failure
| increases with larger quorum sizes
|
| Well, that's not true due to two key design points:
|
| - Size of the read or write quorum can be chosen on a per
| transaction basis depending on what level of availability and
| consistency you want. The design goal is to lower the
| likelyhood of the minimum number of quorum members you need
| for your transaction will not be available.
|
| - Given the transaction processing nodes are ephemeral and
| independent from storage nodes, if a transaction processing
| node which is designated as a member of a quorum dies, it can
| be quickly and instantly replaced with a new one and all in-
| flight operations will be retried by the smart clients.
| belter wrote:
| While all those design strategies are indeed valid, and
| adaptable dynamic quorum sizes, have the potential to
| manage certain trade-offs, the practical implementation of
| such systems presents such a challenges, due to the
| inherent complexities of distributed systems
| that...Tackling these complexities with those strategies
| should be approached with a degree of humility.
|
| Overheads and latency issues in node replacement,
| particularly under conditions of unexpected high load or
| network difficulties, compound these challenges. These
| issues often manifest in correlated failures, which are
| more common than isolated node failures.
|
| In this landscape we are still in compromise and trade-offs
| territory. I would refer to these two papers as insightful
| demonstrations of these challenges:
|
| "Consistency vs. Availability in Distributed Real-Time
| Systems" - https://ar5iv.labs.arxiv.org/html/2301.08906v1
|
| "Consistency models in distributed systems: A survey on
| definitions, disciplines, challenges and applications" -
| https://ar5iv.labs.arxiv.org/html/1902.03305
| dalyons wrote:
| What database systems are you aware of that take advantage of
| these modern network advances? Sounds like spanner , but any
| others?
| SJC_Hacker wrote:
| This may not hold across geographic boundaries, which is the
| real problem. Like between Asia, Europe, and North America. Its
| not so much an a complete outage problem, as it is about
| bandwidth dropping very low on occassion.
|
| When you're talking about say, high frequency trading systems,
| this is a big deal.
|
| And anyone who deploys a service with all their data in a
| single data center, is a pretty small time player, or they
| don't need the uptime.
| Lucasoato wrote:
| I'm still trying to understand if the lack of the PutIfAbsent API
| in S3 (in other words, the ability to put an item only if no
| other item is present) is a consequence of the strong consistency
| they offer in terms of Read after Write. Is this trade-off a
| corollary of the Cap theorem?
|
| That PutIfAbsent Api would solve so many problems for metadata
| formats used in the BigData field like Delta or Iceberg!
| bostik wrote:
| Possibly a side effect from S3's eventual consistency model.
| Until couple of years ago there was no guarantee that an object
| you had written to S3 would be visible to another process
| looking for it.
|
| About two years ago AWS made a big announcement that they could
| finally guarantee "read your own writes" consistency model. But
| ONLY within a single account.
|
| If you know that you are not racing against eventual
| consistency, you can use HeadObject API call to check whether a
| given key exists in the bucket or not.
| hasty_pudding wrote:
| I always thought CAP was weird because you can have both
| consistency and availability if you sync faster than requests
| come in to the system.
| mcny wrote:
| Like a spherical cow, there are some assumptions we need to
| make in a 101 class. For example, I remember my database
| professor saying that I need to assume my entire database does
| not fit in memory (RAM) because if it does, all bets are off.
| naniwaduni wrote:
| If you assume you can "sync faster than requests come in to the
| system", you've necessarily chosen C not-P, i.e. you don't have
| a distributed system (and in practice you've probably chosen
| not-A too).
| j16sdiz wrote:
| CAP is a false choice. In practice, people choose (mostly
| C)+(mostly A)+(mostly P).
| woooooo wrote:
| "Mostly consistent" is not good enough for many
| applications.
|
| The thing most people pick in practice is CA, single master
| with hopefully a replicated backup.
| seneca wrote:
| Absolutely. And that's the right choice IF you can make
| it. Switching to CP or AP is something you should do
| because you're forced to.
|
| That's something lost on a lot of modern engineers, I
| think. Distributed Systems are something you do because
| you have to, not because they're the best approach
| inherently.
| dartos wrote:
| That's because for many use cases, you don't need to be
| perfect.
|
| The theorem is about being perfectly C, A, and P
| hndc wrote:
| You've misunderstood the theorem. It doesn't say you can't
| achieve any combination of mostly C/A/P; it says you can't
| have perfect C, A, and P at the same time.
| hasty_pudding wrote:
| if you choose consistency than the only negative result of a
| distributed system is consistency based latency. i.e. Time to
| become consistent which reduces your response time to the
| client.
|
| so there's a linear relationship between reducing sync time
| and reducing response time until at some point in reducing
| sync time you reduce response time below a level you give a
| shit about...
|
| and you have all 3 of consistency, availability, and
| partitioning
|
| and then you have beaten the CAP theory
| flqn wrote:
| In this case you are assuming not to have P. The traditional
| idea of "pick 2 of 3" with respect to CAP is weirdly formulated
| because you either have a distributed system, in which case P
| is a given and you must trade C vs A, or you don't have a
| distributed system and can therefore have both C and A.
| AtlasBarfed wrote:
| The biggest fallacy of distributed systems: the network is
| always up.
|
| You'd think that would be blindingly obvious, but based on
| every system in existence ... It's not.
|
| Primarily I think it is marketing/sales since thinking properly
| about distributed systems is hard and executives don't want to
| hear about it.
| slashdev wrote:
| CAP really should have been formulated as:
|
| When too many nodes fail or are unreachable, do you sacrifice
| consistency, or availability?
|
| And even then it's misleading, because you can still be
| consistent and partially sacrifice availability (allow local or
| mergable updates) or be available but partially consistent (allow
| updates, but have potential conflicts to resolve, maybe manually,
| when the partition is healed.)
|
| You can even make different A <-> C tradeoffs per
| request/transaction.
|
| Distributed system are complex, who knew?
| eimrine wrote:
| What sacrifies Bitcoin? It seems to be always available and
| always consistent.
| Rexxar wrote:
| Partition
| dartos wrote:
| It's not consistent.
|
| It takes time for all nodes to come to agreement and in that
| time I am able to request _technically_ out of date info from
| a node.
|
| Consistency is often the one CAP that isn't prioritized
| because if the network isn't consistent for a few seconds,
| the application probably still works and that's the case for
| bitcoin.
|
| The network is "eventually consistent"
| amluto wrote:
| > When too many nodes fail or are unreachable
|
| Don't forget the nasty bit about partitions: if I think nodes
| 1-5 are reachable and the rest are unreachable, I can't assume
| that the rest of the nodes are down or idle -- some other
| client may think that nodes 1-5 are unreachable but nodes 6-15
| are reachable.
| slashdev wrote:
| That's correct
| sethammons wrote:
| Related: A Critique of CAP Theorem by Martin Kleppman
|
| https://arxiv.org/abs/1509.05393
|
| TL;DR: terms in CAP are not adequately defined and he proposes a
| delay-sensitive framework.
| belter wrote:
| This paper takes the idea further and formalizes it better:
| https://ar5iv.labs.arxiv.org/html/2301.08906v1
| leashless wrote:
| I remember about 2017 or 2018 giving a talk about Ethereum and
| asking how many people in the audience of maybe 400 had heard of
| the CAP theorem.
|
| One hand went up in the back. That's when I knew our culture was
| doomed.
|
| CAP is (for the record) why the blockchain is and is not "just a
| database" -- it's a database which takes a different position on
| CAP and as a result can be used for different things to other
| databases.
| tomhallett wrote:
| Would love to learn more. Guess: Is the "different position"
| that consistency can be achieved when a fault occurs by having
| the longest chain, instead of a leader/voting mechanism?
| leashless wrote:
| "Eventual consistency"
|
| 1) there is a network
|
| 2) on each block, a leader is selected to make the next block
| on the basis of a lottery -- each lottery ticket is a unit of
| computing done ("proof of work")
|
| 3) if the network splits, each subnet has a different leader
| and a different sequence of blocks from that point onwards
|
| 4) when the network comes back together, a new "agreed
| history" is formed from the last shared block, selecting the
| block with the most "work" done
|
| and resulting in the transactions which were done and
| confirmed in the smaller subnet being thrown out and history
| rewritten as if they had never existed
|
| This is the genius of Satoshi -- he figured out that
| something this trashy was good _enough_ to get something
| done. Nobody in an academic setting would ever have
| considered that as a worthwhile solution I don 't think.
|
| It was madness. But it worked. It's the same kind of genius
| as Ethernet just rebroadcasting packets until something works
| rather than doing Token Ring type negotiation about who can
| talk next.
|
| Worse really is better at times.
| SJC_Hacker wrote:
| The rules haven't changed, its just that its always about the
| details.
|
| Most services, short of financial ones, choose to sacrifice
| consistency for availabilty. Example being social networks, it
| doesn't matter if I get "inconsistent" social media feeds
| depending on where I am because my European friends posts haven't
| had time to propogate across the database shard.
|
| OTOH financial systems, you best believe they are going to choose
| consistency 9 times out of 10. Exceptions being being very small
| amounts. I don't want someone to exploit network/node downtime in
| order to steal millions of dollars.
|
| But the reality is that network outages, rarely last very long.
| Then its just a matter of making sure the nodes themselves are
| reliable.
| marcosdumay wrote:
| Financial systems have been exchanging coerence for
| availability for milenia.
|
| It's just recently that anybody started to care about
| instantaneus coerence, instead of eventual.
| erikerikson wrote:
| > coerence
|
| Did you mean coherence? As in consistency?
|
| I'd have autocorrected in reading except that you wrote it
| twice which implies intention. However a quick search didn't
| turn up a definition.
| marcosdumay wrote:
| Yep, English is not my first language.
| erikerikson wrote:
| No problem, thanks for learning it.
| SJC_Hacker wrote:
| Like I said, depends on the amount. But I remember the days
| when you had to wait sometimes several days for a check to
| clear before the funds were available in your account.
|
| Of course you could always write bad checks - but payee was
| on the hook for the fraud, not the bank ...
| jasonwatkinspdx wrote:
| There are situations that social networks need to worry about,
| consistency wise. Imagine a newly battered spouse unfriends
| their partner before making a post explaining why they're
| getting a divorce. If these two events get mixed up the
| perpetrator can end up reading the post.
|
| Also, ironically enough, most of the global financial system
| actually operates very inconsistently. It can take days for
| transactions to clear, and there's a bazillion quirks based on
| local law and bank policy. So in practice banks use double
| entry accounting with reconciliation. If something got messed
| up they issue a new compensating transaction.
| senorrib wrote:
| The most misunderstood theorem of all time.
| belter wrote:
| Please be more specific.
| efitz wrote:
| I don't think it should have been called "CAP" theorem. I think
| of it as "CAL" (consistency-availability-latency). It has always
| struck me that the P in CAP is different than the other two
| characteristics. C and A are continuous values, but P is binary
| (with a duration presumably).
|
| The "P" in CAP means "tolerance of partition events". But that's
| really not what most engineering teams are thinking about when
| they talk about CAP theorem. Teams that really want tolerance to
| network partitions are really talking about building parallel
| instances, for example, cloud providers implementing multiple
| regions. If they need some level of consistency across those
| instances, then really, it becomes a discussion about latency.
|
| If you think about it, what you're really talking about with the
| network partition events is how stale or latent the data on each
| side will become, and whether or not, it will become consistent
| again, particularly without human intervention.
|
| When you talk about it in terms of CAL, now all the properties
| are continuous, and you can discuss the trade-offs to each
| property with respect to network partitions.
| belter wrote:
| https://ar5iv.labs.arxiv.org/html/2301.08906v1 - "...We have
| extended the CAP theorem to relate quantitative measures of
| these two properties to quantitative measures of communication
| and computation latency (L), obtaining a relation called the
| CAL theorem that is linear in a max-plus algebra. This paper
| shows how to use the CAL theorem in various ways to help design
| real-time systems..."
| efitz wrote:
| Thank you for the link. I'm glad other people are thinking
| this way, and with more formalism.
___________________________________________________________________
(page generated 2024-01-21 23:02 UTC)