[HN Gopher] CAP twelve years later: How the "rules" have changed...
       ___________________________________________________________________
        
       CAP twelve years later: How the "rules" have changed (2012)
        
       Author : thunderbong
       Score  : 76 points
       Date   : 2024-01-20 10:48 UTC (1 days ago)
        
 (HTM) web link (www.infoq.com)
 (TXT) w3m dump (www.infoq.com)
        
       | dang wrote:
       | Related:
       | 
       |  _CAP Twelve Years Later: How the "Rules" Have Changed (2012)_ -
       | https://news.ycombinator.com/item?id=10179628 - Sept 2015 (4
       | comments)
       | 
       |  _CAP Twelve Years Later: How the "Rules" Have Changed_ -
       | https://news.ycombinator.com/item?id=4043260 - May 2012 (3
       | comments)
        
       | samus wrote:
       | Have they changed again in the last 12 years apart from the
       | higher importance of cloud solutions and privacy legislation?
        
         | mirekrusin wrote:
         | Didn't change, CAP was flawed from the very beginning, you need
         | more phrases [0] to classify those systems.
         | 
         | [0] https://jepsen.io/consistency
        
       | vinay_ys wrote:
       | At the hardware level, one thing that has changed over the years
       | is the exponential growth in network speeds and feeds - within a
       | datacenter and across datacenters.
       | 
       | Layer-3 IP Clos networks are pretty much a standard within a
       | datacenter, making multiple redundant paths available. Across
       | datacenters, optical circuit switched networks with lots of
       | redundant paths are very common.
       | 
       | Modern network fabrics also have end-to-end mechanisms for
       | bandwidth engineering based on per message QoS/priority set by
       | applications as well as very accurate time-keeping.
       | 
       | Observability mechanisms and operational practices have also
       | tremendously improved. A single node becoming unreachable from
       | all other nodes is quite possible. But a single node being
       | reachable to some subset of other nodes in the cluster and not
       | others is nearly impossible. Only way a network partition can
       | happen is due to misconfiguration. Purely layer-3 IP clos
       | networks with BGP advertised multiple ECMP routes are a lot
       | simpler than layer-2 networks of the past. With modern switch
       | operating systems, modern devops practices, Infrastructure-as-
       | Code etc. misconfiguration is nearly impossible.
       | 
       | At a higher layer, disaggregated storage and compute
       | architectures are standard for any distributed database system.
       | In this model, storage fabric is quite simple and much more
       | distributed and redundant. Today, it is a lot easier to guarantee
       | a consistent and highly available transactional write to a quorum
       | of simple storage nodes spread across multiple datacenters in a
       | region. On top of that, building higher-level isolation
       | guaranteeing database transactions in a tier of ephemeral compute
       | nodes is relatively much simpler.
       | 
       | This kind of system is extremely resilient to failures at all
       | levels - node failures are detected and lost processing capacity
       | recovered in seconds; failed transactions are retried in
       | milliseconds; loss of redundancy due to failed storage nodes are
       | rebuilt in seconds; loss of network capacity due to failed
       | circuits or intermediate switches are routed around in
       | microseconds.
       | 
       | So, in today's context, we build databases that provide flexible
       | transaction isolation guarantees that is selected appropriately
       | on a per transaction basis by the application. Application also
       | specifies the latency budget it has available per transaction.
       | With all these improvements, thankfully, application developers
       | don't have worry about CAP theorem like they had to in the
       | heydays of NoSQL databases in 2012.
        
         | chx wrote:
         | Could you provide links to these things for those who are stuck
         | in the past and want to learn more?
        
           | vinay_ys wrote:
           | Well, there's no one comprehensive link I can suggest. But
           | you can find plenty of links on arxiv.org for research
           | publications on these topics. You can look at the ones from
           | big companies like Google, Microsoft and Meta/Facebook. You
           | can also search in Youtube for various conference talks on
           | the same topics.
        
         | vendiddy wrote:
         | Thanks for the write-up. Was interesting to read!
        
         | belter wrote:
         | This is a variation on Google's argument that "our network is
         | so resilient we don't need to worry about Partitions so we can
         | offer C and A and P" but is a weak one, more and more on the
         | face of real present experiences.
         | 
         | Due to the quorum requirements, probability of failure
         | increases with larger quorum sizes, as the number of
         | intersections (and therefore potential points of failure)
         | increases with more complex quorum arrangements.
        
           | vinay_ys wrote:
           | > Due to the quorum requirements, probability of failure
           | increases with larger quorum sizes
           | 
           | Well, that's not true due to two key design points:
           | 
           | - Size of the read or write quorum can be chosen on a per
           | transaction basis depending on what level of availability and
           | consistency you want. The design goal is to lower the
           | likelyhood of the minimum number of quorum members you need
           | for your transaction will not be available.
           | 
           | - Given the transaction processing nodes are ephemeral and
           | independent from storage nodes, if a transaction processing
           | node which is designated as a member of a quorum dies, it can
           | be quickly and instantly replaced with a new one and all in-
           | flight operations will be retried by the smart clients.
        
             | belter wrote:
             | While all those design strategies are indeed valid, and
             | adaptable dynamic quorum sizes, have the potential to
             | manage certain trade-offs, the practical implementation of
             | such systems presents such a challenges, due to the
             | inherent complexities of distributed systems
             | that...Tackling these complexities with those strategies
             | should be approached with a degree of humility.
             | 
             | Overheads and latency issues in node replacement,
             | particularly under conditions of unexpected high load or
             | network difficulties, compound these challenges. These
             | issues often manifest in correlated failures, which are
             | more common than isolated node failures.
             | 
             | In this landscape we are still in compromise and trade-offs
             | territory. I would refer to these two papers as insightful
             | demonstrations of these challenges:
             | 
             | "Consistency vs. Availability in Distributed Real-Time
             | Systems" - https://ar5iv.labs.arxiv.org/html/2301.08906v1
             | 
             | "Consistency models in distributed systems: A survey on
             | definitions, disciplines, challenges and applications" -
             | https://ar5iv.labs.arxiv.org/html/1902.03305
        
         | dalyons wrote:
         | What database systems are you aware of that take advantage of
         | these modern network advances? Sounds like spanner , but any
         | others?
        
         | SJC_Hacker wrote:
         | This may not hold across geographic boundaries, which is the
         | real problem. Like between Asia, Europe, and North America. Its
         | not so much an a complete outage problem, as it is about
         | bandwidth dropping very low on occassion.
         | 
         | When you're talking about say, high frequency trading systems,
         | this is a big deal.
         | 
         | And anyone who deploys a service with all their data in a
         | single data center, is a pretty small time player, or they
         | don't need the uptime.
        
       | Lucasoato wrote:
       | I'm still trying to understand if the lack of the PutIfAbsent API
       | in S3 (in other words, the ability to put an item only if no
       | other item is present) is a consequence of the strong consistency
       | they offer in terms of Read after Write. Is this trade-off a
       | corollary of the Cap theorem?
       | 
       | That PutIfAbsent Api would solve so many problems for metadata
       | formats used in the BigData field like Delta or Iceberg!
        
         | bostik wrote:
         | Possibly a side effect from S3's eventual consistency model.
         | Until couple of years ago there was no guarantee that an object
         | you had written to S3 would be visible to another process
         | looking for it.
         | 
         | About two years ago AWS made a big announcement that they could
         | finally guarantee "read your own writes" consistency model. But
         | ONLY within a single account.
         | 
         | If you know that you are not racing against eventual
         | consistency, you can use HeadObject API call to check whether a
         | given key exists in the bucket or not.
        
       | hasty_pudding wrote:
       | I always thought CAP was weird because you can have both
       | consistency and availability if you sync faster than requests
       | come in to the system.
        
         | mcny wrote:
         | Like a spherical cow, there are some assumptions we need to
         | make in a 101 class. For example, I remember my database
         | professor saying that I need to assume my entire database does
         | not fit in memory (RAM) because if it does, all bets are off.
        
         | naniwaduni wrote:
         | If you assume you can "sync faster than requests come in to the
         | system", you've necessarily chosen C not-P, i.e. you don't have
         | a distributed system (and in practice you've probably chosen
         | not-A too).
        
           | j16sdiz wrote:
           | CAP is a false choice. In practice, people choose (mostly
           | C)+(mostly A)+(mostly P).
        
             | woooooo wrote:
             | "Mostly consistent" is not good enough for many
             | applications.
             | 
             | The thing most people pick in practice is CA, single master
             | with hopefully a replicated backup.
        
               | seneca wrote:
               | Absolutely. And that's the right choice IF you can make
               | it. Switching to CP or AP is something you should do
               | because you're forced to.
               | 
               | That's something lost on a lot of modern engineers, I
               | think. Distributed Systems are something you do because
               | you have to, not because they're the best approach
               | inherently.
        
             | dartos wrote:
             | That's because for many use cases, you don't need to be
             | perfect.
             | 
             | The theorem is about being perfectly C, A, and P
        
             | hndc wrote:
             | You've misunderstood the theorem. It doesn't say you can't
             | achieve any combination of mostly C/A/P; it says you can't
             | have perfect C, A, and P at the same time.
        
           | hasty_pudding wrote:
           | if you choose consistency than the only negative result of a
           | distributed system is consistency based latency. i.e. Time to
           | become consistent which reduces your response time to the
           | client.
           | 
           | so there's a linear relationship between reducing sync time
           | and reducing response time until at some point in reducing
           | sync time you reduce response time below a level you give a
           | shit about...
           | 
           | and you have all 3 of consistency, availability, and
           | partitioning
           | 
           | and then you have beaten the CAP theory
        
         | flqn wrote:
         | In this case you are assuming not to have P. The traditional
         | idea of "pick 2 of 3" with respect to CAP is weirdly formulated
         | because you either have a distributed system, in which case P
         | is a given and you must trade C vs A, or you don't have a
         | distributed system and can therefore have both C and A.
        
         | AtlasBarfed wrote:
         | The biggest fallacy of distributed systems: the network is
         | always up.
         | 
         | You'd think that would be blindingly obvious, but based on
         | every system in existence ... It's not.
         | 
         | Primarily I think it is marketing/sales since thinking properly
         | about distributed systems is hard and executives don't want to
         | hear about it.
        
       | slashdev wrote:
       | CAP really should have been formulated as:
       | 
       | When too many nodes fail or are unreachable, do you sacrifice
       | consistency, or availability?
       | 
       | And even then it's misleading, because you can still be
       | consistent and partially sacrifice availability (allow local or
       | mergable updates) or be available but partially consistent (allow
       | updates, but have potential conflicts to resolve, maybe manually,
       | when the partition is healed.)
       | 
       | You can even make different A <-> C tradeoffs per
       | request/transaction.
       | 
       | Distributed system are complex, who knew?
        
         | eimrine wrote:
         | What sacrifies Bitcoin? It seems to be always available and
         | always consistent.
        
           | Rexxar wrote:
           | Partition
        
           | dartos wrote:
           | It's not consistent.
           | 
           | It takes time for all nodes to come to agreement and in that
           | time I am able to request _technically_ out of date info from
           | a node.
           | 
           | Consistency is often the one CAP that isn't prioritized
           | because if the network isn't consistent for a few seconds,
           | the application probably still works and that's the case for
           | bitcoin.
           | 
           | The network is "eventually consistent"
        
         | amluto wrote:
         | > When too many nodes fail or are unreachable
         | 
         | Don't forget the nasty bit about partitions: if I think nodes
         | 1-5 are reachable and the rest are unreachable, I can't assume
         | that the rest of the nodes are down or idle -- some other
         | client may think that nodes 1-5 are unreachable but nodes 6-15
         | are reachable.
        
           | slashdev wrote:
           | That's correct
        
       | sethammons wrote:
       | Related: A Critique of CAP Theorem by Martin Kleppman
       | 
       | https://arxiv.org/abs/1509.05393
       | 
       | TL;DR: terms in CAP are not adequately defined and he proposes a
       | delay-sensitive framework.
        
         | belter wrote:
         | This paper takes the idea further and formalizes it better:
         | https://ar5iv.labs.arxiv.org/html/2301.08906v1
        
       | leashless wrote:
       | I remember about 2017 or 2018 giving a talk about Ethereum and
       | asking how many people in the audience of maybe 400 had heard of
       | the CAP theorem.
       | 
       | One hand went up in the back. That's when I knew our culture was
       | doomed.
       | 
       | CAP is (for the record) why the blockchain is and is not "just a
       | database" -- it's a database which takes a different position on
       | CAP and as a result can be used for different things to other
       | databases.
        
         | tomhallett wrote:
         | Would love to learn more. Guess: Is the "different position"
         | that consistency can be achieved when a fault occurs by having
         | the longest chain, instead of a leader/voting mechanism?
        
           | leashless wrote:
           | "Eventual consistency"
           | 
           | 1) there is a network
           | 
           | 2) on each block, a leader is selected to make the next block
           | on the basis of a lottery -- each lottery ticket is a unit of
           | computing done ("proof of work")
           | 
           | 3) if the network splits, each subnet has a different leader
           | and a different sequence of blocks from that point onwards
           | 
           | 4) when the network comes back together, a new "agreed
           | history" is formed from the last shared block, selecting the
           | block with the most "work" done
           | 
           | and resulting in the transactions which were done and
           | confirmed in the smaller subnet being thrown out and history
           | rewritten as if they had never existed
           | 
           | This is the genius of Satoshi -- he figured out that
           | something this trashy was good _enough_ to get something
           | done. Nobody in an academic setting would ever have
           | considered that as a worthwhile solution I don 't think.
           | 
           | It was madness. But it worked. It's the same kind of genius
           | as Ethernet just rebroadcasting packets until something works
           | rather than doing Token Ring type negotiation about who can
           | talk next.
           | 
           | Worse really is better at times.
        
       | SJC_Hacker wrote:
       | The rules haven't changed, its just that its always about the
       | details.
       | 
       | Most services, short of financial ones, choose to sacrifice
       | consistency for availabilty. Example being social networks, it
       | doesn't matter if I get "inconsistent" social media feeds
       | depending on where I am because my European friends posts haven't
       | had time to propogate across the database shard.
       | 
       | OTOH financial systems, you best believe they are going to choose
       | consistency 9 times out of 10. Exceptions being being very small
       | amounts. I don't want someone to exploit network/node downtime in
       | order to steal millions of dollars.
       | 
       | But the reality is that network outages, rarely last very long.
       | Then its just a matter of making sure the nodes themselves are
       | reliable.
        
         | marcosdumay wrote:
         | Financial systems have been exchanging coerence for
         | availability for milenia.
         | 
         | It's just recently that anybody started to care about
         | instantaneus coerence, instead of eventual.
        
           | erikerikson wrote:
           | > coerence
           | 
           | Did you mean coherence? As in consistency?
           | 
           | I'd have autocorrected in reading except that you wrote it
           | twice which implies intention. However a quick search didn't
           | turn up a definition.
        
             | marcosdumay wrote:
             | Yep, English is not my first language.
        
               | erikerikson wrote:
               | No problem, thanks for learning it.
        
           | SJC_Hacker wrote:
           | Like I said, depends on the amount. But I remember the days
           | when you had to wait sometimes several days for a check to
           | clear before the funds were available in your account.
           | 
           | Of course you could always write bad checks - but payee was
           | on the hook for the fraud, not the bank ...
        
         | jasonwatkinspdx wrote:
         | There are situations that social networks need to worry about,
         | consistency wise. Imagine a newly battered spouse unfriends
         | their partner before making a post explaining why they're
         | getting a divorce. If these two events get mixed up the
         | perpetrator can end up reading the post.
         | 
         | Also, ironically enough, most of the global financial system
         | actually operates very inconsistently. It can take days for
         | transactions to clear, and there's a bazillion quirks based on
         | local law and bank policy. So in practice banks use double
         | entry accounting with reconciliation. If something got messed
         | up they issue a new compensating transaction.
        
       | senorrib wrote:
       | The most misunderstood theorem of all time.
        
         | belter wrote:
         | Please be more specific.
        
       | efitz wrote:
       | I don't think it should have been called "CAP" theorem. I think
       | of it as "CAL" (consistency-availability-latency). It has always
       | struck me that the P in CAP is different than the other two
       | characteristics. C and A are continuous values, but P is binary
       | (with a duration presumably).
       | 
       | The "P" in CAP means "tolerance of partition events". But that's
       | really not what most engineering teams are thinking about when
       | they talk about CAP theorem. Teams that really want tolerance to
       | network partitions are really talking about building parallel
       | instances, for example, cloud providers implementing multiple
       | regions. If they need some level of consistency across those
       | instances, then really, it becomes a discussion about latency.
       | 
       | If you think about it, what you're really talking about with the
       | network partition events is how stale or latent the data on each
       | side will become, and whether or not, it will become consistent
       | again, particularly without human intervention.
       | 
       | When you talk about it in terms of CAL, now all the properties
       | are continuous, and you can discuss the trade-offs to each
       | property with respect to network partitions.
        
         | belter wrote:
         | https://ar5iv.labs.arxiv.org/html/2301.08906v1 - "...We have
         | extended the CAP theorem to relate quantitative measures of
         | these two properties to quantitative measures of communication
         | and computation latency (L), obtaining a relation called the
         | CAL theorem that is linear in a max-plus algebra. This paper
         | shows how to use the CAL theorem in various ways to help design
         | real-time systems..."
        
           | efitz wrote:
           | Thank you for the link. I'm glad other people are thinking
           | this way, and with more formalism.
        
       ___________________________________________________________________
       (page generated 2024-01-21 23:02 UTC)