hngopher.com

       [HN Gopher] Let's consign CAP to the cabinet of curiosities
       ___________________________________________________________________
        
       Let's consign CAP to the cabinet of curiosities
        
       Author : nalgeon
       Score  : 111 points
       Date   : 2024-07-25 14:52 UTC (8 hours ago)
        
 (HTM) web link (brooker.co.za)
 (TXT) w3m dump (brooker.co.za)
        
       | api wrote:
       | This is just saying because the cloud system hides the
       | implications of the theorem from you, it's not relevant.
       | 
       | I suppose it's kinda true in the sense that how to operate a
       | power plant is not relevant when I turn on my lights.
        
         | hinkley wrote:
         | We pay a lot of money to Amazon to not think about the 8
         | Fallacies as well.
         | 
         | You still get consequences for ignoring them, but they show up
         | as burn rate.
        
         | deathanatos wrote:
         | And if you actually use cloud systems long enough, you'll see
         | violations of the "C" in CAP in the form of results that don't
         | make any _sense_. So even in practice... it 's good to
         | understand, because it can and does matter.
        
       | sir-dingleberry wrote:
       | The CAP theorem is irrelevant if your acceptable response time is
       | greater than the time it takes your partitions to sync.
       | 
       | At that point you get all 3: consistency,availability,
       | partitioning.
       | 
       | In my opinion it should be the CAPR theorem.
        
         | jayzalowitz wrote:
         | and considering most cloud compute has a better backbone or
         | internal routing, userspace should see this less.
         | 
         | That being said, if this is truly a problem for you CRDB is
         | basically built with this all in mind.
        
           | sir-dingleberry wrote:
           | Wow have heard about CRDB but just now reading what it does.
           | 
           | That's an incredible piece of engineering.
        
         | pessimizer wrote:
         | What's the difference between choosing an acceptable response
         | time that is greater than the time it takes for your partitions
         | to sync and giving up on availability?
         | 
         | I don't think it makes sense to say that CAP doesn't apply if
         | you don't need consistency, availability, or tolerance to
         | partitions. CAP is entirely about the need to relax at least
         | one of those three to shore up the others.
        
         | anikdas wrote:
         | > The CAP theorem is also irrelevant if your acceptable
         | response time is greater than the time it takes your partitions
         | to sync. This is really an oversimplification. The more
         | important metric here is the delay between write and read of
         | the same data. Even in that case if when the system write load
         | is unpredictable it will definitely lead to high variance in
         | replication lag. The number of times I had to deal with a race
         | condition for not considering replication lag factor is more
         | than I would like to admit.
        
         | alanwreath wrote:
         | I mean you just KNOW that while that addition may make sense,
         | some 12 year minded person like me would just start referring
         | to it as the CRAP theorem. And I don't even dislike the
         | theorem.
        
         | adrr wrote:
         | Partition encompasses response time. If you have defined SLAs
         | on response time and your partition times exceed that, you're
         | not available.
        
         | hot_gril wrote:
         | You mean CLAP? Latency and availability are basically the same
         | thing. CAP is simple.
        
       | ivan_gammel wrote:
       | The CAP theorem is quantum mechanics of software with C*A = O(1)
       | in theory, similarly to uncertainty principle, but in many use
       | cases this value is so small that "classical" expectations of
       | both C and A are fine.
        
       | justinsaccount wrote:
       | > None of the clients need to be aware that a network partition
       | exists (except a small number who may see their connection to the
       | bad side drop, and be replaced by a connection to the good side).
       | 
       | What a convenient world where the client is not affected by the
       | network partition.
        
       | kstrauser wrote:
       | So you're setting up a multi-region RDS. If region A goes down,
       | do you continue to accept writes to region B?
       | 
       | A bank: No! If region A goes down, do _not_ process updates in B
       | until A is back up! We'd rather be down than wrong!
       | 
       | A web forum: Yes! We can reconcile later when A comes back up.
       | Until then keep serving traffic!
       | 
       | CAP theorem doesn't let you treat the cloud as a magic infinite
       | availability box. You still have to design your system to pick
       | the appropriate behavior _when_ something breaks. No one without
       | deep insight into your business needs can decide for you, either.
       | You're on the hook for choosing.
        
         | candiddevmike wrote:
         | How do you setup a multi-region RDS with multi-region write?
        
           | kstrauser wrote:
           | I think that's a thing? It's been a while since I've looked.
           | Even if not, multi-AZ replication with failover is a standard
           | setup and it has the same issues. Suppose you have some
           | frontend servers and an RDS primary instance in us-west-2a
           | and other frontend servers and an RDS replica in us-west-2b.
           | The link between AZs goes down. For simplicity, pretend a
           | failover doesn't happen. (It wouldn't matter if it did. All
           | that would change here is the names of the AZs.)
           | 
           | Do you accept writes in us-west-2a knowing the ones in 2b
           | can't see them? Do you serve reads in 2b knowing they might
           | be showing old information? Or do you shut down 2b altogether
           | and limp along at half capacity in only 2a? What if the
           | problem is that 2a becomes inaccessible to the Internet so
           | that you can no longer use a load balancer to route requests
           | to it? What if the writes in 2a hadn't fully replicated to 2b
           | before the partition?
           | 
           | You can probably answer those for any given business
           | scenario, but the point is that you _have_ to decide them and
           | you can 't outsource it to RDS. Some use cases prioritize
           | availability above all else, using eventual consistency to
           | work out the details once connectivity's restored. Others
           | demand consistency above all else, and would rather be down
           | than risk giving out wrong answers. No cloud host can decide
           | what's write for you. The CAP theorem is extremely freaking
           | relevant to anyone using more than one AZ or one region.
           | 
           | (I know you weren't claiming otherwise, just taking a chance
           | to say why cross-AZ still has the same issues as cross-
           | region, as I _have_ heard well meaning people say they were
           | somehow different. AWS does a great job of keeping things
           | running well to the point that it 's news when they don't.
           | Things still happen though. Same for Azure, GCP, and any
           | other cloud offering. However flawless their design and
           | execution, if a volcano erupts in the wrong place, there's
           | gonna be a partition.)
        
         | phh wrote:
         | Yes. I see CAP theorem as a (perfectly right) answer to
         | managers demanding literally impossible requirements
        
         | slaymaker1907 wrote:
         | Actually, banks use eventually consistent systems in many
         | cases. They'd rather let a few customers overdraw a bit than
         | stop a customer from being able to buy something on their card
         | when they have the funds for it.
        
           | kstrauser wrote:
           | _In many cases_. There are few absolutes at the edges. They
           | have to be decided on a case-by-case basis. Also, failure to
           | decide is deciding.
        
           | wongarsu wrote:
           | Good point. But I'd argue that's an artifact of history.
           | Traditional banks started out when eventual consistency was
           | the only choice. First with humans manually reconciling paper
           | ledgers based on messages from couriers and telegraphs, later
           | with the introduction of computers in batch processes that
           | ran over night. The ability to do this live is new to them,
           | and thus their processes are designed to not require it.
           | 
           | Financial institutions set up in this century (like paypal or
           | fintecs) usually do want consistency at all cost. Your risk
           | exposure is a lot lower if you know people don't spend or
           | withdraw more than they have.
        
             | darby_nine wrote:
             | > Good point. But I'd argue that's an artifact of history.
             | 
             | I think there's also less liability for the banks if they
             | accept and build around the risk of both eventual
             | consistency and fraud. Even now wiring is necessary to pull
             | off a digital bank theft, and even then you need to be
             | _extremely_ fast to get away with the money, and
             | _extremely_ fast to not go to prison, and liquidate it
             | _extremely_ quickly (likely into bitcoin). Even small
             | delays in the transactions would make this basically
             | impossible with modern banking, let alone a full clearing
             | house day.
        
             | dalyons wrote:
             | fintechs have some inhouse tech but typically involve all
             | kinds of third party systems. Even if you are interacting
             | with these third parties via apis(instead of csv or
             | whatever), you will always need an eventually consistent
             | recon mechanism to make sure both sides agree on what was
             | done. It doesnt have to take days though i agree.
             | 
             | The only way around this would be to replace stateful apis
             | with some kind of function that takes a reference to a
             | verifiable ledger, that can mutate that and anyone can
             | inspect themselves to see if the operation has been
             | completed. Oh wait :)
        
             | pradn wrote:
             | Eventual consistency gives you more availability, at the
             | cost of consistency. It's fine for financial institutions
             | to prefer this trade-off, because the any over/under error
             | is a likely a fraction of the cost of unavailability.
             | 
             | Also, in the real world, we can solve problems at a
             | different level - like legal or product.
        
               | PaulHoule wrote:
               | Also the financial institutions are making enough profit
               | that they can afford to lose a little bit in the process
               | of reconciliation. Consider how the 3.5% fees credit
               | cards charge mean they can afford to eat a bit of fraud
               | and actually can tolerate more fraud than a system which
               | had lower fees could.
        
             | cameronh90 wrote:
             | Requiring availability sounds like a good recipe for global
             | payment system outages. It also makes it a lot harder to
             | buy things on a plane.
        
           | dilyevsky wrote:
           | Ignoring CAP theorem doesn't just mean eventually consistent
           | - means you will lose transactions when replicas that aren't
           | fully replicated get corrupted. Probably not something you
           | want to be caught doing if you're a bank
        
             | skissane wrote:
             | Real world financial systems do have transactions that
             | aren't fully replicated and hence can be lost as a result.
             | But banks just decide that both the probability and sums
             | involved are low enough that they'll wear the risk.
             | 
             | Example: some ATMs are configured that if the bank computer
             | goes down (or the connection to it), they'll still let
             | customers withdraw limited sums of money in "offline" mode,
             | keeping a local record of the transaction, that will then
             | be uploaded back to the central bank computers when the
             | connection comes back up.
             | 
             | But, consider this hypothetical scenario: bank computer has
             | a major outage. ATM goes into offline mode. Customer goes
             | to ATM to withdraw $100. ATM grants withdrawal and records
             | card number, sum and timestamp internally (potentially even
             | in two different forms-on disk and a paper-based backup).
             | Customer leaves building. 30 minutes later, bank computer
             | is still down, when a terrorist attack destroys the
             | building and the ATM inside. The customer got $100 from the
             | bank, but all record of it has been destroyed. Yet, this is
             | such an unlikely scenario, and the sums involved are small
             | enough (in the big picture), that banks don't do anything
             | to mitigate against it - the mitigation costs more than the
             | expected value of the risk
        
           | hot_gril wrote:
           | _Very_ eventually consistent. Like, humans involved
           | sometimes.
        
           | mbesto wrote:
           | Actually, CAP theorem is only relevant to a database
           | _transaction_ , not a system. The system has to be available
           | to even _LET_ the card process. This is why no bank would
           | ever use anything BUT an ACID based system as data integrity
           | is far more important than speed (and availability for that
           | matter).
        
           | nly wrote:
           | It helps that most bank accounts come with overdraft
           | agreements and the bank can come after you for repayment with
           | the full force of credit law behind them
        
             | bguebert wrote:
             | Yeah, banks try and encourage overdraws because they make a
             | bunch of fees off them. I remember reading about some of
             | them getting in trouble for reordering same day deposits in
             | a way to maximize the overdraw fees.
             | 
             | https://cw33.com/news/heres-how-banks-rearrange-
             | transactions...
        
           | eddieroger wrote:
           | That and they love to charge overdraft fees to accounts that
           | don't otherwise make them a ton of money for the privilege.
        
           | eschneider wrote:
           | As patio11 says, "The optimum level of fraud is not zero."
        
         | cryptonector wrote:
         | You are mischaracterizing what TFA says. It says that
         | partitions either keep a client from reaching any servers (in
         | which case we don't care) or they keep some servers from
         | talking to others (or some clients from talking to _all_
         | servers), and that in the latter case whenever you can have a
         | quorum (really, a majority of the servers) reachable then you
         | can continue.
         | 
         | You've completely elided the quorum thing to make it sound like
         | TFA is nuts.
        
           | kstrauser wrote:
           | Quorum doesn't let you ignore partitions. It can't. You still
           | have to decide what happens to the nodes on the wrong side of
           | the partition, or how to handle multiple partitions (you have
           | 5 nodes. You have 2 netsplits. Now you have clusters of 2
           | nodes, 2 nodes, and 1 node. Quorum say what?).
           | 
           | I don't think TFA is nuts. I do think the premise that
           | engineers using cloud can ignore CAP theorem is wrong,
           | though. It's a decision you must consider when you're
           | designing a system. As long as everything is running well, it
           | doesn't matter. You've got to make sure that the behavior
           | _when_ something breaks is what you want it to be. And you
           | can 't outsource that decision. It can't be worked around.
        
             | cryptonector wrote:
             | > You still have to decide what happens to the nodes on the
             | wrong side of the partition
             | 
             | You don't because their clients won't also be isolated from
             | the other servers. That's TFA's point, that in a cloud
             | environment network partitions that affect clients
             | generally deny them access to all servers (something you
             | can't do anything about), and network partitions between
             | servers only isolate some servers and none of the clients.
        
               | kstrauser wrote:
               | That's fine if you value availability over consistency.
               | Which is fine if that's the appropriate choice for your
               | use case! Even then you have to decide whether to accept
               | updates in that situation, know that they may never be
               | reconciled if the partition is permanent.
        
               | cryptonector wrote:
               | No, you can have availability and consistency as long as
               | you have enough replicas. Network partitions that leave
               | you w/o a quorum are like the whole cloud being down.
               | Network partitions don't leave some clients only able to
               | reach non-quorums -- this is the big point of TFA.
        
               | kstrauser wrote:
               | That presumes you can only have 1 partition at a time.
               | That is _not_ the case. You can optimize for good
               | behavior when you end up with 2 disconnected graphs. It's
               | absolutely the case that a graph with N nodes can end up
               | with N separate groups. Conceptually, imagine a single
               | switch that connects all of them together and that switch
               | dies.
               | 
               | You can engineer cloud services to be resilient. AWS has
               | done a great job of that in my personal experience. _But
               | worst cases can still happen._
               | 
               | If you have an algorithm that runs in n*log(n) most of
               | the time but 2^n sometimes, it can still be super useful.
               | You have to prepare for the bad case though. If you say
               | it's been running great so far and we don't worry about
               | it, fine, but that doesn't mean it can't still blow up
               | when it's most inconvenient. It only means it hasn't yet.
        
               | cryptonector wrote:
               | Again, a partition that leaves you with no quorum
               | anywhere is like the cloud being down.
        
               | pessimizer wrote:
               | A partition that leaves you with no quorum anywhere is
               | like the cloud being down _if you value consistency over
               | availability._
        
               | cryptonector wrote:
               | Yes, but again, TFA says that you will trust the cloud
               | will be up long enough that you would indeed prefer
               | consistency.
               | 
               | My goodness. So many commenters here today are completely
               | ignoring TFA and giving lessons on CAP -- lessons that
               | are correct, but which completely ignore that TFA is
               | really arguing that clouds move the needle hard in one
               | direction when considering the CAP trade-offs, to the
               | point that "the CAP theorem is irrelevant for cloud
               | systems" (which was this post's original title, though
               | not TFA's).
        
               | robertlagrant wrote:
               | > Again, a partition that leaves you with no quorum
               | anywhere is like the cloud being down.
               | 
               | What does this mean? What is the cloud being down? The
               | cloud isn't a single thing; it's lots of computers in
               | lots of data centres.
        
               | NavinF wrote:
               | It means there will be an article on the front page
               | declaring the outage
        
               | dijit wrote:
               | > No, you can have availability and consistency as long
               | as you have enough replicas.
               | 
               | You've misunderstood the problem.
               | 
               | You cannot be consistent if you cannot resolve conflicts,
               | taking writes after a network partition means you are
               | operating on stale data.
               | 
               | Honestly the closest we've ever gotten to solving CAP
               | theorum is spanner[0], which trades off availability by
               | being hugely less performant for writes. I'm aware
               | spanner is used a lot in google cloud (under the hood),
               | but you won't solve CAP by having hundreds of PGSQL
               | replicas, because those aren't using Spanner.
               | 
               | In fact, you can test it out, ElasticSearch orchestrates
               | a huge number of lucene databases, a small install will
               | have a dozen or so replicas and partitions, but you're
               | welcome to crank it to as many nodes as you want then
               | split off a zone, and see what happens.
               | 
               | I'm becoming annoyed at the level of ignorance on display
               | in this thread, so I'm sorry for the curt tone, you can't
               | abstract your way to solving it, it's a fundamental
               | limitation of data.
               | 
               | [0]: https://static.googleusercontent.com/media/research.
               | google.c...
        
               | cryptonector wrote:
               | Did you read TFA? Please, the point of TFA is NOT that
               | you don't need distributed consensus algorithms, but that
               | once you have them you don't need to worry about
               | partitions, therefore you can have consistency, and you
               | get "availability" in that the cloud "is always
               | available", and if that sounds like cheating you have to
               | realize that partitions in the cloud don't isolate
               | clients outside the cloud, only servers inside the cloud,
               | and if there's no quorum anywhere then you just treat
               | that as "the cloud is down".
        
               | dijit wrote:
               | I did, and it doesn't provide insight that didn't exist
               | 15 years ago.
               | 
               | Cloud is always available? I have a bridge to sell you.
        
               | robertlagrant wrote:
               | > you get "availability" in that the cloud "is always
               | available",
               | 
               | The cloud isn't what's being described as available. The
               | service is available or not. The service is hosted on
               | multiple computers that talk to each other. If they stop
               | talking to each other, the service needs to either become
               | unavailable (choosing consistency) or risk having not up
               | to date data (choosing availability).
        
               | bitwalker wrote:
               | Those aren't the only failure modes - you can have two
               | sets of servers partitioned from one another (in two
               | different data centers), both of which are reachable by
               | clients. Do you allow those partitions to remain
               | available, or do you stop accepting client connections
               | until the partition heals? The "right" choice depends
               | entirely on the application.
        
               | cryptonector wrote:
               | In TFA's world-view clients will not talk to isolated
               | servers.
        
               | thayne wrote:
               | So that would be choosing consistency over availability.
        
             | jrumbut wrote:
             | It's poor timing for this article.
             | 
             | The CrowdStrike incident has demonstrated that no matter
             | what you do it is possible for any distributed system to be
             | partitioned in such a way that you have to choose between
             | consistency and availability and if the system is important
             | then that choice is important.
        
               | darby_nine wrote:
               | That's true, but I think the catch is that most customers
               | likely didn't view it as an investment that could
               | _decrease_ availability.
        
       | senorrib wrote:
       | I think every couple of months there's yet another article saying
       | the CAP theorem is irrelevant. The problem with these is that
       | they ignore the fact that CAP theorem isn't a guide, a framework
       | or anything else.
       | 
       | It's simply the formalization of a fact, and whether or not that
       | fact is *important* (although still a fact) depends on the actual
       | use case. Hell, it applies even to services within the same
       | memory space, although obviously the probability of losing any of
       | the three is orders of magnitude less than on a network.
       | 
       | Can we please move on?
        
         | da_chicken wrote:
         | > Can we please move on?
         | 
         | Doubtful. If there's one thing that new developers have always
         | insisted upon doing it's telling the networking and data store
         | engineers that their foundational limitations are wrong. It
         | seems that everyone has to learn that computers aren't magic
         | through misunderstanding and experience.
        
           | hinkley wrote:
           | Way way back, I had a coworker who aspired to be a hacker get
           | really mad at me because he was just sure that spinning rust
           | was "digital" while I was asserting to him that there's a DAC
           | in there and that's how data recovery services work. They
           | crack open the drive and read the magnetic domains to
           | determine not just what the current bytes are but potentially
           | what the previous ones were too.
           | 
           | They're digital! It's ones and zeroes! Dude, it's a fucking
           | magnet. A bunch of them in fact.
        
             | wrs wrote:
             | Yes, any real digital device is relying on a hopefully
             | well-separated bimodal distribution of something analog.
        
       | mordae wrote:
       | You wish.
       | 
       | > DNS, multi-cast, or some other mechanism directs them towards a
       | healthy load balancer on the healthy side of the partition
       | 
       | Incidentally that's where CAP makes it's appearance and bites
       | your ass.
       | 
       | No amount of VRRP, UCARP wishful thinking can guarantee a
       | conclusion on what partition is "correct" in presence of a
       | network partition between load balancer nodes.
       | 
       | Also, who determines where to point the DNS? A single point of
       | failure VPS? Or perhaps a group of distributed machines voting?
       | Yeah.
       | 
       | You still need to perform the analysis. It's just that some cloud
       | providers offer the distributed voting clusters as a service and
       | take care of the DNS and load balancer switchover for you.
       | 
       | And that's still not enough, because you might not want to allow
       | stragglers write to orphan databases before the whole network
       | fencing kicks in.
        
         | doctorpangloss wrote:
         | I don't know. People spend a lot of time thinking about network
         | partitions. What's the difference between a network partition,
         | and a network that is very slow? You could communicate via
         | carrier pigeon. Practicably, you can easily communicate via
         | people in different regions, who already "know" which partition
         | is correct. Networks are never really partitioned; and, the
         | choice of denominator when calculating their speed means
         | they're never really transferring at 0 bits per second either.
         | Eventually data will start transferring again.
         | 
         | So a pretty simple application design can deal with all of
         | this, and that's the world we live in, and people deal with
         | delays and move on with life, and they might bellyache about
         | delays for the purpose of getting discounts from their vendors
         | but they don't really want to switch vendors. If you're a bank
         | in the highly competitive business of taking 2.95% fees (that
         | should be 0.1%) on transactions, maybe this stuff matters. But
         | like many things in life, like drug prices, that opportunity
         | only exists in the US, it isn't really relevant anywhere else
         | in the world, and it's certainly not a math problem or
         | intrinsic as you're making it sound. That's just the mythology
         | the Stripe and banks people have kind of labeled on top of
         | their shit, which was Mongo at the end of the day.
        
           | 01HNNWZ0MV43FF wrote:
           | "A good driver sometimes misses a turn, a bad driver never
           | misses a turn"
           | 
           | A good engineer knows that all real-time systems are turn-
           | based, and the network is _always_ partitioned
        
             | smilliken wrote:
             | This is exactly the right way to think about it. One should
             | be happy when the network works, and not expect any packet
             | to make it somewhere on any particular schedule. Ideally.
             | In practice it's too expensive to plan this way for most
             | systems, so we compensate by pretending the failure modes
             | don't exist.
        
           | toast0 wrote:
           | You've got to at least consider what is going to happen.
           | 
           | The whole point of the CAP theorem is that you can't have a
           | one size fits all design. Network partitions exist, and it's
           | not always a full partition where you have two (or more)
           | distinct groups of interconnected nodes. Sometimes everybody
           | can talk to everybody, except node A can't reach node B. It
           | can even be unidirectional, where node A can't send to B, but
           | node B can send to A. That's just how it is --- stuff breaks.
           | 
           | There's designs that highlight consistency in the face of
           | partitions; if you must have the most recent write, you need
           | a majority read or a source of truth --- if you can't contact
           | that source of truth or a majority of copies in a reasonable
           | time (or at all). And you can't confirm a write unless you
           | can contact the source of truth or do a majority write. As a
           | consequence, you're more likely to hit situations where you
           | reach a healthy frontend, but can't do work because it can't
           | reach a healthy backend; otoh, that's something you should
           | plan for anyway.
           | 
           | There's designs that highlight availability in the face of
           | partitions; if you must have a read and you can reach _a_
           | node with the data, go for it. This can be extended to writes
           | too, if you trust a node to persist data locally, you can use
           | a journal entry system rather than a state update system, and
           | reconcile when the partition ends. You 'll lose the data if
           | the node's storage fails before the partition ends, of
           | course. And you may end up reconciling into an undesirable
           | state; in banking, it's common to pick availability --- you
           | want customers to be able to use their cards even when the
           | bank systems are offline for periodic maintenance / close of
           | day / close of month processing, or unexpected network issues
           | --- but when the systems come back online, some customers
           | will have total transaction amounts that would have been
           | denied if the system was online. Or you can do a last state
           | update wins too --- sometimes that's appropriate.
           | 
           | Of course, there's the underlying horror of all distributed
           | systems. Information takes time to be delivered, so there's
           | no way for a node to know the current status of anything; all
           | information from remote nodes is delayed. This means a node
           | never knows if it can contact another node, it only knows if
           | it was recently able to or not. This also means unless you do
           | some form of locking read, it's not unreasonable for the
           | value you've read to be out of date before you receive it.
           | 
           | Then there's even further horrors in that even a single node
           | is actually a distributed system. Although there's
           | significantly less likelyhood of a network partition between
           | cpu cores on a single chip.
        
             | mordae wrote:
             | There is also one more level of horror people often forget.
             | Sometimes the history unwinds itself. Some outages will
             | require restoration from a backup and some amount of
             | transactions may be lost. It does not happen often, but it
             | may happen. Such situations may lead to e.g. sequentially
             | allocated IDs being reused or originally valid pointers to
             | records now point nowhere and so on.
        
             | doctorpangloss wrote:
             | Everything you're saying is true. It's also true that you
             | can ignore these things, have some problems with some
             | customers, and if you're still good, they'll still shop
             | with you or whatever, and life will go on.
             | 
             | I am really saying that I don't buy into the mythology that
             | someone at AWS or Stripe or whatever knows more about the
             | theoretical stuff than anyone else. It's a cloud _of
             | farts_. Nobody needs these really detailed explanations.
             | People do them because they 're intellectually edifying and
             | self-aggrandizing, not because they're useful.
        
           | mordae wrote:
           | Suppose we all live under socialism and capitalist rules no
           | longer apply. Only physics.
           | 
           | You run a vacation home exchange/booking site, which you run
           | in 3 instances in America, Europe and Asia because base you
           | found that people are happier when the site is snappy without
           | those pesky 50ms delays.
           | 
           | Now suppose it's the middle of the night in one of those 3
           | regions, so no big deal, but the rollout of new version
           | brings down the database that's holding the bookings for two
           | hours before it's fixed. Yeah, it was just the simplest
           | solution to have just a one database with the bookings,
           | because you wouldn't have to worry about all that pesky CAP
           | stuff.
           | 
           | But people then start to ask if it would be possible to book
           | from regions that are not experiencing the outage while
           | double-bookings would still not happen. So you give it a
           | consideration and then you figure out a brilliant, easy
           | solution. You just
        
             | ndriscoll wrote:
             | Incidentally, I'm highly suspicious of the claim that those
             | pesky 50ms delays matter much. Checking DOMContentLoaded
             | timings on a couple sites:
             | 
             | Reddit: 1.31 s
             | 
             | Amazon: 1.51 s
             | 
             | Google: 577 ms
             | 
             | CNN: 671 ms
             | 
             | Facebook: 857 ms
             | 
             | Logging into Facebook: 8.37 s
             | 
             | As long as you have TLS termination close to the end user,
             | and your proxy maintains connections to the backend so that
             | extra round-trips aren't needed, the amount of time large
             | popular sites take to load suggests that people wildly
             | overstate how much anyone cares about a fraction of a blink
             | of an eye. A lot of these sites have a ~20 ms time to
             | connect for me. If it were so important, I'd expect to see
             | page loads on popular sites take <100 ms (a lot of that
             | being TCP+TLS handshake), not 800 ms or 8 seconds.
        
               | hot_gril wrote:
               | and travel sites will be a lot slower for other reasons
        
               | doctorpangloss wrote:
               | It's hard to say. It's common sense that it doesn't
               | matter that much for rational outcomes. Maybe it matters
               | a lot if you are selling something psychological - like
               | in a sense, before Amazon, people with shopping addiction
               | were poorly served, and maybe 50ms matters to them.
               | 
               | You can produce a document that says these load times
               | matter - Google famously observed that SERP load times
               | have a huge impact on a thing they were measuring. You
               | will never convince those analytics people that the pesky
               | 50ms doesn't matter, because they operate at a scale
               | where they could probably observe a way that it does.
               | 
               | The database people are usually sincere. But are the
               | retail people? If you work for AWS, you're working for a
               | retailer. Like so what if saving 50ms makes more money
               | off shopping addicts? The AWS people will never litigate
               | this. But hey, they don't want their kids using Juul
               | right? They don't want their kids buying Robux. Part of
               | the mythology I hate about these AWS and Stripe posts is,
               | "The only valid real world application of my abstract
               | math is the real world application that pays _my_
               | salary." "The only application that we should have a
               | strictly technical conversation about is my application."
               | 
               | Nobody would care about CAP at Amazon if it didn't
               | extract more money from shopping addicts! AWS doesn't
               | exist without shopping addicts!
        
               | ndriscoll wrote:
               | What I mean is that Amazon seems to think it doesn't
               | matter that much if they're taking hundreds of ms (or
               | over 1 s) to show me a basic page. The TCP+TLS handshake
               | takes like 60 ms, plus another RTT for the request is 80
               | ms total. If it takes 10 ms to generate the page (in fact
               | it seems to take over 100), that should still be under
               | 100 ms total on a fresh connection. But instead they send
               | me off to fetch scripts _from multiple other subdomains_
               | (more TCP+TLS handshakes). That alone completely ruins
               | that 50 ms savings.
               | 
               | So the observation is that apparently Amazon does not
               | think 50 ms is very important. If they did, their page
               | could be loading about 5-10x faster. Likewise with e.g.
               | reddit; I don't know if that site has _ever_ managed to
               | load a page in under 1 s. New reddit is even worse. At
               | one point new reddit was so slow that I could tap the URL
               | bar on my phone, scroll to the left, and change www to
               | old in less time than it took for the page to load. In
               | that context, I find people talking about globally
               | distributed systems /"data at the edge" to save 50 ms to
               | be rather comical.
        
           | tptacek wrote:
           | I don't know what the formalisms here are, but as someone
           | intermittently (thankfully, at this point, rarely)
           | responsible for some fairly large-scale distributed systems:
           | the "network that is really slow" case is much scarier to me
           | than the "total network partition". Routine partial failure
           | is a big part of what makes distributed systems hard to
           | reason about.
        
             | rubiquity wrote:
             | Even more fun are asymmetric network degradations or
             | partitions.
        
         | bankcust08385 wrote:
         | Yep. You can't magic cloud away Consistency by ignoring it and
         | YOLO hoping it will Do The Right Thing(tm).
        
         | hinkley wrote:
         | The idea that someone in a partition can get around it by
         | clever routing tricks indicates the author has some "new"
         | definition of partition.
         | 
         | Partition isn't one ethernet cable going bad. It's _all_ the
         | ethernet cables going bad. Redundant network providers for your
         | data center to handle the idiot with the backhoe isn't
         | surviving partition it's preventing it in the first place.
        
           | kstrauser wrote:
           | Yep. CAP comes into play when all the mitigations fail.
           | However clever, diligent, and well-executed a network setup,
           | there's some scenario where 2 nodes can't communicate. It's
           | perfectly fine to say we've made it so vastly unlikely that
           | we accept the risk to just take the whole thing offline when
           | that happens. The problem _cannot_ be made to go away. It 's
           | not possible.
        
           | wongarsu wrote:
           | A somewhat plausible scenario that's easy to illustrate would
           | be if you have a DC in the US and one in New Zealand, and all
           | 7 subsea cables connecting New Zealand to the internet go
           | dark.
           | 
           | Something like that happened just this year to 13 African
           | nations [1]
           | 
           | 1: https://www.techopedia.com/news/when-africa-lost-internet-
           | ex...
        
           | toast0 wrote:
           | You really do need to consider a full partition, and a
           | partial partition. Both happen.
           | 
           | It's most common that you have a full partition when nobody
           | can talk to node A, because it's actually offline. And
           | sometimes you've got a dead uplink for a switch with a couple
           | of nodes behind it that can talk to each other.
           | 
           | But partial partitions are really common too. If Node A can
           | talk (bidirectionally) to B and C, but B and C can each only
           | talk to A, you _could_ do something with clever routing
           | tricks. If you have a large enough network, you see this kind
           | of thing _all the time_ , and it's tempting to consider
           | routing tricks. IMHO, it's a lot more realistic to just have
           | a fast feedback between monitoring the network and the people
           | who can go out and clean the ends on the fiber on the
           | switches and routers and what not. The internet as exists is
           | the product of smart people doing routing tricks on a global
           | basis; it's not universally optimal --- you could do better
           | in specific cases all the time, but it's really easy to
           | identify these as a human when something is broken; actually
           | doing probing for host routing is a big exercise for
           | typically marginal benefits. Magic host routing won't help
           | when a backhoe (or directional drill or boat anchor) has
           | determined your redundant fiber paths are all physically
           | present in the same hole though.
        
           | hot_gril wrote:
           | Could a partition just be the machine hosting one of your DBs
           | going down? When it comes back up, it still has its data, but
           | it's missed out on what happened in the meantime.
        
             | robertlagrant wrote:
             | Not really - if a database stops you fundamentally don't
             | have availability. CAP is specific to network partitions
             | because it wants to grapple with what happens when clients
             | can talk to nodes in your service, but the nodes can't talk
             | to each other.
        
             | hinkley wrote:
             | Your question was answered by someone else but you've
             | triggered a different form of dread.
             | 
             | We lost a preprod database that held up dev and QA for half
             | a day. To bring the server room up to code they moved the
             | outlets, and when they went to move the power cords,
             | someone thought they could get away with relying on the UPS
             | power during the jiggery. Turned out the UPS that was
             | reporting all green only actually had about 8 seconds of
             | standby power in it, which was a little longer than the IT
             | person needed to untangle the cables.
             | 
             | So in the end we were just fucked on a different day than
             | we would have been if we had waited for the next hiccup in
             | the city power.
             | 
             | I mention this because if you've split your cluster between
             | three racks and someone manages to power down an entire
             | rack, you're dangerously close to not having quorum and
             | definitely close to not having sufficient throughput for
             | both consumer traffic and rebuilding/resyncing/resilvering.
             | 
             | It's a slow motion rock slide that is a cluster that was
             | not sized correctly for user traffic plus recovery traffic
             | as the entire thing limps toward a full on outage when just
             | one more machine needs to be rebooted, because, for
             | instance, your team decided that restart a machine every
             | n/48 hours, to deal with a slow leak is just fine since
             | you're using a consensus protocol that will hide the
             | reboots. Maybe rock slide is the wrong word. It's a train
             | wreck. Once it starts you can't stop it, you can only watch
             | and fret.
        
         | cryptonector wrote:
         | TFA doesn't say you don't need Paxos other such. It says that
         | partitions that leave less than a quorum of servers isolated do
         | not reduce availability, and that partitions that leave you w/o
         | a quorum are not really a thing on the cloud. That last part
         | might not be true, but to the extent that it is true then you
         | can have availability and strong consistency.
        
           | skywhopper wrote:
           | It's not true, but if it were true then it's true?
        
             | cryptonector wrote:
             | No, more like: when the cloud is all down, yeah, you don't
             | have availability, but so what, that's like all servers
             | being down -- that's always a potential problem, but it's
             | less likely on a cloud.
        
         | hot_gril wrote:
         | For a cloud customer, the networking side might be as simple as
         | "run a few replicas in different regions, and let the load
         | balancer deal with it." Or only different AZs. The database
         | problem is much harder to sign away as someone else's job.
        
       | bjornsing wrote:
       | I suspect the CAP theorem factored into the design of these cloud
       | architectures, in such a way that it now seems irrelevant. But it
       | probably was relevant in preventing a lot of other more complex
       | designs.
        
         | wmf wrote:
         | For example, having 3 AZs allows Paxos/Raft voting.
        
           | mrkeen wrote:
           | Paxos being CP.
        
       | xnorswap wrote:
       | There's a better rebuttal(*) of CAP in Kleppmann's DDIA, under
       | the title, "The unhelpful CAP theorem".
       | 
       | I won't plagiarize his text, instead the chapter references his
       | blogpost, "Please stop calling databases CP or AP":
       | https://martin.kleppmann.com/2015/05/11/please-stop-calling-...
       | 
       | (*): rebuttal I think is the wrong word, but I couldn't think of
       | better.
        
         | cowboylowrez wrote:
         | Really good read, in fact all of these discussions are really
         | great.
         | 
         | I liked the linearizable explanation, like when Alice knows the
         | tournament outcome but Bob doesn't. A super extension to this
         | would underscore how important this is, and the danger of Alice
         | knowing the outcome but Bob not knowing, just extend the
         | website a bit and make it a sports gambling website. A system
         | under such a parition would allow Alice to make a "sure thing"
         | bet against Bob, so the constraint should be that when Bob's
         | view is stale, it does not take bets. But how does Bobs view
         | know its stale? It has to query Alices view! Lots of mind games
         | to play!
        
       | throwaway71271 wrote:
       | When I design systems I just think about tiny traitor generals
       | and their sneaky traitor messengers racing in the war, their
       | clocks are broken, and some of them are deaf, blind or both.
       | 
       | CAP or no CAP, chaos will reign.
       | 
       | I think FLP
       | (https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf) is
       | better way to think about systems.
       | 
       | I think CAP is not as relevant in the cloud because the
       | complexity is so high that nobody even knows what is going on, so
       | the just C part, regardless of the other letters, is ridiculously
       | difficult even on a single computer. A book can be written just
       | to explain write(2)'s surprise attacks.
       | 
       | So you think you have guarantees whatever the designers said they
       | have AP or CP, and yet.. the impossible will happen twice a day
       | (and 3 times at night when its your on-call).
        
         | SomeHacker44 wrote:
         | Nice! A paper by Michael Fischer of Multiflow fame. I have one
         | of their computers in my garage. Thanks!
        
         | hot_gril wrote:
         | I can think of a lot of cloud systems that have C. Businesses
         | will rely on a single machine running a big Oracle DB.
         | Microservice architectures will be eventually consistent as a
         | whole, but individual services will again have a single DB on a
         | single machine.
         | 
         | The single machine is a beastly distributed system in of
         | itself, with multiple cores, CPUs, NUMA nodes, tiered caches,
         | RAM, and disks. But when it comes down to two writers fighting
         | over a row, it's going to consult some particular physical
         | place in hardware for the lock.
        
       | killjoywashere wrote:
       | The military lives in this world and will likely encourage people
       | to continue thinking about it. Think about wearables on a
       | submarine, as an example. Does the captain want to know his crew
       | is fatigued, about to get sick, getting less exercise than they
       | did on their last deployment? Yes. Can you talk to a cloud? No.
       | Does the Admiral in Hawaii want to know those same answers about
       | that boat, and every boat in the Group, eventually? Yes. For this
       | situation, datacenter-aware databases are great. There are other
       | solutions for other problems.
        
         | leetrout wrote:
         | Yep.
         | 
         | For those unaware, military networks deal with disruptions,
         | disconnections, intermittent connectivity, and low-bandwidth
         | (DDIL).
         | 
         | https://www.usni.org/magazines/proceedings/sponsored/ddil-en...
        
       | mrkeen wrote:
       | > In practice, the redundant nature of connectivity and ability
       | to use routing mechanisms to send clients to the healthy side of
       | partitions
       | 
       | Iow: You can have CAP as long as you can communicate across
       | "partitions".
        
         | crote wrote:
         | More precisely: partitions in the graph theory sense rarely (if
         | ever) happen. In practice it looks more like a complete graph
         | losing some edges.
         | 
         | Let's say you have servers in DCs A and B, and clients at ISPs
         | C and D. Normally servers at A and B can communicate, and
         | clients at both C and D can each reach servers at A and B.
         | 
         | If connectivity between DCs A and B goes down there is
         | technically no partition, because connectivity between the
         | clients and each DC is still working. The servers at one of the
         | DCs can just say "go away, I lost my quorum", and the clients
         | will connect to the other DC. This is the most likely scenario,
         | and in practice the only one you're really interested in
         | solving.
         | 
         | If connectivity between ISP C and DC A goes down, there is no
         | partition because they can just connect to DC B.
         | 
         | If all connectivity for ISP C goes down, that is Not Your
         | Problem.
         | 
         | If all connectivity for DC A goes down, it doesn't matter
         | because it no longer has clients writing to it, and they've all
         | reconnected to DC B.
         | 
         | To have a partition, you'd need to lose the link between DCs A
         | and B, _and_ lose the link from ISP C to DC B, _and_ lose the
         | link from ISP D to DC A. This leaves the clients from ISP C
         | connected only to DC A, and the clients from ISP D connected
         | only to DC B - while at the same time being unable to reconcile
         | between DCs A and B. Is this theoretically possible? Yes, of
         | course. Does this happen in practice - let alone in a close-
         | enough split that you could reasonably be expected to serve
         | both sides? No, not really.
        
           | justinsaccount wrote:
           | DCA: north america       DCB: south america       ISP C:
           | north american ISP       ISP D: south american ISP
           | 
           | A major fiber link between north and south america is cut,
           | disrupting all connectivity between the two areas.
           | "lose the link between DCs A and B": check.       "lose the
           | link from ISP D to DC A.": check.
           | 
           | replace north and south america with any two disparate
           | locations.
        
           | hinkley wrote:
           | Partition tends to be a problem between peers, where
           | observers can see they aren't talking to each other.
           | 
           | In the early days that might have two machines we walked up
           | to seeing different results, now we talk to everything over
           | the internet. We generally mean partition when one vlan is
           | having trouble. One ethernet card is dying or has a bad
           | cable. If the entire thing becomes unreachable we just call
           | it an outage.
        
       | lupire wrote:
       | He's saying that you don't need Partition Tolerance because
       | network is never actually Partitioned. This is exactly why the
       | Internet and the US Interstate Highway system were invented in
       | the first place.
       | 
       | Or he's saying you don't need Consistency because your system
       | isn't actually distributed; it's just a centralized system with
       | hot backups.
       | 
       | It's unclear what he's trying to say.
       | 
       | No idea why he wrote the blog post. It doesn't increase my
       | confidence in the engineering equality of his employer AWS
        
       | bunderbunder wrote:
       | I once lost an entire Christmas vacation to fixing up the damage
       | caused when an Elasticsearch cluster running in AWS responded
       | poorly to a network partition event and started producing results
       | that ruined our users' day (and business records) in a "costing
       | millions of dollars" kind of way.
       | 
       | It was a very old version of ES, and the specific behavior that
       | led to the problem has been fixed for a long time now. But still,
       | the fact that something like this can happen in a cloud
       | deployment demonstrates that this article's advice rests on an
       | egregiously simplistic perspective on the possible failure modes
       | of distributed systems.
       | 
       | In particular, the major premise that intermittent connectivity
       | is only a problem on internetworks is just plain wrong. Hubs and
       | switches flake out. Loose wires get jiggled. Subnetworks get
       | congested.
       | 
       | And if you're on the cloud, nobody even tries to pretend that
       | they'll tell you when server and equipment maintenance is going
       | to happen.
        
         | hinkley wrote:
         | Twice I've had to take an Ethernet cable from an IT lead and
         | throw it away. "It's still good!" No it fucking isn't. One of
         | them compromised by pulling it out of the trash and cutting the
         | ends off.
         | 
         | When there's maintenance going on in the server room and a
         | machine they promised not to touch starts having intermittent
         | networking problems, it's probably a shitty cable getting
         | jostled. There is an entire generation of devs now that have
         | had physical hardware abstracted away and aren't learning the
         | lessons.
         | 
         | Though my favorite stories are the cubicle neighbor who taps
         | their foot leading to packet loss.
        
           | kstrauser wrote:
           | I watched a colleague diagnosis a flaky cable. Then he
           | immediately pulled out a pocket knife and sliced the cable in
           | 2. He said he didn't ever even want to be tempted to trust it
           | again.
           | 
           | That's been my model since then. I've never worked with a
           | cable so expensive or irreplaceable that I'd want to take a
           | chance on it a second time. I'm sure they exist. I haven't
           | been around one.
        
             | hot_gril wrote:
             | Lightning cable. Gotta mess with it until the iPhone
             | charges. I'm not replacing it until it's tied into a
             | blackwall hitch and still isn't charging.
        
           | devin wrote:
           | It's been awhile since I've been in a data center, but fiber
           | cables were usually the ones that seemed to have more staying
           | power even when they started to flake. Maybe because some of
           | them were long runs and they weren't cheap. You'd see some
           | absolutely wrenched cable going into rack at a horrifying
           | angle from the tray on 30ft run with greater regularity than
           | I'd care to admit.
        
       | motbus3 wrote:
       | If you don't care about costs...
        
         | kstrauser wrote:
         | ...or physics.
        
       | pyrale wrote:
       | Someone else tok ownership of the problem for you and sells you
       | their solution : "The theoretical issue is irrelevant to me".
       | 
       | Sure. Also, there's a long list of other things that are probably
       | irrelevant to you. That is, until your provider fails and you
       | need to understand the situation in order to provide a
       | workaround.
       | 
       | And slapping "load-balancers" everywhere on your schema is not
       | really a solution, because load-balancers themselves are a
       | distributed system with a state and are subject to CAP, as
       | presented in the schema.
       | 
       | > DNS, multi-cast, or some other mechanism directs them towards a
       | healthy load balancer on the healthy side of the partition.
       | 
       | "Somehow, something somewhere will fix my shit hopefully". Also,
       | as a sidenote, a few friends would angrily shake their "it's
       | always DNS" cup reading this.
       | 
       | edit: reading the rest of the blog and author's bio, I'm unsure
       | whether the author is genuinely mistaken, or whether they're
       | advertising their employer's product.
        
       | vmaurin wrote:
       | Plot twist: in the article drawings, replica one and two are
       | split by network, and it could fail.
       | 
       | The author seems to not understand what the meaning of the P in
       | CAP
        
         | jvanderbot wrote:
         | "Non failing node" does not mean "Non partitioned node", simple
         | as that.
         | 
         | If you treat a partitioned node as "failed", then CAP does not
         | apply. You've simply left it out cold with no read / write
         | capability because you've designated a "quorum" as the in-
         | group.
        
         | hinkley wrote:
         | I missed your comment and just made the same comment with
         | different words. He literally has a picture of a partition.
        
       | kristjansson wrote:
       | Many things can be solved by the SEP Field[0]
       | 
       | [0]:
       | https://en.wikipedia.org/wiki/Somebody_else's_problem#Dougla...
        
       | ibash wrote:
       | > if a quorum of replicas is available to the client, they can
       | still get both strong consistency, and uncompromised
       | availability.
       | 
       | Then it's not a partition.
        
       | tristor wrote:
       | As someone who's worked extensively on distributed systems,
       | including at a cloud provider, after reading this I think the
       | author doesn't actually understand the CAP theorem or the two
       | generals problem. Their conclusions are essentially utterly
       | incorrect.
        
       | hinkley wrote:
       | > The formalized CAP theorem would call this system unavailable,
       | based on their definition of availability:
       | 
       | Umm, no? That's a picture of a partition. The partition is not
       | able to make progress because the system is not partition
       | tolerant. If it did it wouldn't be consistent. It's still
       | available.
        
       | throw0101c wrote:
       | See also:
       | 
       | > _In database theory, the PACELC theorem is an extension to the
       | CAP theorem. It states that in case of network partitioning (P)
       | in a distributed computer system, one has to choose between
       | availability (A) and consistency (C) (as per the CAP theorem),
       | but else (E), even when the system is running normally in the
       | absence of partitions, one has to choose between latency (L) and
       | loss of consistency (C)._
       | 
       | * https://en.wikipedia.org/wiki/PACELC_theorem
        
       | jumploops wrote:
       | > The point of this post isn't merely to be the ten billionth
       | blog post on the CAP theorem. It's to issue a challenge. A
       | request. Please, if you're an experienced distributed systems
       | person who's teaching some new folks about trade-offs in your
       | space, don't start with CAP.
       | 
       | Yeah... no. Just because the cloud offers primitives that allow
       | you to skip many of the challenges that the CAP theorem outlines,
       | doesn't mean it's not a critical step to learning about and
       | building novel distributed systems.
       | 
       | I think the author is confusing systems practitioners with
       | distributed systems researchers.
       | 
       | I agree in some part, the former rarely needs to think about CAP
       | for the majority of B2B cloud SaaS. For the latter, it seems
       | entirely incorrect to skip CAP theorem fundamentals in one's
       | education.
       | 
       | tl;dr -- just because Kubernetes (et al.) make building
       | distributed systems easier, it doesn't mean you should avoid the
       | CAP theorem in teaching or disregard it altogether.
        
       | jorblumesea wrote:
       | CAP was never designed as an end all template you blindly apply
       | to large scale systems. Think of it more as a mental starting
       | pointing, that systems have these trade offs you need to
       | consider. Each system you integrate has complex and nuanced
       | requirements that don't neatly fall into clean buckets.
       | 
       | As always Kleppmann has a great and deep answer for this.
       | 
       | https://martin.kleppmann.com/2015/05/11/please-stop-calling-...
        
       | KaiserPro wrote:
       | I kinda see what the author is getting at, but I don't buy the
       | argument.
       | 
       | However, in the example with the network partition, it relies on
       | proper monitoring to work out if the DB its attached to is
       | currently in partition.
       | 
       | managing reads is a piece of piss, mostly. Its when you need
       | propagate write to the rest of the DB system, thats where stuff
       | gets hairy.
       | 
       | Now, most places can run from a single DB, especially as disks
       | are fucking fast now. so CAP is never really that much of a
       | problem. However when you go multi-region, thats when it gets
       | interesting.
        
       | cryptonector wrote:
       | > If the partition extended to the whole big internet that
       | clients are on, this wouldn't work. But they typically don't.
       | 
       | This is the key, that network partitions either keep some clients
       | from accessing any servers, or they keep some servers from
       | talking to each other. The former case is uninteresting because
       | nothing can be done server-side about it. The latter is
       | interesting and we can fix it with load balancers.
       | 
       | This conflicts with the picture painted earlier in TFA where the
       | unhappy client is somehow stuck with the unhappy server, but
       | let's consider that just didactic.
       | 
       | We can also not use load balancers but have the clients talk to
       | all the servers they can reach, when we trust the clients to
       | behave correctly. Some architectures do this, like Lustre, which
       | is why I mention it.
       | 
       | I see several comments here that seem to take TFA as saying that
       | distributed consensus algorithms/protocols are not needed, but
       | TFA does not say that. TFA says you can have consistency,
       | availability, and partition tolerance because network partitions
       | between servers typically don't extend to clients, and you can
       | have enough servers to maintain quorum for all clients (if a
       | quorum is not available it's as if the whole cloud is down, then
       | it's not available to any clients). That is a very reasonable
       | assertion, IMO.
        
         | mrkeen wrote:
         | > TFA says you can have consistency, availability, and
         | partition tolerance because network partitions between servers
         | typically don't extend to clients, and you can have enough
         | servers to maintain quorum for all clients
         | 
         | It's not "partition tolerance" if you declare that partitions
         | typically don't happen.
        
       | skywhopper wrote:
       | Weird article. Different users have different priorities and
       | that's what the CAP theorem expresses. The article also pretends
       | that there's a magic "load balancer" in the cloud that always
       | works and also knows which segment of a partitioned network is
       | the "correct" one (one of the points of CAP is that there's not
       | necessarily a "correct" side), and that no users will ever be on
       | the "wrong" side of the partition. And not only that but all
       | replicas see the exact same network partition. None of this is
       | reality.
       | 
       | But the gist, I guess, is that for most applications it's not
       | actually that important, and that's probably true. But when it is
       | important, "the cloud" is not going to save you.
        
       | rdtsc wrote:
       | > The CAP Theorem is Irrelevant
       | 
       | Just sprinkle the magic "cloud" powder on your system and ignore
       | all the theory.
       | 
       | https://ferd.ca/beating-the-cap-theorem-checklist.html
       | 
       | Let's see, let's pick some checkboxes.
       | 
       | (x) you pushed the actual problem to another layer of the system
       | 
       | (x) you're actually building an AP system
        
         | hot_gril wrote:
         | You're building a ??? system. Either CP or AP depending on what
         | random values got put into some config.
        
       | hot_gril wrote:
       | Every time someone tries to deprecate the nice and simple CAP
       | theorem, it grows stronger. It's an unstoppable freight train at
       | this point, like the concept of relational DBs after the NoSQL
       | fad.
        
       | kwillets wrote:
       | This seems like what I've noticed on MPP systems (a little before
       | cloud): data replicas give a lot more availability than the
       | number of partition events would suggest.
       | 
       | I likely need to read the paper linked, but it's common to have
       | an MPP database lose a node but maintain data availability. CAP
       | applies at various levels, but the notion of availability
       | differs:
       | 
       | 1. all nodes available 2. all data available
       | 
       | Redundancy can make #2 a lot more common than #1.
        
       | hot_gril wrote:
       | The only concrete solution the article proposes that I can think
       | of: Spanner uses quorum to maintain availability and consistency.
       | Your "master" is TrueTime, which is considered reliable enough.
       | You have replicated app backends. If this isn't too generous,
       | let's also say the cloud handles load balancing well enough. CAP
       | isn't violated, but you might say the user no longer worries
       | about it.
       | 
       | Most databases don't work like Spanner, and Spanner has its
       | downsides, two of them being cost and performance. So most of the
       | time, you're using a traditional DB with maybe a RW replica,
       | which will sacrifice significant consistency or availability
       | depending on whether you choose sync or async mode. And you're
       | back to worrying about CAP.
        
         | hot_gril wrote:
         | Ok, you could also have multiple synchronous Postgres standbies
         | in a quorum setup so that it's tolerable to lose one.
         | Supposedly Amazon Aurora does that. Comes with costs and
         | downsides too.
        
       | PaulHoule wrote:
       | So glad to see that the CAP "theorem" is being recognized as a
       | harmful selfish meme like Fielding's REST paper with a deadly
       | seductive power against the overly pedantic.
        
       | thayne wrote:
       | This only addresses one kind of partition.
       | 
       | What if your servers can't talk to each other, but clients can?
       | 
       | What if clients can't connect to any of your servers?
       | 
       | What if there are multiple partitons, and none of them have a
       | quorum?
       | 
       | Also, changing the routing isn't instantaneous, so you will have
       | some period of unavailability between when the partition happens,
       | and when the client is redirected to the partition with the
       | quorum.
        
       | rubiquity wrote:
       | The point trying to be made is that with nimble infrastructure
       | the A in CAP can be designed around to such a small amount you
       | may as well be a CP system unless you have a really good reason
       | to go after that 0.005% of availability. Not being CP means
       | sacrificing the wonderful benefits that being consistent
       | (linearizability, sequential consistency, strict serializibility)
       | make possible. It's hard to disagree with that sentiment, and is
       | likely why the Local First ideology is centered on data ownership
       | rather than that extra 0.0005 ounces of availability. Once
       | availability is no longer the center of attention the design
       | space can be focused on durability or latency: how many copies to
       | read/write before acking.
       | 
       | Unfortunately the point is lost because of the usage of the word
       | "cloud", a somewhat contrived example of solving problems by
       | reconfiguring load balancers (in the real world certain outages
       | might not let you reconfigure!), and missing empathy that you
       | can't tell people not to care about how the semantics that
       | thinking about, or not thinking about, availability imposes on
       | the correctness of their applications.
       | 
       | As for the usage of the word cloud: I don't know when a set of
       | machines becomes a cloud. Is it the APIs for management? Or when
       | you have two or more implementations of consensus running on the
       | set of machines?
        
       | mcbrit wrote:
       | All models are wrong, some are useful. CAP is probably at least
       | as useful as Newtonian mechanics WHEN you are explaining why you
       | just did a bunch of... extra stuff.
       | 
       | I would like to violate CAP, please. I would like to be nearish
       | to c, please.
       | 
       | Here is my passport. I have done the work.
        
       | remram wrote:
       | This article assumes P(artitions) don't happen, and then
       | concludes you can have both C and A. Congrats, that's the CAP
       | theorem.
        
       ___________________________________________________________________
       (page generated 2024-07-25 23:09 UTC)