[HN Gopher] Data Consistency Is Overrated
       ___________________________________________________________________
        
       Data Consistency Is Overrated
        
       Author : bo0tzz
       Score  : 39 points
       Date   : 2023-02-18 15:22 UTC (7 hours ago)
        
 (HTM) web link (two-wrongs.com)
 (TXT) w3m dump (two-wrongs.com)
        
       | ddulaney wrote:
       | I think there are two different kinds of consistency, and it's
       | important to not conflate them.
       | 
       | There's consistency that's _internal_ to a system. Do all of the
       | foreign keys line up correctly? Have I lost any data that was
       | provided to me? Here, we can aspire to be 100% correct. I don 't
       | think the examples in this article conflict with that.
       | 
       | Then there's consistency that's _external_ to a system. This can
       | be between this system and other systems, or between this system
       | and reality. Did the operator enter the correct data for this
       | item? Did the external system change and fail to tell me? Here
       | there is no way to be 100% correct using only the tools that are
       | inside the system. You need external audits, reality checks,
       | periodic reconciliation.
       | 
       | Critically, all of the author's examples are about external
       | consistency (accounting matching reality; inter-system
       | communication), but their conclusion seems to be that because
       | external consistency can't be fully achieved, we should be OK to
       | abandon internal consistency within single systems. I think
       | that's too strong a conclusion.
        
         | josephg wrote:
         | Hm, I'd cut the cake a little differently. I think there's two
         | kinds of consistency we can strive for:
         | 
         | - _Strict_ consistency. Every pointer in my b-tree _must_ point
         | to a b-tree node, and not random data which could cause the
         | program to crash. In a financial world, a bank should never
         | print money.
         | 
         | - _Fuzzy_ "good enough" consistency. In the examples in the
         | article, all of the financial transactions should end up close
         | enough to being reconciled.
         | 
         | There's value in both kinds of consistency. When talking about
         | data entered by humans into a database, there's always going to
         | be a bit of slop involved. Someone mistyped a digit. A few
         | records weren't entered at all.
         | 
         | But when building software systems, designing with strict
         | consistency guarantees is such an unbelievably massive win. I
         | can't overstate how important it is. The entire ladder of
         | abstraction in modern computers from transistors all the way up
         | to this website is only possible because each layer
         | "underneath" the layer we're standing on is solid and
         | deterministic. If CPUs made even 1 error in every billion
         | operations, our computers wouldn't boot at all.
         | 
         | Tony Hoare talks about his invention of null pointers as his
         | "billion dollar mistake". Null pointers take something that
         | should be strictly consistent (references) and make it fuzzy.
         | Rust is exciting lots of people in the systems programming
         | space because it takes things that are fuzzy in C (aliasing,
         | memory management, thread-safe variables, etc) and makes them
         | strictly consistent. All the value in unit testing comes from
         | how they make our systems more strictly correct.
         | 
         | Every time I've relaxed consistency guarantees internally in
         | systems I've worked on (or heard coworkers doing the same)
         | we've come to regret it. At a startup several years ago, we
         | needed to build an external search index for a database. The
         | database updated live - and we had a change feed that updated
         | the browsers live as records changed. The engineer in charge
         | did a "good enough" job - he wrote a scrappy script that was
         | only mostly correct. But it sometimes left the index
         | inconsistent with the data. We got constant reports from our
         | users about items not showing up in the search results. He
         | would dutifully go back and fiddle with things to try and fix
         | the problem. Eventually one of our senior engineers went in and
         | rewrote the whole indexing script to be strictly correct. We
         | never heard a peep about it after that - it just worked, every
         | time. Even putting aside the frustration of our users, writing
         | it correctly was a big win for us in terms of maintenance. Once
         | it was correct, we didn't need to keep pulling engineering time
         | away to fix problems.
         | 
         | Maybe data consistency is overrated in databases. But I think
         | if anything, consistency internally in computing systems is
         | underrated. We take for granted how well computers work. But
         | our capacity to make computers _do anything_ depends entirely
         | on those consistency guarantees. It seems ridiculous to
         | disregard its importance.
        
         | taeric wrote:
         | I think I agree with you. That said, internal and external are
         | a touch inadequate.
         | 
         | Specifically, for a large enough system, internal consistency
         | will look more like external from a smaller system's
         | perspective.
         | 
         | To that end, it is all about costs. If the cost of keeping
         | consistent is not above the budget, do so.
        
           | gfody wrote:
           | including unknowable cost/opportunity risk
        
         | stevesimmons wrote:
         | There's a great book "Data and Reality" that discusses these
         | subtle but very crucial differences. Discussed here in HN a
         | year ago:
         | 
         | https://news.ycombinator.com/item?id=30251747
        
         | GauntletWizard wrote:
         | There's a ton of places where foreign keys are used wrong -
         | deletion is acceptable, and good error handling for that case
         | is the right thing to build anyway.
         | 
         | That's one of the key arguments that nonconsistency advocates
         | are arguing for. If you build a music playlist, the behavior of
         | the program if the mp3 files referenced within should be to
         | skip that track, not to crash. I've heard too many people
         | arguing that they should be allowed to emit nasal demons if
         | there's a data error, and they're wrong even in many well
         | structured and tightly integrated datasets, but especially
         | wrong in "web-scale" datasets
        
           | bcrosby95 wrote:
           | Deletion is acceptable, and if you have everything in a
           | consistent system they will both exist or not.
           | 
           | "Nonconsistency advocacy" doesn't make a lot of sense to me
           | here. Are you advocating not relying on consistency in
           | systems that guarantee consistency? That is a waste of time.
           | 
           | Are you advocating eschewing consistent systems? Well, then
           | you have more work, so only if I need to. And yes, you should
           | handle data errors here because inconsistency is consistent
           | with the system you've chosen.
        
       | dgb23 wrote:
       | I needed to read this, but I'm not sure what to make of it yet.
       | 
       | The reason is I've been struggling with the idea that (bi-)
       | temporal, consistent data is great, as it provides a ton of
       | leverage both for users and for auditing and debugging. For some
       | problems it's the cleanest, general solution.
       | 
       | What irks me that the code that validates, consumes, transforms
       | and displays the data lives in its own time model (git).
       | 
       | Philosophically you're not really looking into the past. You're
       | looking at an echo of the past that's displayed in a current
       | form.
       | 
       | And more practically it's simply tough to draw the line between
       | assuring that historical data is still handled and rendered
       | correctly and getting rid of overhead and complexity of a growing
       | and evolving application.
       | 
       | The article talks about eventual consistency, but I really don't
       | have these kinds of problems and when I do, I fully agree with
       | the author as long as it's very clear to the user whether
       | something is consistent and when it will be.
        
         | refset wrote:
         | > practically it's simply tough to draw the line between
         | assuring that historical data is still handled and rendered
         | correctly and getting rid of overhead and complexity of a
         | growing and evolving application
         | 
         | Agreed, the issue of maintaining accurate data (bitemporal or
         | otherwise) in the context of evolving schema and code feels
         | like a real puzzle to solve with commonplace tools like git and
         | SQL.
        
       | gorbachev wrote:
       | I once worked on a platform producing analytics using data that,
       | at its source, was manually typed in by people.
       | 
       | My product managers would insist we do distinct counts on the
       | aggregates instead of using probabilistic algorithms, because we
       | "needed" the absolute 100% accurate output. No matter how many
       | times I would explain the data was never 100% accurate to begin
       | with and that the error rate using HyperLogLog wouldn't make an
       | ounce of difference, we never were allowed to do that. As a
       | consequence the performance of the system when doing distinct
       | counts on interactive queries was about 100x worse. This didn't
       | seem to bother anyone. I never understood why.
        
         | crazygringo wrote:
         | 100x is not that big of a difference, and I've been burned by
         | "clever" algorithms before because they weren't coded _exactly_
         | right. Because if they drift into a broken state, nobody can
         | tell. While a simple sum is unlikely to fail in subtle ways.
         | 
         | There are benefits to simplicity. If there isn't a reason to
         | need 100x faster performance, then why complicate things? That
         | would be my why.
        
           | dgb23 wrote:
           | That's a great point.
           | 
           | But I want to add that anecdotally users do care about
           | performance and they will often thank you for anything that
           | is noticeable. Sometimes just because it feels nice, but it
           | can also enable a fast feedback loop, which increases
           | immersion and productivity.
           | 
           | In a broader sense, we use computers not only because they
           | can perform work autonomously but also because they are fast,
           | correct and remember details almost perfectly.
        
         | dgb23 wrote:
         | I can imagine some reasons:
         | 
         | - They don't care about performance and think their users don't
         | care about performance.
         | 
         | - They have sold something that doesn't make sense but sounds
         | good.
         | 
         | - They are afraid of their users.
         | 
         | - Their users don't understand the problem you're describing or
         | simply don't believe it.
         | 
         | - The business domain of their users actually knows the about
         | the issue, but they still want or need the exact counts.
        
       | readthenotes1 wrote:
       | Wouldn't it be funny if the post changed every few hours?
        
       | LAC-Tech wrote:
       | "Network partitioning can completely destroy mutual consistency
       | in the worst case, and this fact has led to a certain amount of
       | restrictiveness, vagueness, and even nervousness in past
       | discussions, of how it may be handled. In some environments it is
       | desirable or necessary to permit users to continue modifying
       | resources such as files when the network is partitioned. A
       | network operating system would be a good example. In such
       | environments mutual inconsistency becomes a fact of life which
       | must be dealt with."[0]
       | 
       | TL;DR - you either store your data in one place, or you store it
       | in multiple places but disallow writes when there's a network
       | split and wait for consensus on all nodes... or you have
       | inconsistency and you have to deal with it.
       | 
       | [0] _Detection of Mutual Inconsistency in Distributed Systems_ ,
       | 1983. Great paper, fairly readable, and still relevant.
        
       | legulere wrote:
       | Data consistency comes at a cost, but eschewing it does as well.
       | Most of the time the performance impact of data consistency does
       | not matter, potentially introducing heisenbugs in your system can
       | come at a huge cost though.
        
         | layer8 wrote:
         | And also, eventual consistency means _eventual_ consistency,
         | not abandoning consistency.
        
       | StreamBright wrote:
       | Data consistency is overrated if your business is ok with that.
       | Many businessed are not ok with that. Example: airline, booking
       | process.
        
         | fbdab103 wrote:
         | Airlines are notorious for over-booking available seats and
         | dealing with the fallout.
        
           | StreamBright wrote:
           | Which is done purposefully and not by data in-consistency at
           | all.
        
             | macintux wrote:
             | I would wager, in complete ignorance of the real
             | implementation, that the knowledge that they _can_ overbook
             | means that they can relax some of their requirements. If
             | two servers can't talk to each other to coordinate for a
             | while, they could still each sell tickets.
        
         | taeric wrote:
         | Bad examples. Airlines are notorious for having incoherent
         | booking policies. Over selling flights and such.
        
           | crazygringo wrote:
           | Overbooking is intentional policy. It has nothing to do with
           | data consistency.
           | 
           | To the contrary, consistency is extremely important so that
           | they overbook by exactly the right amount, to compensate for
           | the statistically expected no-shows.
        
             | taeric wrote:
             | It is a form of coherence for the system, though. And in
             | the spirit of the same ideas. That is, it is an intentional
             | policy for databases, too.
             | 
             | And there is no "exactly right amount" that makes it work.
             | They keep options for forcing people off flights if they
             | planned it wrong.
             | 
             | Is also why they don't let gate agents over sell a flight.
             | They keep a stronger consistency on that, for this exact
             | reason. Over selling would fit in what someone else called
             | external consistency.
        
       | bjornsing wrote:
       | Durability is also overrated.
       | 
       | Jokes aside, I've been responsible for a system that processed ~1
       | billion monetary transactions per day. Even with a fanatical
       | focus on consistency and correctness it was still always off.
       | With the philosophy promoted on the OP the chaos would have been
       | complete... is my gut feeling at least.
        
         | mousetree wrote:
         | Out of interest, what system processed 1 billion monetary
         | transactions a day?
        
           | bjornsing wrote:
           | Bookkeeping / billing / analytics system for a high volume
           | online API.
        
           | playingalong wrote:
           | High-frequency something would be my guess;)
        
       | aljungberg wrote:
       | Within a closed system, consistency (and verifying it) is a fail
       | fast mechanism. For example, it's better to crash on a constraint
       | failure when attaching a doodad to a non-existent user account
       | than to figure out where all these orphan doodads came from next
       | year.
        
       ___________________________________________________________________
       (page generated 2023-02-18 23:00 UTC)