[HN Gopher] Immutability Changes Everything (2016)
       ___________________________________________________________________
        
       Immutability Changes Everything (2016)
        
       Author : Garbage
       Score  : 147 points
       Date   : 2021-06-26 08:55 UTC (14 hours ago)
        
 (HTM) web link (queue.acm.org)
 (TXT) w3m dump (queue.acm.org)
        
       | mightybyte wrote:
       | This idea can be extended by observing the following analogy:
       | 
       | purity is to functions
       | 
       | as
       | 
       | immutability is to data
       | 
       | Immutability is definitely a valuable technique that can help
       | make software faster (sometimes), more reliable, and easier to
       | maintain. I would take it one step further and argue that we can
       | get a similar boost by extending the idea to functions, not just
       | data. And the way you do that is by writing pure functions (i.e.
       | no side effects...or controlling/limiting them in some way) and
       | using languages that can enforce that your functions are pure.
        
         | tsimionescu wrote:
         | Unfortunately, this just doesn't pan out in practice. There are
         | various techniques that are much more inefficient to express
         | with immutability and pure functions (e.g. computing a
         | histogram). There is also a pretty big body of evidence already
         | that doesn't show any hugely significant (say, 2x) changes in
         | productivity, correctness or speed for using pure, strict
         | functional languages and styles vs more traditional imperative,
         | impure or even dynamic languages.
        
           | benrbray wrote:
           | For completeness, would you mind posting a link to some of
           | this evidence so we don't have to take your word for it?
        
             | username90 wrote:
             | Sorting is an example, quicksort is fast since it is done
             | in place and mutates the data instead of allocating new
             | memory.
             | 
             | Very few algorithms are not faster with mutable variables.
        
               | benrbray wrote:
               | As I understand it, immutability usually refers to
               | mutable data structures hidden behind an interface that
               | allows the programmer to reason about things as if they
               | were immutable. The naive implementation is of course to
               | duplicate all data whenever a change is made, but pure /
               | immutable languages use smarter things like persistent
               | data structures behind the scenes.
               | 
               | If the compiler knows the original data is never needed,
               | it can optimize away the old copies, too. At least that's
               | the hope, I'm not sure how true it is.
        
             | tsimionescu wrote:
             | First of all, I am arguing for the base case: absent any
             | studies, the null hypothesis is naturally that there are no
             | major differences between programming paradigms.
             | 
             | On the issue of bug counts, there was one huge study, that
             | was successfully reproduced [0], which finds that there is
             | a real but small negative correlation between FP and bug
             | counts in large GitHub projects.
             | 
             | For productivity I haven't been able to find any large-
             | scale studies that compare functional vs imperative/OOP
             | languages. I did find one small scale study [1] that found
             | significantly better productivity for Haskell (and a Lisp
             | dialect) vs Ada, C++ and a few others, but it was about a
             | specific small prototype problem with just a handful of
             | programmers participating (the Haskell program had 85
             | lines, the worse, C++, had 1105; and programs took between
             | 10h and 54h to finish - hardly representative of large-
             | scale coding). Most other studies on productivity that I
             | could find [2], [3] didn't include any FP languages at all.
             | 
             | [0] https://arxiv.org/pdf/1901.10220.pdf
             | 
             | [1] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1
             | .1.36...
             | 
             | [2] https://www.e-informatyka.pl/attach/e-Informatica_-
             | _Volume_1...
             | 
             | [3] https://www.researchgate.net/publication/4261726_Do_Pro
             | gramm...
        
             | nonameiguess wrote:
             | You can look at the R data.table package. R by default is a
             | purely functional language, and the built-in data.frame
             | type for in-memory tabular data is immutable. That means
             | computing anything that changes the data requires copying
             | it all. The data.table packages achieved pretty impressive
             | speed-ups by dropping down into C and doing everything in-
             | place. Also makes it harder to run out of memory.
             | 
             | Granted, some effects like this can be mitigated with
             | sufficiently clever language design. Haskell with lazy
             | evaluation. I believe some compilers for functional
             | languages will reuse memory addresses they can prove
             | nothing else refers to, so in reality they are mutating in-
             | place even if semantically the variables are "immutable."
             | Heck, even Python does that with the built-in immutable
             | types. When you assign the changed data to a new name, it
             | doesn't actually allocate new memory and just does it in-
             | place under the hood. In cases like that, the data isn't
             | really "immutable." It just can't be mutated by the
             | programmer, only by the runtime or compiler.
             | 
             | Of course, if we want to get sufficiently pedantic, no data
             | is ever immutable. The law may say a ledger is append-only,
             | but nothing is stopping an accountant from just lying. A
             | compiler may prevent you from assigning new data to an
             | already-existing name binding, but a compiler can't stop
             | someone from using rowhammer or an x-ray machine or
             | something and flipping the bits in memory anyway.
        
         | sitkack wrote:
         | Yes, and computation and data both become interchangeable
         | facts. And by becoming facts, they lessen the tech debt, and
         | start to fade into the background. I think the only way to have
         | tractable large systems is to have pervasive immutability and
         | purity.
        
         | jhgb wrote:
         | > purity is to functions as immutability is to data
         | 
         | I thought all data was just a bunch of nullary functions. ;)
        
       | [deleted]
        
       | samuell wrote:
       | On my reading list for the summer: "The Art of Immutable
       | Architecture: Theory and Practice of Data Management in
       | Distributed Systems" [1]
       | 
       | [1] https://www.goodreads.com/book/show/52653567-the-art-of-
       | immu...
        
         | throwaway81523 wrote:
         | I'll try to look at that. At a more local level you might like
         | "Purely Functional Data Structures" by Chris Okasaki.
         | Functional programming revolves around those.
        
           | BazookaMusic wrote:
           | +1 on "Purely Functional Data Structures", an excellent book
           | to learn a different programming paradigm
        
       | refset wrote:
       | Pat also wrote a pretty great follow-up article that expands on
       | the "Data on the Outside vs. Data on the Inside" section,
       | covering the intersection of immutability, temporality, schema
       | and relational querying:
       | https://queue.acm.org/detail.cfm?id=341501
        
         | news_hacker wrote:
         | Your link wasn't working for me, here's a different one:
         | https://queue.acm.org/detail.cfm?id=3415014
        
           | refset wrote:
           | I somehow messed up the copy+paste and lost the last
           | character, sorry :(
           | 
           | Thank you for your persistence though and for chiming in with
           | the right answer!
        
       | jbjbjbjb wrote:
       | I feel like the programming tools haven't kept up and so to do
       | this we also need immutable requirements and impeccable
       | programmers. Currently if I change the code in my application it
       | gives different results. I'm not sure if there is a word for it,
       | but I'm thinking of pure applications like we have pure
       | functions. I don't think the right tools exist for this.
       | 
       | Also the "accountants don't use erasers" feels really glib to me,
       | it's hard work.
        
         | kaba0 wrote:
         | As others in the thread mentioned, the Unison language is
         | exactly what you are getting at.
        
         | dgb23 wrote:
         | We have Git but a commit hash has no notion of whether the
         | consumers have to care about a change.
         | 
         | The artifact we produce, typically immutable in a package
         | repository, then has "semantic versioning" (possibly) to convey
         | whether they should care.
        
           | jbjbjbjb wrote:
           | Yeah I my argument is that those tools are ok but a bit
           | inadequate. Certainly not first class solutions. It's just a
           | thought, I'm not sure it would open up a world of
           | possibilities but there could be more rigorous ways on
           | showing how the logic has changed over time and keeping that
           | alongside the current state of the logic.
        
             | dgb23 wrote:
             | You might find Unison[0] interesting. Or the more recent
             | developments (or intents) of Clojure spec[1] and the talk
             | Speculation[2].
             | 
             | [0] https://www.unisonweb.org/
             | 
             | [1] https://github.com/clojure/spec-alpha2
             | 
             | [2] https://www.youtube.com/watch?v=oyLBGkS5ICk
        
       | belter wrote:
       | Immutability changes everything but does not Solves Everything.
       | You are just changing to a different scenario that might help.
       | 
       | "Tesler's Law" [1] :-) makes sure you are looking from a
       | different perspective. Complexity is still present.
       | 
       | "Immutability is not enough"
       | 
       | https://codewords.recurse.com/issues/six/immutability-is-not...
       | 
       | [1] - "Tesler's Law"
       | https://en.wikipedia.org/wiki/Law_of_conservation_of_complex...
        
         | cuddlecake wrote:
         | The article "Immutability is not enough" is sub-par in every
         | aspect.
         | 
         | The author never defines, what sort of problems immutable data
         | can help avoid, or what the benefits of immutable data are. And
         | still, after every paragraph, he wonders why functional
         | programming didn't help him avoid bugs in his business logic.
         | 
         | Like, what the hell, the author produced a bug because the
         | order of your operations was wrong? Great, write a test and
         | restructure your code to make it work, at least you know where
         | to look.
         | 
         | Functional Programming does not fix your Game Loop, and it
         | never promised to.
        
           | moldavi wrote:
           | The article rang true for me. I heard the same FP promises
           | that the author did, and eventually ran into the same issues
           | in my FP programs, and left disillusioned. The article
           | captured the experience perfectly.
        
           | crdrost wrote:
           | FWIW I came to the comments because I also found the
           | "Immutability changes everything" paper nearly unreadable.
           | 
           | I just wanted to comment that I love your last paragraph from
           | a conceptual level. You want immutability? Great, let's talk
           | about what performant game design looks like in that context.
           | Games are special, in that they kind of do everything that
           | computers are good at, to varying degrees. The author's ideas
           | about snapshots of relational databases inviting further
           | schema changes in the pipeline--I guess I see the idea you're
           | trying to introduce, but what does it actually look like in a
           | chess program? Or a choose your own adventure game?
        
         | jt2190 wrote:
         | Thanks for the link to "Immutability is not enough" I've
         | resubmitted the article here:
         | https://news.ycombinator.com/item?id=27642263
        
         | kristiandupont wrote:
         | I like to say that mutable state is to software as moving parts
         | are to hardware. If you can avoid them, great! But a number of
         | systems/designs simply require them.
         | 
         | https://kristiandupont.medium.com/mutable-state-is-to-softwa...
        
       | bob1029 wrote:
       | Immutability allows for incredible performance when you are
       | working in the guts of database engines. Having assurances that a
       | specific data log file offset cannot possibly be modified allows
       | for substantial optimization throughout.
       | 
       | For users who do not need a serialized view of events, access to
       | "near-real-time" data can be made to scale as far as needed, even
       | if a single machine is driving all the writes. You could
       | theoretically have a billion readers working off a single
       | synchronous core system.
       | 
       | By utilizing careful abstractions, that single core system can
       | write something on the order of 10-100 million transactions per
       | second. We do it every day on the various financial exchanges
       | around the world. In many cases, a _single thread_ is what is
       | holding up the entire exchange. Arrays of structs, hot cache
       | lines, correct branch prediction and full pipelines are a hell of
       | a combo.
       | 
       | In some practical businesses apps, it is also feasible to
       | coalesce logical writes before they need to enter a fully-
       | synchronous context, so you can get some step-down ratio like
       | 100:1 on transactions that actually need to go across the network
       | and be handled by the core system.
        
         | [deleted]
        
         | m12k wrote:
         | I'm increasingly coming to realize that in programming,
         | flexibility is something you pay for in performance, whether
         | you use it or not. Data being mutable even if you never mutate
         | it. Automatic bounds checks for data that you've already made
         | sure can't go out of bounds. vtable lookups even when you're
         | not using inheritance, because who knows, someone else might.
         | Compilers needing to do extra work in case two pointers are
         | aliases, even though that never makes sense for them to be from
         | the programmer's perspective. Objects allocated one at a time,
         | scattered around memory, even if you don't need them to be, and
         | wouldn't mind their innards being kept together in cache
         | aligned arrays.
         | 
         | They say you can solve any problem in computer science with
         | another level of indirection - except performance, it seems,
         | because we're paying the cost of all this indirection all the
         | time, even if we don't need it most of the time. There's a
         | crazy amount of unnecessary "housekeeping" done by compilers
         | and runtimes to make sure everything always works all the time,
         | which requires them to always do extra work to protect against
         | pathological cases that almost never arise.
         | 
         | Modern compilers and runtimes are getting better and better at
         | recognizing the common "happy case" and optimizing for that,
         | and that's good. But it still feels backwards - both that we
         | can't explicitly opt out of unneeded flexibility, so instead
         | the compiler needs to deduce that we aren't using it. But also
         | that a lot of this isn't opt-in to begin with, rather than opt
         | out - i.e. make things immutable by default. Or let the
         | programmer tell the compiler a little bit more about what they
         | need and don't need the code to do, so the compiler can do a
         | much better job if it.
        
           | Svetlitski wrote:
           | Definitely agree. Another example of lost performance that
           | comes to mind is (ignorance of) C struct padding. In terms of
           | modern tools, Rust has definitely made some headway here:
           | 
           | * Variables are immutable by default
           | 
           | * The layout of a struct's fields in memory is opaque to the
           | programmer, allowing the compiler to reorder them to minimize
           | padding
           | 
           | * As a natural consequence of the borrow checker, Rust has
           | _much_ more information about whether or not two references
           | could alias than most other languages
           | 
           | * The programmer is very much made to feel the cost of
           | dynamic-dispatch (via dyn), and static-dispatch is the norm
           | 
           | * The compiler can omit bounds checks when it can prove
           | they're unnecessary
           | 
           | * It's easy to safely have structs that live entirely on the
           | stack (i.e. no heap allocation)
           | 
           | Overall this is an unsolved problem though. It seems to me
           | that the trend is that the more details about the internals
           | that the language exposes to the programmer, the fewer
           | optimizations can be made automatically. Essentially the
           | ideal is _declarative_ performance, where the programmer
           | expresses the idea and the compiler figures out the optimal
           | implementation (which is in stark contrast to most high-
           | performance code being hand-tuned today). Take a look at
           | something like Halide for a successful application of this
           | philosophy: https://halide-lang.org/
        
           | bob1029 wrote:
           | > Or let the programmer tell the compiler a little bit more
           | about what they need and don't need the code to do, so the
           | compiler can do a much better job if it.
           | 
           | You can do this today in languages like C#:
           | 
           | https://docs.microsoft.com/en-us/dotnet/csharp/write-safe-
           | ef...
           | 
           | > Adding the readonly modifier to members that don't mutate
           | state provides two related benefits. First, the compiler
           | enforces your intent. That member can't mutate the struct's
           | state. Second, the compiler won't create defensive copies of
           | in parameters when accessing a readonly member. The compiler
           | can make this optimization safely because it guarantees that
           | the struct is not modified by a readonly member.
        
             | XorNot wrote:
             | This feels like some seriously missed static analysis
             | optimization though - why do I need to specify this?
             | There's vanishingly few cases the answer to "looks like you
             | don't modify this ever, make it readonly?" where the answer
             | is not yes.
        
               | bob1029 wrote:
               | Probably because reflection & compilation+activation of
               | types existing outside the original compilation context
               | is possible at runtime.
               | 
               | I much prefer to having a language/runtime that can
               | handle extremely complex problem scenarios out of the box
               | (with the need for occasional tuning), rather than the
               | other way around. It is not very often that I find myself
               | screwing around with performance-intensive code. Usually,
               | a few very important things are addressed at the lowest
               | levels and then you put all your nice abstractions back
               | on top for daily driving.
        
           | jiggawatts wrote:
           | > flexibility is something you pay for in performance
           | 
           | I lightbulb moment I had recently was that the classic
           | interface-based OO programming paradigm utilising virtual
           | functions was essentially the following declaration by the
           | programmer to the compiler: The function that is invoked can
           | change at _any_ time, even call-to-call; assume nothing. it
           | is a black box with unknown, unknowable contents. Do not
           | inline it, do not inspect it, expect the worst.
           | 
           | If you look at a typical web framework, with call stacks
           | hundreds of methods deep, you'll come to the same horrifying
           | realisation that these are all inherently un-optimisable
           | because of that statement. The compiler can do nearly
           | nothing, and the processor will also struggle to paper over
           | the inefficiency. Meanwhile, in reality, the specific
           | implementations of those interfaces _never_ change during
           | actual production execution! Dynamic dispatch is utilised
           | only during development time, and even then only at _compile
           | time_ to inject test harnesses, mocks, or whatever.
           | 
           | At runtime, 99.999999% of all function calls are essentially
           | static, and dynamic dispatch is a pure waste.
        
             | dzaima wrote:
             | Except that a smart runtime can actually work with the
             | assumption that only one implementation of a class will be
             | used, and inline what would otherwise be virtual methods.
             | And, as far as I know, the JVM definitely does this, and at
             | multiple levels too - if there's only one implementation of
             | an interface, it can just assume that everything's static
             | and inlineable (at the cost of deoptimizations if that
             | turns out to be false later). If not, it can still decide
             | to inline, and just check the type beforehand.
        
             | MaxBarraclough wrote:
             | > The compiler can do nearly nothing,
             | 
             | Optimizing JIT-compiling JVMs are often able to optimize
             | virtual calls that only ever resolve to one implementation.
             | They may even inline such calls. ( _edit_ dzaima got there
             | first.)
        
               | duped wrote:
               | It's old too, Self was doing this in the late 80s
        
               | scns wrote:
               | It was written by the same people:
               | 
               | The Java HotSpot Performance Engine was released on April
               | 27, 1999, built on technologies from an implementation of
               | the programming language Smalltalk named Strongtalk,
               | originally developed by Longview Technologies, which
               | traded as Animorphic. The Longview virtual machine was
               | based on the Self virtual machine, with an interpreter
               | replacing the fast-and-dumb first compiler. When Sun
               | cancelled the Self project, two key people, Urs Holzle
               | and Lars Bak left Sun to start Longview. In 1997, Sun
               | Microsystems purchased Animorphic.
               | 
               | Source: wikipedia
        
             | andi999 wrote:
             | Yes and speaking of c++ the v table pointer might destroy
             | your cache line.
        
               | kaba0 wrote:
               | Just as a note, one may get horrified at the cost of all
               | these indirections, C code is full of function pointers
               | passed as callbacks. Hell, actually there is no much
               | difference between a vtable and typical C code doing the
               | same thing. Chances are, C++ compiler writers made it
               | more efficient. Also, c++ doesn't default to virtual so
               | you rarely pay the price.
        
             | strictfp wrote:
             | Yup,and this is excerbarated by the fact that c++ doesn't
             | really offer any convenient equivalent of virtual methods
             | with static dispatch.
             | 
             | std::variant is almost it, but it's interface is quite
             | horrendous as of now.
             | 
             | Another interesting thing is that c++ also intermingles
             | dynamic allocation with polymorphism, which can lead to a
             | lot of unnecessary pointer chasing
        
             | jasonwatkinspdx wrote:
             | The only reason vtables et all work as well as they do is
             | modern branch predictors are incredible.
        
               | dzaima wrote:
               | Well, the alternative to vtables is manually branching
               | (besides completely rewriting the code to avoid different
               | actions), which isn't that much better and still requires
               | branch prediction. And I think JVMs also might convert a
               | dispatch from two classes to a conditional branch, but
               | I'm not sure.
        
               | nayuki wrote:
               | Yes: https://shipilev.net/blog/2015/black-magic-method-
               | dispatch/#...
        
           | specialist wrote:
           | > _...with another level of indirection_
           | 
           | Ya. I've spent the last two decades removing levels of
           | indirection, reducing call stack depth. This has put me at
           | odds with most everyone. I've been insulted many times for
           | not understanding "software architecture". Heh.
           | 
           | I started a design pattern study group shortly after GoF's
           | book was published. I think it's a lot like jazz. 1st time
           | thru, mind blown. 2nd time, ah, I think I'm starting to get
           | it. 3rd time, well, duh.
           | 
           | Eventually students get full circle, a la "there is no
           | spoon", and you reject the veneer of "design patterns". Said
           | another way, it's just another tool, learn to use it wisely.
           | Learn to reject temptation.
           | 
           | Aspects, annotations, runtime code reflection,
           | metaprogramming, frameworks, schema mappings, factories,
           | singletons, and whatnot are just monkey work. (Most of the
           | time.) Exquisite puzzles (finger traps) to suck smart people
           | people in. To distract from the real work of solving
           | customer's problems, business needs.
           | 
           | I think the kids these days call it "yak shaving." Yes, I am
           | _very guilty_ of this. Past, present, and surely future. I 'm
           | no better than anyone else. I recognize my failings and am
           | trying to improve.
        
           | mtVessel wrote:
           | As much as people like to talk about "footguns", that's the
           | value proposition of C++, zero cost for anything you don't
           | use.
           | 
           | Managed/interpreted langs have a different proposition: pay
           | up front for all the features, but your complexity never
           | increases whether you use none, one or all of them.
        
             | MauranKilom wrote:
             | Well, pointer aliasing for example is a very tricky thing
             | in C++ too. Even with strict aliasing, it is very
             | unintuitive from the outside when you're going to trip over
             | unexpected performance hurdles. (Does this data structure
             | have a char* inside? Who knows, but if it does, there goes
             | vectorization!).
        
               | username90 wrote:
               | If you don't dereference the char* it will get treated an
               | an integer and be vectorizable. In C++ you don't pay for
               | what you don't use. If you need the char* in the loop I'm
               | not sure why you'd be thinking about vectorization. Maybe
               | you meant parallelism?
        
           | de_keyboard wrote:
           | Great talk: Constraints Liberate, Liberties Constrain --
           | Runar Bjarnason
           | 
           | https://www.youtube.com/watch?v=GqmsQeSzMdw
        
             | adverbly wrote:
             | Wow thanks for linking that. Great talk.
             | 
             | I'm always on the lookout for talks about interesting and
             | generally useful concepts, but where I can follow only a
             | small part of them. Makes me feel like I can learn and grow
             | into a wiser person. This certainly fits that criteria.
        
               | whoisburbansky wrote:
               | This talk fits a similar bucket in my head; do you keep a
               | playlist of other such talks you've run across in the
               | past?
        
         | DennisP wrote:
         | > single core system can write something on the order of 10-100
         | million transactions per second...a single thread is what is
         | holding up the entire exchange. Arrays of structs, hot cache
         | lines, correct branch prediction and full pipelines
         | 
         | Without going back to school, how would I learn how to do that?
         | (Books I should read, good open source examples, any Coursera
         | courses?)
        
           | rigtorp wrote:
           | I've written some on this topic. Start here
           | https://rigtorp.se/virtual-memory/ , there is more on similar
           | stuff on my website.
        
         | jasonwatkinspdx wrote:
         | I've been working occasionally on a database engine design that
         | takes this idea to heart. It's been hugely beneficial.
         | 
         | The big picture design is simply described as an append only
         | log of copy on write page states, with Kafka style compaction
         | to maintain whatever window of history you want. This is
         | supplemented with a compact in memory index that holds the LSN
         | for the latest page states to make accessing live data as fast
         | as possible.
         | 
         | The write path goes through a single batch committer thread to
         | amortize fsync overhead (technically two threads in series so I
         | can overlap prepping the next batch with waiting on fsync).
         | Single writer isn't a big limit since even on consumer grade
         | SSDs you can write at GiB/sec now. Append only log means I hit
         | the happy case on SSDs. FTLs have come a long way in equalizing
         | sequential vs random write performance, but this comes at the
         | cost of a lot more write cycles behind the scenes to shuffle
         | stuff around.
         | 
         | Here's two examples of how immutability has drastically
         | simplified the design:
         | 
         | First, the in memory index is just a hand tweaked version of a
         | Bagwell Trie. Since the trie is copy on write, once the fsync
         | finishes the write thread can publish the new index state with
         | a single atomic write. Client threads never see any
         | intermediate state. The database moves from snapshot to
         | snapshot atomically from their view. (This is the part I'm
         | coding up atm)
         | 
         | A second example is want to have an on heap cache, to avoid
         | overtaxing the SSD or the downfalls of a pure mmap caching
         | approach (mmap can be the miss path for this cache however).
         | I'm planning on using a lock free hash table. Since I'm working
         | in a language that doesn't have such as a library, I'll have to
         | roll my own. Ordinarily this would be an insane idea, as such
         | structures are infamous for being nearly impossible to get
         | correct, even when using formal modeling tools. But in my case,
         | the relationship of LSN -> Immutable Page State is stable, so
         | the cache can be quite sloppy. With a mutable cache I'd in
         | essence have to ensure the CAS always went the right place,
         | even under resizing, etc. But with immutability duplicates are
         | only a potential memory waste, not a correctness problem, so I
         | get to be quite a bit more sloppy.
         | 
         | One shouldn't treat any single thing as a panacea, but I've
         | definitely come around to the view that immutable by default is
         | a good approach. You get to just trivially dodge entire
         | categories of nasty problems when doing something like the
         | above.
         | 
         | I'd love to hear anything more you can share about the stuff
         | you work on. It sounds really interesting.
        
           | bob1029 wrote:
           | I really enjoy seeing how similar themes emerge across the
           | industry.
           | 
           | I've been working on a key-value store that is purpose built
           | for low-latency NVMe devices.
           | 
           | I have an append only log (spread across file segments for GC
           | concerns), with the only contents being the nodes of a basic
           | splay tree. The key material is randomly-derived, so I don't
           | have to worry about the sequential integer issues on inserts.
           | Writes to the database are modified roots of the splay tree,
           | so the log is a continuous stream of consistent database
           | snapshots. For readers that don't require serialization, they
           | can asynchronously work with the historical result set as
           | appropriate without any translation or sync required. You
           | just point the reader to a prior root node offset and
           | everything should work out.
        
             | jasonwatkinspdx wrote:
             | Yup, I think a bunch of us are running up the same trail
             | here. Definitely the zero coordination required read only
             | transactions are super appealing.
        
           | fishtockos wrote:
           | Can you explain the API a bit more? If you're batching writes
           | in a single thread, doesn't that imply that clients don't
           | 'know' when their writes are actually committed to disk? Or
           | are their requests kept open until fsync?
        
             | jasonwatkinspdx wrote:
             | So first a disclaimer. I've been thinking about this design
             | or something like it in the back of my head for a couple
             | years while keeping up to date on VLDB papers and the like.
             | It's only recently that I've gotten serious about shipping
             | a proof of concept. As it stands I'm just working bottom up
             | trying to confirm the more risky components will work. So
             | it's by no means a done deal. Obviously I think I'm on the
             | right trail or I wouldn't be doing it though.
             | 
             | I'm working in golang. Client goroutines that execute read
             | write transactions buffer up their read and write sets,
             | then submit them to the committer goroutine via a channel.
             | Part of the struct they submit has a blocking channel for
             | the committer notify them of the result. So nothing returns
             | to the client unless it's stable on disk. Assuming I get
             | far enough, in a future version that'll also include
             | optionally waiting for raft replication.
        
           | dataangel wrote:
           | > Bagwell Trie
           | 
           | I'm getting almost no google results for "Bagwell Trie".
           | Where can I read about this?
        
             | jasonwatkinspdx wrote:
             | https://lampwww.epfl.ch/papers/idealhashtrees.pdf
             | 
             | These are the prior work for HAMTs, which have been
             | implemented in a bunch of languages.
        
       | davidgerard wrote:
       | (2016)
        
       | skohan wrote:
       | Cheeky headline
        
       | whiddershins wrote:
       | So I want to store historical sensor data, like room temperature,
       | for many sensors across many rooms.
       | 
       | And then events that happen less frequently. Hourly. Daily.
       | 
       | There is no need to ever change that data, only append to it.
       | 
       | Does anyone here know what I should be looking at for append-only
       | data storage technologies?
        
         | rzzzt wrote:
         | "Time series database" and "log-structured storage" might be
         | two keywords for starting points.
        
         | jasonwatkinspdx wrote:
         | For industrial scale Kafka et all.
         | 
         | For more modest scale you could use something like lmdb, where
         | append only hits a happy case optimization.
        
       | JackMorgan wrote:
       | Anyone interested in what an immutable programming language looks
       | like should check out Unison [0] written by Paul Chiasano. He is
       | known for writing the book FP in Scala.
       | 
       | The premise is that any edit produces a new function, but the old
       | one still exists, so you need to migrate calls from the old
       | version to the new version.
       | 
       | The benefits really start to manifest in distributed computing
       | and package management: much less need to worry about transitive
       | dependencies and conflicting versions.
       | 
       | The editable code is a projection of a data structure, so
       | refactoring and static analysis tools will be very easy to write.
       | Also the syntax is just a display issue and can be changed
       | easily, everyone could read and write code in the syntax they
       | prefer.
       | 
       | It does increase some complexity, but I think it is essential
       | complexity we currently have only poor ways to manage, and
       | explicit knobs to turn will allow us overall less effort.
       | 
       | [0]https://www.unisonweb.org/
        
         | fsloth wrote:
         | F# is immutable by default, so are all ML variants (e.g. OCaml,
         | Scala) for that matter.
        
           | DennisP wrote:
           | F# has immutable data by default. Your parent comment
           | describes immutable _code_.
        
             | CyberDildonics wrote:
             | What is that supposed to mean?
        
               | DennisP wrote:
               | I mean, it's right there in the above comment with
               | presumably more at the link. This seems like the key
               | point:
               | 
               | > any edit produces a new function, but the old one still
               | exists, so you need to migrate calls from the old version
               | to the new version
               | 
               | I'm not the guy to expand on it further, but maybe the
               | above commenter could answer questions.
        
               | CyberDildonics wrote:
               | A running program "editing" its functions is already
               | pretty exotic and mostly relegated to JIT compilation.
        
               | DennisP wrote:
               | I think they're talking about software development, not
               | running functions.
        
               | CyberDildonics wrote:
               | I don't understand why this would need a new language, it
               | seems like tools would be easier. If it is from
               | development, the whole point is to edit functions. If it
               | is to edit live programs, one obvious and good way is to
               | swap out a function pointer atomically and call functions
               | by getting their pointer atomically. This doesn't really
               | have anything to do with a different language though.
        
       | dgb23 wrote:
       | > Normalization is not necessary in an immutable data set,
       | however. The only reason to normalize immutable data sets may be
       | to reduce the storage necessary for them. On the other hand,
       | denormalized data sets may be easier and faster to process as
       | inputs to a computation.
       | 
       | I disagree here. To derive new data or from a immutable data
       | source you still want normalized data. Denormalization can happen
       | at some point for processing. Sometimes normalization is not
       | feasible, because it often requires modelling/design. But it is
       | the best _general_ case scenario for data to be well-structured
       | and normalized.
        
         | mtVessel wrote:
         | You're being kind. The author's statement is utter nonsense. If
         | you store the same fact in two places and only change one of
         | the them, it doesn't matter whether you update in-place, or
         | create a new version of record -- your data is still corrupted.
        
         | [deleted]
        
       | rurban wrote:
       | Taught us SICP already 45 years ago (1985)
        
       | wruza wrote:
       | _Accountants don 't use erasers; otherwise they may go to jail.
       | All entries in a ledger remain in the ledger. Corrections can be
       | made but only by making new entries in the ledger._
       | 
       | This is not true. Accounting is usually a most mutable data
       | source you ever have. Facts arrive at random times and then are
       | entered on a schedule. Corrections await on someone to resolve
       | analytical issues. Different accounting units have different
       | timelines, and in a unit there is many threads of an unordered
       | fact flow. It is all synced together only at specific points,
       | usually when it's time to report officially or internally. If you
       | ask your accountant and the only thing they drop is "busy", it's
       | one of these weeks of the year.
       | 
       |  _When a company 's quarterly results are published, they include
       | small corrections to the previous quarter. Small fixes are OK.
       | They are append-only, too._
       | 
       | This is true, because public reports are legally binding.
       | 
       | On topic: in my opinion, immutability at small scales is good,
       | because modifications do not leak accidentally into the fields of
       | your data structures (e.g. a mutable string is a bad idea in
       | general). But at the bigger scale immutability gets in the way,
       | if not hidden properly. It requires a less common sort of
       | developers who have to think in immutable/functional way, and
       | even when they can, this increases their cognitive distance to
       | the business logic. Properly hidden immutability helps a lot
       | though, and we mutable guys use it a lot, despite a widespread
       | belief. It's transactions and execution context isolation. There
       | is nothing wrong with mutating the operative area if these
       | mutations cannot escape it. But if you're short on primitives (a
       | context object, an object model vs global.store and raw fields of
       | a one-for-all object) you end up with setting globally visible
       | state that _can_ be read from the outside and trouble creeps in.
       | Moreover, a real world code rarely looks like you've seen in a
       | tutorial and instead of a clear `assemble(this(), that())` you
       | have to resort to non-ordinary functional tricks and concepts in
       | the name of it. As a result, you have to hire (or be one of)
       | these functional guys who may be better in general, but also
       | spend half their cognitive abilities on things that you couldn't
       | explain to an accountant.
       | 
       | So yes and no, immutability changes everything, but it's not the
       | only way to change it. It is just the easiest manual way to feel
       | safe if you start naked in the woods.
        
         | bruce343434 wrote:
         | > Facts arrive at random times and then are entered on a
         | schedule. Corrections await on someone to resolve analytical
         | issues. Different accounting units have different timelines,
         | and in a unit there is many threads of an unordered fact flow.
         | It is all synced together only at specific points, usually when
         | it's time to report officially or internally. If you ask your
         | accountant and the only thing they drop is "busy", it's one of
         | these weeks of the year.
         | 
         | I fail to see how any of this implies the data is mutable.
        
           | wruza wrote:
           | An accountant loads a raw payment data from a bank. Every
           | payment has to be recognized and related to some analytics (a
           | project, an expense/income class, etc). If it's not obvious,
           | then corresponding analytics are left empty or
           | "_Unrecognized", which rings at top of all related reports.
           | Later the info arrives and they simply go to that day and
           | fill in fields. The change history may be useful for data-
           | debug reasons, but the model that a regular accountant uses
           | is of a mutable nature at their level.
           | 
           | Of course one could design a system that made changesets
           | append-only and explicit for them. But they would object, the
           | same way you'd object if instead of updating a working tree,
           | `git pull` simply appended a set of patches to every file in
           | it and expected you to modify source code only by appending
           | patches yourself. Developer's own work model _is_ mutable (a
           | working tree), and you can't expect an accountant to be
           | surprisingly fine with what you personally are not.
        
             | slt2021 wrote:
             | bank is a perfect example of immutable accounting.
             | 
             | once debit is posted to your account (let's say it is
             | fraudulent transaction, or transaction entered by mistake),
             | nobody will go and erase it as if it never happened. They
             | would actually do reversal, by creating a new entry with
             | the same amount and opposite sign. So you would env up with
             | two transactions: +100 and -100.
             | 
             | same is true for most accounting systems. Once the
             | accounting period is closed (daily in banking system,
             | monthly for most other companies), you can't change the
             | closed month. All changes go as new transaction adjustments
        
               | wruza wrote:
               | That's what I'm talking about! You don't have to go all-
               | immutable, only at the "commit and set in stone" level
               | (you may have many). Obsessing yourself with a completely
               | immutable state management at all levels does more damage
               | to your workflow than the problems it solves. It's yet
               | another silver bullet that everyone tries to bite these
               | days.
        
       | lewisjoe wrote:
       | Curious: Under what kind does redux's use of immutability come
       | under?
        
       | baby wrote:
       | Can you change the title?
        
         | junon wrote:
         | The original title from the article matches the one here.
         | There's nothing to change.
        
           | baby wrote:
           | Jk
        
       | chaz6 wrote:
       | I do not think immutability is compatible with data protection
       | legislation such as GDPR.
        
         | pjc50 wrote:
         | Interestingly France _requires_ immutability (well, durable
         | records, not quite the same thing) for point of sale systems,
         | under  "NF525": https://www.ikosoft.com/en-gb/all-knowledge-
         | about-the-nf525-...
        
         | aszen wrote:
         | Immutability doesn't mean you can't delete old records.
        
           | tyingq wrote:
           | In practice, it could. Kafka, for example, doesn't provide a
           | non-hacky way to delete a specific record from a topic.
        
       ___________________________________________________________________
       (page generated 2021-06-26 23:01 UTC)