[HN Gopher] Immutability Changes Everything (2016)
___________________________________________________________________
Immutability Changes Everything (2016)
Author : Garbage
Score : 147 points
Date : 2021-06-26 08:55 UTC (14 hours ago)
(HTM) web link (queue.acm.org)
(TXT) w3m dump (queue.acm.org)
| mightybyte wrote:
| This idea can be extended by observing the following analogy:
|
| purity is to functions
|
| as
|
| immutability is to data
|
| Immutability is definitely a valuable technique that can help
| make software faster (sometimes), more reliable, and easier to
| maintain. I would take it one step further and argue that we can
| get a similar boost by extending the idea to functions, not just
| data. And the way you do that is by writing pure functions (i.e.
| no side effects...or controlling/limiting them in some way) and
| using languages that can enforce that your functions are pure.
| tsimionescu wrote:
| Unfortunately, this just doesn't pan out in practice. There are
| various techniques that are much more inefficient to express
| with immutability and pure functions (e.g. computing a
| histogram). There is also a pretty big body of evidence already
| that doesn't show any hugely significant (say, 2x) changes in
| productivity, correctness or speed for using pure, strict
| functional languages and styles vs more traditional imperative,
| impure or even dynamic languages.
| benrbray wrote:
| For completeness, would you mind posting a link to some of
| this evidence so we don't have to take your word for it?
| username90 wrote:
| Sorting is an example, quicksort is fast since it is done
| in place and mutates the data instead of allocating new
| memory.
|
| Very few algorithms are not faster with mutable variables.
| benrbray wrote:
| As I understand it, immutability usually refers to
| mutable data structures hidden behind an interface that
| allows the programmer to reason about things as if they
| were immutable. The naive implementation is of course to
| duplicate all data whenever a change is made, but pure /
| immutable languages use smarter things like persistent
| data structures behind the scenes.
|
| If the compiler knows the original data is never needed,
| it can optimize away the old copies, too. At least that's
| the hope, I'm not sure how true it is.
| tsimionescu wrote:
| First of all, I am arguing for the base case: absent any
| studies, the null hypothesis is naturally that there are no
| major differences between programming paradigms.
|
| On the issue of bug counts, there was one huge study, that
| was successfully reproduced [0], which finds that there is
| a real but small negative correlation between FP and bug
| counts in large GitHub projects.
|
| For productivity I haven't been able to find any large-
| scale studies that compare functional vs imperative/OOP
| languages. I did find one small scale study [1] that found
| significantly better productivity for Haskell (and a Lisp
| dialect) vs Ada, C++ and a few others, but it was about a
| specific small prototype problem with just a handful of
| programmers participating (the Haskell program had 85
| lines, the worse, C++, had 1105; and programs took between
| 10h and 54h to finish - hardly representative of large-
| scale coding). Most other studies on productivity that I
| could find [2], [3] didn't include any FP languages at all.
|
| [0] https://arxiv.org/pdf/1901.10220.pdf
|
| [1] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1
| .1.36...
|
| [2] https://www.e-informatyka.pl/attach/e-Informatica_-
| _Volume_1...
|
| [3] https://www.researchgate.net/publication/4261726_Do_Pro
| gramm...
| nonameiguess wrote:
| You can look at the R data.table package. R by default is a
| purely functional language, and the built-in data.frame
| type for in-memory tabular data is immutable. That means
| computing anything that changes the data requires copying
| it all. The data.table packages achieved pretty impressive
| speed-ups by dropping down into C and doing everything in-
| place. Also makes it harder to run out of memory.
|
| Granted, some effects like this can be mitigated with
| sufficiently clever language design. Haskell with lazy
| evaluation. I believe some compilers for functional
| languages will reuse memory addresses they can prove
| nothing else refers to, so in reality they are mutating in-
| place even if semantically the variables are "immutable."
| Heck, even Python does that with the built-in immutable
| types. When you assign the changed data to a new name, it
| doesn't actually allocate new memory and just does it in-
| place under the hood. In cases like that, the data isn't
| really "immutable." It just can't be mutated by the
| programmer, only by the runtime or compiler.
|
| Of course, if we want to get sufficiently pedantic, no data
| is ever immutable. The law may say a ledger is append-only,
| but nothing is stopping an accountant from just lying. A
| compiler may prevent you from assigning new data to an
| already-existing name binding, but a compiler can't stop
| someone from using rowhammer or an x-ray machine or
| something and flipping the bits in memory anyway.
| sitkack wrote:
| Yes, and computation and data both become interchangeable
| facts. And by becoming facts, they lessen the tech debt, and
| start to fade into the background. I think the only way to have
| tractable large systems is to have pervasive immutability and
| purity.
| jhgb wrote:
| > purity is to functions as immutability is to data
|
| I thought all data was just a bunch of nullary functions. ;)
| [deleted]
| samuell wrote:
| On my reading list for the summer: "The Art of Immutable
| Architecture: Theory and Practice of Data Management in
| Distributed Systems" [1]
|
| [1] https://www.goodreads.com/book/show/52653567-the-art-of-
| immu...
| throwaway81523 wrote:
| I'll try to look at that. At a more local level you might like
| "Purely Functional Data Structures" by Chris Okasaki.
| Functional programming revolves around those.
| BazookaMusic wrote:
| +1 on "Purely Functional Data Structures", an excellent book
| to learn a different programming paradigm
| refset wrote:
| Pat also wrote a pretty great follow-up article that expands on
| the "Data on the Outside vs. Data on the Inside" section,
| covering the intersection of immutability, temporality, schema
| and relational querying:
| https://queue.acm.org/detail.cfm?id=341501
| news_hacker wrote:
| Your link wasn't working for me, here's a different one:
| https://queue.acm.org/detail.cfm?id=3415014
| refset wrote:
| I somehow messed up the copy+paste and lost the last
| character, sorry :(
|
| Thank you for your persistence though and for chiming in with
| the right answer!
| jbjbjbjb wrote:
| I feel like the programming tools haven't kept up and so to do
| this we also need immutable requirements and impeccable
| programmers. Currently if I change the code in my application it
| gives different results. I'm not sure if there is a word for it,
| but I'm thinking of pure applications like we have pure
| functions. I don't think the right tools exist for this.
|
| Also the "accountants don't use erasers" feels really glib to me,
| it's hard work.
| kaba0 wrote:
| As others in the thread mentioned, the Unison language is
| exactly what you are getting at.
| dgb23 wrote:
| We have Git but a commit hash has no notion of whether the
| consumers have to care about a change.
|
| The artifact we produce, typically immutable in a package
| repository, then has "semantic versioning" (possibly) to convey
| whether they should care.
| jbjbjbjb wrote:
| Yeah I my argument is that those tools are ok but a bit
| inadequate. Certainly not first class solutions. It's just a
| thought, I'm not sure it would open up a world of
| possibilities but there could be more rigorous ways on
| showing how the logic has changed over time and keeping that
| alongside the current state of the logic.
| dgb23 wrote:
| You might find Unison[0] interesting. Or the more recent
| developments (or intents) of Clojure spec[1] and the talk
| Speculation[2].
|
| [0] https://www.unisonweb.org/
|
| [1] https://github.com/clojure/spec-alpha2
|
| [2] https://www.youtube.com/watch?v=oyLBGkS5ICk
| belter wrote:
| Immutability changes everything but does not Solves Everything.
| You are just changing to a different scenario that might help.
|
| "Tesler's Law" [1] :-) makes sure you are looking from a
| different perspective. Complexity is still present.
|
| "Immutability is not enough"
|
| https://codewords.recurse.com/issues/six/immutability-is-not...
|
| [1] - "Tesler's Law"
| https://en.wikipedia.org/wiki/Law_of_conservation_of_complex...
| cuddlecake wrote:
| The article "Immutability is not enough" is sub-par in every
| aspect.
|
| The author never defines, what sort of problems immutable data
| can help avoid, or what the benefits of immutable data are. And
| still, after every paragraph, he wonders why functional
| programming didn't help him avoid bugs in his business logic.
|
| Like, what the hell, the author produced a bug because the
| order of your operations was wrong? Great, write a test and
| restructure your code to make it work, at least you know where
| to look.
|
| Functional Programming does not fix your Game Loop, and it
| never promised to.
| moldavi wrote:
| The article rang true for me. I heard the same FP promises
| that the author did, and eventually ran into the same issues
| in my FP programs, and left disillusioned. The article
| captured the experience perfectly.
| crdrost wrote:
| FWIW I came to the comments because I also found the
| "Immutability changes everything" paper nearly unreadable.
|
| I just wanted to comment that I love your last paragraph from
| a conceptual level. You want immutability? Great, let's talk
| about what performant game design looks like in that context.
| Games are special, in that they kind of do everything that
| computers are good at, to varying degrees. The author's ideas
| about snapshots of relational databases inviting further
| schema changes in the pipeline--I guess I see the idea you're
| trying to introduce, but what does it actually look like in a
| chess program? Or a choose your own adventure game?
| jt2190 wrote:
| Thanks for the link to "Immutability is not enough" I've
| resubmitted the article here:
| https://news.ycombinator.com/item?id=27642263
| kristiandupont wrote:
| I like to say that mutable state is to software as moving parts
| are to hardware. If you can avoid them, great! But a number of
| systems/designs simply require them.
|
| https://kristiandupont.medium.com/mutable-state-is-to-softwa...
| bob1029 wrote:
| Immutability allows for incredible performance when you are
| working in the guts of database engines. Having assurances that a
| specific data log file offset cannot possibly be modified allows
| for substantial optimization throughout.
|
| For users who do not need a serialized view of events, access to
| "near-real-time" data can be made to scale as far as needed, even
| if a single machine is driving all the writes. You could
| theoretically have a billion readers working off a single
| synchronous core system.
|
| By utilizing careful abstractions, that single core system can
| write something on the order of 10-100 million transactions per
| second. We do it every day on the various financial exchanges
| around the world. In many cases, a _single thread_ is what is
| holding up the entire exchange. Arrays of structs, hot cache
| lines, correct branch prediction and full pipelines are a hell of
| a combo.
|
| In some practical businesses apps, it is also feasible to
| coalesce logical writes before they need to enter a fully-
| synchronous context, so you can get some step-down ratio like
| 100:1 on transactions that actually need to go across the network
| and be handled by the core system.
| [deleted]
| m12k wrote:
| I'm increasingly coming to realize that in programming,
| flexibility is something you pay for in performance, whether
| you use it or not. Data being mutable even if you never mutate
| it. Automatic bounds checks for data that you've already made
| sure can't go out of bounds. vtable lookups even when you're
| not using inheritance, because who knows, someone else might.
| Compilers needing to do extra work in case two pointers are
| aliases, even though that never makes sense for them to be from
| the programmer's perspective. Objects allocated one at a time,
| scattered around memory, even if you don't need them to be, and
| wouldn't mind their innards being kept together in cache
| aligned arrays.
|
| They say you can solve any problem in computer science with
| another level of indirection - except performance, it seems,
| because we're paying the cost of all this indirection all the
| time, even if we don't need it most of the time. There's a
| crazy amount of unnecessary "housekeeping" done by compilers
| and runtimes to make sure everything always works all the time,
| which requires them to always do extra work to protect against
| pathological cases that almost never arise.
|
| Modern compilers and runtimes are getting better and better at
| recognizing the common "happy case" and optimizing for that,
| and that's good. But it still feels backwards - both that we
| can't explicitly opt out of unneeded flexibility, so instead
| the compiler needs to deduce that we aren't using it. But also
| that a lot of this isn't opt-in to begin with, rather than opt
| out - i.e. make things immutable by default. Or let the
| programmer tell the compiler a little bit more about what they
| need and don't need the code to do, so the compiler can do a
| much better job if it.
| Svetlitski wrote:
| Definitely agree. Another example of lost performance that
| comes to mind is (ignorance of) C struct padding. In terms of
| modern tools, Rust has definitely made some headway here:
|
| * Variables are immutable by default
|
| * The layout of a struct's fields in memory is opaque to the
| programmer, allowing the compiler to reorder them to minimize
| padding
|
| * As a natural consequence of the borrow checker, Rust has
| _much_ more information about whether or not two references
| could alias than most other languages
|
| * The programmer is very much made to feel the cost of
| dynamic-dispatch (via dyn), and static-dispatch is the norm
|
| * The compiler can omit bounds checks when it can prove
| they're unnecessary
|
| * It's easy to safely have structs that live entirely on the
| stack (i.e. no heap allocation)
|
| Overall this is an unsolved problem though. It seems to me
| that the trend is that the more details about the internals
| that the language exposes to the programmer, the fewer
| optimizations can be made automatically. Essentially the
| ideal is _declarative_ performance, where the programmer
| expresses the idea and the compiler figures out the optimal
| implementation (which is in stark contrast to most high-
| performance code being hand-tuned today). Take a look at
| something like Halide for a successful application of this
| philosophy: https://halide-lang.org/
| bob1029 wrote:
| > Or let the programmer tell the compiler a little bit more
| about what they need and don't need the code to do, so the
| compiler can do a much better job if it.
|
| You can do this today in languages like C#:
|
| https://docs.microsoft.com/en-us/dotnet/csharp/write-safe-
| ef...
|
| > Adding the readonly modifier to members that don't mutate
| state provides two related benefits. First, the compiler
| enforces your intent. That member can't mutate the struct's
| state. Second, the compiler won't create defensive copies of
| in parameters when accessing a readonly member. The compiler
| can make this optimization safely because it guarantees that
| the struct is not modified by a readonly member.
| XorNot wrote:
| This feels like some seriously missed static analysis
| optimization though - why do I need to specify this?
| There's vanishingly few cases the answer to "looks like you
| don't modify this ever, make it readonly?" where the answer
| is not yes.
| bob1029 wrote:
| Probably because reflection & compilation+activation of
| types existing outside the original compilation context
| is possible at runtime.
|
| I much prefer to having a language/runtime that can
| handle extremely complex problem scenarios out of the box
| (with the need for occasional tuning), rather than the
| other way around. It is not very often that I find myself
| screwing around with performance-intensive code. Usually,
| a few very important things are addressed at the lowest
| levels and then you put all your nice abstractions back
| on top for daily driving.
| jiggawatts wrote:
| > flexibility is something you pay for in performance
|
| I lightbulb moment I had recently was that the classic
| interface-based OO programming paradigm utilising virtual
| functions was essentially the following declaration by the
| programmer to the compiler: The function that is invoked can
| change at _any_ time, even call-to-call; assume nothing. it
| is a black box with unknown, unknowable contents. Do not
| inline it, do not inspect it, expect the worst.
|
| If you look at a typical web framework, with call stacks
| hundreds of methods deep, you'll come to the same horrifying
| realisation that these are all inherently un-optimisable
| because of that statement. The compiler can do nearly
| nothing, and the processor will also struggle to paper over
| the inefficiency. Meanwhile, in reality, the specific
| implementations of those interfaces _never_ change during
| actual production execution! Dynamic dispatch is utilised
| only during development time, and even then only at _compile
| time_ to inject test harnesses, mocks, or whatever.
|
| At runtime, 99.999999% of all function calls are essentially
| static, and dynamic dispatch is a pure waste.
| dzaima wrote:
| Except that a smart runtime can actually work with the
| assumption that only one implementation of a class will be
| used, and inline what would otherwise be virtual methods.
| And, as far as I know, the JVM definitely does this, and at
| multiple levels too - if there's only one implementation of
| an interface, it can just assume that everything's static
| and inlineable (at the cost of deoptimizations if that
| turns out to be false later). If not, it can still decide
| to inline, and just check the type beforehand.
| MaxBarraclough wrote:
| > The compiler can do nearly nothing,
|
| Optimizing JIT-compiling JVMs are often able to optimize
| virtual calls that only ever resolve to one implementation.
| They may even inline such calls. ( _edit_ dzaima got there
| first.)
| duped wrote:
| It's old too, Self was doing this in the late 80s
| scns wrote:
| It was written by the same people:
|
| The Java HotSpot Performance Engine was released on April
| 27, 1999, built on technologies from an implementation of
| the programming language Smalltalk named Strongtalk,
| originally developed by Longview Technologies, which
| traded as Animorphic. The Longview virtual machine was
| based on the Self virtual machine, with an interpreter
| replacing the fast-and-dumb first compiler. When Sun
| cancelled the Self project, two key people, Urs Holzle
| and Lars Bak left Sun to start Longview. In 1997, Sun
| Microsystems purchased Animorphic.
|
| Source: wikipedia
| andi999 wrote:
| Yes and speaking of c++ the v table pointer might destroy
| your cache line.
| kaba0 wrote:
| Just as a note, one may get horrified at the cost of all
| these indirections, C code is full of function pointers
| passed as callbacks. Hell, actually there is no much
| difference between a vtable and typical C code doing the
| same thing. Chances are, C++ compiler writers made it
| more efficient. Also, c++ doesn't default to virtual so
| you rarely pay the price.
| strictfp wrote:
| Yup,and this is excerbarated by the fact that c++ doesn't
| really offer any convenient equivalent of virtual methods
| with static dispatch.
|
| std::variant is almost it, but it's interface is quite
| horrendous as of now.
|
| Another interesting thing is that c++ also intermingles
| dynamic allocation with polymorphism, which can lead to a
| lot of unnecessary pointer chasing
| jasonwatkinspdx wrote:
| The only reason vtables et all work as well as they do is
| modern branch predictors are incredible.
| dzaima wrote:
| Well, the alternative to vtables is manually branching
| (besides completely rewriting the code to avoid different
| actions), which isn't that much better and still requires
| branch prediction. And I think JVMs also might convert a
| dispatch from two classes to a conditional branch, but
| I'm not sure.
| nayuki wrote:
| Yes: https://shipilev.net/blog/2015/black-magic-method-
| dispatch/#...
| specialist wrote:
| > _...with another level of indirection_
|
| Ya. I've spent the last two decades removing levels of
| indirection, reducing call stack depth. This has put me at
| odds with most everyone. I've been insulted many times for
| not understanding "software architecture". Heh.
|
| I started a design pattern study group shortly after GoF's
| book was published. I think it's a lot like jazz. 1st time
| thru, mind blown. 2nd time, ah, I think I'm starting to get
| it. 3rd time, well, duh.
|
| Eventually students get full circle, a la "there is no
| spoon", and you reject the veneer of "design patterns". Said
| another way, it's just another tool, learn to use it wisely.
| Learn to reject temptation.
|
| Aspects, annotations, runtime code reflection,
| metaprogramming, frameworks, schema mappings, factories,
| singletons, and whatnot are just monkey work. (Most of the
| time.) Exquisite puzzles (finger traps) to suck smart people
| people in. To distract from the real work of solving
| customer's problems, business needs.
|
| I think the kids these days call it "yak shaving." Yes, I am
| _very guilty_ of this. Past, present, and surely future. I 'm
| no better than anyone else. I recognize my failings and am
| trying to improve.
| mtVessel wrote:
| As much as people like to talk about "footguns", that's the
| value proposition of C++, zero cost for anything you don't
| use.
|
| Managed/interpreted langs have a different proposition: pay
| up front for all the features, but your complexity never
| increases whether you use none, one or all of them.
| MauranKilom wrote:
| Well, pointer aliasing for example is a very tricky thing
| in C++ too. Even with strict aliasing, it is very
| unintuitive from the outside when you're going to trip over
| unexpected performance hurdles. (Does this data structure
| have a char* inside? Who knows, but if it does, there goes
| vectorization!).
| username90 wrote:
| If you don't dereference the char* it will get treated an
| an integer and be vectorizable. In C++ you don't pay for
| what you don't use. If you need the char* in the loop I'm
| not sure why you'd be thinking about vectorization. Maybe
| you meant parallelism?
| de_keyboard wrote:
| Great talk: Constraints Liberate, Liberties Constrain --
| Runar Bjarnason
|
| https://www.youtube.com/watch?v=GqmsQeSzMdw
| adverbly wrote:
| Wow thanks for linking that. Great talk.
|
| I'm always on the lookout for talks about interesting and
| generally useful concepts, but where I can follow only a
| small part of them. Makes me feel like I can learn and grow
| into a wiser person. This certainly fits that criteria.
| whoisburbansky wrote:
| This talk fits a similar bucket in my head; do you keep a
| playlist of other such talks you've run across in the
| past?
| DennisP wrote:
| > single core system can write something on the order of 10-100
| million transactions per second...a single thread is what is
| holding up the entire exchange. Arrays of structs, hot cache
| lines, correct branch prediction and full pipelines
|
| Without going back to school, how would I learn how to do that?
| (Books I should read, good open source examples, any Coursera
| courses?)
| rigtorp wrote:
| I've written some on this topic. Start here
| https://rigtorp.se/virtual-memory/ , there is more on similar
| stuff on my website.
| jasonwatkinspdx wrote:
| I've been working occasionally on a database engine design that
| takes this idea to heart. It's been hugely beneficial.
|
| The big picture design is simply described as an append only
| log of copy on write page states, with Kafka style compaction
| to maintain whatever window of history you want. This is
| supplemented with a compact in memory index that holds the LSN
| for the latest page states to make accessing live data as fast
| as possible.
|
| The write path goes through a single batch committer thread to
| amortize fsync overhead (technically two threads in series so I
| can overlap prepping the next batch with waiting on fsync).
| Single writer isn't a big limit since even on consumer grade
| SSDs you can write at GiB/sec now. Append only log means I hit
| the happy case on SSDs. FTLs have come a long way in equalizing
| sequential vs random write performance, but this comes at the
| cost of a lot more write cycles behind the scenes to shuffle
| stuff around.
|
| Here's two examples of how immutability has drastically
| simplified the design:
|
| First, the in memory index is just a hand tweaked version of a
| Bagwell Trie. Since the trie is copy on write, once the fsync
| finishes the write thread can publish the new index state with
| a single atomic write. Client threads never see any
| intermediate state. The database moves from snapshot to
| snapshot atomically from their view. (This is the part I'm
| coding up atm)
|
| A second example is want to have an on heap cache, to avoid
| overtaxing the SSD or the downfalls of a pure mmap caching
| approach (mmap can be the miss path for this cache however).
| I'm planning on using a lock free hash table. Since I'm working
| in a language that doesn't have such as a library, I'll have to
| roll my own. Ordinarily this would be an insane idea, as such
| structures are infamous for being nearly impossible to get
| correct, even when using formal modeling tools. But in my case,
| the relationship of LSN -> Immutable Page State is stable, so
| the cache can be quite sloppy. With a mutable cache I'd in
| essence have to ensure the CAS always went the right place,
| even under resizing, etc. But with immutability duplicates are
| only a potential memory waste, not a correctness problem, so I
| get to be quite a bit more sloppy.
|
| One shouldn't treat any single thing as a panacea, but I've
| definitely come around to the view that immutable by default is
| a good approach. You get to just trivially dodge entire
| categories of nasty problems when doing something like the
| above.
|
| I'd love to hear anything more you can share about the stuff
| you work on. It sounds really interesting.
| bob1029 wrote:
| I really enjoy seeing how similar themes emerge across the
| industry.
|
| I've been working on a key-value store that is purpose built
| for low-latency NVMe devices.
|
| I have an append only log (spread across file segments for GC
| concerns), with the only contents being the nodes of a basic
| splay tree. The key material is randomly-derived, so I don't
| have to worry about the sequential integer issues on inserts.
| Writes to the database are modified roots of the splay tree,
| so the log is a continuous stream of consistent database
| snapshots. For readers that don't require serialization, they
| can asynchronously work with the historical result set as
| appropriate without any translation or sync required. You
| just point the reader to a prior root node offset and
| everything should work out.
| jasonwatkinspdx wrote:
| Yup, I think a bunch of us are running up the same trail
| here. Definitely the zero coordination required read only
| transactions are super appealing.
| fishtockos wrote:
| Can you explain the API a bit more? If you're batching writes
| in a single thread, doesn't that imply that clients don't
| 'know' when their writes are actually committed to disk? Or
| are their requests kept open until fsync?
| jasonwatkinspdx wrote:
| So first a disclaimer. I've been thinking about this design
| or something like it in the back of my head for a couple
| years while keeping up to date on VLDB papers and the like.
| It's only recently that I've gotten serious about shipping
| a proof of concept. As it stands I'm just working bottom up
| trying to confirm the more risky components will work. So
| it's by no means a done deal. Obviously I think I'm on the
| right trail or I wouldn't be doing it though.
|
| I'm working in golang. Client goroutines that execute read
| write transactions buffer up their read and write sets,
| then submit them to the committer goroutine via a channel.
| Part of the struct they submit has a blocking channel for
| the committer notify them of the result. So nothing returns
| to the client unless it's stable on disk. Assuming I get
| far enough, in a future version that'll also include
| optionally waiting for raft replication.
| dataangel wrote:
| > Bagwell Trie
|
| I'm getting almost no google results for "Bagwell Trie".
| Where can I read about this?
| jasonwatkinspdx wrote:
| https://lampwww.epfl.ch/papers/idealhashtrees.pdf
|
| These are the prior work for HAMTs, which have been
| implemented in a bunch of languages.
| davidgerard wrote:
| (2016)
| skohan wrote:
| Cheeky headline
| whiddershins wrote:
| So I want to store historical sensor data, like room temperature,
| for many sensors across many rooms.
|
| And then events that happen less frequently. Hourly. Daily.
|
| There is no need to ever change that data, only append to it.
|
| Does anyone here know what I should be looking at for append-only
| data storage technologies?
| rzzzt wrote:
| "Time series database" and "log-structured storage" might be
| two keywords for starting points.
| jasonwatkinspdx wrote:
| For industrial scale Kafka et all.
|
| For more modest scale you could use something like lmdb, where
| append only hits a happy case optimization.
| JackMorgan wrote:
| Anyone interested in what an immutable programming language looks
| like should check out Unison [0] written by Paul Chiasano. He is
| known for writing the book FP in Scala.
|
| The premise is that any edit produces a new function, but the old
| one still exists, so you need to migrate calls from the old
| version to the new version.
|
| The benefits really start to manifest in distributed computing
| and package management: much less need to worry about transitive
| dependencies and conflicting versions.
|
| The editable code is a projection of a data structure, so
| refactoring and static analysis tools will be very easy to write.
| Also the syntax is just a display issue and can be changed
| easily, everyone could read and write code in the syntax they
| prefer.
|
| It does increase some complexity, but I think it is essential
| complexity we currently have only poor ways to manage, and
| explicit knobs to turn will allow us overall less effort.
|
| [0]https://www.unisonweb.org/
| fsloth wrote:
| F# is immutable by default, so are all ML variants (e.g. OCaml,
| Scala) for that matter.
| DennisP wrote:
| F# has immutable data by default. Your parent comment
| describes immutable _code_.
| CyberDildonics wrote:
| What is that supposed to mean?
| DennisP wrote:
| I mean, it's right there in the above comment with
| presumably more at the link. This seems like the key
| point:
|
| > any edit produces a new function, but the old one still
| exists, so you need to migrate calls from the old version
| to the new version
|
| I'm not the guy to expand on it further, but maybe the
| above commenter could answer questions.
| CyberDildonics wrote:
| A running program "editing" its functions is already
| pretty exotic and mostly relegated to JIT compilation.
| DennisP wrote:
| I think they're talking about software development, not
| running functions.
| CyberDildonics wrote:
| I don't understand why this would need a new language, it
| seems like tools would be easier. If it is from
| development, the whole point is to edit functions. If it
| is to edit live programs, one obvious and good way is to
| swap out a function pointer atomically and call functions
| by getting their pointer atomically. This doesn't really
| have anything to do with a different language though.
| dgb23 wrote:
| > Normalization is not necessary in an immutable data set,
| however. The only reason to normalize immutable data sets may be
| to reduce the storage necessary for them. On the other hand,
| denormalized data sets may be easier and faster to process as
| inputs to a computation.
|
| I disagree here. To derive new data or from a immutable data
| source you still want normalized data. Denormalization can happen
| at some point for processing. Sometimes normalization is not
| feasible, because it often requires modelling/design. But it is
| the best _general_ case scenario for data to be well-structured
| and normalized.
| mtVessel wrote:
| You're being kind. The author's statement is utter nonsense. If
| you store the same fact in two places and only change one of
| the them, it doesn't matter whether you update in-place, or
| create a new version of record -- your data is still corrupted.
| [deleted]
| rurban wrote:
| Taught us SICP already 45 years ago (1985)
| wruza wrote:
| _Accountants don 't use erasers; otherwise they may go to jail.
| All entries in a ledger remain in the ledger. Corrections can be
| made but only by making new entries in the ledger._
|
| This is not true. Accounting is usually a most mutable data
| source you ever have. Facts arrive at random times and then are
| entered on a schedule. Corrections await on someone to resolve
| analytical issues. Different accounting units have different
| timelines, and in a unit there is many threads of an unordered
| fact flow. It is all synced together only at specific points,
| usually when it's time to report officially or internally. If you
| ask your accountant and the only thing they drop is "busy", it's
| one of these weeks of the year.
|
| _When a company 's quarterly results are published, they include
| small corrections to the previous quarter. Small fixes are OK.
| They are append-only, too._
|
| This is true, because public reports are legally binding.
|
| On topic: in my opinion, immutability at small scales is good,
| because modifications do not leak accidentally into the fields of
| your data structures (e.g. a mutable string is a bad idea in
| general). But at the bigger scale immutability gets in the way,
| if not hidden properly. It requires a less common sort of
| developers who have to think in immutable/functional way, and
| even when they can, this increases their cognitive distance to
| the business logic. Properly hidden immutability helps a lot
| though, and we mutable guys use it a lot, despite a widespread
| belief. It's transactions and execution context isolation. There
| is nothing wrong with mutating the operative area if these
| mutations cannot escape it. But if you're short on primitives (a
| context object, an object model vs global.store and raw fields of
| a one-for-all object) you end up with setting globally visible
| state that _can_ be read from the outside and trouble creeps in.
| Moreover, a real world code rarely looks like you've seen in a
| tutorial and instead of a clear `assemble(this(), that())` you
| have to resort to non-ordinary functional tricks and concepts in
| the name of it. As a result, you have to hire (or be one of)
| these functional guys who may be better in general, but also
| spend half their cognitive abilities on things that you couldn't
| explain to an accountant.
|
| So yes and no, immutability changes everything, but it's not the
| only way to change it. It is just the easiest manual way to feel
| safe if you start naked in the woods.
| bruce343434 wrote:
| > Facts arrive at random times and then are entered on a
| schedule. Corrections await on someone to resolve analytical
| issues. Different accounting units have different timelines,
| and in a unit there is many threads of an unordered fact flow.
| It is all synced together only at specific points, usually when
| it's time to report officially or internally. If you ask your
| accountant and the only thing they drop is "busy", it's one of
| these weeks of the year.
|
| I fail to see how any of this implies the data is mutable.
| wruza wrote:
| An accountant loads a raw payment data from a bank. Every
| payment has to be recognized and related to some analytics (a
| project, an expense/income class, etc). If it's not obvious,
| then corresponding analytics are left empty or
| "_Unrecognized", which rings at top of all related reports.
| Later the info arrives and they simply go to that day and
| fill in fields. The change history may be useful for data-
| debug reasons, but the model that a regular accountant uses
| is of a mutable nature at their level.
|
| Of course one could design a system that made changesets
| append-only and explicit for them. But they would object, the
| same way you'd object if instead of updating a working tree,
| `git pull` simply appended a set of patches to every file in
| it and expected you to modify source code only by appending
| patches yourself. Developer's own work model _is_ mutable (a
| working tree), and you can't expect an accountant to be
| surprisingly fine with what you personally are not.
| slt2021 wrote:
| bank is a perfect example of immutable accounting.
|
| once debit is posted to your account (let's say it is
| fraudulent transaction, or transaction entered by mistake),
| nobody will go and erase it as if it never happened. They
| would actually do reversal, by creating a new entry with
| the same amount and opposite sign. So you would env up with
| two transactions: +100 and -100.
|
| same is true for most accounting systems. Once the
| accounting period is closed (daily in banking system,
| monthly for most other companies), you can't change the
| closed month. All changes go as new transaction adjustments
| wruza wrote:
| That's what I'm talking about! You don't have to go all-
| immutable, only at the "commit and set in stone" level
| (you may have many). Obsessing yourself with a completely
| immutable state management at all levels does more damage
| to your workflow than the problems it solves. It's yet
| another silver bullet that everyone tries to bite these
| days.
| lewisjoe wrote:
| Curious: Under what kind does redux's use of immutability come
| under?
| baby wrote:
| Can you change the title?
| junon wrote:
| The original title from the article matches the one here.
| There's nothing to change.
| baby wrote:
| Jk
| chaz6 wrote:
| I do not think immutability is compatible with data protection
| legislation such as GDPR.
| pjc50 wrote:
| Interestingly France _requires_ immutability (well, durable
| records, not quite the same thing) for point of sale systems,
| under "NF525": https://www.ikosoft.com/en-gb/all-knowledge-
| about-the-nf525-...
| aszen wrote:
| Immutability doesn't mean you can't delete old records.
| tyingq wrote:
| In practice, it could. Kafka, for example, doesn't provide a
| non-hacky way to delete a specific record from a topic.
___________________________________________________________________
(page generated 2021-06-26 23:01 UTC)