https://cohost.org/prophet/post/7083950-functional-programming [noscript] log insign up [12648-bc81] prophet Prophet @prophet * prophetlabs.de/ Highly, terribly dangerous posts. Misuse of this profile can invite eggbug to trounce upon your data and then laugh in your face. You don't want this profile. Really. log inask --------------------------------------------------------------------- My Blog prophetlabs.de/ $\mathbb{X}$ twitter.com/welltypedwitch Mastodon functional.cafe/@prophet Github github.com/Innf107 Twitch twitch.tv/welltypedwitch prophet Prophet@prophet7/29/2024, 6:13 PM --------------------------------------------------------------------- Functional programming languages should be so much better at mutation than they are A lot of people think that functional programming is mostly about avoiding mutation at all costs. Even though persistent data structures are great and there is definitely some truth to it, this view just doesn't really hold up in reality. Many data structures fundamentally require some form of mutation (e.g. union find) and even something simple like a persistent sequential data structure that allows both fast appends and fast random access is orders of magnitude more complicated and still slower than a naive dynamic mutable array. So really, functional languages need to allow mutation in some way if they don't want nearly every program to suffer from completely unnecessary overhead in terms of both time and implementation complexity. Existing languages have a few options here but I don't think any of these are good enough. Option 1: Give up This is the most common and obvious option: just allow programmers to use mutable data structures like they would in an imperative language. While this has the advantage of being relatively straight-forward and quite flexible, I really don't think it is the way to go. The fundamental problem with this approach is that arguably the main reason to use a functional language in the first place is that you don't want to have to watch out for accidental side effects and accidental mutation. But if your language has unrestricted mutable data structures, the desire not to worry about mutation outweighs the benefits of using a mutable data structure so they will typically only be used as a last resort and mostly in really simple scenarios where it is easy to manually verify that no mutation leaks outside some interface boundary. (a function, module, etc.) For example, let's say you're iterating over some structure and collecting your results in a sequence. The most efficient data structure to use here would be a mutable dynamic array and in an imperative language that's what pretty much everyone would use. But if you asked an OCaml programmer, they would almost certainly use a linked list instead. This isn't because they're stupid or they don't care about performance but because the additional mental overhead of worrying about mutation outweighs the benefits of using a more efficient data structure. And that's a language failure! A good programming language shouldn't make you write worse code just because it's too limited to express the better alternative nicely. Option 1.1: Give up but only in IO If your language has some way of tracking side effects, you can give programmers full access to mutable data structures but only in a context where they can already use unrestricted side effects. Regardless of which other option a language chooses, I think having some form of unrestricted mutation for those cases where you do need the flexibility is important and in my opinion, this is the best way to achieve that. That said, because using mutation is now reflected in the function's type and infects all call sites, the problems from the previous section are massively excacerbated so this option is only suitable as a last resort or for programs that already mostly run in IO. Option 2: Locally sourced mutation only One way to make mutation compatible with functional programming is to make sure that it is limited in scope and that no mutable state ever escapes a region. This is how Haskell's ST Monad and Koka's st effect work. If no mutable state can leak, this means that the mutation becomes an implementation detail and from outside the region where mutation is allowed, the computation behaves exactly as if it were completely pure. If you know me, you might know that I really like this approach in principle. But I almost never use it in practice. Part of the reason why might just be the Haskell implementation, which employs very little dedicated language support (as opposed to something like Flix that presumably does a lot more inference), but in any case, annotating lifetimes is tedious enough that if you have something mutable inside a lot of structure, it's often easier to write the module that uses it in IO and make sure manually that it doesn't leak. And if you only need mutation locally inside a function, using ST makes your code fundamentally more imperative in a way that really forces you to change your programming style. This isn't great either and doesn't exactly help with readability, so the mental overhead is rarely worth it. Option 3: You can only read this section once Another way to use mutation without compromising on purity is to make sure you're never reusing a value after you have modified it. This is called linearity^1. In this case, there is no way to observe whether a data structure is persistent or mutable because to find out you would have to look at it after you've modified it! So now all your linear data structures can have APIs that look exactly like their persistent counterparts (aside from some linearity bookkeping) but internally they will mutate the data structure in-place. In theory, this sounds amazing and it is probably the best option presented here (also currently the least well-supported one in practice) but it still has problems. One issue is that it's not always easy to write linear code. For example, while needing exclusive access to write to a linear data structure is quite natural, you'll typically want to be able to share it for reading. There are nice solutions to this, such as Rust's shared XOR mutable references^2 but once you go down that path, you add some complexity and lose the ability to write code as if your data structures were fully persistent. But what's much more problematic in my opinion is that linearity is extremely infectious. You can add linearity as an opt-in feature (like in current Haskell) such that you only need to worry about it if you need it. Unfortunately, this makes it impossible to use any standard functions like map on linear values and either makes linearity nearly useless or inevitably creates a parallel, incomplete universe of functions that also work on linear values. Alternatively, you can make linearity the default and give every standard function a type that respects linearity. Now linearity works great and is really easy to use! However, now everyone needs to worry about linearity all the time since e.g. changing a function from using an argument linearly to non-linearly is a breaking change. This also means that to be compatible with the rest of the ecosystem, every polymorphic function needs to be written as if its arguments were linear. The same problem occurs with tracked effects although I think it's a lot easier to justify there since effects are something you'll want to think about in a functional language anyway. If you go down this path you might even run into this issue with linearity and effects at the same time. Option 4: Functional farming After the last section, you might have had an idea: if explictly tracking linearity is difficult but most programs that would benefit from it are effectively linear already, why don't we implicitly track linearity at runtime and just perform a copy if we have more than one reference? Well, that's exactly what Koka's Functional But In-Place (FBIP) and Swift's Copy-on-Write (CoW) data structures do! The advantage of this is that it's pretty much invisible from the surface. This time, your program actually looks exactly as if it were just using persistent data structures so you won't run into any of the virality issues you have with linearity. Koka can even infer a lot of this for you and perform in-place updates on seemingly immutable data structures. This might sound a little dangerous since accidentally holding on to a reference could turn a linear time algorithm quadratic, but it's actually a lot less problematic than it sounds! If there are two references to a data structure and one is modified, it will create its own unique copy and let go of the previous one, so any further modification to either reference can be performed in-place again. Unfortunately, this approach has a different, much more fundamental fatal flaw, that makes it unsuitable for most programming languages: It relies on reference counting. If you want to be able to tell that a data structure is used more than once at runtime, you need some way to keep track of references. A tracing garbage collector just doesn't give you this sort of information. There are some nice efforts into making reference counting more competitive for this kind of workload but at least for now even these struggle to keep up with tracing garbage collectors even when factoring in automatic reuse analysis. Reference counting might work for some languages, especially ones that don't care much about parallelism, but it's just not a realistic default. So, what do we do about this? Honestly, I don't know. Linearity is the most promising option out of the ones presented, but unless we find an answer to the virality problem, I doubt we will ever see it in more than a few semi-popular functional programming languages. I think there is room for a fundamentally new approach here, I'm just not sure what that would look like. Alternatively, some way to integrate regions more nicely with regular functional programming would probably help a ton. A capable solution should also be compatible with existing state management solutions. It shouldn't be easier to use a State monad/ effect over a persistent data structure than to use a proper mutable data structure. --------------------------------------------------------------------- 1. There is technically a difference between linear types that make you use every value exactly once and affine types that make you use values at most once (like in rust), but the distinction doesn't really matter for our purposes. 2. This slogan has always bothered me a little because it's not actually XOR. you can have values without any references to them! It's actually just "not (shared and mutable)" but I guess that doesn't roll off the tongue as nicely #programming languages#type systems#haskell#ocaml#PLT#mutation# functional programming#data structures see all --------------------------------------------------------------------- 7 comments You must log in to comment. in reply to @prophet's post: NireBryce NireBryce Nire Bryce@NireBryce7/29/2024, 7:36 PM it feels like the artificial separation between the camps (functional vs not) means there's so much work that's just not being done on things (like this) that could be, because the hard borders have been drawn in ways where often people are not mutually intelligible until they've done a lot of both :/ login to reply noahtheduke noahtheduke noah@noahtheduke7/29/2024, 8:09 PM Clojure [DEL:handles:DEL] deals with this in a couple different ways. 1. The persistent data structures are all versions of hash array mapped tries, so they get copy-on-write semantics by default (structural sharing) with the JVM's best-in-class garbage collector. 2. There's Transients, "mutable" versions of the built-in persistent data structures (and a mechanism for defining ones' own). The transient versions require you to use a "bang" version of the modification functions (conj vs conj!, etc), which means it's very explicit and the transient version can't be leaked. The transient versions do modifications in-place without any copy-on-write behavior, making them significantly faster. 3. You can always drop down to the host language. If you want a mutable hashmap, just use one directly. You can convert to a PersistentHashMap once you're done, or don't and continue to use the seamless interop code. I think transients are really nice. login to reply etrepum etrepum @etrepum7/29/2024, 9:32 PM 1. This is the flip side of the "Option 1: Give Up" coin, where you sacrifice space (and probably time) by not using mutability at all 2. Transients are "Option 3: Locally sourced mutation only" 3. This is "Option 1: Give Up" :) login to reply noahtheduke noahtheduke noah@noahtheduke7/29/2024, 10:37 PM This reads like you're trying to refute something I've said, but I'm not sure what. edit: i have other things to say but i want to double check your intent before I say them so we don't misunderstand each other. login to reply jjjollyjim jjjollyjim Jamie@jjjollyjim7/30/2024, 3:07 PM Personally I read "handles this" as you saying Clojure solves this problem, but then listing 2 things tantamount to giving up on the problem (not solving it), which is what the reply is trying to refute. The reply to point 2 is a helpful clarification login to reply noahtheduke noahtheduke noah@noahtheduke7/30/2024, 8:30 PM There's no problem to be solved, it's an open discussion of trade-offs. All of the options described by prophet have trade-offs, none of them are silver bullets. I felt it relevant to share how Clojure deals with the problem of "changing immutable data is slow, allowing mutation is risky" as a further point of discussion. Thanks for giving a possible read, that's helpful. login to reply twoai twoai TwoAi@twoai7/30/2024, 2:51 AM How analogous is this to borrow checking? And if it is similar, is there anything to learn or borrow from Rust? (From a distance it seems very similar, but I haven't yet properly learned/used Rust so idk if it's obviously different, literally the same, or anywhere in between.) login to reply Pinned Tags * haskell * ocaml * polaris * type systems * programming languages * (c) 2024 anti software software club llc * thanks for using cohost Legal * Terms of Use * Privacy Notice * Community Guidelines About * install cohost on your phone * @staff * Support * Credits * cohost status * Careers