[HN Gopher] Purely Functional Data Structures (1996) [pdf]
___________________________________________________________________
Purely Functional Data Structures (1996) [pdf]
Author : debanjan16
Score : 300 points
Date : 2023-05-30 11:43 UTC (11 hours ago)
(HTM) web link (www.cs.cmu.edu)
(TXT) w3m dump (www.cs.cmu.edu)
| dang wrote:
| Related. Others?
|
| _Purely Functional Data Structures in Elm - course lecture notes
| (2015)_ - https://news.ycombinator.com/item?id=12145741 - July
| 2016 (15 comments)
|
| _What 's new in purely functional data structures since Okasaki?
| (2010)_ - https://news.ycombinator.com/item?id=11056704 - Feb
| 2016 (42 comments)
|
| _Purely Functional Data Structures (1996) [pdf]_ -
| https://news.ycombinator.com/item?id=10486481 - Nov 2015 (13
| comments)
|
| _Okasaki: Purely Functional Data Structures (1996) [pdf]_ -
| https://news.ycombinator.com/item?id=8327838 - Sept 2014 (1
| comment)
|
| _What 's new in purely functional data structures since Okasaki?
| (2010)_ - https://news.ycombinator.com/item?id=7081191 - Jan 2014
| (17 comments)
|
| _Ten Years of Purely Functional Data Structures (2008)_ -
| https://news.ycombinator.com/item?id=5701396 - May 2013 (24
| comments)
|
| _What 's new in purely functional data structures since
| Okasaki?_ - https://news.ycombinator.com/item?id=1983461 - Dec
| 2010 (2 comments)
|
| _What 's new in purely functional data structures since Okasaki_
| - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1
| comment)
|
| _" Purely Functional Data Structures" by Chris Okasaki [pdf]_ -
| https://news.ycombinator.com/item?id=1138979 - Feb 2010 (12
| comments)
|
| _Teaching, Playing, and Programming: Ten Years of Purely
| Functional Data Structures_ -
| https://news.ycombinator.com/item?id=112270 - Feb 2008 (2
| comments)
|
| _Chris Okasaki 's PhD thesis on purely functional data
| structures (pdf)_ - https://news.ycombinator.com/item?id=8221 -
| April 2007 (1 comment)
| okennedy wrote:
| This is the canonical guide to reasoning about amortized
| runtimes. The exponentially growing ArrayBuffer (O(1) amortized
| append) is the classical data structure used to teach amortized
| runtimes, but the O(1) amortized functional queue Okasaki
| presents here gives a much better intuition for what amortized
| runtimes are all about.
| odipar wrote:
| Okasaki got me interested in confluently persistent data-
| structures, way back in the 2000s.
|
| They seem magical! To be able to combine data from the past with
| current data, efficiently!
|
| They are almost always trees, with the exception of skip-lists,
| with all operations O(log(n)), .
|
| After creating my own programming language Enchilada that is
| based on immutable data structures, I started considering what I
| deemed "next level":
|
| _Uniquely represented confluently persistent data structures_
|
| Combined with a Merkle tree encoding of such uniquely represented
| data structures (they are almost always trees), you can
| efficiently and incrementally authenticate them. Think 'block
| chain' on steroids, with incremental cryptographic hashes. Or
| torrents, if you are into that kind of thing.
| weeksie wrote:
| This book is near and dear to my heart. Back in the mists of time
| (~05?) when I was learning Haskell I reimplemented several of
| these in order to get my head around things. Lots of fond
| memories.
| 2-718-281-828 wrote:
| is this being upvoted onto the homepage based on upvoters
| actually understanding that this paper from 1996 is of
| contemporary relevance and interest or more due to keywords like
| "pure", "functional", "data" and "structure"?
| paulgb wrote:
| I can't speak for everyone, but I upvoted it from nostalgia,
| having read the book version over a decade ago. I happened to
| be thinking about ordering a copy for the office just
| yesterday.
| dllthomas wrote:
| I have a copy on my desk. The bit about designing data
| structures by analogy to number systems (and limiting carry
| propagation) is really fun.
| bradrn wrote:
| > is this being upvoted onto the homepage based on upvoters
| actually understanding that this paper from 1996 is of
| contemporary relevance and interest ...?
|
| In my case, yes.
| jstrieb wrote:
| I've recently used a number of structures that I learned from
| this book. Though I don't know if the text represents the state
| of the art in purely functional data structures, it's a pretty
| seminal work in the area.
| UncleMeat wrote:
| Nowhere near the state of the art. Lots of improvements since
| this was published.
|
| It is a good book for learning. It is a decent book for
| reference. When you want to really fly you will want to reach
| for more recent work.
| hardlianotion wrote:
| Examples of more recent work for us non-specialists?
| nequo wrote:
| See the commenter here:
|
| https://news.ycombinator.com/item?id=36125110
| hardlianotion wrote:
| Missed it - thanks
| bluepod4 wrote:
| Your comment is giving early 2010s hipster "you probably never
| heard of it" vibes.
| ufo wrote:
| It is is still the go-to textbook for immutable data
| structures. Worth the read.
| turtleyacht wrote:
| The book is on Amazon, but this submission had those keywords,
| _plus_ it 's a PDF.
|
| Of course it is of contemporary relevance; functional
| programming (FP) is all the rage.
|
| The tricky bit are questions like
|
| How does this jive with existing JS functional constructs like
| "fantasy land," for example.
|
| How to "recruit" more folks to FP, or even a hybrid approach of
| objects interacting functionally
|
| Game jams using more FP-like data structures? Or more HN
| submissions like that.
|
| The harder things to evaluate are a lot of other topics, news-
| like but investigative and curious, or sites that are
| essentially selling a service (versus teaching the mechanism
| behind it).
|
| For SaaS stuff, since HN is about startups, I have to let it
| slide. But the _hacking_ piece is when one person accomplishes
| something with persistent decomposition of sequential problems,
| or does something clever using tools or ideas from a different
| context.
| solomatov wrote:
| My understanding is that the book is based on this PhD
| dissertation.
| turtleyacht wrote:
| Oh.. then, yes--this was unabashedly a (positively)
| triggered reaction (fortunately or unfortunately).
| schaefer wrote:
| This comment reads as if there is a clear, contemporary
| successor for learning pure functional data structures.
|
| Is there? If so, please do share a reference.
| eointierney wrote:
| I remember encountering an early draft when he was still doing
| his masters? Absolutely cracking paper, one for the ages. Thanks
| Chris :)
| lemper wrote:
| this literature was my "serious" introduction to fp. implemented
| some of the data structure on f#.
| mklauber1 wrote:
| Definitely read that headline as "Purely Fictional Data
| Structures". My disappointment is immense.
| 082349872349872 wrote:
| Purely Fictional Data Structures include: To
| Queue a Mockingbird The Call-stack of the Wild
| Tesselation of the d'Urbervilles One Flew Over the Cuckoo
| Hash The Data of the Woosters Brideshead Re-visitor
| The Catcher in the Trie Les Miser-tables The Nat-
| elim of Monte Cristo
| rntz wrote:
| title typo: "Purely Functional Data Structure" -> "Purely
| Functional Data Structures" (pluralization)
|
| reads a bit weird otherwise - sounds like it's discussing a
| particular purely functional data structure when it's actually a
| survey of many (pretty much the canonical survey of them, in
| fact).
| debanjan16 wrote:
| Thanks for pointing it out. I totally missed it. Sorry.
| tmtvl wrote:
| It's weird that there are so many claims in here that the data
| structures and algorithms are perfectly performant yet there
| isn't even one look at generated assembly or any acknowledgement
| of the underlying system that is supposed to _run_ the code.
| Proving things are Big O performant is neat, but at some point
| the code has to hit hardware.
| shepherdjerred wrote:
| > at some point the code has to hit hardware
|
| Yes, but that's not a concern of a computer scientist.
| Implementation and execution of the algorithm are up to the
| reader.
|
| It's like complaining that engineers don't do enough novel
| computer science research; of course they don't! It's not their
| job.
| Gravityloss wrote:
| Which profession is expected to make progress on actual
| software performance?
| shepherdjerred wrote:
| Computer scientists (and of course engineers) both care
| about real-world performance, but some computer scientists
| just care about the theory.
| fjeifisjf wrote:
| That's a very naive view of computer science. That is the
| attitude of people who have given up and decided that
| computers are too big for science now.
| shepherdjerred wrote:
| You're right, there are plenty of papers focusing on real-
| world performance. I chose not to capture the nuance
| because I wasn't sure how to express it succinctly.
| scott_s wrote:
| How I see it: computer science as an academic field is
| roughly split between _theory_ and _systems_. The theory
| folks tend to evaluate progress through proofs. The
| systems folks tend to evaluate progress through
| experiments.
| rgbgraph wrote:
| I've vouched for this, because it brings up an interesting
| point (albeit, one that proven provocative and been
| expressed before). Also, many of your comments are dead.
|
| Big O/algorithmic complexity is interesting in that it
| tends to abstract away the architecture of the underlying
| processor. A copy is a copy is a copy. Arithmetic is
| arithmetic is arithmetic. Regardless of what
| instructions/processing units the underlying processor has.
| Regardless of data access. Regardless of parallelization.
| Regardless of memory complexity. All we use are brutish
| units of "work" -- despite not all "workers" being equal.
|
| It reminds me a bit of scientific management: treating your
| "workers" as interchangeable units that carry out what has
| been decided to be the most efficient movements and
| processes -- completely disregarding the individual
| characteristics of the worker. For a humanist example,
| consider the individual quirks of the physiology (not only
| in size, length, and composition of the skeletal system via
| the muscles, bones, tendons and so on that would
| necessitate there being specific movements and workloads
| that best suit them -- but also in psychology; the brain as
| its own work-producing part that is uniquely suited and
| most efficient for specific workloads; rather than an all-
| encompassing abstract average "best workstyle" that makes
| no note of these characteristics, but simply decides to use
| external metrics like "time to pick up a box using X, Y, or
| Z form.")
|
| The same parallel can be drawn to computers: different
| processors and collections of hardware (what is essentially
| the "body and brain") have different quirks and different
| workloads they perform best at. For example, GPUs are much
| more useful in the case of vectorized/parallel workloads
| where the operations are relatively simple (f.e matrix
| arithmetic). While you can run an iterative matrix
| multiplication algorithm on a CPU in n^3 time -- your data
| access costs will be significant. Instead, running a
| parallel algorithm on a GPU, with RAM very close by, you
| can achieve log^2 n.
|
| This is where CS research really shines: not running away
| from the realities of physical systems, but embracing them.
| jove_ wrote:
| Also, while I haven't read any further yet, based on the table
| on page 3, their big O performance is pretty bad.
| gnulinux wrote:
| > It's weird
|
| It's not weird, this is standard in academic computer science.
| It would be weird to do _otherwise_. In a theoretical
| dissertation /paper like this you can't just randomly bring up
| compiled assembly, it's _completely_ and _utterly_ off topic,
| it 's not any more on topic than bringing up if the code was
| ran by an interpreter, or JVM, or transpiled to Haskell, or ran
| on GPU etc...
| nerdponx wrote:
| I can't speak for other fields, but in statistics and machine
| learning, it's not uncommon to see at least some practical
| experiments or simulations alongside theoretical results, or
| at least some discussion of practical considerations.
|
| There are definitely plenty of pure theory papers as well,
| and pure theory is still a contribution. However one would at
| least hope to see subsequent work that does focus on the
| practical aspect.
| amelius wrote:
| "Computer science is no more about computers than astronomy
| is about telescopes."
|
| -- Edsger W. Dijkstra
|
| However, I do believe that astronomers put references to the
| actual instruments they used in their publications.
| wk_end wrote:
| Other commenters have addressed the key point - it's _not_
| weird. CLRS doesn 't look at generated assembly either.
|
| One other thing I want to add, though, is that this book was
| published in 1996. The CPU landscape at the time was far more
| diverse, both in terms of ISA (which assembly?) and also in
| terms of capabilities. Even on the x86 line, plenty of people
| were still using 486s, which weren't even superscalar, and thus
| had pretty strikingly different performance characteristics
| than, say, a Pentium Pro (a "modern", OoO CPU). Tying your
| algorithmic analysis to a particular piece of hardware would be
| pretty useless; providing a satisfactory look at the
| performance on so many different (micro-)architectures could
| risk overwhelming the content.
|
| Moreover, at the time, performance was, maybe, a little more
| straightforward, making your concerns perhaps not quite as
| important. Memory was slower than the CPU but not the way it is
| now. Caches were much smaller and less useful. Branch
| prediction was more primitive. CPUs had relatively weak
| reordering capabilities, if any at all, and simpler pipelines.
| There were few dedicated SIMD instructions. This was a world
| where linked lists still often made sense, because the hardware
| penalties you paid for them were smaller. In other words: in
| 1996 the on-paper performance wasn't at quite as big a risk of
| diverging quite so wildly from the real-world performance, so
| keeping things simple and focusing on the on-paper performance
| was less objectionable.
| Tainnor wrote:
| Yeah, Big-O notation is only part of the picture and yes,
| "galactic algorithms" that are theoretically efficient but only
| for astronomically big inputs do exist, but in _many_ cases
| (especially if your algorithm isn 't doing anything
| particularly deep or clever) linear is faster than quadratic
| which is faster than exponential on real world input, too.
|
| Mathematics is the science of modeling and abstraction and that
| abstraction exists in order to make the process of knowledge
| gathering easier. It works very well for the sciences, so I
| would say the method has been proven. Of course, if the models
| turn out to be completely inappropriate, that would be
| different, but so far, the simple machine models used
| (implicitly or explicitly) in the analysis of algorithms seem
| to be rather reasonable.
|
| The alternative that you suggest, trying to benchmark _concrete
| implementations_ of algorithms has so many confounders that it
| would be very hard to execute properly. I 'm sure people are
| doing this too, but the formal Big O proof has a different
| quality to it because it does not depend on those particulars.
| agumonkey wrote:
| I'd even say that maths is what appears when you stop willing
| to pay for concretion.
| malkia wrote:
| For C++ check this one out - https://github.com/arximboldi/immer
| and this talk from the author -
| https://www.youtube.com/watch?v=_oBx_NbLghY (CppCon 2018: Juan
| Pedro Bolivar Puente "The Most Valuable Values")
| tpoacher wrote:
| I have a personal pet peeve about the misuse of terminology when
| dealing with such names, for which the only solution is to go
| read the original reference to figure out what they meant by it.
|
| E.g., in this case, to describe a data structure as "purely
| functional" makes zero sense to me intuitively at first. You need
| to go read the thesis and realise they're referring to data
| structures implemented as algebraic data types, which in the
| context of a purely functional language can themselves be thought
| of as functions, and can therefore be described as 'functional'
| ...
|
| But unless you do that, the first thought is going to be "huh?
| can arrays be further split into imperative vs functional?" "Does
| he mean immutable?" "Can I use functional arrays in c?" "Are they
| faster/safer than normal arrays?".
|
| By contrast, I think "Purely Functional Data-Structure Types"
| would have been a far more intuitive term ... but I suppose the
| author may have felt that clarifying further wasn't punchy
| enough, and could have made the thesis more wordy...
| Joker_vD wrote:
| A data structure _is_ a type, so "data-structure type" is a
| pleonasm. Please don't misuse terminology :)
| aeonik wrote:
| I'm reading this book right now. It's really great so far!
|
| I've been working a lot with Trees in Clojure, and have been
| hitting serious limitations of my understanding.
|
| I also found this YouTube video from a Clojure conference that
| reviews some different strategies for tree traversal in Clojure:
| https://youtu.be/YgvJqWiyMRY
|
| I thought that learning a Functional Lisp would make it really
| easy to traverse trees, since that is that the language is doing
| to actually execute its code.
|
| Turns out all the stuff I want to do is simply hard.
|
| Functional data structures are really awesome though, it just
| seems to take a bit of up front investment.
| jmkr wrote:
| This has been a topic I've wanted to get into for a few years
| now, specifically because of Clojure! So if you have any
| additional recommendations I'd appreciate it.
|
| I really enjoyed Friedman's book `Scheme and the Art of
| Programming` because it filled in some pieces missing from "The
| Little Schemer" (and "Seasoned Schemer"). Building stuff like
| `kons`, `map` and all that `letrec` stuff.
|
| But the big difference between Scheme and Clojure is that in
| Scheme, while it's "functional by concept," you get to escape
| with `(set! ...)` whenever you want to have a more traditional
| ds/algo (like say doing the n queens problem).
|
| In Clojure you can kind of do that by either escaping to Java,
| using protocols (with atoms), or using transients, but I often
| feel like there's a "more functional way to do stuff that
| hasn't been taught to me."
|
| I've opened up either the Okasaki dissertation or book or both,
| but I've always had trouble reading it, and then sticking with
| it. And some stuff like looking at Rosetta code and saying "to
| reverse a linked list in a lisp is easy... because it's a cons
| cell" seems like cheating. Almost like showing up to an
| interview, saying your "linked list" is implemented in an array
| structure and then calling `reverse()` on it.
|
| Will watch that talk from 2014, must not have seen it before.
|
| I guess, conceptually, day to day things in Clojure does feel
| pretty natural, even easier, and I think I have a decent
| understanding of it. But then when I look at leetcode type
| problems, or something more involved, it takes a lot of mental
| effort to translate to it. Especially things like `big O` gets
| thrown away in my mental model. I get it, persistent data
| structures and all that, but there's still a mystery there.
| smabie wrote:
| I feel like they are.. not so awesome: they are grossly
| inefficient due to all the pointer chasing and are pretty much
| guaranteed to be slower than the alternative since they trash
| your cache.
| aeonik wrote:
| Most of the problems I'm trying to solve require those
| pointers anyway.
|
| Tracking diffs and versions of data over time: Version
| control systems and RDMS just don't cut it.
|
| Bitemporal databases are interesting to me as well.
| agentultra wrote:
| You _feel_ this way but if you do the math and benchmarks you
| will be surprised.
|
| You might not be running GHC's RTS on a resource-constrained
| target platform but you can generate the C code that could
| run on that platform from Haskell, leveraging the guarantees
| you get with using Haskell's advanced type system.
|
| _Update_ : point is that GHC can optimize code and some of
| those optimizations can avoid pointer chasing! But like most
| optimizing compilers, some times GHC can miss them, and you
| have to do a bit of work to tell GHC where it can avoid
| chasing unnecessary pointers (or where it should/should-not
| inline, give it some help to know where it can do fusion,
| etc).
| jfoutz wrote:
| I wish I could remember the exact book, I think it's _writing
| solid code_. There was a long example about the excel
| evaluation engine back in the day when it was shipped on CD's
| and had to be perfect. The approach was, use one big dumb
| slow, but understandable and correct implementation, in
| parallel, use the lightning fast super whiz bang new
| implementation. Since both shared the same interface, they
| could be swapped out, or both run at the same time.
|
| I think there is real value in starting with the pure
| functional versions, then swapping out when needed. One
| problem that seems fairly common in large codebases is using
| an object as a hash key. but as the code grows, that object
| sort of gets passed around and eventually somebody updates it
| without rehashing. That's a tough bug find. They are for me
| anyway.
|
| This is one of those rare cases where you can actually make
| it faster later without trashing the overall design.
|
| I'd encourage starting with the pure functional version
| first, every time. I'd go further and say, leave some
| opportunity to flip back to the pure functional version in
| dev builds for isolating heisenbugs.
|
| Blah blah, grain of salt, free advice, your milage may vary,
| Games have different requirements than webpages, everything
| is contextual.
|
| This is one rare case where it's always worth having the
| capability of swapping back and forth is worth it. Just use
| it, and think hard before giving it up is a really good
| default.
| 1MachineElf wrote:
| I wanted to verify the book reference, and I think you
| could be correct.
|
| There is a website for it: https://writingsolidcode.com/
|
| The Amazon page it links to includes a few Index pages in
| the preview. I saw these under _E_ :
|
| Excel number formatting code 163
|
| Excel's dialog handling code 113
| wtetzner wrote:
| They're only grossly inefficient if you wouldn't otherwise
| need to frequently make copies of large mutable data
| structures. It all depends on what you're trying to do.
| duped wrote:
| There are entire fields of research dedicated to studying
| cache oblivious persistent data structures, and they're
| almost always trees.
|
| One of the key points of immutable datastructures is that
| they're easy to reason about given that the invariants are
| upheld. Designing an efficient memory representation that
| upholds those invariants is an entirely separate problem
| domain. Not every pointer dereference is slow, and not every
| tree is represented with pointers.
| mtrimpe wrote:
| I agree that it's a great book; but I wouldn't recommend it as
| an entry point into understanding purely functional data
| structures.
|
| Okasaki fleshes out a couple of examples and explains his
| thought processes and heuristics for developing such data
| structures quite well; but they're very much still notes on the
| early-stage exploration of the concept.
|
| From a historical perspective it's still a fascinating read
| though and definitely recommend it if you want to read up on
| the origins of immutable data structures.
| paddw wrote:
| It would be cool if someone could translate this into Typescript
| or the like, I think it would make it a lot more readable.
| haskellandchill wrote:
| TypeScript is much less readable than Haskell and OCaml, but
| you can easily find translations to TypeScript such as
| https://github.com/skeate/lambdata.
| nequo wrote:
| > TypeScript is much less readable than Haskell and OCaml
|
| That's like saying that Norwegian is much less readable than
| Italian. It is in the eye of the beholder. They can both
| express the same concepts but which one is more readable
| depends on which one you already know.
| substation13 wrote:
| Using this to push Brainfuck at work
| consilient wrote:
| > They can both express the same concepts
|
| Only in the turing tarpit sense. Out of the box, they have
| very different capabilities. For example:
|
| Higher-kinded types: easy in Haskell, hard in OCaml, mostly
| impossible in Typescript.
|
| First-class modules: OCaml has them, Typescript can sort of
| simulate them with extraordinarily unsafe prototype
| mangling stuff that you should never ever use, impossible
| in Haskell
|
| Open variants: Easy in OCaml and Typescript, hard in
| Haskell
| gexahaha wrote:
| There's also a nice addendum on cstheory.stackexchange, "What's
| new in purely functional data structures since Okasaki?" -
| https://cstheory.stackexchange.com/questions/1539/whats-new-...
| dang wrote:
| A few discussions:
|
| _What 's new in purely functional data structures since
| Okasaki? (2010)_ -
| https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42
| comments)
|
| _What 's new in purely functional data structures since
| Okasaki? (2010)_ - https://news.ycombinator.com/item?id=7081191
| - Jan 2014 (17 comments)
|
| _What 's new in purely functional data structures since
| Okasaki?_ - https://news.ycombinator.com/item?id=1983461 - Dec
| 2010 (2 comments)
|
| _What 's new in purely functional data structures since
| Okasaki_ - https://news.ycombinator.com/item?id=1713594 - Sept
| 2010 (1 comment)
| anfelor wrote:
| I would also recommend Koen Claessen's simplified finger trees
| (https://dl.acm.org/doi/abs/10.1145/3406088.3409026) and Zip
| trees (https://arxiv.org/pdf/1806.06726.pdf) for purely
| functional skip lists.
| mefarza123 wrote:
| Since Okasaki's work there have been several advancements and
| new developments in the field:(Source: MirrorThink.ai)
|
| 1. PaC-trees: Supporting Parallel and Compressed Purely-
| Functional Collections - 2022: This paper introduces PaC-trees,
| a purely functional data structure that supports parallel and
| compressed collections. PaC-trees are designed to be efficient
| in terms of space and time complexity while maintaining the
| benefits of functional data structures.
|
| 2. Proving tree algorithms for succinct data structures - 2019:
| This paper discusses the development of tree algorithms for
| succinct data structures, which are compact representations of
| data that support efficient query and update operations.
|
| These is some work in other fields too:
|
| 1. CyBy2: a strongly typed, purely functional framework for
| chemical data management - 2019-12-30: This paper presents
| CyBy2, a purely functional framework for managing chemical
| data. The framework is designed to be strongly typed and
| enforce referential transparency through the type system.
|
| 2. chemf: A purely functional chemistry toolkit - 2012-12-20:
| This paper introduces chemf, a purely functional chemistry
| toolkit that provides a set of algorithms and data structures
| for working with chemical data in a functional programming
| context.
| bafe wrote:
| [dead]
| smasher164 wrote:
| RRB trees in particular allow you to benefit from some of the
| cache efficiency of regular vectors, while supporting
| operations like immutable update. They are used in languages
| like Clojure and Scala.
| solomatov wrote:
| There's a book which looks like it's based on this dissertation,
| which probably is a better source to read if you are interested
| in this topic: https://www.amazon.com/Purely-Functional-Data-
| Structures-Oka...
| chalcolithic wrote:
| my dream language is rebol with immutable data structures only
| 1MachineElf wrote:
| What are software bugs that can be avoided by choosing data
| structures like these?
|
| I'm making a broad, high-level presentation about immutability in
| technology. At my company we have folks who have heard of it in
| the context of ransomware-resilient backups, others who have
| heard of it in the context of infrastructure as code, and very
| few who have heard of it in terms of data structures (distributed
| and non-distributed). My goal is to showcase the concept in
| various contexts so that people can better understand its role as
| a key design choice in technology.
|
| Personally I have no experience working on software that utilizes
| these, so if others here do, I would appreciate your input on how
| these make your software more reliable.
|
| The emphasis on software reliability and bugs-avoided is because
| the audience works under the company's risk-management division.
| chrisfinazzo wrote:
| Disclaimer: Not a programmer - I do not have a CS degree. One
| of my minors as an undergrad was MIS and while I took a
| database programming course as a senior, my knowledge is
| limited.
|
| The high-level explanation I recall reading years ago somewhere
| (Joel on Software, I think??) was that a) Functional programs
| have no side effects, leading to the notion that b) They can,
| without much effort, be adapted for parallel tasks.
|
| The example here was MapReduce, which was the original guts of
| the Google Search algorithm.
|
| E.g, Things are fast because we can send your query to many,
| many computers to find an answer, and chances are good that
| that machine is (relatively) close to you, which means low wait
| times to get your answer.
| thethimble wrote:
| Purely functional data structures are very common in purely
| functional languages like Haskell but are also used in non
| functional languages via libraries like immutable.js.
|
| At a high level, immutability forces you to be extremely
| deliberate about state changes in your application. This
| improves reasoning/understanding, reduces bugs, and eases
| debugging.
|
| An example of immutability that you might be familiar with
| would be react props/state. You don't modify your state. This
| makes reasoning about state much more simple.
| otto-edgar wrote:
| I think the point is that these data structures can be the
| foundation of a pure-functional style of programming.
|
| You wouldn't drop these into an ecosystem where the programmers
| are used to dealing with mutable structures.
|
| Now, whether using a purely functional language correlates with
| writing less bugs is a separate discussion, not sure what the
| modern consensus is.
| mrkeen wrote:
| > not sure what the modern consensus is.
|
| Too good a phrase to pass up!
|
| Modern consensus is done by constructing a replicated log -
| an append only data-structure shared across multiple workers.
| It's what you use Paxos for, and it's what Raft streamlines.
| red_admiral wrote:
| _Immutable_ data structures (not 100% the same thing, but a lot
| of overlap) avoid all kinds of concurrency problems, because
| you can safely pass them around and you'll never get a data
| race. You don't even need any locking (just to make things
| complicated, _lock-free_ data structures are another closely
| related but not identical concept).
|
| Once you're running a distributed system, this kind of stuff
| comes into its own.
| taeric wrote:
| Careful with this wording. They avoid shared memory
| mutations. They don't necessarily change data races. Rather,
| they just change them since, by definition, every edit is now
| creating stale data.
|
| At large, in distributed software, this is a distraction.
| Since most passing of messages around from one distributed
| piece to the other was already doing a copy across mediums.
| Such that sent data was already immutable from the senders
| perspective. (Granted, the preparation step can have you step
| on your own feet.)
| Chris_Newton wrote:
| Purely functional data structures are a means, not an end in
| themselves.
|
| Programming with immutable values and the more declarative
| style they support does design out at least one infamous source
| of bugs: shared mutable state. That has become increasingly
| important in recent years, for example because modern chips
| support ever increasing numbers of concurrent threads in
| hardware and because a lot of the software we write is now part
| of distributed systems.
|
| That programming style has other advantages in terms of
| readability and reasoning about code. If you can be confident
| that a particular name in the code always refers to the same
| value once it's been defined, tracing the flow of data through
| a function or through an entire system often becomes much
| easier. Having more understandable code naturally tends to
| improve reliability, among other advantages.
|
| If we adopt that programming style then we still need to
| represent collections of values and different relationships
| within our data, but we can't always use "traditional" data
| structures to do that any more because many of them rely on
| mutability, which we have excluded. So we need other ways to
| represent structured data that _are_ compatible with
| immutability and declarative style, and that is what purely
| functional data structures give us.
| okennedy wrote:
| As the peer comment points out, functional data structures
| force you to be deliberate about where/when state changes take
| place. In particular, they force you to be cognizant of which
| _version_ of a data structure a particular piece of code is
| referencing.
|
| Immutable structures can help significantly in reducing race
| conditions in parallel code, as it is impossible for one thread
| to modify part of a data structure while another thread is
| accessing it. Each thread sees a consistent 'version' of the
| data structure until the code explicitly replaces it with a new
| version.
|
| Immutability can also help with memory optimizations: A node of
| one data structure can be safely re-used in another data
| structure. For example, if you need to retain older versions of
| a tree, you can share common nodes (this is how GIT encodes
| version changes).
| alex_lav wrote:
| I would love to be educated. I've seen claims about the merit and
| value of functional programming throughout my (nowadays
| relatively long) programming career. In practice I've never once
| seen those values or merits come to fruition - just the same
| cycle all software goes through.
|
| My very direct experience recently has been Scala + cats resulted
| in the same buggy nonperformant software it was meant to prevent.
| I understand that bad programmers produce bad programs,
| regardless of language, but I feel pretty strongly that good
| tools prevent some amount of the typical "bad" that makes bad
| programs bad (ignoring the obviously maliciously bad examples).
| So I don't really understand, and would like to understand, if,
| how and when pure FP (and I suppose FP in general) actually
| improve quality of code/life outside of toy examples.
| mrkeen wrote:
| The bottom line is that pure FP means that the same input to a
| function gives you the same output.
|
| When you debug, you just give the program the same input which
| was problematic and you get to reproduce the error.
|
| Persistent data structures make it less wildly inefficient to
| do so.
| user2342 wrote:
| Thanks. I have the printed book, but a PDF is much more
| comfortable!
| EddieEngineers wrote:
| _Why_ would we want to use purely functional data structures?
| When do the pros of functional data structures outweigh the
| additional complexity? Are there scenarios when a project would
| want to pivot from a regular data structure to a purely
| functional one?
| yawaramin wrote:
| Do you use git? The git commit graph is a purely functional
| data structure.
| toolslive wrote:
| they have some nice properties that you might want to benefit
| from. For example, they're persistent: every previous state of
| the data structure is still in there.. aka snapshots!
| haroldl wrote:
| The most common point is that they're safe to share between
| threads making parallel algorithms easier to invent,
| understand, and implement correctly.
|
| You can also safely re-use sub-structures without performing a
| deep copy. For example, if you want to keep a sub-tree around
| for later you can do that in O(1) time because it's safe to
| keep a reference to it. If it is a mutable tree you don't know
| what's going to happen to it so you need to do a deep copy of
| the entire sub-tree you're holding on to. This can save a lot
| on memory allocation and copying depending on your use case.
| wtetzner wrote:
| There's a tradeoff between the complexity of the implementation
| of a data structure, and the use of one. While the complexity
| of implementing purely functional data structures is often
| (except maybe in the case of singly linked lists) higher than
| their mutable counterparts, actually using them in a program is
| simpler and less error prone.
|
| There are obviously other trade offs as well, like performance
| and memory usage.
| nimih wrote:
| As is explained in the abstract, purely functional data
| structures are most useful when you want to avoid assignment,
| either due to the restrictions of your programming environment
| (f.ex. you're programming in Haskell), or because that is a
| property you're interested in for other reasons (e.g. you want
| to ensure immutability due to concurrency concerns, or exploit
| persistence to improve performance).
| rg111 wrote:
| Deep Learning applications is one area.
|
| Traditionally, OO code is written all the time.
|
| But after I learned JAX/Flax, a light turned on inside my head
| and I now write functional Deep Learning code as much as I can.
| All my side projects and new code are purely functional in
| JAX/Flax. PyTorch has functional API known as functorch, and I
| have used it one project.
|
| Where lots and lots of data in 3,4,5 dimensional tensors exist,
| and you need to run lots of transformation on them, and then
| you need to multiply a huge number of them thousand times in
| each second- functional code makes much more sense and gives
| immense sanity and peace of mind.
|
| Those of you writing Deep Learning code, learn functional
| programming principles (immutable data, pure functions, leaving
| no side effect, etc.), and apply them to DL via functorch or
| JAX.
|
| Your life will never be the same.
| debanjan16 wrote:
| Do JAX and functorch have the same level of builtin
| functionalities (operations) as the original Pytorch library?
|
| Where to learn about it other than the documentation?
| rg111 wrote:
| JAX is very barebones and will require you to write much
| more code for the same task than you write in PyTorch.
|
| Functorch is still new, and honestly, there is little to
| learn if you already know JAX. There are some talks from
| Meta, and then there is always the docs.
| [deleted]
___________________________________________________________________
(page generated 2023-05-30 23:00 UTC)