[HN Gopher] My mental model of Clojure transducers
       ___________________________________________________________________
        
       My mental model of Clojure transducers
        
       Author : nathell
       Score  : 94 points
       Date   : 2023-09-09 13:10 UTC (1 days ago)
        
 (HTM) web link (blog.danieljanus.pl)
 (TXT) w3m dump (blog.danieljanus.pl)
        
       | bjoli wrote:
       | I wrote SRFI-171 for scheme. Ask me anything.
       | 
       | I have had many people tell me that the SRFI document [0] made
       | transducers click for them, which is always a nice thing to hear.
       | 
       | 0: https://srfi.schemers.org/srfi-171/srfi-171.html
        
         | mcbrit wrote:
         | OK. It's a better map that can keep some state. (I scanned not
         | read your doc and that was my takeaway; I have not thought
         | about what I can do with state yet, particularly since I had
         | never associated transducers with being able to keep state)
         | 
         | Can you give me an example of a classic problem (since we're
         | talking about transformations, raytracers and compilers come to
         | mind) where if you involve a transducer vs a map you get an
         | interesting difference?
         | 
         | Edit: 'How state is kept is not specified': I assume that there
         | are limitations to state keeping, particularly with the
         | composability and performance pillars of transducers, but I'm
         | just having a really hard time synthesizing everything.
        
           | bjoli wrote:
           | You could keep the state using the state monad, or by letting
           | every reducer keep a transparent state in a linked list that
           | whoever is pushing values through it has to handle. The
           | reference implementation keeps it hidden using closures.
           | 
           | This is mostly an API thing. In clojure you can pass a
           | transducer when you create a channel. That way you can make a
           | channel do just about anything. Send data in chunks of N.
           | Filter Odd numbers. Or just do arbitrary transformations.
           | 
           | It is a protocol for composable transformations of data being
           | passed in one direction. It is not fancy. Not really hard to
           | understand. A generalization of map, filter, and friends.
        
       | talkingtab wrote:
       | Sometimes sequence programming logic is easier to write and
       | understand. Network streams being an example for me. The benefit
       | I see of transducers is that it allows you to think with that
       | model. For example if I want to find all image files on my
       | computer, remove duplicates, etc. The processing is much easier
       | for me to think about if I can think of it as one file at a time.
       | Then some more sequential programming based on other
       | characteristics of the file - like batching where and when it was
       | taken.
       | 
       | I like the concept so much that I built a JavaScript version
       | using generators.
       | 
       | Unfortunately, I do am not a fan of JavaScript at all and it
       | appears that clojure now requires that I install a JVM. I realize
       | that this is a very personal issue, but I am a "just say no to
       | Java" person after using it for 10 years.
        
       | yakubin wrote:
       | How are transducers different than iterators in C++ or Rust or
       | streams in Java?
        
         | twic wrote:
         | Okay, so you know streams in Java. You know a Collector? A
         | transducer is roughly a Function<Collector, Collector> (with
         | some more type variables to make that valid). Given a
         | Collector, a transducer will make another Collector which does
         | some stuff to its input and delegates to the underlying
         | Collector.
         | 
         | The nice thing about this is that you can implement all the
         | non-terminal operations on Stream in terms of this. Mapping is
         | passing each input element through a function before passing it
         | on. Filtering is testing each element with a predicate, and
         | only passing it on if it passes. Flat mapping is passing the
         | element through a function to get a number of new elements,
         | then passing them all on one by one.
         | 
         | But you can do more! You can write a transducer which batches
         | elements, a bit like the inverse of flat mapping. You can't do
         | that with the existing Streams API at all!
         | 
         | The things transducers operate on, which Clojure calls reducing
         | functions, end up being a bit different to Collectors: their
         | accumulate operation returns a boolean to control early
         | stopping, which lets them implement take while, limit, and so
         | on, and that means they can't be parallel. But the vibes are
         | exactly the same.
        
           | tadfisher wrote:
           | A similar mechanism in another JVM language is Kotlin's
           | sequences, which can also do windowed/chunked/batched
           | operations by holding a little bit of intermediate state.
           | Combine that with coroutines and it's pretty trivial to
           | perform a parallel map/flatmap and saturate all cores. I'm
           | surprised Streams can't do this.
        
         | nextaccountic wrote:
         | My reading is that a transducer is a map, that is, something
         | that transforms iterators
         | 
         | Like something that receives myiterator and returns
         | myiterator.map(|x| x + 2) in rust
         | 
         | But other than focusing on the map rather than on the iterator,
         | I think it's the same thing
        
           | fiddlerwoaroof wrote:
           | The power it has on top of map is it can skip items and
           | change the shape of the container: that is, it's a
           | decomposition of a fold and not just a composable map.
           | 
           | So, you can consume a vector and produce a subset of the
           | vector as a map.
           | 
           | Finally, transducers separate the element-wise processing
           | from both the construction of the final sequence and the
           | details of iterating over the input.
        
       | waffletower wrote:
       | Great that the author mentioned Christophe Grand's xforms
       | library. It has transducers (and reducing functions) useful for
       | hashmap processing such as x/by-key and x/reduce. Very useful for
       | utilizing the transducer paradigm with a wider array of data
       | types and problem spaces.
        
       | ducktective wrote:
       | >For the noncognoscenti...
       | 
       | Expected nothing less from the legendary Clojure linguistics
       | aficionados!
        
       | 3cats-in-a-coat wrote:
       | I don't have experience with Clojure, but this description makes
       | transducers sound like simple pipeline of functions transforming
       | a stream along the way. This is quite common and readily
       | available in any language, it's also how Unix piping works (for
       | text/binary, but that's a stream of chars/bytes), and I'm
       | wondering why was the name "transducer" required.
        
       | rowbin wrote:
       | So like partially applied functions in Haskell?
        
         | slowmovintarget wrote:
         | Transducers are functions that return transformed reducing
         | functions.
        
         | crdrost wrote:
         | It's not, this is that Lisp thing where you can check the
         | length of your argument list and do something completely
         | different when you don't get enough arguments.
         | 
         | In this case `(map f)` notices that it was told to map a
         | function but not told what to map it over, and so it decides to
         | give you a new function. If this were a partially applied
         | function the signature would be `[x] -> [y]` where `f: x -> y`
         | would pick out the specifics.
         | 
         | But this is actually a totally different signature, isomorphic
         | to `x -> [y]`, the signature of generators. Specifically `(map
         | f)` generates what in Haskell would be `\x -> [f x]`.
         | 
         | However the type is not quite that straightforward for
         | historical and compositional reasons; it is actually
         | [?]z. (y -> z -> z) -> x -> z -> z
         | 
         | With the implementation being here                   \handle x
         | -> handle (f x)
         | 
         | This is a sort of enhanced map that can do filtering because it
         | uses concatMap (also known as >>=, "bind in the list monad") to
         | combine. So to `(filter pred)` you would have the isomorphic
         | versions,                   \x -> if pred x then [x] else []
         | \handle x rest -> if pred x then handle x rest else rest
         | 
         | This also leads to an _important nitpick_ for the article in
         | question, a strictly better mental model of a transducer is not
         | that it maps conveyor belts to conveyor belts, since that has
         | more power than transducers do. (For instance, reverse is not a
         | transducer.) But rather that it maps individual items on a
         | conveyor belt, to their own conveyor belts on the first
         | conveyor belt, then mashes them all together into one effective
         | conveyor belt.
         | 
         | So filter will either map an object to a singleton conveyor
         | belt containing that thing, or an empty conveyor belt. You can
         | implement `dupIf pred x = if pred x then [ x, x] else [x]` as a
         | transducer too, `handle x (handle x rest)`. Conveyor belt that
         | either has one or two elements on it. You can potentially put
         | an infinite conveyor belt inside your conveyor belts and make a
         | chunk of the input unreachable, although Clojure is strict so I
         | have the feeling this will just run out of memory?
        
           | rowbin wrote:
           | Thanks, that helped
        
         | lgrapenthin wrote:
         | Not even by far
        
         | bmacho wrote:
         | You probably are thinking of normal functions, and not
         | partially applied ones (which are also just normal functions
         | that we get a special way totally unrelated here). Also I don't
         | think they can reproduce                 Blammo! Our gnome is
         | now packaging together incoming items into bundles of three,
         | caching them in the interim while the bundle is not complete
         | yet. But if we close the input prematurely, it will acknowledge
         | and produce the incomplete bundle:              (>!! b 4)
         | (>!! b 5)       (close! b)       ; Value: [4 5]
        
         | mrkeen wrote:
         | Probably just plain old functions & laziness.
         | 
         | If you want to interleave IO into it then probably a library
         | like conduit.
        
           | adityaathalye wrote:
           | Plain old functions & eager evaluation & a bit more awesome
           | sauce.
           | 
           | Given transducers, we can compose mutually independent parts
           | at will:
           | 
           | - Data source (sequence, stream, channel, socket etc.)
           | 
           | - Data sink (sequence, stream, channel, socket etc.)
           | 
           | - Data transformer (function of any value -> any other value)
           | 
           | - Data transformation process (mapping, filtering, reducing
           | etc.)
           | 
           | - Some process control (we can transduce finite data (of
           | course) as well as streams, and also have optional early
           | termination in either case. I'm not sure about first-class
           | support for other methods like backpressure.)
           | 
           | e.g. read numbers off a Kafka topic, FizzBuzz them, and send
           | them to another Kafka topic, OR slurp numbers from file on
           | disk, FizzBuzz them, and push into an in-memory queue. But
           | each time, you don't have to rewrite your core fizzbuzz
           | function, nor your `(map fizzbuzz)` definition.
           | 
           | cf. https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-
           | clojur...
        
             | throwaway858 wrote:
             | I'm not sure why the parent was downvoted, this sounds
             | exactly like the Haskell conduit library (or indeed plain
             | laziness if you don't need IO).
        
               | waffletower wrote:
               | I can't downvote, but might have as the first sentence is
               | an over-simplication and misunderstanding -- particularly
               | as laziness for collections has always been available in
               | clojure.core. Clojure transducers offer an optimization
               | orthogonal to collections best summed above with:
               | "transducers allow you to define steps in collection
               | processing _per item_ rather than having collection
               | processing as a series of transformations of
               | collections". Yes, transducers can be viewed as somewhat
               | of an analog to the Haskell conduit library (as discussed
               | here several years ago:
               | https://hypirion.com/musings/haskell-transducers).
               | However, I think the detractors coming from strongly
               | typed languages are decidedly missing much of the
               | generalization of the transducer model, particularly
               | those conflating transducers exclusively with streams.
        
       | curuinor wrote:
       | It seems folks want a working example. Here's one in prod:
       | 
       | Metabase is a BI tool, backend written mostly in Clojure. Like
       | basically all BI tools they have this intermediate representation
       | language thing so you write the same thing in "MBQL (metabase
       | query language)" and it theoretically becomes same query in like,
       | Postgres and Mongo and whatever. End user does not usually write
       | MBQL, it's a service for the frontend querybuilding UI thing and
       | lots of other frontend UI stuff mainly in usage.
       | 
       | Whole processing from MBQL -> your SQL or whatever is done via a
       | buncha big-ass transducers. There's a query cache and lots of
       | other stuff, you need state, but you also need it to be basically
       | fast. Metabase is not materially faster than other BI tools
       | (because all the other BI tools do something vaguely similar in
       | their langs and because the limiting factor is still the actual
       | query running in most cases) but it's pretty comparable speed and
       | the whole thing was materially written by like 5 peeps.
       | 
       | https://github.com/metabase/metabase/blob/master/src/metabas...
       | 
       | (nb: I used to work for Metabase but currently do not. but open
       | core is open core)
        
       | jwr wrote:
       | Transducers are an under-appreciated feature in Clojure. They are
       | incredibly useful, allow for composable and reusable code, and
       | come with really nice performance benefits as well (by not
       | creating intermediate collections).
       | 
       | Once you get used to them, they become a very natural tool -- in
       | my case, almost every time I do something to a sequence, I'll
       | start with `into`. Even if I'm just applying a single
       | transformation, this lets me quickly change the resulting
       | collection and easily add more transformations if needed.
       | 
       | On a higher level, I found myself thinking about business logic
       | (model) in terms of transducer pipelines and ended up with a
       | number of reusable transformations, and clearly specified
       | pipeline logic.
       | 
       | One gripe I have with transducers is that writing stateful
       | transducers is hard. As in, well, really hard (hope you
       | remembered to flush using `unreduced` in your single-arity
       | version!). I still write them sometimes, but it's never a walk in
       | the park (it does provide satisfaction when done, though). But I
       | guess that's what you need to go through in order to get good
       | performance.
        
       | mcbrit wrote:
       | What is generally missing from this category of article is a
       | motivating statement, eg here is a problem that is easier or at
       | least different (transformed into a different category of
       | problem) given this idea. Up top. When I don't see this up top or
       | scanning forward I assume the article assumes knowledge that I do
       | not have, and I bounce.
        
         | javajosh wrote:
         | I put together a transducer in JavaScript with the intent to
         | simplify Redux/flux architected front-ends. Rather than have
         | synthetic actions that trigger reduction, I allow the caller to
         | simply add objects directly into the store, and rely on them
         | being combined in a useful way. And rather than leave it purely
         | at dead data, I allow you to add functions, too, such that once
         | added they can handle subsequent input. My `combine()` is a
         | recursive `Object.assign()` plus reification of function calls:
         | https://simpatico.io/combine2. It's that second part that makes
         | it a transducer.
         | 
         | This work predates the term "transducer" and I was happy to see
         | Rich Hickey defining it, so I use the term retroactively.
         | 
         | Note that I don't see much use for transducers as a general
         | programming technique. This one use is very specific, but I
         | have never before or since needed one.
        
           | eyelidlessness wrote:
           | FYI: I was curious and clicked your link, but it fully
           | reloads the page every 2-3 seconds so I had to bail (iOS
           | Safari if that helps)
        
             | javajosh wrote:
             | Thanks for the feedback, fixed. I like to use a simple
             | method to speed up my BTD loop when doing webapp work,
             | which is to add a `meta refresh` tag. But that's
             | inappropriate for deployment. Note that my site is very
             | strict about 3rd party content (none is allowed), and about
             | privacy (the only log is in a tmux buffer), so there's no
             | reason for this behavior other than my forgetfulness.
        
           | vanderZwan wrote:
           | > _This work predates the term "transducer" and I was happy
           | to see Rich Hickey defining it_
           | 
           | IIRC Hickey did not define this term (nor claimed to have
           | done so), he found it in existing compsci research papers
           | that approached them from a mathematical proof angle and
           | realized they solved a problem he needed to solve (basically
           | not allocating intermediate collections and doing wasteful
           | work on parts of the data that will be immediately
           | discarded).
           | 
           | I think one of the authors of the papers he cited even gave a
           | talk at a Clojure conference once.
        
         | adityaathalye wrote:
         | True, though I think this is part personal note, and part
         | intended as additional material for people already trying to
         | apply transducers effectively. The author suggests this by
         | casting it as "my" mental model, in a work context.
         | 
         | Maybe this variant of explanation will suit your context
         | better:
         | 
         | https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-clojur...
         | 
         | I wrote the post as a way to explore Clojure's standard library
         | using FizzBuzz as a device.
        
           | mcbrit wrote:
           | Most folks on HN who are interested in PLs, including me, are
           | familiar with transducers at a high level. Composable,
           | performant, yada yada ya. What we (or maybe just I) do not
           | have is a nontrivial example of advantage.
           | 
           | Your comment sharpened that for me. I don't want FizzBuzz; I
           | want someone taking a reasonable toy problem, such as a
           | trad+photon+quadtree raytracer and demonstrating advantage by
           | applying the concept.
        
         | sesm wrote:
         | Initial problem was duplication of sequence library for CSP
         | channels (core.async in Clojure). When the idea of
         | 'transducers' (transformers of reducing function) was
         | discovered, it turned out to have many additional useful
         | properties and was integrated deeply into standard library.
        
           | thom wrote:
           | I've always felt any mention of reducers is a big
           | distraction. They're not really key to the idea of
           | transducers which are almost entirely about pure sequence to
           | sequence transforms.
        
         | casion wrote:
         | To give a shallow overview, transducers allow you to define
         | steps in collection processing _per item_ rather than having
         | collection processing as a series of transformations of
         | collections.
         | 
         | So rather than processing the collection, passing it to the
         | next function that processes the collection, passing it to the
         | next... etc.. consuming all the CPU and memory that involves,
         | you can define steps that are applied for each item in the
         | collection thereby having the iteration through the collection
         | happen once.
         | 
         | These steps (transducers) are also composable and reusable.
         | 
         | I suspect you know this, consider this a basic explanation for
         | other people reading.
        
           | mrkeen wrote:
           | That seems like insufficient magic for the respect that
           | transducers seem to have.
           | 
           | What you've written is just the Haskell list monad or Java8
           | Stream.flatmap.
        
             | waffletower wrote:
             | I think this opinion is less nuanced than it could be.
             | Perhaps this will help?
             | https://hypirion.com/musings/haskell-transducers
        
               | lgas wrote:
               | Surely the example that says these two would be
               | equivalent is wrong:                    foldl (+) 0 (take
               | n myList)               reduce (takeNPlus 10) 0 myList
               | 
               | I assume it should be (takeNPlus n)? (or (take 10
               | myList))
        
             | casion wrote:
             | Yes, and like most language features it's not about the
             | feature, it's about having _that feature_ in a language
             | with other benefits.
             | 
             | Think generics in Go or concurrency (effects) in OCAML or
             | smart pointers in Rust. Not at all unique things, but
             | having them in the language with other benefits is worth
             | some discussion as it may provide extra leverage in
             | context.
        
             | BoiledCabbage wrote:
             | > That seems like insufficient magic for the respect that
             | transducers seem to have.
             | 
             | This is 100% correct. It's amazing how over hyped they are.
             | 
             | If you have ever used C#'s LINQ you are using transducers.
             | The fact that LINQ works item at a time instead of
             | collection at a time is all that's being discussed. That if
             | you say take the first two in a 1,000,000 long collection
             | LINQ will only enumerate the first 2 items and not all
             | 1,000,000 is the other behavior. And the way to do this is
             | by composing operations using a "." operator into a "query"
             | to run.
             | 
             | C# had had this since 2007. It doesn't require a fancy
             | name, it doesn't require streams, it doesn't require
             | "thought pieces" every few months for over a decade for
             | people to "master thinking in linq". Just sequence your
             | operations and get back to coding.
             | 
             | And Clojure is a really great language, but the amount of
             | mental space occupied by transducers is a bad look for the
             | language. Via analogy it's like watching people be amazed
             | by for loops for over a decade. And I know they're are a
             | lot of really smart people in the clojure community, so I
             | honestly put it on Rich. Either on hyping or up so much
             | when he released it like he just invented sliced bread and
             | it's a deep advanced topic, or for how it's presented in
             | the language that people have to understand so much beneath
             | the abstraction layer to use it correctly.
             | 
             | Your average blub enterprise programmer has been using LINQ
             | for 15 years and never needed 100 thought pieces in how to
             | use it and reason about it. Yes it's a monad, yes it let's
             | your short circuit, yes it's item at a time, but a user
             | doesn't need to know lots of detail to use it. It's like
             | watching a language community that can do calculus be
             | continuously hypong on the fact it can do long division.
             | 
             | Clojure is an amazing language, transducers are not that
             | special, figure out why they are so hyped in clojure.
             | 
             | Or maybe I have it backwards and linq/transducers really
             | are partial differential equations and C# snuck it into the
             | 4th grade curriculum and nobody noticed.
        
               | joshlemer wrote:
               | I think there's a big difference between transducers work
               | and how other lazy stream libraries work. Not too
               | familiar with LINQ but more with Java streams and Scala
               | views and iterators. Usually these libraries wherever the
               | work is forced, like in a final call of `.toList()` or
               | whatnot, they are materializing something like a stack of
               | iterators. Each iterator calls its child/children
               | iterators and the pipeline logic is in the stack of
               | `.next()` calls.
               | 
               | The difference with transducers is that the
               | transformation logic is completely decoupled from the
               | fact that this is happening in collections. That is what
               | makes them usable outside of the context of collections,
               | most famously in core.async where you can
               | map/filter/mapcat/flatten/take/drop channels just like
               | you do with collections. This only works because
               | transducers are completely decoupled from either the fact
               | that elements are coming from, or going into, a
               | collection. I think it is a really fascinating and
               | creative achievement in decoupling. Java Streams and
               | Scala view/iterator/iterable transformations can never be
               | reused in other contexts since they are fundamentally
               | about the collections. Whatever kind of Observable/Async
               | Stream/Future or other places where you might want
               | map/filter/reduce functionality has to reimplement the
               | whole set of these operations anew (see for example Akka
               | Streams, RxJava)
        
               | waffletower wrote:
               | Transducers do not require streams. You might want to
               | learn a bit more about Clojure before poorly generalizing
               | about this feature of the language. The OP did use
               | streams to explore some specific applications of
               | transducers. To say that they are overhyped is to be hung
               | up on the buzz surrounding their initial release nearly 8
               | years ago.
        
               | [deleted]
        
               | ajanuary wrote:
               | It's been a while since I did any C#. How do I write a
               | function that returns something that encapsulates the
               | operations to perform on the stream that I can then
               | compose with other operations? And can I then apply to
               | whatever sequence abstraction I have? Isn't it limited to
               | things that implement IEnumerable? I seem to remember
               | when transducers were introduced, one of the selling
               | points was it wasn't fixed to a specific sequence
               | abstraction.
        
               | thom wrote:
               | In fairness, it's hard to picture a scenario where you
               | couldn't have an IEnumerable wherever you might want one.
        
               | User23 wrote:
               | > If you have ever used C#'s LINQ you are using
               | transducers. The fact that LINQ works item at a time
               | instead of collection at a time is all that's being
               | discussed. That if you say take the first two in a
               | 1,000,000 long collection LINQ will only enumerate the
               | first 2 items and not all 1,000,000 is the other
               | behavior. And the way to do this is by composing
               | operations using a "." operator into a "query" to run.
               | 
               | Isn't that just function composition to build the map
               | function? I guess that where the magic can come in is
               | that function composition is associative, which allows
               | for some really interesting opportunities for runtime
               | optimization that, so far as I know, haven't been
               | seriously explored.
        
               | BoiledCabbage wrote:
               | Yup, it's essential function composition.
               | 
               | If IIRC technically it's a bit more. It's a monad just
               | like function composition, but it's bind has to handle
               | data threading and short-circuit behavior. If you squint
               | it's not _too_ differently than than a parser combinator
               | over Applicative matching a sequence of characters.
               | Composing operations to correct thread sequential data
               | while handing short circuiting.
               | 
               | I'm not certain about the optimizations due to
               | associativity. While yes function composition is
               | associative, that just builds the query. Running the
               | query itself must be sequential as each operation depends
               | on the data from the prior, leaving I believe little room
               | for optimization.
        
           | vaylian wrote:
           | If I understand you correctly, transduces transform something
           | like
           | 
           | (map fn1 (map fn2 (map fn3 collection)))
           | 
           | into
           | 
           | (map (fn [x] (fn1 (fn2 (fn3 x)))) collection); Is that
           | correct?
           | 
           | More idiomatic clojure:
           | 
           | (map (comp fn1 fn2 fn3) collection)
        
             | roguas wrote:
             | Generally, yes, but I also think the fn1,fn2,fn3 can happen
             | independently (which is why its powerful). So items of
             | collection may be at different steps.
        
             | sundarurfriend wrote:
             | That sort of reminds of broadcast fusion [1] in Julia.
             | 
             | Funnily enough, Julia is where I came across the term
             | transducers too, via the Transducers.jl package [2]. The
             | article and the comments here now make me wonder what the
             | difference is, between broadcast fusion and transducers.
             | 
             | [1] https://julialang.org/blog/2017/01/moredots/ [2]
             | https://github.com/JuliaFolds2/Transducers.jl/
        
             | casion wrote:
             | You have the idea, yes.
        
             | thom wrote:
             | It's the mapping itself that can be composed with
             | transducers. Where before you had pipelines of
             | map/filter/whatever, now you have a single function
             | representing the sequence operations, which can be used for
             | any kind of sequence (a list in memory, or items coming in
             | over a channel or message queue) item by item.
        
           | tyre wrote:
           | > To give a shallow overview, transducers allow you to define
           | steps in collection processing _per item_ rather than having
           | collection processing as a series of transformations of
           | collections.
           | 
           | This is a much better and clearer explanation than the entire
           | article.
        
       ___________________________________________________________________
       (page generated 2023-09-10 23:00 UTC)