[HN Gopher] My mental model of Clojure transducers
___________________________________________________________________
My mental model of Clojure transducers
Author : nathell
Score : 94 points
Date : 2023-09-09 13:10 UTC (1 days ago)
(HTM) web link (blog.danieljanus.pl)
(TXT) w3m dump (blog.danieljanus.pl)
| bjoli wrote:
| I wrote SRFI-171 for scheme. Ask me anything.
|
| I have had many people tell me that the SRFI document [0] made
| transducers click for them, which is always a nice thing to hear.
|
| 0: https://srfi.schemers.org/srfi-171/srfi-171.html
| mcbrit wrote:
| OK. It's a better map that can keep some state. (I scanned not
| read your doc and that was my takeaway; I have not thought
| about what I can do with state yet, particularly since I had
| never associated transducers with being able to keep state)
|
| Can you give me an example of a classic problem (since we're
| talking about transformations, raytracers and compilers come to
| mind) where if you involve a transducer vs a map you get an
| interesting difference?
|
| Edit: 'How state is kept is not specified': I assume that there
| are limitations to state keeping, particularly with the
| composability and performance pillars of transducers, but I'm
| just having a really hard time synthesizing everything.
| bjoli wrote:
| You could keep the state using the state monad, or by letting
| every reducer keep a transparent state in a linked list that
| whoever is pushing values through it has to handle. The
| reference implementation keeps it hidden using closures.
|
| This is mostly an API thing. In clojure you can pass a
| transducer when you create a channel. That way you can make a
| channel do just about anything. Send data in chunks of N.
| Filter Odd numbers. Or just do arbitrary transformations.
|
| It is a protocol for composable transformations of data being
| passed in one direction. It is not fancy. Not really hard to
| understand. A generalization of map, filter, and friends.
| talkingtab wrote:
| Sometimes sequence programming logic is easier to write and
| understand. Network streams being an example for me. The benefit
| I see of transducers is that it allows you to think with that
| model. For example if I want to find all image files on my
| computer, remove duplicates, etc. The processing is much easier
| for me to think about if I can think of it as one file at a time.
| Then some more sequential programming based on other
| characteristics of the file - like batching where and when it was
| taken.
|
| I like the concept so much that I built a JavaScript version
| using generators.
|
| Unfortunately, I do am not a fan of JavaScript at all and it
| appears that clojure now requires that I install a JVM. I realize
| that this is a very personal issue, but I am a "just say no to
| Java" person after using it for 10 years.
| yakubin wrote:
| How are transducers different than iterators in C++ or Rust or
| streams in Java?
| twic wrote:
| Okay, so you know streams in Java. You know a Collector? A
| transducer is roughly a Function<Collector, Collector> (with
| some more type variables to make that valid). Given a
| Collector, a transducer will make another Collector which does
| some stuff to its input and delegates to the underlying
| Collector.
|
| The nice thing about this is that you can implement all the
| non-terminal operations on Stream in terms of this. Mapping is
| passing each input element through a function before passing it
| on. Filtering is testing each element with a predicate, and
| only passing it on if it passes. Flat mapping is passing the
| element through a function to get a number of new elements,
| then passing them all on one by one.
|
| But you can do more! You can write a transducer which batches
| elements, a bit like the inverse of flat mapping. You can't do
| that with the existing Streams API at all!
|
| The things transducers operate on, which Clojure calls reducing
| functions, end up being a bit different to Collectors: their
| accumulate operation returns a boolean to control early
| stopping, which lets them implement take while, limit, and so
| on, and that means they can't be parallel. But the vibes are
| exactly the same.
| tadfisher wrote:
| A similar mechanism in another JVM language is Kotlin's
| sequences, which can also do windowed/chunked/batched
| operations by holding a little bit of intermediate state.
| Combine that with coroutines and it's pretty trivial to
| perform a parallel map/flatmap and saturate all cores. I'm
| surprised Streams can't do this.
| nextaccountic wrote:
| My reading is that a transducer is a map, that is, something
| that transforms iterators
|
| Like something that receives myiterator and returns
| myiterator.map(|x| x + 2) in rust
|
| But other than focusing on the map rather than on the iterator,
| I think it's the same thing
| fiddlerwoaroof wrote:
| The power it has on top of map is it can skip items and
| change the shape of the container: that is, it's a
| decomposition of a fold and not just a composable map.
|
| So, you can consume a vector and produce a subset of the
| vector as a map.
|
| Finally, transducers separate the element-wise processing
| from both the construction of the final sequence and the
| details of iterating over the input.
| waffletower wrote:
| Great that the author mentioned Christophe Grand's xforms
| library. It has transducers (and reducing functions) useful for
| hashmap processing such as x/by-key and x/reduce. Very useful for
| utilizing the transducer paradigm with a wider array of data
| types and problem spaces.
| ducktective wrote:
| >For the noncognoscenti...
|
| Expected nothing less from the legendary Clojure linguistics
| aficionados!
| 3cats-in-a-coat wrote:
| I don't have experience with Clojure, but this description makes
| transducers sound like simple pipeline of functions transforming
| a stream along the way. This is quite common and readily
| available in any language, it's also how Unix piping works (for
| text/binary, but that's a stream of chars/bytes), and I'm
| wondering why was the name "transducer" required.
| rowbin wrote:
| So like partially applied functions in Haskell?
| slowmovintarget wrote:
| Transducers are functions that return transformed reducing
| functions.
| crdrost wrote:
| It's not, this is that Lisp thing where you can check the
| length of your argument list and do something completely
| different when you don't get enough arguments.
|
| In this case `(map f)` notices that it was told to map a
| function but not told what to map it over, and so it decides to
| give you a new function. If this were a partially applied
| function the signature would be `[x] -> [y]` where `f: x -> y`
| would pick out the specifics.
|
| But this is actually a totally different signature, isomorphic
| to `x -> [y]`, the signature of generators. Specifically `(map
| f)` generates what in Haskell would be `\x -> [f x]`.
|
| However the type is not quite that straightforward for
| historical and compositional reasons; it is actually
| [?]z. (y -> z -> z) -> x -> z -> z
|
| With the implementation being here \handle x
| -> handle (f x)
|
| This is a sort of enhanced map that can do filtering because it
| uses concatMap (also known as >>=, "bind in the list monad") to
| combine. So to `(filter pred)` you would have the isomorphic
| versions, \x -> if pred x then [x] else []
| \handle x rest -> if pred x then handle x rest else rest
|
| This also leads to an _important nitpick_ for the article in
| question, a strictly better mental model of a transducer is not
| that it maps conveyor belts to conveyor belts, since that has
| more power than transducers do. (For instance, reverse is not a
| transducer.) But rather that it maps individual items on a
| conveyor belt, to their own conveyor belts on the first
| conveyor belt, then mashes them all together into one effective
| conveyor belt.
|
| So filter will either map an object to a singleton conveyor
| belt containing that thing, or an empty conveyor belt. You can
| implement `dupIf pred x = if pred x then [ x, x] else [x]` as a
| transducer too, `handle x (handle x rest)`. Conveyor belt that
| either has one or two elements on it. You can potentially put
| an infinite conveyor belt inside your conveyor belts and make a
| chunk of the input unreachable, although Clojure is strict so I
| have the feeling this will just run out of memory?
| rowbin wrote:
| Thanks, that helped
| lgrapenthin wrote:
| Not even by far
| bmacho wrote:
| You probably are thinking of normal functions, and not
| partially applied ones (which are also just normal functions
| that we get a special way totally unrelated here). Also I don't
| think they can reproduce Blammo! Our gnome is
| now packaging together incoming items into bundles of three,
| caching them in the interim while the bundle is not complete
| yet. But if we close the input prematurely, it will acknowledge
| and produce the incomplete bundle: (>!! b 4)
| (>!! b 5) (close! b) ; Value: [4 5]
| mrkeen wrote:
| Probably just plain old functions & laziness.
|
| If you want to interleave IO into it then probably a library
| like conduit.
| adityaathalye wrote:
| Plain old functions & eager evaluation & a bit more awesome
| sauce.
|
| Given transducers, we can compose mutually independent parts
| at will:
|
| - Data source (sequence, stream, channel, socket etc.)
|
| - Data sink (sequence, stream, channel, socket etc.)
|
| - Data transformer (function of any value -> any other value)
|
| - Data transformation process (mapping, filtering, reducing
| etc.)
|
| - Some process control (we can transduce finite data (of
| course) as well as streams, and also have optional early
| termination in either case. I'm not sure about first-class
| support for other methods like backpressure.)
|
| e.g. read numbers off a Kafka topic, FizzBuzz them, and send
| them to another Kafka topic, OR slurp numbers from file on
| disk, FizzBuzz them, and push into an in-memory queue. But
| each time, you don't have to rewrite your core fizzbuzz
| function, nor your `(map fizzbuzz)` definition.
|
| cf. https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-
| clojur...
| throwaway858 wrote:
| I'm not sure why the parent was downvoted, this sounds
| exactly like the Haskell conduit library (or indeed plain
| laziness if you don't need IO).
| waffletower wrote:
| I can't downvote, but might have as the first sentence is
| an over-simplication and misunderstanding -- particularly
| as laziness for collections has always been available in
| clojure.core. Clojure transducers offer an optimization
| orthogonal to collections best summed above with:
| "transducers allow you to define steps in collection
| processing _per item_ rather than having collection
| processing as a series of transformations of
| collections". Yes, transducers can be viewed as somewhat
| of an analog to the Haskell conduit library (as discussed
| here several years ago:
| https://hypirion.com/musings/haskell-transducers).
| However, I think the detractors coming from strongly
| typed languages are decidedly missing much of the
| generalization of the transducer model, particularly
| those conflating transducers exclusively with streams.
| curuinor wrote:
| It seems folks want a working example. Here's one in prod:
|
| Metabase is a BI tool, backend written mostly in Clojure. Like
| basically all BI tools they have this intermediate representation
| language thing so you write the same thing in "MBQL (metabase
| query language)" and it theoretically becomes same query in like,
| Postgres and Mongo and whatever. End user does not usually write
| MBQL, it's a service for the frontend querybuilding UI thing and
| lots of other frontend UI stuff mainly in usage.
|
| Whole processing from MBQL -> your SQL or whatever is done via a
| buncha big-ass transducers. There's a query cache and lots of
| other stuff, you need state, but you also need it to be basically
| fast. Metabase is not materially faster than other BI tools
| (because all the other BI tools do something vaguely similar in
| their langs and because the limiting factor is still the actual
| query running in most cases) but it's pretty comparable speed and
| the whole thing was materially written by like 5 peeps.
|
| https://github.com/metabase/metabase/blob/master/src/metabas...
|
| (nb: I used to work for Metabase but currently do not. but open
| core is open core)
| jwr wrote:
| Transducers are an under-appreciated feature in Clojure. They are
| incredibly useful, allow for composable and reusable code, and
| come with really nice performance benefits as well (by not
| creating intermediate collections).
|
| Once you get used to them, they become a very natural tool -- in
| my case, almost every time I do something to a sequence, I'll
| start with `into`. Even if I'm just applying a single
| transformation, this lets me quickly change the resulting
| collection and easily add more transformations if needed.
|
| On a higher level, I found myself thinking about business logic
| (model) in terms of transducer pipelines and ended up with a
| number of reusable transformations, and clearly specified
| pipeline logic.
|
| One gripe I have with transducers is that writing stateful
| transducers is hard. As in, well, really hard (hope you
| remembered to flush using `unreduced` in your single-arity
| version!). I still write them sometimes, but it's never a walk in
| the park (it does provide satisfaction when done, though). But I
| guess that's what you need to go through in order to get good
| performance.
| mcbrit wrote:
| What is generally missing from this category of article is a
| motivating statement, eg here is a problem that is easier or at
| least different (transformed into a different category of
| problem) given this idea. Up top. When I don't see this up top or
| scanning forward I assume the article assumes knowledge that I do
| not have, and I bounce.
| javajosh wrote:
| I put together a transducer in JavaScript with the intent to
| simplify Redux/flux architected front-ends. Rather than have
| synthetic actions that trigger reduction, I allow the caller to
| simply add objects directly into the store, and rely on them
| being combined in a useful way. And rather than leave it purely
| at dead data, I allow you to add functions, too, such that once
| added they can handle subsequent input. My `combine()` is a
| recursive `Object.assign()` plus reification of function calls:
| https://simpatico.io/combine2. It's that second part that makes
| it a transducer.
|
| This work predates the term "transducer" and I was happy to see
| Rich Hickey defining it, so I use the term retroactively.
|
| Note that I don't see much use for transducers as a general
| programming technique. This one use is very specific, but I
| have never before or since needed one.
| eyelidlessness wrote:
| FYI: I was curious and clicked your link, but it fully
| reloads the page every 2-3 seconds so I had to bail (iOS
| Safari if that helps)
| javajosh wrote:
| Thanks for the feedback, fixed. I like to use a simple
| method to speed up my BTD loop when doing webapp work,
| which is to add a `meta refresh` tag. But that's
| inappropriate for deployment. Note that my site is very
| strict about 3rd party content (none is allowed), and about
| privacy (the only log is in a tmux buffer), so there's no
| reason for this behavior other than my forgetfulness.
| vanderZwan wrote:
| > _This work predates the term "transducer" and I was happy
| to see Rich Hickey defining it_
|
| IIRC Hickey did not define this term (nor claimed to have
| done so), he found it in existing compsci research papers
| that approached them from a mathematical proof angle and
| realized they solved a problem he needed to solve (basically
| not allocating intermediate collections and doing wasteful
| work on parts of the data that will be immediately
| discarded).
|
| I think one of the authors of the papers he cited even gave a
| talk at a Clojure conference once.
| adityaathalye wrote:
| True, though I think this is part personal note, and part
| intended as additional material for people already trying to
| apply transducers effectively. The author suggests this by
| casting it as "my" mental model, in a work context.
|
| Maybe this variant of explanation will suit your context
| better:
|
| https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-clojur...
|
| I wrote the post as a way to explore Clojure's standard library
| using FizzBuzz as a device.
| mcbrit wrote:
| Most folks on HN who are interested in PLs, including me, are
| familiar with transducers at a high level. Composable,
| performant, yada yada ya. What we (or maybe just I) do not
| have is a nontrivial example of advantage.
|
| Your comment sharpened that for me. I don't want FizzBuzz; I
| want someone taking a reasonable toy problem, such as a
| trad+photon+quadtree raytracer and demonstrating advantage by
| applying the concept.
| sesm wrote:
| Initial problem was duplication of sequence library for CSP
| channels (core.async in Clojure). When the idea of
| 'transducers' (transformers of reducing function) was
| discovered, it turned out to have many additional useful
| properties and was integrated deeply into standard library.
| thom wrote:
| I've always felt any mention of reducers is a big
| distraction. They're not really key to the idea of
| transducers which are almost entirely about pure sequence to
| sequence transforms.
| casion wrote:
| To give a shallow overview, transducers allow you to define
| steps in collection processing _per item_ rather than having
| collection processing as a series of transformations of
| collections.
|
| So rather than processing the collection, passing it to the
| next function that processes the collection, passing it to the
| next... etc.. consuming all the CPU and memory that involves,
| you can define steps that are applied for each item in the
| collection thereby having the iteration through the collection
| happen once.
|
| These steps (transducers) are also composable and reusable.
|
| I suspect you know this, consider this a basic explanation for
| other people reading.
| mrkeen wrote:
| That seems like insufficient magic for the respect that
| transducers seem to have.
|
| What you've written is just the Haskell list monad or Java8
| Stream.flatmap.
| waffletower wrote:
| I think this opinion is less nuanced than it could be.
| Perhaps this will help?
| https://hypirion.com/musings/haskell-transducers
| lgas wrote:
| Surely the example that says these two would be
| equivalent is wrong: foldl (+) 0 (take
| n myList) reduce (takeNPlus 10) 0 myList
|
| I assume it should be (takeNPlus n)? (or (take 10
| myList))
| casion wrote:
| Yes, and like most language features it's not about the
| feature, it's about having _that feature_ in a language
| with other benefits.
|
| Think generics in Go or concurrency (effects) in OCAML or
| smart pointers in Rust. Not at all unique things, but
| having them in the language with other benefits is worth
| some discussion as it may provide extra leverage in
| context.
| BoiledCabbage wrote:
| > That seems like insufficient magic for the respect that
| transducers seem to have.
|
| This is 100% correct. It's amazing how over hyped they are.
|
| If you have ever used C#'s LINQ you are using transducers.
| The fact that LINQ works item at a time instead of
| collection at a time is all that's being discussed. That if
| you say take the first two in a 1,000,000 long collection
| LINQ will only enumerate the first 2 items and not all
| 1,000,000 is the other behavior. And the way to do this is
| by composing operations using a "." operator into a "query"
| to run.
|
| C# had had this since 2007. It doesn't require a fancy
| name, it doesn't require streams, it doesn't require
| "thought pieces" every few months for over a decade for
| people to "master thinking in linq". Just sequence your
| operations and get back to coding.
|
| And Clojure is a really great language, but the amount of
| mental space occupied by transducers is a bad look for the
| language. Via analogy it's like watching people be amazed
| by for loops for over a decade. And I know they're are a
| lot of really smart people in the clojure community, so I
| honestly put it on Rich. Either on hyping or up so much
| when he released it like he just invented sliced bread and
| it's a deep advanced topic, or for how it's presented in
| the language that people have to understand so much beneath
| the abstraction layer to use it correctly.
|
| Your average blub enterprise programmer has been using LINQ
| for 15 years and never needed 100 thought pieces in how to
| use it and reason about it. Yes it's a monad, yes it let's
| your short circuit, yes it's item at a time, but a user
| doesn't need to know lots of detail to use it. It's like
| watching a language community that can do calculus be
| continuously hypong on the fact it can do long division.
|
| Clojure is an amazing language, transducers are not that
| special, figure out why they are so hyped in clojure.
|
| Or maybe I have it backwards and linq/transducers really
| are partial differential equations and C# snuck it into the
| 4th grade curriculum and nobody noticed.
| joshlemer wrote:
| I think there's a big difference between transducers work
| and how other lazy stream libraries work. Not too
| familiar with LINQ but more with Java streams and Scala
| views and iterators. Usually these libraries wherever the
| work is forced, like in a final call of `.toList()` or
| whatnot, they are materializing something like a stack of
| iterators. Each iterator calls its child/children
| iterators and the pipeline logic is in the stack of
| `.next()` calls.
|
| The difference with transducers is that the
| transformation logic is completely decoupled from the
| fact that this is happening in collections. That is what
| makes them usable outside of the context of collections,
| most famously in core.async where you can
| map/filter/mapcat/flatten/take/drop channels just like
| you do with collections. This only works because
| transducers are completely decoupled from either the fact
| that elements are coming from, or going into, a
| collection. I think it is a really fascinating and
| creative achievement in decoupling. Java Streams and
| Scala view/iterator/iterable transformations can never be
| reused in other contexts since they are fundamentally
| about the collections. Whatever kind of Observable/Async
| Stream/Future or other places where you might want
| map/filter/reduce functionality has to reimplement the
| whole set of these operations anew (see for example Akka
| Streams, RxJava)
| waffletower wrote:
| Transducers do not require streams. You might want to
| learn a bit more about Clojure before poorly generalizing
| about this feature of the language. The OP did use
| streams to explore some specific applications of
| transducers. To say that they are overhyped is to be hung
| up on the buzz surrounding their initial release nearly 8
| years ago.
| [deleted]
| ajanuary wrote:
| It's been a while since I did any C#. How do I write a
| function that returns something that encapsulates the
| operations to perform on the stream that I can then
| compose with other operations? And can I then apply to
| whatever sequence abstraction I have? Isn't it limited to
| things that implement IEnumerable? I seem to remember
| when transducers were introduced, one of the selling
| points was it wasn't fixed to a specific sequence
| abstraction.
| thom wrote:
| In fairness, it's hard to picture a scenario where you
| couldn't have an IEnumerable wherever you might want one.
| User23 wrote:
| > If you have ever used C#'s LINQ you are using
| transducers. The fact that LINQ works item at a time
| instead of collection at a time is all that's being
| discussed. That if you say take the first two in a
| 1,000,000 long collection LINQ will only enumerate the
| first 2 items and not all 1,000,000 is the other
| behavior. And the way to do this is by composing
| operations using a "." operator into a "query" to run.
|
| Isn't that just function composition to build the map
| function? I guess that where the magic can come in is
| that function composition is associative, which allows
| for some really interesting opportunities for runtime
| optimization that, so far as I know, haven't been
| seriously explored.
| BoiledCabbage wrote:
| Yup, it's essential function composition.
|
| If IIRC technically it's a bit more. It's a monad just
| like function composition, but it's bind has to handle
| data threading and short-circuit behavior. If you squint
| it's not _too_ differently than than a parser combinator
| over Applicative matching a sequence of characters.
| Composing operations to correct thread sequential data
| while handing short circuiting.
|
| I'm not certain about the optimizations due to
| associativity. While yes function composition is
| associative, that just builds the query. Running the
| query itself must be sequential as each operation depends
| on the data from the prior, leaving I believe little room
| for optimization.
| vaylian wrote:
| If I understand you correctly, transduces transform something
| like
|
| (map fn1 (map fn2 (map fn3 collection)))
|
| into
|
| (map (fn [x] (fn1 (fn2 (fn3 x)))) collection); Is that
| correct?
|
| More idiomatic clojure:
|
| (map (comp fn1 fn2 fn3) collection)
| roguas wrote:
| Generally, yes, but I also think the fn1,fn2,fn3 can happen
| independently (which is why its powerful). So items of
| collection may be at different steps.
| sundarurfriend wrote:
| That sort of reminds of broadcast fusion [1] in Julia.
|
| Funnily enough, Julia is where I came across the term
| transducers too, via the Transducers.jl package [2]. The
| article and the comments here now make me wonder what the
| difference is, between broadcast fusion and transducers.
|
| [1] https://julialang.org/blog/2017/01/moredots/ [2]
| https://github.com/JuliaFolds2/Transducers.jl/
| casion wrote:
| You have the idea, yes.
| thom wrote:
| It's the mapping itself that can be composed with
| transducers. Where before you had pipelines of
| map/filter/whatever, now you have a single function
| representing the sequence operations, which can be used for
| any kind of sequence (a list in memory, or items coming in
| over a channel or message queue) item by item.
| tyre wrote:
| > To give a shallow overview, transducers allow you to define
| steps in collection processing _per item_ rather than having
| collection processing as a series of transformations of
| collections.
|
| This is a much better and clearer explanation than the entire
| article.
___________________________________________________________________
(page generated 2023-09-10 23:00 UTC)