[HN Gopher] Into CPS, Never to Return
       ___________________________________________________________________
        
       Into CPS, Never to Return
        
       Author : g0xA52A2A
       Score  : 127 points
       Date   : 2024-12-25 19:43 UTC (1 days ago)
        
 (HTM) web link (bernsteinbear.com)
 (TXT) w3m dump (bernsteinbear.com)
        
       | upghost wrote:
       | In case anyone is wondering, "when would I EVER use this (in
       | hand-written code)?", it's a trick that makes DSL (domain
       | specific language) and small language implementation much easier.
       | A great reference for this is Peter Norvig's Paradigms of
       | Artificial Intelligence Programming, when he implements a subset
       | of Prolog and bolsters the functionality using CPS[1].
       | 
       | The second, although more obscure, is that you can use it in
       | languages that do not have "non-local exits" to terminate a
       | deeply nested computation early or return to an earlier point in
       | the call stack. For example, Clojure does not have nonlocal
       | exits, as only the final form of the function is returned.
       | However, using CPS, you can terminate the expression early and
       | return to the original caller without executing the rest of the
       | function. You probably only want to use this in specialized cases
       | though or it may upset your team, they are tricky to debug.
       | 
       | Lastly and probably most controversially, you can make an
       | extensible "if" statement using CPS if you are in a dynamic
       | language and you have no other tools to do so. Admittedly I do
       | sometimes use this in ClojureScript. This allows you to write
       | "append only" code without continually growing the "cond". Again,
       | most teams don't like this, so it depends on the circumstances,
       | but might be nice to have in your back pocket if you need it.
       | 
       | [1]: https://github.com/norvig/paip-
       | lisp/blob/main/docs/chapter12...
        
         | remexre wrote:
         | It's also really helpful with defunctionalization; search
         | "defunctionalize the continuation" for stuff on that
        
         | Vetch wrote:
         | I've also used it to make a function tail recursive, which is
         | useful in languages with TCO.
        
           | codebje wrote:
           | If your continuation has no state to hold your recursion
           | should already be tail called.
           | 
           | If it does have state, it's probably a stack allocated
           | closure...
        
         | cryptonector wrote:
         | I've used something similar to CPS in async I/O programs, where
         | at each point that a function wants more I/O it queues up a
         | continuation for the event and then returns to the event loop.
         | This forces one to represent state in a compact way and to use
         | very little stack space for state representation, all of which
         | reduces per-client memory footprint a great deal.
        
           | upghost wrote:
           | hey that's really clever. was this for open source or
           | commercial? was it received well? what language? Very
           | interesting technique
        
             | cryptonector wrote:
             | > was this for open source or commercial?
             | 
             | Proprietary, though I have permission to open source it
             | (but I need to make sure that $WORK picks a license for
             | it).
             | 
             | This was specifically for a poor man's Kafka as an HTTP
             | server that implements "tail -f" semantics for GET when
             | using `Range:` with the end offset left unspecified, and
             | with `{URI, weak-ETag, offset}` resembling the Kafka
             | `{topic, partition, offset}`. Non-range requests end when
             | EOF is reached in the file being fetched. Range requests
             | with the end offset specified end as soon as the requested
             | range has been sent back. Range requests with the end
             | offset left unspecified do not end until the file is
             | unlinked or renamed away, using inotify to detect those
             | cases, and every time data is written to the file that data
             | is immediately sent to the clients that are waiting "at the
             | tail".
             | 
             | The connection acceptor in the event loop creates a client
             | record and queues reads to call the reader continuation
             | which is just `{client, reader}`, then when a request is
             | fully read the reader continuation will queue up a writer
             | continuation to write the response, and so on.
             | 
             | The fact that the underlying abstraction is "just a file"
             | makes this very easy to use because `write(2)` is how you
             | publish data and `rename(2)` and `unlink(2)` is how you
             | "commit transactions" / roll logs in your application, and
             | clients immediately find out. All that completely
             | asynchronously with how the data gets served. You can even
             | script an application, like using bash, since `echo foo >>
             | some_file` is how you publish data.
             | 
             | It's extremely useful.
             | 
             | > what language?
             | 
             | C, specifically for Linux, using epoll and inotify. The
             | server is multi-processed, with each process single-
             | threaded, and it spawns NPROC workers (or however many you
             | specify). But naturally this would work very well in Rust,
             | C++, etc., and it could be multi-threaded instead of multi-
             | processed.
             | 
             | > Very interesting technique
             | 
             | It's just C10K with a dash of CPS :)
             | 
             | In this particular program the sequence of events is pretty
             | linear. Since the progression is fairly linear all the
             | continuation functions are one next to the other and one
             | can just read them linearly. This makes the code quite
             | readable.
        
           | fweimer wrote:
           | What's the alternative? Update the state object associated
           | with the I/O event in place (and unregister/re-register it
           | with the event loop as needed)?
        
             | cryptonector wrote:
             | > What's the alternative?
             | 
             | Alternatives include:                 - thread per-client
             | - thread per-client (but         green threads)       - co-
             | routines
             | 
             | Co-routines are very similar to what I'm already doing,
             | really. For fairly linear code one can easily end up with
             | co-routine functions that are multi-thousand lines of code
             | long -- see PuTTY for example.
             | 
             | > Update the state object associated with the I/O event in
             | place (and unregister/re-register it with the event loop as
             | needed)?
             | 
             | I have a simple data structure to represent the
             | continuation, which is really more like a closure,
             | consisting of `{fd, data, handler_function}` (this being
             | C). Events are typically one-shots. When queuing I/O I just
             | update the handler_function of that struct and then
             | register the one-shot event with epoll, re-using this
             | "closure" object. When it comes time to close a connection
             | and we're done handling events we free (or, rather, put on
             | a free-list) the closure, and destruct the client data
             | structure.
        
         | joe_the_user wrote:
         | _In case anyone is wondering, "when would I EVER use this (in
         | hand-written code)?", it's a trick that makes DSL (domain
         | specific language) and small language implementation much
         | easier._
         | 
         | Sure but that's still saying "this only gets used implementing
         | a language". And that's OK because the continuation construct
         | (A half-executed function passed to another function, wtf)
         | seems like something I'd be horrified to find in the code of a
         | normal application.
        
           | codebje wrote:
           | Callbacks with closures are continuations with bad syntax.
           | Admittedly one could call those horrifying without being
           | accused widely of hyperbole, so it's not a great counter to
           | your point.
        
           | upghost wrote:
           | I'm not disagreeing with you. The three examples I mentioned
           | were specifically about overcoming the limitations presented
           | by some runtimes. If you have to reach for these, typically
           | you know you are already swimming upstream. But sometimes the
           | price of adopting a new tool is more expensive than using a
           | specialized (hopefully well documented) application of CPS on
           | the existing infrastructure.
           | 
           | In the second example, for instance, you could throw an
           | exception that gets caught higher up the stack. And in fact,
           | that's often how call/cc with lexically scoped continuations
           | are implemented in languages that do not support first class
           | continuations.
           | 
           | In that case, it really comes down to how you and your team
           | feel about "exceptions as flow control".
           | 
           | Of course, exception-based flow control doesn't help in
           | situations where you are deeply recursing and have a limited
           | stack. This is where "trampolining" using CPS is very
           | effective.
           | 
           | Typically languages implementing this go out of their way to
           | hide the CPS though, because most people react to seeing it
           | the way you did. And I don't blame them, it's pretty
           | horrifying as you said!
        
         | cma wrote:
         | CPS continuation-passing style
        
         | mattalex wrote:
         | This is essentially the principle behind algebraic effects
         | (which, in practice, do get implemented as delimited
         | continuations):
         | 
         | When you have an impure effect (e.g. check a database, generate
         | a random number, write to a file, nondeterministic
         | choices,...), instead of directly implementing the impure
         | action, you instead have a symbol e.g "read", "generate
         | number", ...
         | 
         | When executing the function, you also provide a context of
         | "interpreters" that map the symbol to whatever action you want.
         | This is very useful, since the actual business logic can be
         | analyzed in an isolated way. For instance, if you want to test
         | your application you can use a dummy interpreter for "check
         | database" that returns whatever values you need for testing,
         | but without needing to go to an actual SQL database. It also
         | allows you to switch backends rather easily: If your database
         | uses the symbols "read", "write", "delete" then you just need
         | to implement those calls in your backend. If you want to
         | formally prove properties of your code, you can also do that by
         | noting the properties of your symbols, e.g. `[?] key. read
         | (delete key) = None`.
         | 
         | Since you always capture the symbol using an interpreter, you
         | can also do fancy things like dynamically overriding the
         | interpreter: To implement a seeded random number generator, you
         | can have an interpreter that always overrides itself using the
         | new seed. The interpreter would look something like this
         | 
         | ```
         | 
         | Pseudorandom_interpreter(seed)(argument, continuation):
         | rnd, new_seed <- generate_pseudorandom(seed, argument)
         | with Pseudorandom_interpreter(new_seed):
         | continuation(rnd)
         | 
         | ```
         | 
         | You can clearly see the continuation passing style and the
         | power of self-overriding your own interpreter. In fact, this is
         | a nice way of handeling state in a pure way: Just put something
         | other than new_seed into the new interpreter.
         | 
         | If you want to debug a state machine, you can use an
         | interpreter like this
         | 
         | ``` replace_state_interpreter(state)(new_state, continuation):
         | with replace_state_interpreter(new_state ++ state):
         | continuation(head state)
         | 
         | ```
         | 
         | To trace the state. This way the "state" always holds the
         | entire history of state changes, which can be very nice for
         | debugging. During deployment, you can then replace use a
         | different interpreter
         | 
         | ```
         | 
         | replace_state_interpreter(state)(new_state, continuation):
         | with replace_state_interpreter(new_state):
         | continuation(state)
         | 
         | ```
         | 
         | which just holds the current state.
        
           | upghost wrote:
           | That's really interesting. This seems like it would be a
           | really good approach to combine something like an otherwise
           | pure finite state machine, but with state transitions that
           | rely on communicating with external systems.
           | 
           | Normally I emit tokens to a stack which are consumed by an
           | interpreter but then it's a bit awkward to feed the results
           | back into the FSM, it feels like decoupling just for the sake
           | of decoupling even though the systems need to be maintained
           | in parallel.
           | 
           | I'll have to explore this approach, thank you!
        
       | _zoltan_ wrote:
       | I thought I'm going to read a piece about US's Child Protective
       | Services, and clicked. Was a letdown :-)
        
         | TapamN wrote:
         | I was hoping, but not expecting, for it to be about the Capcom
         | arcade board.
        
         | MobiusHorizons wrote:
         | I was pleased to see something properly technical, given that
         | that is the supposed focus of this forum. I Hadn't ever heard
         | of CPS as an IR, only SSA, and it was very interesting to learn
         | about it.
        
           | tekknolagi wrote:
           | Funnily enough, both CPS and SSA are US government services
        
           | cryptonector wrote:
           | CPS is a predecessor technique to SSA, essentially.
        
       | assbuttbuttass wrote:
       | CPS is a cool technique, and makes advanced constructs like
       | call/cc trivial to implement. Using CPS techniques in code can
       | feel like magic.
       | 
       | But CPS isn't really the state of the art for functional
       | compilers anymore. It's complex for a human to read and hard to
       | optimize. Something like A-normal form is much easier to work
       | with for a compiler intermediate representation
       | 
       | https://en.wikipedia.org/wiki/A-normal_form
        
         | Mathnerd314 wrote:
         | If you think ANF is great then explain how to deal with the
         | transformation from "E (if x then a else b)" to "if x then E a
         | else E b".
        
           | codebje wrote:
           | ANF transform will rewrite that to "let a0 = if x then an
           | else b in E a0".
           | 
           | This isn't identical to floating the call to E inside the
           | "if", obviously, but would have (within predictable
           | boundaries) the same performance. If this code went through
           | LLVM to produce machine code I suspect it would wind up
           | identical as LLVM will likely rewrite the duplicated call to
           | be outside the "if".
        
           | mrkeen wrote:
           | See join points:
           | https://dl.acm.org/doi/pdf/10.1145/3062341.3062380
        
         | maplant wrote:
         | I wouldn't call either "state of the art", it all depends on
         | what you're doing and what you want to achieve. Many compilers
         | use several different IRs to achieve different things
        
       | childintime wrote:
       | The author references [scrapscript](https://scrapscript.org/),
       | hopefully we'll hear more about it.
        
         | tekknolagi wrote:
         | See my other blog posts and GitHub if you would like some more!
        
           | upghost wrote:
           | I realize this is something I should probably know but I've
           | never been able to pin down a definition nor have I used a
           | language with this feature -- could you maybe explain what a
           | "content addressable language" is and why that is important?
           | It seems like it's _really_ important but I have no idea why.
           | You explain it here[1] but it 's somewhat brief. I realize it
           | probably should be self-evident why it's important but it's
           | not clicking for me.
           | 
           | It's somewhat like when Jack Skelington is trying to explain
           | the meaning of Christmas to the residents of Halloween
           | Town[2] and somehow it hasn't stuck yet.
           | 
           | [1]: https://scrapscript.org/#ContentAddressibleEverything
           | 
           | [2]: https://youtu.be/l6QZUcRJw-A
        
             | tekknolagi wrote:
             | To be clear, I am not Taylor and I did not come up with
             | scrapscript! That website belongs to him and I'm just a
             | random guy on the internet who emailed him and built an
             | implementation.
             | 
             | The way I think about it is kind of like software versions.
             | Say you want to do some kind of reproducible software. You
             | can write that your software depends on jazzinator==3.0.0
             | but that's kind of unsatisfying because it relies either on
             | tools or people to make sure that versions are immutable.
             | It also relies on jazzinator to specify its own
             | dependencies pretty strictly. So maybe you end up inventing
             | the concept of a lockfile to partially solve that problem,
             | but you still have the issue of relying on a third party to
             | not change things up on you.
             | 
             |  _However_ , if you can address a library by a name that is
             | computed from its code (a hash), things get interesting.
             | You end up with this tree/DAG of dependencies where every
             | dependency is immutable--you can't change the code without
             | also changing the name. It's a Merkle DAG like Git. You do
             | lose the ability to say things like "above version 2.5"...
             | sort of.
             | 
             | If you also build a ref or tag system (similar to Git
             | refs), you can kinda do normal style versioning on top of
             | this content addressable world. But again, that relies on
             | someone or something (maybe your hash-based store) to keep
             | a map point-version names to hash names.
             | 
             | The other thing that's interesting with scrapscript in
             | particular is since everything is addressable, you can use
             | (sort of implicitly import/download) any value from any
             | other program in a repository readable by you. But no
             | scrapscript implementation fully does this yet, as far as
             | I'm aware. For a fully working system, check out Unison.
        
               | upghost wrote:
               | Oh that is really fascinating, thank you. That has some
               | really interesting implications for dependency management
               | that I will have to think through. It sounds like that
               | implies some really useful tree properties on dependency
               | checking, like if the part of the program you are relying
               | on matches a certain hash you don't need to check down
               | the chain because you can be sure the rest of the nodes
               | are as you expect.
               | 
               | Much appreciated!
        
         | tekknolagi wrote:
         | Now that I'm back at a computer:
         | 
         | * https://bernsteinbear.com/blog/scrapscript/
         | 
         | * https://bernsteinbear.com/blog/scrapscript-baseline/
         | 
         | * https://bernsteinbear.com/blog/scrapscript-tricks/
         | 
         | * https://bernsteinbear.com/blog/type-inference/
         | 
         | * https://bernsteinbear.com/blog/row-poly/
         | 
         | and the repo for the main implementation:
         | https://github.com/tekknolagi/scrapscript
        
         | jalict wrote:
         | Scripted Pipelines in Jenkins are also using CPS in Groovy!
         | https://plugins.jenkins.io/workflow-cps/#plugin-content-tech...
        
           | maratc wrote:
           | Declarative Pipelines do that too (i.e. not only Scripted
           | ones.)
        
       | fovc wrote:
       | It's also useful in typed languages to introduce an existentially
       | quantified type
        
       | yapyap wrote:
       | not gonna lie this title made me think this was gonna be some
       | child protective services horror story or something
        
       | purplesyringa wrote:
       | > If you don't want to do any of this trampoline stuff, you can
       | also do the One Big Switch approach which stuffs each of the funs
       | and conts into a case in a massive switch statement. Calls become
       | gotos.
       | 
       | Just wanted to add that this is very similar to how async/await
       | JavaScript code was compiled for runtimes that didn't support
       | generators back in the day:
       | http://facebook.github.io/regenerator/
        
       | Joker_vD wrote:
       | There is actually a way to produce CPS without lots of
       | administrative redexes, which uses the same main trick that
       | "higher-order transform" utilizes: tracking the whether the
       | expression being transformed is in tail context or general
       | context, and translating accordingly -- but without using meta-
       | continuations because boy are those mind-boggling.
       | 
       | Instead, the normal loops are used, and the subresults are
       | accumulated in some builder structure which is then finalized to
       | produce a complete tree of a CPS expression. The main trick is to
       | figure out just _what_ this builder structure should be, and it
       | turns out it 's a tree zipper of sorts!
       | 
       | The margins of this comment are not big enough to sketch the
       | whole thing but it's actually not as formidable as it sounds
       | ("tree zipper", wooooh): it's mostly just a stack of tree-
       | building instructions which can be easily extended/completed. An
       | alternative would be to have a _mutable_ CPS expression as the
       | result, and keep appending things into it, but it requires lots
       | of tree traversal and lot of care to not accidentally leave the
       | thing in not quite constructed state which is way too easy in my
       | experience.
       | 
       | I think I'll make and publish a gist with it when I get home
       | because I don't believe I've ever seen it done this way anywhere.
        
         | tekknolagi wrote:
         | Please show this gist. I would enjoy linking to it
        
         | upghost wrote:
         | Sounds like this is for a syntactic or metaprogramming
         | transformation over source code? I assume this implies you are
         | either working with a homoiconic language or you have already
         | converted the source to AST?
         | 
         | However it sounds like a very portable technique, would be
         | interested to see it.
        
       ___________________________________________________________________
       (page generated 2024-12-26 23:01 UTC)