[HN Gopher] Into CPS, Never to Return
___________________________________________________________________
Into CPS, Never to Return
Author : g0xA52A2A
Score : 127 points
Date : 2024-12-25 19:43 UTC (1 days ago)
(HTM) web link (bernsteinbear.com)
(TXT) w3m dump (bernsteinbear.com)
| upghost wrote:
| In case anyone is wondering, "when would I EVER use this (in
| hand-written code)?", it's a trick that makes DSL (domain
| specific language) and small language implementation much easier.
| A great reference for this is Peter Norvig's Paradigms of
| Artificial Intelligence Programming, when he implements a subset
| of Prolog and bolsters the functionality using CPS[1].
|
| The second, although more obscure, is that you can use it in
| languages that do not have "non-local exits" to terminate a
| deeply nested computation early or return to an earlier point in
| the call stack. For example, Clojure does not have nonlocal
| exits, as only the final form of the function is returned.
| However, using CPS, you can terminate the expression early and
| return to the original caller without executing the rest of the
| function. You probably only want to use this in specialized cases
| though or it may upset your team, they are tricky to debug.
|
| Lastly and probably most controversially, you can make an
| extensible "if" statement using CPS if you are in a dynamic
| language and you have no other tools to do so. Admittedly I do
| sometimes use this in ClojureScript. This allows you to write
| "append only" code without continually growing the "cond". Again,
| most teams don't like this, so it depends on the circumstances,
| but might be nice to have in your back pocket if you need it.
|
| [1]: https://github.com/norvig/paip-
| lisp/blob/main/docs/chapter12...
| remexre wrote:
| It's also really helpful with defunctionalization; search
| "defunctionalize the continuation" for stuff on that
| Vetch wrote:
| I've also used it to make a function tail recursive, which is
| useful in languages with TCO.
| codebje wrote:
| If your continuation has no state to hold your recursion
| should already be tail called.
|
| If it does have state, it's probably a stack allocated
| closure...
| cryptonector wrote:
| I've used something similar to CPS in async I/O programs, where
| at each point that a function wants more I/O it queues up a
| continuation for the event and then returns to the event loop.
| This forces one to represent state in a compact way and to use
| very little stack space for state representation, all of which
| reduces per-client memory footprint a great deal.
| upghost wrote:
| hey that's really clever. was this for open source or
| commercial? was it received well? what language? Very
| interesting technique
| cryptonector wrote:
| > was this for open source or commercial?
|
| Proprietary, though I have permission to open source it
| (but I need to make sure that $WORK picks a license for
| it).
|
| This was specifically for a poor man's Kafka as an HTTP
| server that implements "tail -f" semantics for GET when
| using `Range:` with the end offset left unspecified, and
| with `{URI, weak-ETag, offset}` resembling the Kafka
| `{topic, partition, offset}`. Non-range requests end when
| EOF is reached in the file being fetched. Range requests
| with the end offset specified end as soon as the requested
| range has been sent back. Range requests with the end
| offset left unspecified do not end until the file is
| unlinked or renamed away, using inotify to detect those
| cases, and every time data is written to the file that data
| is immediately sent to the clients that are waiting "at the
| tail".
|
| The connection acceptor in the event loop creates a client
| record and queues reads to call the reader continuation
| which is just `{client, reader}`, then when a request is
| fully read the reader continuation will queue up a writer
| continuation to write the response, and so on.
|
| The fact that the underlying abstraction is "just a file"
| makes this very easy to use because `write(2)` is how you
| publish data and `rename(2)` and `unlink(2)` is how you
| "commit transactions" / roll logs in your application, and
| clients immediately find out. All that completely
| asynchronously with how the data gets served. You can even
| script an application, like using bash, since `echo foo >>
| some_file` is how you publish data.
|
| It's extremely useful.
|
| > what language?
|
| C, specifically for Linux, using epoll and inotify. The
| server is multi-processed, with each process single-
| threaded, and it spawns NPROC workers (or however many you
| specify). But naturally this would work very well in Rust,
| C++, etc., and it could be multi-threaded instead of multi-
| processed.
|
| > Very interesting technique
|
| It's just C10K with a dash of CPS :)
|
| In this particular program the sequence of events is pretty
| linear. Since the progression is fairly linear all the
| continuation functions are one next to the other and one
| can just read them linearly. This makes the code quite
| readable.
| fweimer wrote:
| What's the alternative? Update the state object associated
| with the I/O event in place (and unregister/re-register it
| with the event loop as needed)?
| cryptonector wrote:
| > What's the alternative?
|
| Alternatives include: - thread per-client
| - thread per-client (but green threads) - co-
| routines
|
| Co-routines are very similar to what I'm already doing,
| really. For fairly linear code one can easily end up with
| co-routine functions that are multi-thousand lines of code
| long -- see PuTTY for example.
|
| > Update the state object associated with the I/O event in
| place (and unregister/re-register it with the event loop as
| needed)?
|
| I have a simple data structure to represent the
| continuation, which is really more like a closure,
| consisting of `{fd, data, handler_function}` (this being
| C). Events are typically one-shots. When queuing I/O I just
| update the handler_function of that struct and then
| register the one-shot event with epoll, re-using this
| "closure" object. When it comes time to close a connection
| and we're done handling events we free (or, rather, put on
| a free-list) the closure, and destruct the client data
| structure.
| joe_the_user wrote:
| _In case anyone is wondering, "when would I EVER use this (in
| hand-written code)?", it's a trick that makes DSL (domain
| specific language) and small language implementation much
| easier._
|
| Sure but that's still saying "this only gets used implementing
| a language". And that's OK because the continuation construct
| (A half-executed function passed to another function, wtf)
| seems like something I'd be horrified to find in the code of a
| normal application.
| codebje wrote:
| Callbacks with closures are continuations with bad syntax.
| Admittedly one could call those horrifying without being
| accused widely of hyperbole, so it's not a great counter to
| your point.
| upghost wrote:
| I'm not disagreeing with you. The three examples I mentioned
| were specifically about overcoming the limitations presented
| by some runtimes. If you have to reach for these, typically
| you know you are already swimming upstream. But sometimes the
| price of adopting a new tool is more expensive than using a
| specialized (hopefully well documented) application of CPS on
| the existing infrastructure.
|
| In the second example, for instance, you could throw an
| exception that gets caught higher up the stack. And in fact,
| that's often how call/cc with lexically scoped continuations
| are implemented in languages that do not support first class
| continuations.
|
| In that case, it really comes down to how you and your team
| feel about "exceptions as flow control".
|
| Of course, exception-based flow control doesn't help in
| situations where you are deeply recursing and have a limited
| stack. This is where "trampolining" using CPS is very
| effective.
|
| Typically languages implementing this go out of their way to
| hide the CPS though, because most people react to seeing it
| the way you did. And I don't blame them, it's pretty
| horrifying as you said!
| cma wrote:
| CPS continuation-passing style
| mattalex wrote:
| This is essentially the principle behind algebraic effects
| (which, in practice, do get implemented as delimited
| continuations):
|
| When you have an impure effect (e.g. check a database, generate
| a random number, write to a file, nondeterministic
| choices,...), instead of directly implementing the impure
| action, you instead have a symbol e.g "read", "generate
| number", ...
|
| When executing the function, you also provide a context of
| "interpreters" that map the symbol to whatever action you want.
| This is very useful, since the actual business logic can be
| analyzed in an isolated way. For instance, if you want to test
| your application you can use a dummy interpreter for "check
| database" that returns whatever values you need for testing,
| but without needing to go to an actual SQL database. It also
| allows you to switch backends rather easily: If your database
| uses the symbols "read", "write", "delete" then you just need
| to implement those calls in your backend. If you want to
| formally prove properties of your code, you can also do that by
| noting the properties of your symbols, e.g. `[?] key. read
| (delete key) = None`.
|
| Since you always capture the symbol using an interpreter, you
| can also do fancy things like dynamically overriding the
| interpreter: To implement a seeded random number generator, you
| can have an interpreter that always overrides itself using the
| new seed. The interpreter would look something like this
|
| ```
|
| Pseudorandom_interpreter(seed)(argument, continuation):
| rnd, new_seed <- generate_pseudorandom(seed, argument)
| with Pseudorandom_interpreter(new_seed):
| continuation(rnd)
|
| ```
|
| You can clearly see the continuation passing style and the
| power of self-overriding your own interpreter. In fact, this is
| a nice way of handeling state in a pure way: Just put something
| other than new_seed into the new interpreter.
|
| If you want to debug a state machine, you can use an
| interpreter like this
|
| ``` replace_state_interpreter(state)(new_state, continuation):
| with replace_state_interpreter(new_state ++ state):
| continuation(head state)
|
| ```
|
| To trace the state. This way the "state" always holds the
| entire history of state changes, which can be very nice for
| debugging. During deployment, you can then replace use a
| different interpreter
|
| ```
|
| replace_state_interpreter(state)(new_state, continuation):
| with replace_state_interpreter(new_state):
| continuation(state)
|
| ```
|
| which just holds the current state.
| upghost wrote:
| That's really interesting. This seems like it would be a
| really good approach to combine something like an otherwise
| pure finite state machine, but with state transitions that
| rely on communicating with external systems.
|
| Normally I emit tokens to a stack which are consumed by an
| interpreter but then it's a bit awkward to feed the results
| back into the FSM, it feels like decoupling just for the sake
| of decoupling even though the systems need to be maintained
| in parallel.
|
| I'll have to explore this approach, thank you!
| _zoltan_ wrote:
| I thought I'm going to read a piece about US's Child Protective
| Services, and clicked. Was a letdown :-)
| TapamN wrote:
| I was hoping, but not expecting, for it to be about the Capcom
| arcade board.
| MobiusHorizons wrote:
| I was pleased to see something properly technical, given that
| that is the supposed focus of this forum. I Hadn't ever heard
| of CPS as an IR, only SSA, and it was very interesting to learn
| about it.
| tekknolagi wrote:
| Funnily enough, both CPS and SSA are US government services
| cryptonector wrote:
| CPS is a predecessor technique to SSA, essentially.
| assbuttbuttass wrote:
| CPS is a cool technique, and makes advanced constructs like
| call/cc trivial to implement. Using CPS techniques in code can
| feel like magic.
|
| But CPS isn't really the state of the art for functional
| compilers anymore. It's complex for a human to read and hard to
| optimize. Something like A-normal form is much easier to work
| with for a compiler intermediate representation
|
| https://en.wikipedia.org/wiki/A-normal_form
| Mathnerd314 wrote:
| If you think ANF is great then explain how to deal with the
| transformation from "E (if x then a else b)" to "if x then E a
| else E b".
| codebje wrote:
| ANF transform will rewrite that to "let a0 = if x then an
| else b in E a0".
|
| This isn't identical to floating the call to E inside the
| "if", obviously, but would have (within predictable
| boundaries) the same performance. If this code went through
| LLVM to produce machine code I suspect it would wind up
| identical as LLVM will likely rewrite the duplicated call to
| be outside the "if".
| mrkeen wrote:
| See join points:
| https://dl.acm.org/doi/pdf/10.1145/3062341.3062380
| maplant wrote:
| I wouldn't call either "state of the art", it all depends on
| what you're doing and what you want to achieve. Many compilers
| use several different IRs to achieve different things
| childintime wrote:
| The author references [scrapscript](https://scrapscript.org/),
| hopefully we'll hear more about it.
| tekknolagi wrote:
| See my other blog posts and GitHub if you would like some more!
| upghost wrote:
| I realize this is something I should probably know but I've
| never been able to pin down a definition nor have I used a
| language with this feature -- could you maybe explain what a
| "content addressable language" is and why that is important?
| It seems like it's _really_ important but I have no idea why.
| You explain it here[1] but it 's somewhat brief. I realize it
| probably should be self-evident why it's important but it's
| not clicking for me.
|
| It's somewhat like when Jack Skelington is trying to explain
| the meaning of Christmas to the residents of Halloween
| Town[2] and somehow it hasn't stuck yet.
|
| [1]: https://scrapscript.org/#ContentAddressibleEverything
|
| [2]: https://youtu.be/l6QZUcRJw-A
| tekknolagi wrote:
| To be clear, I am not Taylor and I did not come up with
| scrapscript! That website belongs to him and I'm just a
| random guy on the internet who emailed him and built an
| implementation.
|
| The way I think about it is kind of like software versions.
| Say you want to do some kind of reproducible software. You
| can write that your software depends on jazzinator==3.0.0
| but that's kind of unsatisfying because it relies either on
| tools or people to make sure that versions are immutable.
| It also relies on jazzinator to specify its own
| dependencies pretty strictly. So maybe you end up inventing
| the concept of a lockfile to partially solve that problem,
| but you still have the issue of relying on a third party to
| not change things up on you.
|
| _However_ , if you can address a library by a name that is
| computed from its code (a hash), things get interesting.
| You end up with this tree/DAG of dependencies where every
| dependency is immutable--you can't change the code without
| also changing the name. It's a Merkle DAG like Git. You do
| lose the ability to say things like "above version 2.5"...
| sort of.
|
| If you also build a ref or tag system (similar to Git
| refs), you can kinda do normal style versioning on top of
| this content addressable world. But again, that relies on
| someone or something (maybe your hash-based store) to keep
| a map point-version names to hash names.
|
| The other thing that's interesting with scrapscript in
| particular is since everything is addressable, you can use
| (sort of implicitly import/download) any value from any
| other program in a repository readable by you. But no
| scrapscript implementation fully does this yet, as far as
| I'm aware. For a fully working system, check out Unison.
| upghost wrote:
| Oh that is really fascinating, thank you. That has some
| really interesting implications for dependency management
| that I will have to think through. It sounds like that
| implies some really useful tree properties on dependency
| checking, like if the part of the program you are relying
| on matches a certain hash you don't need to check down
| the chain because you can be sure the rest of the nodes
| are as you expect.
|
| Much appreciated!
| tekknolagi wrote:
| Now that I'm back at a computer:
|
| * https://bernsteinbear.com/blog/scrapscript/
|
| * https://bernsteinbear.com/blog/scrapscript-baseline/
|
| * https://bernsteinbear.com/blog/scrapscript-tricks/
|
| * https://bernsteinbear.com/blog/type-inference/
|
| * https://bernsteinbear.com/blog/row-poly/
|
| and the repo for the main implementation:
| https://github.com/tekknolagi/scrapscript
| jalict wrote:
| Scripted Pipelines in Jenkins are also using CPS in Groovy!
| https://plugins.jenkins.io/workflow-cps/#plugin-content-tech...
| maratc wrote:
| Declarative Pipelines do that too (i.e. not only Scripted
| ones.)
| fovc wrote:
| It's also useful in typed languages to introduce an existentially
| quantified type
| yapyap wrote:
| not gonna lie this title made me think this was gonna be some
| child protective services horror story or something
| purplesyringa wrote:
| > If you don't want to do any of this trampoline stuff, you can
| also do the One Big Switch approach which stuffs each of the funs
| and conts into a case in a massive switch statement. Calls become
| gotos.
|
| Just wanted to add that this is very similar to how async/await
| JavaScript code was compiled for runtimes that didn't support
| generators back in the day:
| http://facebook.github.io/regenerator/
| Joker_vD wrote:
| There is actually a way to produce CPS without lots of
| administrative redexes, which uses the same main trick that
| "higher-order transform" utilizes: tracking the whether the
| expression being transformed is in tail context or general
| context, and translating accordingly -- but without using meta-
| continuations because boy are those mind-boggling.
|
| Instead, the normal loops are used, and the subresults are
| accumulated in some builder structure which is then finalized to
| produce a complete tree of a CPS expression. The main trick is to
| figure out just _what_ this builder structure should be, and it
| turns out it 's a tree zipper of sorts!
|
| The margins of this comment are not big enough to sketch the
| whole thing but it's actually not as formidable as it sounds
| ("tree zipper", wooooh): it's mostly just a stack of tree-
| building instructions which can be easily extended/completed. An
| alternative would be to have a _mutable_ CPS expression as the
| result, and keep appending things into it, but it requires lots
| of tree traversal and lot of care to not accidentally leave the
| thing in not quite constructed state which is way too easy in my
| experience.
|
| I think I'll make and publish a gist with it when I get home
| because I don't believe I've ever seen it done this way anywhere.
| tekknolagi wrote:
| Please show this gist. I would enjoy linking to it
| upghost wrote:
| Sounds like this is for a syntactic or metaprogramming
| transformation over source code? I assume this implies you are
| either working with a homoiconic language or you have already
| converted the source to AST?
|
| However it sounds like a very portable technique, would be
| interested to see it.
___________________________________________________________________
(page generated 2024-12-26 23:01 UTC)