[HN Gopher] Compilers for free with weval
___________________________________________________________________
Compilers for free with weval
Author : todsacerdoti
Score : 153 points
Date : 2024-05-19 11:33 UTC (11 hours ago)
(HTM) web link (bernsteinbear.com)
(TXT) w3m dump (bernsteinbear.com)
| tekknolagi wrote:
| Author here! Feel free to ask questions or leave comments or
| whatever.
| burakemir wrote:
| Thanks for the write-up.
|
| Neil Jones' (free) book about partial evaluation deserves a
| mention here, since it explains how/why it works in some
| detail.
|
| Query languages are a very cool application of this stuff. I
| was planning to use wasm for Mangle implementation (a language
| extending datalog, with its implementation a library for
| deductive database), following Tiark Rompf and Nada Amin's "a
| SQL to C compiler in 500 lines of code"... but of course I only
| have the interpreter for now and started a Rust rewrite
| instead.
| quag wrote:
| The free book on partial evaluations:
| http://www.itu.dk/people/sestoft/pebook/
| sestep wrote:
| Is the book enlightening beyond what one can learn from
| sources like this post and the aforementioned paper by
| Rompf and Amin? I'm trying to decide whether I should spend
| the time to dig deeper into it.
| azakai wrote:
| I understand that by making the bytecode constant a lot of
| optimization become possible. But I can't see how you can
| unroll the interpreter loop past unknown inputs to the program?
|
| That is, if the program is a pure function like power(x, y)
| (from the top of the article) then I understand if the bytecode
| is constant and the inputs x, y are constant then you can
| unroll completely. In fact, in that case, you can just do a
| full evaluation of the entire program to a constant. But, if x
| and y are unknown at compile time then you can do very little,
| I think?
|
| I guess the example at the top of the article is saying that if
| you know y but not x then partial evaluation can still help a
| lot. But in a real-world program the first unknown input that
| causes a significant branch in execution will block progress,
| won't it? (edit: in particular unrolling has to stop, unless
| you consider multiple paths)
| tekknolagi wrote:
| Weval has intrinsics that unroll the loop and specialize on
| the PC, merging paths with the same PC. So if you pick a good
| specialization context like PC, the unrolling stops
| eventually.
| n_plus_1_acc wrote:
| For a given function like `power`, you can unroll the
| interpreter loop and eliminate branches on the instruction,
| so the code has less branching overall.
| posnet wrote:
| I will never not be amazed by Futamura Projections.
| metadat wrote:
| They are pretty remarkable.
|
| https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...
| rst wrote:
| The most elaborate deployed version of this idea is probably
| PyPy, an alternative Python implementation that works by giving a
| JIT-compiling partial evaluator a hint-annotated Python
| interpreter as the program, and the Python code to be executed as
| its input. Slightly abstruse overview here:
| https://doc.pypy.org/en/latest/architecture.html
| tekknolagi wrote:
| I think PyPy moved away from partial evaluation to trace
| optimization, fwiw
| alserio wrote:
| Futamura projections should still be the basis for GraalVM
| evacchi wrote:
| they are for Truffle-based language implementations https://c
| hrisseaton.com/rubytruffle/pldi17-truffle/pldi17-tr...
| chubot wrote:
| PyPy used partial evaluation at one point, but now it's "meta-
| tracing":
|
| _Why did we Abandon Partial Evaluation?_ -
| https://www.pypy.org/posts/2018/09/the-first-15-years-of-pyp...
| 082349872349872 wrote:
| Taking meta-tracing to the next level would reconstruct not
| just hot loops but even the original CFG, as speculated in:
| https://news.ycombinator.com/item?id=38778372
|
| Anyone know of non-fictional work in this area?
| cfallin wrote:
| weval author here (thanks Max for the blog post!). Also AMA!
|
| The talk about weval that Max mentions was at NEU and also CMU;
| the latter was recorded and is here: https://vimeo.com/940568191
|
| I also plan to write up a blog post on weval in depth, plus its
| application to SpiderMonkey-on-Wasm, in a few months; it's pretty
| exciting though, currently getting 4-5x speedups on some
| benchmarks on a decidedly nontrivial interpreter!
| JonChesterfield wrote:
| I remember being really excited about the derive-a-compiler-from-
| a-partial-evaluator premise when I first ran across it. You
| genuinely do get a compiler from an interpreter plus a partial
| evaluator. That trick does work.
|
| The gotcha I didn't spot at the time is that a compiler is (I
| think, could be argued either way) easier to write than a partial
| evaluator. The "get it for free bit" only really applies if
| someone else gave you the partial evaluator _and_ if you somehow
| don 't need to spend ages debugging why that partial evaluator
| isn't behaving like you hoped.
|
| Now that I'm somewhat more beaten down by toolchain dev, I'm
| starting to think compilers are easier to write than
| interpreters. It's definitely not a clear win in favour of the
| interpreter. If you compile to x64, or to C, or to javascript or
| whatever, you now have a thing you can debug with whatever tools
| are native to that target. If you run in an interpreter, you get
| to debug that interpreter running the program, with whatever
| somewhat ad hoc debugging tools you put into the interpreter
| yourself.
|
| Getting useful semantic error message out of an interpreter at
| partial evaluation time (aka the "compile time" of the aggregate
| tool) is probably solvable but not likely to work out of the box.
| Partial eval isn't really a phase separation friendly thing.
| cfallin wrote:
| This is indeed a good point and something I want to write about
| when I eventually do a blog post on weval.
|
| A few counterpoints that I'd offer (and what led me to still
| take this approach):
|
| - If the target has sub-par debugging infrastructure, it can be
| easier to debug an interpreter (which is portable) then apply
| the semantics-preserving PE. In particular when targeting Wasm
| outside the browser, there is... not really a good debug
| experience, anywhere, for that. It was way easier to get an
| interpreter right by developing on native with gdb/rr/whatever,
| and then separately ensure weval preserves semantics (which I
| tested with lockstep differential execution).
|
| - Maintenance: if one is going to have an interpreter and a
| compiler anyway (and one often wants this or needs this e.g. to
| handle eval()), easier for them both to come from the same
| source.
|
| - Amortized cost: in the Wasm world we want AOT compilers for
| many languages eventually; there are interpreter ports with no
| Wasm backends; developing weval was a one-time cost and we can
| eventually apply it multiple times.
|
| - If the semantics of the existing interpreter are quite
| nontrivial, that can push the balance the other way. I designed
| weval as part of my work on SpiderMonkey; extremely nontrivial
| interpreter with all sorts of edge cases; replicating that in a
| from-scratch compiler seemed a far harder path. (It's since
| been done by someone else and you can find the "wasm32 codegen"
| patchset in Bugzilla but there are other phasing issues with it
| from our use-case's PoV; it's not true AOT, it requires codegen
| at runtime.)
|
| I don't think the tradeoff is always clear and if one is
| building a language from scratch, and targeting a simple ISA,
| by all means write a direct compiler! But other interesting
| use-cases do exist.
| JonChesterfield wrote:
| The divergent semantics risk between the interpreter and the
| compiler is a really big deal. It's genuinely difficult to
| get a language implementation to behave exactly as specified,
| even when the spec is do the same as some other
| implementation. Treating "compiled code" as specialising the
| interpreter with respect to the program is a great solution
| to that, since the bugs in the optimiser/partial-evaluator
| (they're kind of the same thing) are unlikely to be of the
| same class as bugs from independent implementations.
|
| Wasm is a really solid target for heroic compiler
| optimisations. It's relatively precisely specified, user
| facing semantic diagnostics are in some language front end
| out of sight, aliasing is limited and syscalls are finite
| with known semantics. Pretty much because it was designed by
| compiler people. You've picked a good target for this
| technique.
| titzer wrote:
| > The gotcha I didn't spot at the time is that a compiler is (I
| think, could be argued either way) easier to write than a
| partial evaluator.
|
| I agree that this is sometimes true, but there are several,
| IMHO bigger, issues:
|
| 1. The partial evaluation of the interpreter might degenerate
| back to getting a copy of the interpreter loop (or multiple!)
| and not achieve any speedup, just memory overhead
|
| 2. The partial evaluator might need additional annotations on
| what is and is not mutable; it might need a lot of help in
| order to constant-fold away a key piece of data. Tuning that
| and getting wrong and result in #1.
|
| 3. Partially evaluating the interpreter is a _pretty slow_
| compiler. You need to do the second Futamura projection (apply
| the partial evaluator to an interpreter loop without the user
| code) in order to get a fast compiler. That means the partial
| evaluator needs to be in the same IR as the partial evaluator
| 's input language.
|
| That said, I've chatted a bit with Chris offline about this,
| and I think this work is really cool and promising. I might
| tinker with this a little in Wizard too.
| JonChesterfield wrote:
| That is better expressed than my rambling! The futamura
| projection is satisfied by passing the "program" directly to
| the interpreter without optimising either, which is sometimes
| useful as a packaging exercise and does look somewhat like a
| non-optimising compiler.
|
| If you want this trickery to make a useful compiler, the
| partial evaluator picks up global value numbering, dead code
| elimination, loop transforms - the effective partial
| evaluator is very much the optimisation passes of a compiler.
| It can still be a win if someone else wrote said partial
| evaluator.
|
| Point 2 is the usual "my compiler is not sufficiently smart,
| I must annotate the program" problem, with the slight twist
| that you're annotating the interpreter in the hope that it
| does useful things with the end program. Interacts with
| hoping someone else built the dependency well.
|
| And yeah, generated compilers in this fashion have a
| reputation for being slow, and for not being great optimising
| compilers, where self application might dig oneself out of
| the hole. Very like adding optimisations to your compiler to
| make your compiler faster when building your compiler.
|
| All in all the compiler for free tagline (not specifically
| this post, it's written on a lot of futamura references)
| feels a bit like an inside joke. It reminds me of the sad
| discovery that a metacircular interpreter can't interpret
| itself after all since what you've written is a heavily
| obfuscated infinite loop.
| teo_zero wrote:
| If the wasi+weval version is better than the compiled one,
| doesn't it mean that the compiler made a bad job at optimizing
| out the constant parts?
|
| To be fair, in order for weval to do its magic, the code has to
| be changed and annotated. I'm wondering if a similar set of
| annotations could convince clang to generate a binary that runs
| in less than 40 ms.
| foota wrote:
| Clang doesn't know about the static code, so it can't do the
| optimizations this does.
|
| You could probably do the same thing with some hacky C++ code
| gen and templates, but I'm not sure clang et al would be able
| to optimize them.
| tekknolagi wrote:
| In the limit, that's called Copy and Patch!
| abeppu wrote:
| This is cool, but the post swaps in a tiny interpreter bc it says
| that CPython and SpiderMonkey are both huge. It then goes on to
| talk about modifying the interpreter to make some specific info
| visible to weval, and I can imagine that those changes would be
| considerable for CPython or SpiderMonkey. But if we ignore those
| specialization sections, and just consider the material up
| through the "Enter weval" section ... could weval have handled
| CPython or SpiderMonkey (or some other real interpreter that
| people use regularly)? Or does the partial evaluation have some
| scaling behavior that explodes or bogs down with large or complex
| interpreters? I.e. was the tiny interpreter needed just to make
| the following sections compact, or does even the basic use of
| weval require a small input?
| tekknolagi wrote:
| You should wait for Chris's upcoming post :) the minimal
| changes are the same as the ones I have for my tiny
| interpreter. There are some more advanced changes you can make
| for more performance
___________________________________________________________________
(page generated 2024-05-19 23:00 UTC)