[HN Gopher] Writing memory safe JIT compilers
___________________________________________________________________
Writing memory safe JIT compilers
Author : vips7L
Score : 109 points
Date : 2024-06-07 16:07 UTC (6 hours ago)
(HTM) web link (medium.com)
(TXT) w3m dump (medium.com)
| csjh wrote:
| Futamura projections are super cool, reminds me of another post
| recently https://news.ycombinator.com/item?id=40406194
| throwup238 wrote:
| This stuff is way over my head but if I understand correctly,
| PyPy is the most famous implementation of all three Futamura
| projections to JIT a subset of Python:
| https://gist.github.com/tomykaira/3159910
| simlevesque wrote:
| There's also this classic blogpost:
| http://blog.sigfpe.com/2009/05/three-projections-of-doctor-f...
| jacobp100 wrote:
| Is GraalVM actually competitive with V8?
| papercrane wrote:
| It depends. In my experience generally GraalVM is slower to
| start, but after a few iterations it can be as fast or faster,
| and that the same results can be amplified if you use the JVM
| instead of AOT (i.e. it's even slower to start, but eventually
| it can be much faster.)
|
| Of course this all depends on your specific code and uses
| cases.
| chc4 wrote:
| Truffle/Graal is also able to do some insanely cool things
| related to optimizing across FFI boundries: if you have a
| Java program that uses the Truffle javascript engine for
| scripting, Truffle is able to do JIT optimization
| transparently across the FFI boundry so it has 0 overhead.
| IIRC they even have some special API to allow a
| Truffle->native C library to be exposed to the runtime in a
| way that allows for it to optimize away a lot of FFI overhead
| or inline the native function into the trace. They were
| advertising this "Polyglot VM" functionality a lot a few
| years ago, although now their marketing mostly focuses on the
| NativeImage part (which helps a lot with the slow startup you
| mention).
|
| TruffleRuby even had the extremely big brain idea of running
| a _Truffle interpreter for C_ for native extensions instead
| of actually compiling them to native code, just so that
| Truffle can optimize transparently across FFI boundaries.
| https://chrisseaton.com/truffleruby/cext/
| MaxBarraclough wrote:
| I don't have anything to contribute to the Truffle
| discussion, but for those not familiar: Chris Seaton was an
| active participant on Hacker News, until his tragic death
| in late 2022. Wish he was still with us.
|
| https://news.ycombinator.com/threads?id=chrisseaton
|
| https://news.ycombinator.com/item?id=33893120
| chc4 wrote:
| Yes, it's extremely sad :( He was a giant in the Ruby and
| Truffle communities, and TruffleRuby was a monumental
| work for both projects.
| an-unknown wrote:
| > TruffleRuby even had the extremely big brain idea of
| running a Truffle interpreter for C for native extensions
| [...]
|
| TruffleC was a research project and the first attempt of
| running C code on Truffle that I'm aware of. It directly
| interpreted C source code and while that works for small
| self-contained programs, you quickly run into a lot of
| problems as soon as you want to run larger real world
| programs. You need everything including the C library
| available as pure C code and you have to deal with the fact
| that a lot of C code uses some UB/IB. In addition, your C
| parser has to fully adhere to the C standard and once you
| want to support C++ too because a lot of code is written in
| C++, you have to re-start from scratch. I don't know if
| TruffleC was ever released as open source.
|
| The next / current attempt is Sulong which uses LLVM to
| compile C/C++/Rust/... to LLVM IR ("bitcode") and then
| directly interprets that bitcode. It's a lot better,
| because you don't have to write your own complete C/C++/...
| parser/compiler, but bitcode still has various limitations.
| Essentially as soon as the program uses handwritten
| assembler code somewhere, or if it does some low level
| things like setjmp/longjmp, things get hairy pretty
| quickly. Bitcode itself is also platform dependent (think
| of constants/macros/... that get expanded during
| compilation), you still need all code / libraries in
| bitcode, every language uses a just so slightly different
| set of IR nodes and requires a different runtime library so
| you have to explicitly support them, and even then you
| can't make it fully memory safe because typical programs
| will just break. In addition, the optimization level you
| choose when compiling the source program can result in very
| different bitcode with very different IR nodes, some of
| which were not supported for a long time (e.g., everything
| related to vectorization). Sulong can load libraries and
| expose them via the Truffle FFI, and it can be used for C
| extensions in GraalPython and TruffleRuby AFAIK. It's open
| source [1] and part of GraalVM, so you can play around with
| it.
|
| Another research project was then to directly interpret
| AMD64 machine code and emulate a Linux userspace
| environment, because that would solve all the problems with
| inline assembly and language compatibility. Although that
| works, it has an entirely different set of problems:
| Graal/Truffle is simply not made for this type of code and
| as a result the performance is significantly worse than
| Sulong. You also end up re-implementing the Linux syscall
| interface in your interpreter, you have to deal with all
| the low level memory features that are available on Linux
| like mmap/mprotect/... and they have to behave exactly as
| on a real Linux system, and you can't easily export
| subroutines via Truffle FFI in a way that they also work
| with foreign language objects. It does work with various
| guest languages like C/C++/Rust/Go/... without modifying
| the interpreter, as long as the program is available as
| native Linux/AMD64 executable and doesn't use any of the
| unimplemented features. This project is also available as
| open source [2], but its focus somewhat shifted to using
| the interpreter for execution trace based program analysis.
|
| Things that aren't supported by any of these projects AFAIK
| are full support for multithreading and multiprocessing,
| full support for IPC, and so on. Sulong partially solves it
| by calling into the native C library loaded in the VM for
| subroutines that aren't available as bitcode and aborting
| on certain unsupported calls like fork/clone, but then you
| obviously lose the advantage of having everything in the
| interpreter.
|
| The conclusion is, whatever you try to interpret C/C++/...
| code, get ready for a world of pain and incompatibilities
| if you intend to run real world programs.
|
| [1] https://github.com/oracle/graal/tree/master/sulong
|
| [2] https://github.com/pekd/tracer/tree/master/vmx86
| jacobp100 wrote:
| I'm assuming that's a benchmark that runs a lot of hot code?
| I wonder why the start up is slower, if it's based off an
| interpreter, and all current VMs start that way too
| hmottestad wrote:
| I haven't tried the JS stuff in GraalVM, but I have tried it
| for Java. It's often faster than the regular JVM, especially
| good at escape analysis.
| vips7L wrote:
| You mean with the Graal JIT instead of C2? AFAIK Graals
| truffle (espresso) implementation of Java is still far behind
| HotSpot.
| hmottestad wrote:
| Yeah. The Graal JIT. I haven't tried Espresso, but now I
| got curious...
|
| For anyone else who got curious:
| https://www.graalvm.org/latest/reference-manual/java-on-
| truf...
| pizlonator wrote:
| Their perf claim is based on ancient benchmarks. Looks like the
| Octane suite. I bet they also made sure to ignore startup
| overheads.
|
| I'd only believe their perf claims if they used a more modern
| benchmark suite.
| azakai wrote:
| I am a big fan of automatically generating optimizing VMs from
| interpreters, like Graal (in this article) and also PyPy and
| others do. But the other approach, of writing a custom JIT for
| each language, seems much more flexible even if it is more
| dangerous and time-consuming.
|
| On small benchmarks I can believe the performance can be very
| similar between them, but I'd be more interested in large real-
| world codebases. I'm not aware of any myself, and can't seem to
| find any. Does anyone know?
| darby_nine wrote:
| > But the other approach, of writing a custom JIT for each
| language, seems much more flexible even if it is more dangerous
| and time-consuming.
|
| This is also a good illustration of how we ended up with the
| NULL problem. I don't think it's as big of a deal in this case
| as interpreters/vms/compilers are designed to be fungible in
| ways that source code was not, but it's something worth
| thinking on.
| almostgotcaught wrote:
| > and also PyPy
|
| this is a perpetually repeated misconception:
|
| > Why did we Abandon Partial Evaluation?
|
| https://www.pypy.org/posts/2018/09/the-first-15-years-of-pyp...
|
| > others do
|
| to my knowledge graal is the only production project using
| futamura projections (what you're talking about)
| azakai wrote:
| PyPy has experimented with multiple approaches here, yes, but
| as far as I know they did not abandon the overall approach of
| automatically converting an interpreter to an optimizing VM,
| which is what I mentioned. That's more general than partial
| evaluation (which is just one way to do such a conversion).
| almostgotcaught wrote:
| > automatically converting an interpreter to an optimizing
| VM
|
| but that is not what PyPy does
|
| > So, how did that tracing JIT generator work? A tracing
| JIT generates code by observing and logging the execution
| of the running program. This yields a straight-line trace
| of operations, which are then optimized and compiled into
| machine code. Of course most tracing systems mostly focus
| on tracing loops.
|
| > As we discovered, it's actually quite simple to apply a
| tracing JIT to a generic interpreter, by not tracing the
| execution of the user program directly, but by instead
| tracing the execution of the interpreter while it is
| running the user program (here's the paper we wrote about
| this approach).
|
| So it's just a tracing JIT but applied to the interpreter.
| One way to put it - it's effectively the same benefit as
| just running jython to begin with.
| oasisaimlessly wrote:
| The fact that they had to make up a new term for the
| technique (a "meta-tracing JIT") should be a hint that
| what they're doing is somewhat novel/unusual.
|
| If you read the paper[1] linked in your quote, you'd see
| that it is not "just a tracing JIT"; the interpreter
| being run under the JIT has some special hooks that let
| it tell the JIT e.g. where the program counter is:
|
| > Since the tracing JIT cannot know which parts of the
| language interpreter are the program counter, the author
| of the language interpreter needs to mark the relevant
| variables of the language interpreter with the help of a
| hint. The tracing interpreter will then effectively add
| the values of these variables to the position key. This
| means that the loop will only be considered to be closed
| if these variables that are making up the program counter
| at the language interpreter level are the same a second
| time. Loops found in this way are, by definition, user
| loops.
|
| This is vastly distinct from how Jython works.
|
| [1]: https://foss.heptapod.net/pypy/extradoc/-/blob/branc
| h/extrad...
| almostgotcaught wrote:
| > The fact that they had to make up a new term for the
| technique (a "meta-tracing JIT") should be a hint that
| what they're doing is somewhat novel/unusual.
|
| you say this and then quote directly what the novelty is
| and so i ask you - does that piece warrant a whole new
| term?
|
| > This is vastly distinct from how Jython works.
|
| jython isn't _doing_ anything - it 's a python
| interpreter that runs on the jvm. my point was that
| that's the same thing: an interpreter for a language that
| itself is being jitted.
| oasisaimlessly wrote:
| > does that piece warrant a whole new term?
|
| In my opinion: Yes, definitely! Without meta-tracing
| techniques, a JIT'd interpreter can only hope to be on
| par with a compiled interpreter, never significantly
| faster. It will never go beyond the limitations of an
| interpreter, because the JIT can't "see" the user code
| that the interpreter is running.
| azakai wrote:
| I'm not familiar enough with Jython, but I do consider
| applying a tracing JIT to an interpreter as a way to
| automatically convert an interpreter to an optimizing VM.
|
| I suppose if the tracing JIT were very complex and very
| tailored to a single language that might not make sense,
| but my impression of PyPy is that the opposite is true,
| and PyPy can in fact run other languages than Python.
| almostgotcaught wrote:
| > but I do consider applying a tracing JIT to an
| interpreter as a way to automatically convert an
| interpreter to an optimizing VM.
|
| there's nothing automatic about it. the user above you
| quoted from their paper - the user (person writing the
| interpreter) must annotate their code to pass hints to
| the tracing jit.
| azakai wrote:
| It is still largely automatic. Some amount of special
| annotations are needed in all these related approaches.
| For example, in this article, @CompilationFinal is used
| in Graal interpreters.
|
| (It is possible more annotations are needed in PyPy than
| Graal, but still, an annotated interpreter is far, far
| simpler than writing an optimizing JIT!)
| metadat wrote:
| Futamura projection refresher:
|
| https://en.wikipedia.org/wiki/Partial_evaluation
|
| Badass future space tech.
| quikoa wrote:
| Is there a difference in the amount of memory used for both
| methods?
| dzaima wrote:
| Of course, this just moves the safety question from a javascript-
| specific JIT to the Truffle JIT compiler and the partial
| evaluator. This can have some benefits (only one JIT to
| improve/fix across many languages), but can still have safety
| bugs.
|
| And the big tradeoff is that the general JIT may be less capable
| of doing language-specific optimizations (indeed such
| optimizations have a chance to introduce bugs as the linked V8
| blog shows, but they also can be correct and significantly
| improve perf in cases where the general JIT doesn't have the
| necessary info to do it itself).
| krylon wrote:
| The first two times, my brain wanted to read Futamura as
| _Futurama_. Silly me.
| aardvark179 wrote:
| I may have made that verbal slip while giving a talk on
| Truffle.
| grashalm wrote:
| I work on Truffle for more than 10 years and I recently wrote
| a comment on hackernews using Futurama instead of Futamura.
| That comment had it wrong twice.
| krylon wrote:
| I am _so_ glad I 'm not the only one whose brain pulls such
| pranks. Thank you!
___________________________________________________________________
(page generated 2024-06-07 23:00 UTC)