[HN Gopher] Writing memory safe JIT compilers
       ___________________________________________________________________
        
       Writing memory safe JIT compilers
        
       Author : vips7L
       Score  : 109 points
       Date   : 2024-06-07 16:07 UTC (6 hours ago)
        
 (HTM) web link (medium.com)
 (TXT) w3m dump (medium.com)
        
       | csjh wrote:
       | Futamura projections are super cool, reminds me of another post
       | recently https://news.ycombinator.com/item?id=40406194
        
         | throwup238 wrote:
         | This stuff is way over my head but if I understand correctly,
         | PyPy is the most famous implementation of all three Futamura
         | projections to JIT a subset of Python:
         | https://gist.github.com/tomykaira/3159910
        
         | simlevesque wrote:
         | There's also this classic blogpost:
         | http://blog.sigfpe.com/2009/05/three-projections-of-doctor-f...
        
       | jacobp100 wrote:
       | Is GraalVM actually competitive with V8?
        
         | papercrane wrote:
         | It depends. In my experience generally GraalVM is slower to
         | start, but after a few iterations it can be as fast or faster,
         | and that the same results can be amplified if you use the JVM
         | instead of AOT (i.e. it's even slower to start, but eventually
         | it can be much faster.)
         | 
         | Of course this all depends on your specific code and uses
         | cases.
        
           | chc4 wrote:
           | Truffle/Graal is also able to do some insanely cool things
           | related to optimizing across FFI boundries: if you have a
           | Java program that uses the Truffle javascript engine for
           | scripting, Truffle is able to do JIT optimization
           | transparently across the FFI boundry so it has 0 overhead.
           | IIRC they even have some special API to allow a
           | Truffle->native C library to be exposed to the runtime in a
           | way that allows for it to optimize away a lot of FFI overhead
           | or inline the native function into the trace. They were
           | advertising this "Polyglot VM" functionality a lot a few
           | years ago, although now their marketing mostly focuses on the
           | NativeImage part (which helps a lot with the slow startup you
           | mention).
           | 
           | TruffleRuby even had the extremely big brain idea of running
           | a _Truffle interpreter for C_ for native extensions instead
           | of actually compiling them to native code, just so that
           | Truffle can optimize transparently across FFI boundaries.
           | https://chrisseaton.com/truffleruby/cext/
        
             | MaxBarraclough wrote:
             | I don't have anything to contribute to the Truffle
             | discussion, but for those not familiar: Chris Seaton was an
             | active participant on Hacker News, until his tragic death
             | in late 2022. Wish he was still with us.
             | 
             | https://news.ycombinator.com/threads?id=chrisseaton
             | 
             | https://news.ycombinator.com/item?id=33893120
        
               | chc4 wrote:
               | Yes, it's extremely sad :( He was a giant in the Ruby and
               | Truffle communities, and TruffleRuby was a monumental
               | work for both projects.
        
             | an-unknown wrote:
             | > TruffleRuby even had the extremely big brain idea of
             | running a Truffle interpreter for C for native extensions
             | [...]
             | 
             | TruffleC was a research project and the first attempt of
             | running C code on Truffle that I'm aware of. It directly
             | interpreted C source code and while that works for small
             | self-contained programs, you quickly run into a lot of
             | problems as soon as you want to run larger real world
             | programs. You need everything including the C library
             | available as pure C code and you have to deal with the fact
             | that a lot of C code uses some UB/IB. In addition, your C
             | parser has to fully adhere to the C standard and once you
             | want to support C++ too because a lot of code is written in
             | C++, you have to re-start from scratch. I don't know if
             | TruffleC was ever released as open source.
             | 
             | The next / current attempt is Sulong which uses LLVM to
             | compile C/C++/Rust/... to LLVM IR ("bitcode") and then
             | directly interprets that bitcode. It's a lot better,
             | because you don't have to write your own complete C/C++/...
             | parser/compiler, but bitcode still has various limitations.
             | Essentially as soon as the program uses handwritten
             | assembler code somewhere, or if it does some low level
             | things like setjmp/longjmp, things get hairy pretty
             | quickly. Bitcode itself is also platform dependent (think
             | of constants/macros/... that get expanded during
             | compilation), you still need all code / libraries in
             | bitcode, every language uses a just so slightly different
             | set of IR nodes and requires a different runtime library so
             | you have to explicitly support them, and even then you
             | can't make it fully memory safe because typical programs
             | will just break. In addition, the optimization level you
             | choose when compiling the source program can result in very
             | different bitcode with very different IR nodes, some of
             | which were not supported for a long time (e.g., everything
             | related to vectorization). Sulong can load libraries and
             | expose them via the Truffle FFI, and it can be used for C
             | extensions in GraalPython and TruffleRuby AFAIK. It's open
             | source [1] and part of GraalVM, so you can play around with
             | it.
             | 
             | Another research project was then to directly interpret
             | AMD64 machine code and emulate a Linux userspace
             | environment, because that would solve all the problems with
             | inline assembly and language compatibility. Although that
             | works, it has an entirely different set of problems:
             | Graal/Truffle is simply not made for this type of code and
             | as a result the performance is significantly worse than
             | Sulong. You also end up re-implementing the Linux syscall
             | interface in your interpreter, you have to deal with all
             | the low level memory features that are available on Linux
             | like mmap/mprotect/... and they have to behave exactly as
             | on a real Linux system, and you can't easily export
             | subroutines via Truffle FFI in a way that they also work
             | with foreign language objects. It does work with various
             | guest languages like C/C++/Rust/Go/... without modifying
             | the interpreter, as long as the program is available as
             | native Linux/AMD64 executable and doesn't use any of the
             | unimplemented features. This project is also available as
             | open source [2], but its focus somewhat shifted to using
             | the interpreter for execution trace based program analysis.
             | 
             | Things that aren't supported by any of these projects AFAIK
             | are full support for multithreading and multiprocessing,
             | full support for IPC, and so on. Sulong partially solves it
             | by calling into the native C library loaded in the VM for
             | subroutines that aren't available as bitcode and aborting
             | on certain unsupported calls like fork/clone, but then you
             | obviously lose the advantage of having everything in the
             | interpreter.
             | 
             | The conclusion is, whatever you try to interpret C/C++/...
             | code, get ready for a world of pain and incompatibilities
             | if you intend to run real world programs.
             | 
             | [1] https://github.com/oracle/graal/tree/master/sulong
             | 
             | [2] https://github.com/pekd/tracer/tree/master/vmx86
        
           | jacobp100 wrote:
           | I'm assuming that's a benchmark that runs a lot of hot code?
           | I wonder why the start up is slower, if it's based off an
           | interpreter, and all current VMs start that way too
        
         | hmottestad wrote:
         | I haven't tried the JS stuff in GraalVM, but I have tried it
         | for Java. It's often faster than the regular JVM, especially
         | good at escape analysis.
        
           | vips7L wrote:
           | You mean with the Graal JIT instead of C2? AFAIK Graals
           | truffle (espresso) implementation of Java is still far behind
           | HotSpot.
        
             | hmottestad wrote:
             | Yeah. The Graal JIT. I haven't tried Espresso, but now I
             | got curious...
             | 
             | For anyone else who got curious:
             | https://www.graalvm.org/latest/reference-manual/java-on-
             | truf...
        
         | pizlonator wrote:
         | Their perf claim is based on ancient benchmarks. Looks like the
         | Octane suite. I bet they also made sure to ignore startup
         | overheads.
         | 
         | I'd only believe their perf claims if they used a more modern
         | benchmark suite.
        
       | azakai wrote:
       | I am a big fan of automatically generating optimizing VMs from
       | interpreters, like Graal (in this article) and also PyPy and
       | others do. But the other approach, of writing a custom JIT for
       | each language, seems much more flexible even if it is more
       | dangerous and time-consuming.
       | 
       | On small benchmarks I can believe the performance can be very
       | similar between them, but I'd be more interested in large real-
       | world codebases. I'm not aware of any myself, and can't seem to
       | find any. Does anyone know?
        
         | darby_nine wrote:
         | > But the other approach, of writing a custom JIT for each
         | language, seems much more flexible even if it is more dangerous
         | and time-consuming.
         | 
         | This is also a good illustration of how we ended up with the
         | NULL problem. I don't think it's as big of a deal in this case
         | as interpreters/vms/compilers are designed to be fungible in
         | ways that source code was not, but it's something worth
         | thinking on.
        
         | almostgotcaught wrote:
         | > and also PyPy
         | 
         | this is a perpetually repeated misconception:
         | 
         | > Why did we Abandon Partial Evaluation?
         | 
         | https://www.pypy.org/posts/2018/09/the-first-15-years-of-pyp...
         | 
         | > others do
         | 
         | to my knowledge graal is the only production project using
         | futamura projections (what you're talking about)
        
           | azakai wrote:
           | PyPy has experimented with multiple approaches here, yes, but
           | as far as I know they did not abandon the overall approach of
           | automatically converting an interpreter to an optimizing VM,
           | which is what I mentioned. That's more general than partial
           | evaluation (which is just one way to do such a conversion).
        
             | almostgotcaught wrote:
             | > automatically converting an interpreter to an optimizing
             | VM
             | 
             | but that is not what PyPy does
             | 
             | > So, how did that tracing JIT generator work? A tracing
             | JIT generates code by observing and logging the execution
             | of the running program. This yields a straight-line trace
             | of operations, which are then optimized and compiled into
             | machine code. Of course most tracing systems mostly focus
             | on tracing loops.
             | 
             | > As we discovered, it's actually quite simple to apply a
             | tracing JIT to a generic interpreter, by not tracing the
             | execution of the user program directly, but by instead
             | tracing the execution of the interpreter while it is
             | running the user program (here's the paper we wrote about
             | this approach).
             | 
             | So it's just a tracing JIT but applied to the interpreter.
             | One way to put it - it's effectively the same benefit as
             | just running jython to begin with.
        
               | oasisaimlessly wrote:
               | The fact that they had to make up a new term for the
               | technique (a "meta-tracing JIT") should be a hint that
               | what they're doing is somewhat novel/unusual.
               | 
               | If you read the paper[1] linked in your quote, you'd see
               | that it is not "just a tracing JIT"; the interpreter
               | being run under the JIT has some special hooks that let
               | it tell the JIT e.g. where the program counter is:
               | 
               | > Since the tracing JIT cannot know which parts of the
               | language interpreter are the program counter, the author
               | of the language interpreter needs to mark the relevant
               | variables of the language interpreter with the help of a
               | hint. The tracing interpreter will then effectively add
               | the values of these variables to the position key. This
               | means that the loop will only be considered to be closed
               | if these variables that are making up the program counter
               | at the language interpreter level are the same a second
               | time. Loops found in this way are, by definition, user
               | loops.
               | 
               | This is vastly distinct from how Jython works.
               | 
               | [1]: https://foss.heptapod.net/pypy/extradoc/-/blob/branc
               | h/extrad...
        
               | almostgotcaught wrote:
               | > The fact that they had to make up a new term for the
               | technique (a "meta-tracing JIT") should be a hint that
               | what they're doing is somewhat novel/unusual.
               | 
               | you say this and then quote directly what the novelty is
               | and so i ask you - does that piece warrant a whole new
               | term?
               | 
               | > This is vastly distinct from how Jython works.
               | 
               | jython isn't _doing_ anything - it 's a python
               | interpreter that runs on the jvm. my point was that
               | that's the same thing: an interpreter for a language that
               | itself is being jitted.
        
               | oasisaimlessly wrote:
               | > does that piece warrant a whole new term?
               | 
               | In my opinion: Yes, definitely! Without meta-tracing
               | techniques, a JIT'd interpreter can only hope to be on
               | par with a compiled interpreter, never significantly
               | faster. It will never go beyond the limitations of an
               | interpreter, because the JIT can't "see" the user code
               | that the interpreter is running.
        
               | azakai wrote:
               | I'm not familiar enough with Jython, but I do consider
               | applying a tracing JIT to an interpreter as a way to
               | automatically convert an interpreter to an optimizing VM.
               | 
               | I suppose if the tracing JIT were very complex and very
               | tailored to a single language that might not make sense,
               | but my impression of PyPy is that the opposite is true,
               | and PyPy can in fact run other languages than Python.
        
               | almostgotcaught wrote:
               | > but I do consider applying a tracing JIT to an
               | interpreter as a way to automatically convert an
               | interpreter to an optimizing VM.
               | 
               | there's nothing automatic about it. the user above you
               | quoted from their paper - the user (person writing the
               | interpreter) must annotate their code to pass hints to
               | the tracing jit.
        
               | azakai wrote:
               | It is still largely automatic. Some amount of special
               | annotations are needed in all these related approaches.
               | For example, in this article, @CompilationFinal is used
               | in Graal interpreters.
               | 
               | (It is possible more annotations are needed in PyPy than
               | Graal, but still, an annotated interpreter is far, far
               | simpler than writing an optimizing JIT!)
        
           | metadat wrote:
           | Futamura projection refresher:
           | 
           | https://en.wikipedia.org/wiki/Partial_evaluation
           | 
           | Badass future space tech.
        
       | quikoa wrote:
       | Is there a difference in the amount of memory used for both
       | methods?
        
       | dzaima wrote:
       | Of course, this just moves the safety question from a javascript-
       | specific JIT to the Truffle JIT compiler and the partial
       | evaluator. This can have some benefits (only one JIT to
       | improve/fix across many languages), but can still have safety
       | bugs.
       | 
       | And the big tradeoff is that the general JIT may be less capable
       | of doing language-specific optimizations (indeed such
       | optimizations have a chance to introduce bugs as the linked V8
       | blog shows, but they also can be correct and significantly
       | improve perf in cases where the general JIT doesn't have the
       | necessary info to do it itself).
        
       | krylon wrote:
       | The first two times, my brain wanted to read Futamura as
       | _Futurama_. Silly me.
        
         | aardvark179 wrote:
         | I may have made that verbal slip while giving a talk on
         | Truffle.
        
           | grashalm wrote:
           | I work on Truffle for more than 10 years and I recently wrote
           | a comment on hackernews using Futurama instead of Futamura.
           | That comment had it wrong twice.
        
             | krylon wrote:
             | I am _so_ glad I 'm not the only one whose brain pulls such
             | pranks. Thank you!
        
       ___________________________________________________________________
       (page generated 2024-06-07 23:00 UTC)