[HN Gopher] Building a baseline JIT for Lua automatically (2023)
___________________________________________________________________
Building a baseline JIT for Lua automatically (2023)
Author : lawrencechen
Score : 123 points
Date : 2024-01-11 08:11 UTC (14 hours ago)
(HTM) web link (sillycross.github.io)
(TXT) w3m dump (sillycross.github.io)
| summarity wrote:
| The last (interpreter only) version mentioned that neither GC nor
| modules were implemented. Did that change?
|
| The JIT work is exciting but even more exciting would be a
| faster, fully featured interpreter for platforms with runtime
| code generation constraints (e.g. iOS) for integration into
| engines like Love
| fwsgonzo wrote:
| There is already Luau if you need a sandbox. Neither Lua nor
| LuaJIT are sandboxes. There is also my libriscv project if you
| need a low latency sandbox, without JIT.
|
| See: http://lua-users.org/lists/lua-l/2011-02/msg01582.html
|
| I'm not sure what you mean by code generation constraints
| though.
| summarity wrote:
| I haven't mentioned sandboxes and don't need them. As an
| example, Love integrates LuaJIT, but the JIT is disabled in
| i-platforms. As is mentioned by LuaJIT:
|
| > Note: the JIT compiler is disabled for iOS, because regular
| iOS Apps are not allowed to generate code at runtime. You'll
| only get the performance of the LuaJIT interpreter on iOS.
| This is still faster than plain Lua, but much slower than the
| JIT compiler. Please complain to Apple, not me. Or use
| Android. :-p
|
| So to return to my original comment, the improvement that I'm
| seeing here is a faster _interpreter_, which is something
| advertised on the luajit-remake repo.
| fwsgonzo wrote:
| Ah yes, it is indeed going to be faster.
| vardump wrote:
| Looks like LuaJIT is still going to be faster, because
| Deegen requires runtime code generation, thus executable +
| writable pages, which iOS platform does not allow.
| pansa2 wrote:
| > _Neither Lua nor LuaJIT are sandboxes._
|
| Maybe we have different definitions of "sandbox", but I
| thought the Lua interpreter was one? That is, isn't it safe
| (or can be made safe) to embed the interpreter within an
| application and use it to run untrusted Lua code?
| fwsgonzo wrote:
| http://lua-users.org/wiki/SandBoxes
|
| There is a lot of information there, but it doesn't seem to
| be able to handle resource exhaustion, execution time
| limits or even give any guarantees. It does indicate that
| it's possible to use as a sandbox, and has a decent example
| of the most restrictive setup. But I would for example
| compare it with Luau's SECURITY.md.
|
| From https://github.com/luau-
| lang/luau/blob/master/SECURITY.md:
|
| > Luau provides a safe sandbox that scripts can not escape
| from, short of vulnerabilities in custom C functions
| exposed by the host. This includes the virtual machine and
| builtin libraries. Notably this currently does not include
| the work-in-progress native code generation facilities.
|
| > Any source code can not result in memory safety errors or
| crashes during its compilation or execution. Violations of
| memory safety are considered vulnerabilities.
|
| > Note that Luau does not provide termination guarantees -
| some code may exhaust CPU or RAM resources on the system
| during compilation or execution.
|
| So, even luau will have trouble with untrusted code, but it
| does give certain guarantees, and writes specifically about
| what is not covered. I think that's fair. And then
| libriscv.
|
| From https://github.com/fwsGonzo/libriscv/blob/master/SECUR
| ITY.md:
|
| > libriscv provides a safe sandbox that guests can not
| escape from, short of vulnerabilities in custom system
| calls installed by the host. This includes the virtual
| machine and the native helper libraries. Do not use binary
| translation in production at this time. Do not use linux
| filesystem or socket system calls in production at this
| time.
|
| > libriscv provides termination guarantees and default
| resource limits - code should not be able to exhaust CPU or
| RAM resources on the system during initialization or
| execution. If blocking calls are used during system calls,
| use socket timeouts or timers + signals to cancel.
|
| So, it is possible to provide limits while still running
| fast. I imagine many WebAssembly emulators can give the
| same guarantees.
| eachro wrote:
| I tried reading this post and it just went way over my head.
| Anyone have any good resources on background material to even
| start?
| mariusseufzer wrote:
| Same here! Would love to understand what this is about!
| JonChesterfield wrote:
| It's a template jit with a strange implementation.
|
| Instead of writing the bytes directly, it uses llvm to compile
| functions that refer to external symbols and then patches
| copies of those bytes at jit time. That does have the advantage
| of being loosely architecture agnostic.
|
| Template jits can't register alloc across bytecodes so that
| usually messes up performance. That can be partially mitigated
| by picking the calling convention of the template/stencils
| carefully, in particular you don't want to flush everything
| to/from the stack on every jump for a register architecture.
|
| It's not in the same league of engineering as luajit, but then
| not much is.
| chombier wrote:
| This article is about a technique "Copy-and-Patch" for just-in-
| time (JIT) compilation that is both fast to compile, produces
| reasonably efficient machine code, and is easier to maintain
| than writing assembly by hand.
|
| The section "Copy-and-Patch: the Art of Repurposing Existing
| Tools" describes the heart of the method, which is to use an
| existing compiler to compile a chunk of c/c++ code
| corresponding to some bytecode instruction, then patch the
| resulting object file in order to tweak the result (e.g. to
| specify the instruction operands) in a similar fashion to
| symbol relocation happening during link.
|
| Given a stream of bytecode instruction, the JIT compilation
| reduces to copying code objects (named "stencils")
| corresponding to bytecode instructions from a library of
| precompiled stencils then patching the stencils as needed,
| which is very fast compared to running a full-blown compiler
| like LLVM from the syntax tree.
|
| Of course, the resulting code is slower than full-blown Ahead-
| of-Time (AOT) compilation, but the authors describe a few
| tricks to keep the execution speed within a reasonable margin
| of AOT. For instance, they leverage tail calls to replace
| function calls with jumps, compile sequences of frequently
| associated bytecode instructions together, and so on.
| reginaldo wrote:
| My advice would be to read Piumarta's "Optimizing direct
| threaded code by selective inlining" paper [1] first, and then
| read the references from the wikipedia article [2].
|
| If the piumarta paper is still over your head, take a look at
| its references, but they will refer to Java, SmallTalk and
| Forth which might be a distraction.
|
| [1]
| https://groups.csail.mit.edu/pag/OLD/parg/piumarta98optimizi...
| [2] https://en.wikipedia.org/wiki/Copy-and-patch
| ngrilly wrote:
| It was a very interesting read, in the context of Python getting
| a copy-and-patch JIT compiler in the upcoming 3.13 release [1],
| to understand better the approach.
|
| [1] https://tonybaloney.github.io/posts/python-gets-a-jit.html
| dabber wrote:
| > It was a very interesting read, in the context of Python
| getting a copy-and-patch JIT compiler in the upcoming 3.13
| release
|
| Indeed! In fact, someone mentioned this article in the thread
| about the Python JIT from a few days ago:
|
| https://news.ycombinator.com/item?id=38924826
| ngrilly wrote:
| I missed that. Thanks for the link!
| stargrazer wrote:
| I am using https://luajit.org/ in my GCC C++ project.
|
| Can I use this faster Lua JIT in my project as a replacement? And
| if so, how so?
|
| The existing luajit doesn't do v5.1, so it would be nice to use
| this newer engine at the newer baseline lua version level.
| tarruda wrote:
| I'm skeptical that there's any JIT for any programming language
| that can match the raw performance and memory efficiency of
| LuaJIT
| pansa2 wrote:
| I'm sure all the modern JavaScript JITs would beat LuaJIT for
| raw performance. JS JITs were already faster when I compared
| them several years ago and have only improved since - whereas
| LuaJIT has almost been standing still for a decade.
| Rochus wrote:
| > _whereas LuaJIT has almost been standing still for a
| decade_
|
| LuaJIT 2.1 improved a bit (~7%) from 2017 to 2023:
| http://software.rochus-keller.ch/are-we-fast-
| yet_LuaJIT_2017...
| Rochus wrote:
| The Mono VM is about twice as fast, V8 is even a bit faster:
| https://github.com/rochus-
| keller/Oberon/blob/master/testcase...
| saagarjha wrote:
| This is a baseline JIT, so it will be slower than LuaJIT (which
| has a sophisticated optimizer).
| fullstop wrote:
| > The existing luajit doesn't do v5.1, so it would be nice to
| use this newer engine at the newer baseline lua version level.
|
| LuaJIT _only_ does v5.1, but does have some non-breaking
| features from v5.2 and a few optional 5.2 features which could
| break 5.1 compatibility.
| chombier wrote:
| For those interested, the ACM page for the paper has a good
| introductory video https://dl.acm.org/doi/abs/10.1145/3485513
| lambdaone wrote:
| This is a beautiful piece of work. Connecting all the semantic
| levels is hard work, and this does it elegantly. It goes to show
| that old-fashioned technology like object files and linkers is
| still useful, and can still pay off in unexpected ways as part of
| new technology.
| rfl890 wrote:
| Object files and linkers are old fashioned? What replaced them?
| khiner wrote:
| I think by "old fashioned" op means they are old
| technologies, not that they are obsolete.
| tiffanyh wrote:
| FYI - Mike Pall is back working on LuaJIT.
|
| And v3.0 is underway.
|
| https://github.com/LuaJIT/LuaJIT/issues/1092
| synergy20 wrote:
| this, best news for me for the day.
| vardump wrote:
| Mike Pall is doing pretty amazing work. Lots of respect.
___________________________________________________________________
(page generated 2024-01-11 23:01 UTC)