[HN Gopher] Why LuaJIT's interpreter is written in assembly
       ___________________________________________________________________
        
       Why LuaJIT's interpreter is written in assembly
        
       Author : obl
       Score  : 43 points
       Date   : 2021-02-11 16:18 UTC (6 hours ago)
        
 (HTM) web link (lua-users.org)
 (TXT) w3m dump (lua-users.org)
        
       | guenthert wrote:
       | That needs an 2011 tag.
        
       | newobj wrote:
       | Is LuaJIT still under active development? I thought the developer
       | had walked away. With Torch also looking dead, that use case is
       | gone, too. Roblox has their own Lua VM now.
       | 
       | I love Lua and code in it almost every day for fun, but yeah I'm
       | pretty sure LuaJIT is just "done" now?
        
         | BugsJustFindMe wrote:
         | The git repo weirdly continues to get periodic updates. I think
         | it just doesn't get features anymore?
        
         | cardanome wrote:
         | LuaJit is simply feature complete. There are is nothing to add
         | to it.
         | 
         | The newer versions of Lua basically implemented features that
         | LuaJit already had or that would mean a performance trade off.
         | I am still using Lua 5.1 (which is from 2012) because there is
         | no reason to upgrade.
         | 
         | If you design something well, you don't need to push new
         | features every year. Stability is underrated.
        
           | pansa2 wrote:
           | > _There is nothing to add to it._
           | 
           | I'm not sure that's true. Maybe LuaJIT was never going to add
           | the features it's missing from Lua 5.2, 5.3 and 5.4. However,
           | when Mike Pall stepped back in 2015 [0], he had still been
           | planning to further improve the implementation - for example
           | with a new garbage collector [1] and "hyperblock scheduling"
           | [2] (which remain unimplemented), plus 64-bit pointer support
           | (which was eventually completed by other people).
           | 
           | [0] https://www.freelists.org/post/luajit/Looking-for-new-
           | LuaJIT...
           | 
           | [1] http://wiki.luajit.org/New-Garbage-Collector
           | 
           | [2] https://github.com/LuaJIT/LuaJIT/issues/37
        
         | moonchild wrote:
         | It gets maintained, but that's pretty much it. Last commit was
         | a small fix ~2 months ago.
         | 
         | I had high hopes for moonjit, but development on it has ceased.
         | There are other forks--openresty and raptorjit come to mind--
         | but they don't have feature parity.
        
         | tobylane wrote:
         | https://github.com/openresty/luajit2
         | 
         | It has a few extras but they agree with the original luajit
         | authors opinion that not every 5.2 feature can be made in jit.
        
       | pansa2 wrote:
       | Relevant:
       | 
       | > > _Threaded code should have better branch prediction behavior
       | than a jump table with a single dispatch point_
       | 
       | > _This is not the case anymore, at least for modern Intel
       | processors. Starting with the Haswell micro-architecture, the
       | indirect branch predictor got much better and a plain switch
       | statement is just as fast as the "computed goto" equivalent. Be
       | wary of any references about this that are from before 2013._
       | 
       | https://news.ycombinator.com/item?id=15396761
        
         | gopalv wrote:
         | > Be wary of any references about this that are from before
         | 2013.
         | 
         | And now that it is 2021, ignore the pre-2018 numbers.
         | 
         | Somewhere in that old thread, I referenced the indirect jump
         | performance hit from the whole Spectre/Meltdown mitigations for
         | any Xeons you might have bought before mid-2020.
         | 
         | There's a nice paper from VMWare on "JumpSwitches" from USENIX
         | '19 that is worth reading in this context.
         | 
         | That suggests that of the five different types of indirect
         | jumps, some are back to being fast again - the one we are
         | dealing with is the search jumpswitch.
         | 
         | I would say direct threaded execution (computed goto) is still
         | worth it over the single dispatch jumps, particularly if you
         | can JIT basic blocks & replace something an "add, mul, add,
         | store" into a single basic block without unloading from
         | registers for the whole operation, jump into that directly with
         | CGOTO like you would do with a compiled chunk of code & build
         | your micro-JIT one opcode at a time.
        
         | [deleted]
        
       | remexre wrote:
       | I wonder if Clang can do better these days; I noticed that it
       | seems to merge computed gotos to normal-looking control-flow (I
       | suppose this is necessary for alias analysis).
        
         | nn3 wrote:
         | Also modern compilers should be able to use frequency
         | information from profile feedback to inform the register
         | allocator. So it's unclear the static register allocation
         | scheme is really that much better.
        
       ___________________________________________________________________
       (page generated 2021-02-11 23:01 UTC)