[HN Gopher] Computed goto for efficient dispatch tables (2012)
       ___________________________________________________________________
        
       Computed goto for efficient dispatch tables (2012)
        
       Author : henning
       Score  : 16 points
       Date   : 2021-01-03 08:25 UTC (14 hours ago)
        
 (HTM) web link (eli.thegreenplace.net)
 (TXT) w3m dump (eli.thegreenplace.net)
        
       | rurban wrote:
       | What Eli (and most others) is missing, are two more common
       | strategies to avoid this dispatcher:
       | 
       | 1st: implement the ops basic block as linked list, not an array
       | of opcodes. Thus the next op can be called directly. No lookup
       | and no indirect call.
       | 
       | 2nd: inline the op insn into the basic block array, like a jit or
       | just llvm byte code or similar. No jumps, everything stays in the
       | i-cache.
       | 
       | There are more, even faster options, like luajit. Dispatch tables
       | are pretty desperate in 2020. With 20 ops and short op insns (as
       | in lua) everything fits into the icache. With fixed registers you
       | can be pretty fast.
        
       | codeulike wrote:
       | You could do this in BBC Basic to make an ultra-fast switch.
       | 20 GOTO (30 + (start%*10))
       | 
       | I used to use it in game loops to call different procedures for
       | character control based on the type of character. Just needed a
       | bit of careful tuning of the character type integer and the line
       | numbers.
        
       | henning wrote:
       | I don't do much C programming, but I set up a benchmark of the
       | code in this post and it does indeed seem to be about 25% faster
       | for the toy instruction set in the post.
       | https://github.com/fearofcode/labeled_goto_test
        
         | gopalv wrote:
         | > seem to be about 25% faster
         | 
         | So if you tested sometime between Haswell and Skylake, there
         | was a period when there was no noticeable difference in cgoto
         | vs a switch dispatch[1].
         | 
         | Then Intel got caught up in Meltdown, which is how it was
         | speculating through branches and we're back to 2010 speeds for
         | the switch dispatch, because the branch prediction won't load
         | up the cache (the instruction cache) or start decoding before
         | the conditional is entirely processed.
         | 
         | [1] - https://news.ycombinator.com/item?id=15397109
        
       ___________________________________________________________________
       (page generated 2021-01-03 23:00 UTC)