hngopher.com

       [HN Gopher] MiniVM: A zero-dependency cross-language runtime on ...
       ___________________________________________________________________
        
       MiniVM: A zero-dependency cross-language runtime on par with LuaJIT
       and C
        
       Author : isaacimagine
       Score  : 212 points
       Date   : 2022-01-08 11:05 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | fastkebab wrote:
        
       | rahimiali wrote:
       | The main interpreter loop looks a lot like python's. It uses the
       | same pre processor tricks and jump tables to help both the reader
       | and the compiler to make sense of the flow. It's also near that
       | integers are unboxed, at the expensive of their top two bits
       | being reserved as a type hint.
        
         | jng wrote:
         | Not only the reader and the compiler, that kind of threaded
         | code really helps performance thanks to leveraging the CPU's
         | branch target buffer to obtain great branch prediction of
         | interpreted code, and that is one of the killer aspects to get
         | current CPUs to perform well.
        
       | celeritascelery wrote:
       | Is there any documentation on the bytecode? I would be interested
       | to see their design choices.
        
         | isaacimagine wrote:
         | Check the code, there's a big list of opcodes with pretty
         | descriptive names. The code is effectively the documentation
         | for now.
        
       | adontz wrote:
       | Does it support any kind of debugging?
        
       | wruza wrote:
       | 4984, there is a roadmap, but I couldn't find info on strategic
       | goals of this project or where it supposed to be a best fit. Can
       | you please share your views on this?
       | 
       | Also, is there an API to connect to to emit VM bytecodes? Or an
       | instruction set reference.
       | 
       | I'm asking out of pure fan-like interest in this field and
       | because I see lack of simple-enough (for me) tools for language
       | building. Don't take me serious or anything like that, I'm not a
       | pro. I tried to dig to llvm few years ago, but so c++ much
       | boilerplate, meh. I know I can just transpile to Lua or js with
       | sourcemaps, but it's zero fun. Projects like this remind me of
       | 8080 asm times.
        
         | 4984 wrote:
         | There is no roadmap. There are no written long term goals. This
         | project came out of need for a faster virtual machine for my
         | Paka language. There is an api to emit vm bytecode, but it's
         | not in C, it is from Paka.
         | 
         | Roadmap could go like this tho: More Core Opcodes (Bit Ops,
         | Closures), Split overloaded opcodes. More frontends. Optimizer,
         | Dynamic Compiler and JIT.
        
         | childintime wrote:
         | This is a nice Forth-style bytecode interpreter, that is easy
         | to generate code for. Close to WASM, but much simpler.
         | 
         | The thing that attracts me most is that the bytecode format can
         | be easily translated to a more compact format, allowing it
         | serve as an intermediate format for _many_ homegrown
         | instruction sets.
         | 
         | So a GCC or LLVM backend needs to be on the roadmap. Code
         | generation is a blast. Only the number of registers should be
         | user-configurable to make this useful.
        
           | rahimiali wrote:
           | In what way is this forth-like? The opcodes take immediates
           | and registers as arguments, whereas forth-like would be much
           | more stack-based than this, right?
        
             | jng wrote:
             | The bytecodes aren't but the interpreter is, look up
             | threaded code.
        
         | b3morales wrote:
         | Have a look at https://c9x.me/compile/ and
         | https://c9x.me/compile/bib/
        
       | [deleted]
        
       | iainmerrick wrote:
       | This looks like a fun project, but claiming that it "beats
       | luajit" is very premature based on the benchmarks listed.
       | 
       | The README lists only an allocation stress test (fast allocs are
       | good, to be sure) and recursive Fibonacci (always good fun to try
       | but doesn't resemble real usage).
       | 
       | Looks like it doesn't have a JIT compiler, and without that I'd
       | put a lot of money against it ever being as fast as LuaJIT, V8,
       | the JVM, etc etc in the general case.
       | 
       |  _Edit to add:_ on reading more closely, there's more benchmark-
       | gaming going on here than I realised. On the second benchmark, it
       | says:
       | 
       |  _As you can see, minivm (no JIT) is a hair slower than Node JS
       | (JIT) but beats luajit --joff by a fair margin (no JIT)._
       | 
       | They already noted that Node has a longer startup time, so that
       | obviously dominates in a benchmark that's only a few milliseconds
       | long; and disabling the JIT in LuaJIT doesn't tell us much.
        
         | isaacimagine wrote:
         | Personally, I find it pretty impressive that it performs as
         | well as these runtimes despite _not_ having a JIT compiler. I
         | 'm pretty sure Shaw's written more benchmarks, but as the
         | README explains, it's really hard to tell what the performance
         | characteristics of a language are without writing a larger
         | application. So far the largest applications written with
         | MiniVM are Paka[0], a self-hosted language similar to Lua that
         | targets MiniVM; os49[1], an operating system built on
         | Paka/MiniVM in the spirit of lisp machines; and xori[2], an
         | online playground for the language.
         | 
         |  _Edit: to address your edit, the the Tree benchmark starts
         | small, but grows exponentially (the graph is logarithmic). The
         | final runs of the benchmark take about 15 seconds to run, and
         | are run 10 times per language, at which point startup time is
         | no longer a large factor. The fib test compares both luajit
         | with the JIT on and JIT off: as MiniVM is not JIT 'd, I think
         | that this is a fair comparison to make._
         | 
         | [0]: https://github.com/FastVM/paka
         | 
         | [1]: https://github.com/ShawSumma/os49
         | 
         | [2]: https://fastvm.github.io/xori
        
           | iainmerrick wrote:
           | Sure, it's a nice piece of work!
           | 
           | I just don't think you should claim it's very fast unless you
           | can show that it actually is very fast. The current
           | benchmarks look like toy benchmarks. (I don't mean that in a
           | derogatory way -- they're exactly the kind of tiny programs
           | you need to build and test while bringing up a new VM.)
        
             | mst wrote:
             | The disclaimer at the top of the benchmark section -
             | https://github.com/FastVM/minivm#benchmarks - seemed pretty
             | reasonable to me.
             | 
             | It's very clear that microbenchmarks don't tell the whole
             | story, and that whole application benchmarks to give more
             | real results will be desirable later.
             | 
             | I mean, I agree these benchmarks don't mean that much
             | except for being a useful way to verify that the core code
             | is working and reasonably performant - but they're up front
             | about that fact so calling it 'benchmark-gaming' seems a
             | trifle unfair.
        
         | 4984 wrote:
         | MiniVM will JIT eventually. Right now the biggest barrier is
         | the VM snapshots. Currently MiniVM its own stack and heap and
         | can snapshot them from any point in the program.
         | 
         | One thing to note about MiniVM is that it has very strong
         | types. V8 has many steps to go through when it needs to perform
         | something like addition (weak typing). MiniVM supports only 4
         | basic types.
         | 
         | MiniVM's reference frontend Paka is fully self hosted, It is
         | currently the only large MiniVM program.
        
           | deckarep wrote:
           | Question on VM snapshotting: what's the purpose/point in even
           | having such an ability? What does it allow you to do?
           | 
           | I only know of snapshotting perhaps being necessary to
           | support coroutine based context switching.
           | 
           | Thanks and very cool project!
        
             | slightknack wrote:
             | A snapshot is a the entire state of a program at a single
             | moment in time. Continuations are basically exposed
             | snapshots, i.e. taking a snapshot, storing it in a
             | variable, doing some work, and then 'calling' the snapshot
             | to return to an earlier point. Continuations allow you to
             | implement a naive version of single-shot delimited
             | continuations - coroutines! This can be very useful for
             | modeling concurrency.
             | 
             | Aside from coroutines and continuations, snapshots are neat
             | for distributed computing: spin up a vm, take a snapshot,
             | and replicate it over the network. You could also send
             | snapshots of different tasks to other computers to execute.
             | In the context of edge computing, you could snapshot the
             | program once it's 'warm' to cut back on VM startup time.
             | 
             | Snapshots allow you to peek into your program. Imagine a
             | debugger that takes snapshots on breakpoints, lets you to
             | inspect the stack and heap, and replay the program forward
             | from a given point in a deterministic manner. You could
             | also send a snapshot to a friend so they can run an
             | application from a given point on their machine. If you do
             | snapshots + live reloading there are tons of other things
             | you can do (e.g. live patching and replaying of functions
             | while debugging).
        
           | samatman wrote:
           | That sounds like a solid basis for adding asymmetric
           | coroutines as well, is that something you're thinking about
           | adding?
           | 
           | I like the philosophy and gumption shown by this project but
           | coroutines aren't a feature I'd lightly give up.
           | 
           | Digging what I see so far!
        
             | 4984 wrote:
             | MiniVM uses coroutines already, every 1000 branches the vm
             | will return to the scheduler. The Build system can accept
             | CFLAGS+=-DVM_BRANCH_DEFER which makes MiniVM able to run
             | part of bytecode up-to N instructions.
             | 
             | It is not exposed on the insruction level yet tho. Would be
             | quite easy to add.
        
               | samatman wrote:
               | Ah thank you, that's very promising! It was the kind of
               | thing I wanted to hear, I write Lua every day and four
               | data types didn't appear to support 'thread'.
               | 
               | I might just kick the tires on this over the weekend.
        
               | exikyut wrote:
               | It might be interesting to be able to tell the VM to
               | resume into code for exactly N instructions or branches.
               | I can see instrumentation/debugging tooling using this
               | with N=1 for example.
        
               | rgrmrts wrote:
               | That's cool. Have you played around with that number? I'm
               | curious how you pick that constant and what the trade
               | offs are
        
           | exikyut wrote:
           | Out of curiosity, are you planning on (progressively, slowly)
           | rolling your own JIT, or using something like DynASM
           | (https://luajit.org/dynasm_features.html), libFirm
           | (https://pp.ipd.kit.edu/firm/Features.html), or some other
           | preexisting thing (eg https://github.com/wdv4758h/awesome-
           | jit) in the space?
           | 
           | FWIW, I understand that LuaJIT gets some of its insane real-
           | world performance from a JIT and VM design that's effectively
           | shrink-wrapped around Lua semantics/intrinsics - it's not
           | general-purpose. I've read (unfortunately don't remember
           | exactly where) that people have tried to run off with the JIT
           | and VM and use it in other projects, but never succeeded
           | because of the tight coupling.
           | 
           | In the same way, while a bespoke 64/32-bit x86+ARM JIT would
           | be a reasonable undertaking, it could make for a pretty
           | interesting target with a potentially wide-ranging set of
           | uses.
           | 
           | For example, it could be the VM+JIT combination that all
           | those people trying to dissect LuaJIT were looking for :).
           | 
           | I could see something like this becoming an attractive option
           | for games that want an exceptionally simple runtime. Sort of
           | like a scaled-down NekoVM (a la Haxe).
           | 
           | Broadly speaking, I get the (potentially incorrect)
           | impression (from a very naive/inexperienced POV) that
           | MiniVM+JIT would be looking to close a more widely-scoped,
           | higher-level loop out of the box than something like libFirm
           | would be (despite the very close correlation in functionality
           | when looking at libFirm). So it'd be closer to Cling
           | (https://root.cern/cling/) than raw LLVM, perhaps (albeit
           | with 0.01% of the code size :D). It is for this reason that I
           | kind of pause for a minute and ponder that a fully integrated
           | JIT could be a pretty good idea. It would absolutely make the
           | irreducible complexity of the project balloon, with
           | reasonable motivation likely necessary to maintain cohesion.
           | 
           | On a different note, If I were to backseat-drive for a minute
           | :) the first thing I'd rant about is how attractive modern
           | JITs need trivial ways to verify code correctness, both
           | online (how exactly was *this* specific generated code
           | constructed - so, straightforward logging) but also (and
           | particularly) offline (humans staring at the JIT source code
           | and mentally stepping through its behavior - _and succeeding_
           | (miracles!!) because the code is small and comprehensibly
           | architected). If the JIT was implemented in such a
           | straightforward manner, end users wanting to run potentially
           | malicious user-supplied code with high performance in
           | potentially security-sensitive settings might be attracted to
           | the project. (Mike Pall made bank for a while while
           | CloudFlare was using LuaJIT for its WAF*... ahem...)
           | 
           | I came across this reference of how to break out of LuaJIT
           | 2.1 (2015) a while back:
           | https://www.corsix.org/content/malicious-luajit-bytecode -
           | and every time I take a look at the code I switch away from
           | the tab :) (and sometimes step away from the computer for a
           | minute :D). It's solely a demonstration of "this is how it
           | would work", and clarifies that LuaJIT makes no sandbox
           | guarantees about the code it executes, but reading through
           | it, the amount of Stuff(tm) going on represents a surface
           | area that to me (naively) seems just... like LuaJIT as a
           | whole is generally too large to easily reason about from a
           | security standpoint (oh yeah, besides being written in
           | assembly language...). This might be inexperience speaking,
           | but I can't help but wonder whether a smaller, simpler
           | implementation _might_ be able to implement a secure JIT; for
           | all I know this might be an impossible P=NP pipe dream I
           | haven 't fully grasped yet, I guess what I'm trying to figure
           | out is whether "small enough for non-100x programmers to
           | mentally reason through" and "large enough to do a few
           | practical things quickly" have practical overlap?
           | 
           | [* CloudFlare only ran internally-generated code through
           | LuaJIT, for completeness. A VM+JIT that could safely run
           | untrusted code would thus be even _more_ interesting than
           | LuaJIT was to CloudFlare, from a general perspective. ]
           | 
           | ---
           | 
           | Something completely different that I randomly discovered
           | recently and which I thought I'd idly mention is that JITs
           | _might_ seem to have a bit of a hard time on Android in
           | certain obscure circumstances. I can 't (yet) tell if this is
           | LuaJIT-specific or "anything that's a JIT"-specific: KOReader
           | (Android eBook reader, implemented entirely using LuaJIT -
           | https://github.com/koreader/android-luajit-launcher) has a
           | bunch of very scary magic Things(tm) it seems to need to do
           | to make LuaJIT even work at all on Android
           | (https://github.com/koreader/android-luajit-
           | launcher/blob/mas...), due to a apparently-current issue
           | causing issues across different domains
           | (https://github.com/LuaJIT/LuaJIT/issues/285), which has been
           | apparently cropping up going back years
           | (https://www.freelists.org/post/luajit/Android-performance-
           | dr... (2013)). KOReader has even nontrivially patched
           | LuaJIT's C code in places
           | (https://github.com/koreader/android-luajit-
           | launcher/blob/bb0...) with purposes I am yet to fully
           | understand (it might just be for debugging). I happened to be
           | considering idly playing around with Lua on Android
           | (currently fascinated with interpreted/JITed runtimes) and
           | after stumbling on this I'm debating whether to use Lua
           | instead, ha. I've been meaning to ask around on the LuaJIT
           | list and wherever KOReader discusses stuff to learn more,
           | after spending the time actually getting LuaJIT linked into
           | an Android project and poking around. Haven't got to it yet.
           | This could be an absolutely massive red herring that I'm
           | going deer-in-headlights about because it just _looks_ off-
           | in-the-weeds, or potentially significant. It might also be
           | absolutely irrelevant to not-LuaJIT pursuits, but as I noted
           | I 'm not (yet) experienced enough to tell.
        
         | CalChris wrote:
         | Fibonacci is the poster child of hotspots. A JIT should be
         | crushing an interpreter at Fibonacci as it is simple to
         | generate good code for. Yes, it's just a microbenchmark; so you
         | shouldn't make SPEC type conclusions from it. But you shouldn't
         | dismiss it either.
        
       | steeleduncan wrote:
       | The Readme mentions that they've experimented with a Lua
       | frontend. I can't find any reference to this in the repo. Is this
       | a private experiment, or is it available somewhere?
        
       | kaba0 wrote:
       | Wouldn't a templated interpreter be perhaps faster (eg. what is
       | used by the JVM)?
        
       | eps wrote:
       | Anything beats something on some benchmarks. That's not a very
       | captivating pitch point.
       | 
       | Good stuff otherwise. Very nice project.
        
         | isaacimagine wrote:
         | Thanks for pointing this out, I'll adjust the title to make it
         | more accurate. I think the benchmarks show that the runtime is
         | at least on par with existing popular runtimes.
         | 
         | Some other cool pitch points I've learned while chatting with
         | Shaw:
         | 
         | - MiniVM supports saving and loading compressed
         | snapshots/images, like smalltalk
         | 
         | - JIT is in the works (it had a JIT in the past, which was
         | removed in a refactor)
         | 
         | - Zero dependencies, compiles to Wasm with cosmo libc.
         | 
         | - VM supports multiple languages and sharing libraries between
         | languages
         | 
         | - The main language that targets MiniVM is self-hosted on
         | MiniVM; there's a minimal os built on MiniVM.
         | 
         | - Fast startup times make it good for embedded scripting, like
         | lua.
         | 
         | - the way the ISA itself is also really interesting, would
         | recommend reading the C if that's your thing.
        
         | 4984 wrote:
         | Thank you for having a look.
         | 
         | The reason MiniVM is so benchmark-oriented in its presentation
         | is that MiniVM is ver benchmark-oriented in my workflow.
         | 
         | I have tried to find a faster interpreter without JIT. It is
         | hard to find benchmarks that don't just say that MiniVM is
         | 2-10x faster than the interpreted languages.
        
           | iainmerrick wrote:
           | Yeah, it's a bit disappointing how slow most non-JIT
           | interpreters are -- I think JIT understandably took the wind
           | out of their sails.
           | 
           | The state of the art as I understand it (but maybe you've
           | figured out something faster?) is threaded interpreters. I
           | suggest looking at Forth implementations, which is where I
           | think most interpreter innovation has happened.
        
             | MaxBarraclough wrote:
             | > I suggest looking at Forth implementations, which is
             | where I think most interpreter innovation has happened.
             | 
             | Fortunately the Forth community write about their
             | techniques, so you don't have to just eyeball gforth's
             | source-code. A handful of links for those interested:
             | 
             | https://www.complang.tuwien.ac.at/papers/
             | 
             | https://github.com/ForthPapersMirror/Papers
             | 
             | https://www.complang.tuwien.ac.at/projects/forth.html
             | 
             | https://www.complang.tuwien.ac.at/anton/euroforth/
             | 
             | http://soton.mpeforth.com/flag/jfar/vol7.html
             | 
             | http://soton.mpeforth.com/flag/fd/
             | 
             | http://www.forth.org/literature.html
             | 
             | http://www.forth.org/fd/contents.html
        
             | isaacimagine wrote:
             | So I was talking about threaded code with Shaw the other
             | day, and he gave me a long explanation about the various
             | tradeoffs being made and the resulting techniques used in
             | MiniVM. As I understand it, MiniVM is basically threaded
             | code, with cached opcodes. Essentially, it uses
             | _precomputed_ gotos:                   // classical
             | computed goto         goto *x[y[z]]         // MiniVM
             | precomputed goto         goto *a[z]
             | 
             | This optimization makes MiniVM a tad faster than raw
             | computed gotos, because it's essentially threaded code with
             | some optimizations. He can probably expand on this; if you
             | want to hear the full really interesting explanation
             | yourself, he's pretty active in the Paka Discord[0].
             | 
             | [0]: https://discord.gg/UyvxuC5W5q
        
               | aardvark179 wrote:
               | That feels like the sort of optimisation that wins in
               | small cases and loses in larger ones. The jump array is
               | going to be 4 or 8 times the size of the opcode array so
               | will put a lot more pressure on the caches as your
               | programs get larger.
        
               | sitkack wrote:
               | Sounds like a great test case for using a tensorflow-lite
               | model to switch between both techniques.
        
       | kgeist wrote:
       | >Built on a register-based ISA that beats luajit--with the JIT on
       | --in some benchmarks
       | 
       | This is interesting, considering that LuaJIT in interpreter mode
       | is famous for being faster than simple JITs. Worth looking into
       | what the differences are.
        
         | isaacimagine wrote:
         | From a discussion with Shaw:
         | 
         | - Simpler more consistent type system
         | 
         | - MiniVM has more larger instructions to take advantage of
         | 
         | - MiniVM precomputes and caches jumps where possible
         | 
         | - Calling functions is more efficient, less overhead
         | 
         | - Has a custom GC/allocator which allows for some optimizations
         | 
         | - Fixed-sized arrays vs Lua's dynamic tables
         | 
         | - Not a JIT, so faster startup times (no tracing)
        
       | thayne wrote:
       | Looks like an interesting project, but I can't find any
       | documentation on how to actually use it (or on the paka language
       | either). Is it just still in early stages and documentation
       | hasn't been written yet?
        
         | slightknack wrote:
         | I've written some documentation for it in the past, but had to
         | scrap it because things tend to move so quickly that it's
         | better to read the code and ask Shaw than use incorrect
         | documentation. As things stabilize, these issues will be
         | resolved, but it's very bleeding-edge right now.
        
       | sitkack wrote:
       | How does this compare to the techniques used in the wasm3 (name,
       | not description)?
       | 
       | https://github.com/wasm3/wasm3
       | 
       | architecture of M3 the interpreter core of wasm3
       | https://github.com/wasm3/wasm3/blob/main/docs/Interpreter.md
       | 
       | https://github.com/wasm3/wasm3/blob/main/docs/Performance.md
        
         | 4984 wrote:
         | I have looked into Meta Machines. They are not what I want.
         | 
         | Meta Machines can be almost as fast, but i just prefer the
         | cache locality of local functions.
         | 
         | Wasm3 is slower than MiniVM when you don't count the GC. Wasm3
         | is very cool!
        
       | tonyg wrote:
       | Hard to tell whether it supports proper tail calls or not. The
       | readme mentions Scheme, so it's not impossible I suppose! One of
       | the nice things about wasm is the formal semantics; a project
       | like this could benefit from one of those, too.
        
         | 4984 wrote:
         | Author here, Tail calls can be implemented in MiniVM bytecode.
         | There was once an instruction that performed a tail call, It
         | bit-rotted due to not being used.
         | 
         | MiniVM is moving very quickly currently. A lot still needs to
         | be done, like JIT and optimizers.
         | 
         | Eventually a formal specification of MiniVM should happen, just
         | not yet as things are changing too fast.
        
           | user-the-name wrote:
           | Speaking of wasm, any idea how feasible it would be to
           | convert wasm to minivm bytecode and use it as a wasm runner?
        
       | ford_o wrote:
       | How does this compare to MIR?
        
       | mst wrote:
       | This example seems a bit pessimised:
       | head_of_loop:         load_var x         push_int 1000
       | less_than         jump_if_false :end_of_loop         load_var x
       | increment         store_var x       end_of_loop:
       | 
       | I'd more expect a stack engine to be something like something
       | like -                 load_var x       head_of_loop:
       | jump_lt_keep 1000 :end_of_loop         increment
       | end_of_loop:       store_var x
       | 
       | in the sense that anything that can figure out how to populate
       | and use r0 would likely also be able to figure out it can wrap
       | the load/store calls around the outside of the loop.
       | 
       | (I'm far from an expert on stack versus register VMs though so I
       | may be missing several somethings here)
        
         | tln wrote:
         | That would be a nice optimization. Here's what Python does
         | 0 LOAD_FAST                0 (x)                   2 LOAD_CONST
         | 1 (1000)                   4 COMPARE_OP               0 (<)
         | 6 POP_JUMP_IF_FALSE       18                        8 LOAD_FAST
         | 0 (x)                  10 LOAD_CONST               2 (1)
         | 12 BINARY_ADD                  14 STORE_FAST               0
         | (x)                  16 JUMP_ABSOLUTE            0
         | 18 LOAD_CONST               0 (None)
         | 
         | Python's VM is very slow, but note only one arg per op and the
         | compare + jump are not fused.
         | 
         | BTW, if we're considering "advanced" optimization might as well
         | elide the whole loop :) EG https://godbolt.org/z/n3GznqKsM
         | test(int):             push    rbp             mov     rbp, rsp
         | mov     DWORD PTR [rbp-4], edi             jmp     .L2
         | .L3:             add     DWORD PTR [rbp-4], 1         .L2:
         | cmp     DWORD PTR [rbp-4], 999             jle     .L3
         | nop             pop     rbp             ret
        
           | mst wrote:
           | Well, sure, but I was thinking along the lines of "reserving
           | a spot on the stack and reserving a register seem like about
           | the same level of clever", especially thinking about how
           | mixed register/stack argument passing works in even
           | relatively stupid C compilers.
           | 
           | Eliding the entire loop seems relatively similarly clever for
           | either architecture, but would've counted as cheating in the
           | frame I was pondering from.
           | 
           | Still neat to see even if x86 asm continues to make my eyes
           | cross ;)
        
       | djwatson24 wrote:
       | Binary trees is a classic GC benchmark - ideally a GC'd language
       | should be able to do _better_ than a malloc /free implementation
       | in C (tree.c) since it can release the whole tree in bulk (like
       | what the ptree.c does).
       | 
       | Also note that tree.c speeds up substantially if you link with a
       | better malloc like jemalloc or tcmalloc.
        
       ___________________________________________________________________
       (page generated 2022-01-08 23:00 UTC)