[HN Gopher] Emulating an emulator inside itself. Meet Blink
       ___________________________________________________________________
        
       Emulating an emulator inside itself. Meet Blink
        
       Author : 0xhiro
       Score  : 86 points
       Date   : 2023-01-04 19:42 UTC (3 hours ago)
        
 (HTM) web link (hiro.codes)
 (TXT) w3m dump (hiro.codes)
        
       | asciii wrote:
       | > Me: How small can an emulator be? Blink: Yes.
       | 
       | My favorite FAQ
        
       | mtlynch wrote:
       | Blink sounds cool, but this blog post is pretty thin. It's just
       | restating a handful of tweets about Blink by its author.
        
         | jart wrote:
         | Author of Blink here. Ask me anything :-)
        
           | monocasa wrote:
           | What do you account for the perf win over Qemu? A bunch of
           | micro optimizations, less abstraction layers, or something
           | more systemic?
        
             | saagarjha wrote:
             | QEMU TCG is not particularly optimized for performance.
             | It's not all that hard to do better than it, especially if
             | you target only one architecture.
        
               | bonzini wrote:
               | It's also not that easy though, and blink's code
               | generation is comparable to QEMU in 2007 or so.
               | 
               | I suspect the reason why blink is faster to _not_ be
               | related to code generation, as I mentioned in another
               | comment.
        
             | jart wrote:
             | Blink is like a Tesla sports car whereas Qemu is like a
             | locomotive. I think what may be happening, is Qemu has a
             | lot of heavy hitting optimizations that benefit long-
             | running compute intensive programs. But if you just want to
             | run an big program like GCC ephemerally as part of your
             | build system, the cost of the the locomotive gaining speed
             | doesn't pay off, since there's nothing to amortize over.
             | Blink's JIT also accelerates quickly because it uses a
             | printf-style DSL and it doesn't relocate. The tradeoff is
             | that JIT path construction sometimes fails and needs to be
             | retried.
             | 
             | Another great example of this tinier is better phenomenom,
             | would be v8 vs. quickjs. Fabrice Bellard singlehandedly
             | wrote a JavaScript interpreter that runs the Test262 suite
             | something like 20x faster Google's flagship V8 software,
             | because once again, tests are ephemeral. It's amazing how
             | much quicker QuickJS is. But if you wanted to do something
             | like write a JS MPEG decoder to show television
             | advertisements without a <video> tag then v8 is going to be
             | faster, since it's a locomotive.
             | 
             | Fabrice Bellard wrote Qemu too. But I suspect his Tiny Code
             | Generator has gotten a lot heftier over the years as so
             | many people everywhere contributed to it. I really want to
             | examine his original source code, since I'd imagine what he
             | originally did probably looked a lot more like Blink than
             | it looks like modern Qemu.
        
               | [deleted]
        
               | JoshTriplett wrote:
               | Would it be fair to describe Blink's JIT as more of a
               | "baseline JIT" to QEMU's "optimized JIT", or does that
               | analogy not accurately capture what you mean in the first
               | paragraph?
        
               | saagarjha wrote:
               | > Fabrice Bellard singlehandedly wrote a JavaScript
               | interpreter that runs the Test262 suite something like
               | 20x faster Google's flagship V8 software
               | 
               | Something is wrong here. How did you test this? QuickJS
               | might start up faster on very small testcases but V8 is
               | not _that_ slow; it needs to have very low latency on a
               | webpage too. Did you run a debug build or something?
        
               | rcme wrote:
               | I have no knowledge of what allows QuickJS to run the
               | tests faster, or if it even does run the tests faster,
               | but QuickJS does have one big speed advantage over V8 in
               | some circumstances: QuickJS allows ahead-of-time
               | compilation of JS to byte code. This removes the need to
               | parse the JS at execution time. It's a pretty nifty
               | feature.
        
               | andai wrote:
               | Fascinating. Most JS code is ephemeral, i.e. rarely is
               | something as intensive as video encoding done in the
               | browser (and even then WebAssembly would usually be
               | preferred).
               | 
               | It seems to me like browsers would benefit from running
               | most code in QuickJS, and then spinning up V8 only for
               | those rare cases of long-running JS?
        
               | nightpool wrote:
               | "Ephemeral" is relative. Most JS code in the browser runs
               | for at least 30 seconds, if not longer, as the user
               | interacts with the page. That's plenty of time to spend
               | spare cycles on JITing in the background to make
               | responsiveness better without worrying about 100s of
               | milliseconds of startup / shutdown latency.
        
               | saagarjha wrote:
               | V8 is optimized for real-world use cases, not benchmarks.
               | Any modern browser will blow QuickJS out of the water for
               | anything that's non-trivial.
        
               | bonzini wrote:
               | Hi Justine, QEMU developer here. Great job on Blink! You
               | have done a lot of cool work and it's been fun to follow.
               | I enjoyed looking at different choices you made in the
               | frontend, for example flags handling is very different
               | from QEMU.
               | 
               | QEMU's code generator is actually pretty fast and
               | shouldn't really be expensive. It's a handful of passes
               | that are run on individual basic blocks, certainly not
               | optimal when a lot of code runs once as is the case for a
               | very short compile but it's nothing like v8.
               | 
               | I suspect an even more silly reason--startup time might
               | even be the biggest factor, because I think qemu-user's
               | startup has never been optimized. I assume both QEMU and
               | blink binaries are statically linked (or both dynamically
               | linked, alternatively)?
               | 
               | Anyhow these theories should be pretty easy to disprove
               | just by compiling something larger than hello world, so I
               | will do it in case there's some low-hanging fruit left.
        
               | pwdisswordfish9 wrote:
               | > Fabrice Bellard singlehandedly wrote a JavaScript
               | interpreter that
               | 
               | No he didn't.
        
               | [deleted]
        
               | googlryas wrote:
               | Charlie Gordon was involved too, I suppose would be a
               | more constructive comment.
        
           | Y_Y wrote:
           | Why bother making this? (even if it is really cool)
        
             | trashburger wrote:
             | https://justforfunnoreally.dev/
        
               | jart wrote:
               | We do what we must because we can.
        
           | sidewndr46 wrote:
           | Where is the getting started guide for this?
        
             | jart wrote:
             | Here's a gentle introduction. https://github.com/jart/blink
             | /tree/master/third_party/sector... See also
             | https://justine.lol/sectorlisp2/
        
           | gabcoh wrote:
           | The comparison with QEMU is with KVM disabled, right?
           | Assuming this is true, how does it compare with KVM enabled?
        
             | fwsgonzo wrote:
             | KVM allows you to run guests directly on the CPU and has
             | native performance
        
               | monocasa wrote:
               | Well, not quite 'native'. TLB refills are 4x to 5x as
               | expensive, and anything that needs a context switch tends
               | to be at a minimum twice as expensive, and it's common to
               | balloon even farther from there.
        
               | fwsgonzo wrote:
               | I guess that's mostly if you are running a full operating
               | system inside it, generally in Qemu. It doesn't have to
               | be - could just be a program. Tiny programs running in
               | KVM can use big pages and never cause or require any
               | pagetable changes.
               | 
               | For simple workloads it can even be faster than native
               | unless you dynamically load something that uses bigger
               | pages for your native program, eg.
               | https://easyperf.net/blog/2022/09/01/Utilizing-Huge-
               | Pages-Fo...
        
               | monocasa wrote:
               | It's harder to force huge pages on a guest than it is to
               | just use them in regular user space where you can simply
               | mmap them in.
               | 
               | And none of that accounts for the increased context
               | switch time.
        
               | fwsgonzo wrote:
               | The guest is not in control - sure theres a few pages at
               | the beginning of each section that has to be 4k until you
               | reach the first 2MB-multiple.
               | 
               | What context switch time? It takes 5 micros to enter and
               | leave the guest. The rest is just "workload".
               | 
               | The point is: KVM is native speed if you never have to
               | leave. I don't need to prove this for anyone to
               | understand it has to be true.
        
               | monocasa wrote:
               | > The guest is not in control
               | 
               | The guest has it's own page tables above the nested guest
               | phys->host phys tables.
               | 
               | > What context switch time? It takes 5 micros to enter
               | and leave the guest. The rest is just "workload".
               | 
               | And then the kernel doesn't know what to do with nearly
               | every guest exit on KVM, so then you trap out to host
               | user space, which then probably can't do much without the
               | host kernel so you transition back to kernel space to
               | actually perform whatever IO is needed, then back to host
               | user, then back to host kernel to restart the guest, then
               | back from host kernel to guest. So six total context
               | swaps on a good day guest->host_kern->host_user->host_ker
               | n->host_user->host_kern->guest.
        
               | fwsgonzo wrote:
               | Right, that's very true! It's clear that you know what
               | you're talking about when it comes to KVM and maybe even
               | the internal structure in Linux. However, I/O can be
               | avoided. Imagine a guest that needs no I/O, doesn't have
               | any interrupts enabled, and simply runs a workload
               | straight on the CPU (given that it has all the bits it
               | needs). That is what I have made for $COMPANY, which is
               | in production, and serves a ... purpose. I can't really
               | elaborate more than I already have. But you get the gist
               | of it. It works great. It does the job, and it sandboxes
               | a piece of code at native speed. Lots of ifs and buts and
               | memory sharing and tricks to get it to be fast and low
               | latency. No need for JIT, which is a security and
               | complexity nightmare.
               | 
               | The topic of this thread is about Blink, which happens to
               | be a userspace emulator. Hence my comment.
        
               | jart wrote:
               | I usually measure the functions I write in picoseconds
               | per byte, so 5 microseconds is an eternity.
        
               | [deleted]
        
               | bonzini wrote:
               | 10 ps/byte is equivalent to 100 GB/sec; unless you
               | routinely write functions that are in the tens of GB/sec
               | range, so you probably mean nanoseconds?
        
             | monocasa wrote:
             | I think this is a user mode emulator, so qemu with kvm
             | isn't a great comparison.
        
               | jart wrote:
               | Blink is primarily a user mode emulator, but it does
               | support real mode BIOS programs. It can even bootstrap
               | Cosmopolitan Libc bare metal programs into long mode.
               | Here's a video of Blink doing just that. https://storage.
               | googleapis.com/justine/sectorlisp2/sectorlis...
        
               | [deleted]
        
               | gabcoh wrote:
               | Is this true? Why can't qemu use kvm for user mode
               | emulation?
        
               | monocasa wrote:
               | Nobody's really set it up to do that as it's easier to
               | use Linux's sandboxing features if you're looking to run
               | user code of the same cpu ISA. GVisor has an
               | (experimental last time I checked) backend that uses KVM
               | to run user mode code, but there you have the win of the
               | sandboxing code being written in a memory safe language
               | and giving you a real privilege boundary as opposed to
               | the sieve that qemu-user is. In just about every other
               | instance just running code natively in regular user space
               | (even if sandboxed with seccomp or a ptrace jail)
               | achieves the underlying goals better.
        
               | jart wrote:
               | It depends on whether you're more afraid of language bugs
               | or hardware bugs. One potentially nice thing about having
               | a tool like Blink that can fully virtualize the memory of
               | existing programs, is it's sort of like an extreme
               | version of ASLR. In order to virtualize a fixed address
               | space, you have to break apart memory into pieces and
               | shuffle them around into things like radix tries, and
               | that might provide enough obfuscation of the actual
               | memory to protect you from someone rowhammering your
               | system. I don't know if it's true but it'd be fun to
               | test.
        
               | fathyb wrote:
               | KVM requires additional privileges. A Linux container
               | would need privileged rights and access to /dev/kvm to
               | run QEMU with KVM for example, whereas any container
               | should be able to run it in user-mode.
        
               | monocasa wrote:
               | That's not really an issue, as there's a lot of
               | infrastructure around optionally giving device file
               | access to containers. That's why
               | SECCOMP_IOCTL_NOTIF_ADDFD exists.
        
       | 0xhiro wrote:
       | Blink is a new CPU emulator written in C, made by Justine Tunney.
       | Besides having a really cool name, blink has a lot of impressive
       | features and some of them will blow your mind!
        
         | sitkack wrote:
         | Is it not too late to RiiR?
        
         | [deleted]
        
       | pdntspa wrote:
       | This guy got Doom running inside of Doom using a code execution
       | exploit
       | 
       | https://www.youtube.com/watch?v=c6hnQ1RKhbo
        
       | zamadatix wrote:
       | "blinkenlights" put a smile on my face.
       | 
       | Looks like it itself is not yet able to be compiled with
       | Cosmopolitan Libc (though it emulates programs compiled with it)
       | but it's planned - very cool!
        
         | jart wrote:
         | Author here. I'm planning to get Blink to compile with
         | Cosmopolitan Libc as soon as possible. There's just a few
         | switch statements that need to be refactored. There's a really
         | nice `cosmocc` toolchain that makes building POSIX software
         | with Cosmo easier than ever. See
         | https://github.com/jart/cosmopolitan/blob/master/tool/script...
         | and
         | https://github.com/jart/cosmopolitan/blob/master/tool/script...
        
           | jvolkman wrote:
           | Will compiling with Cosmopolitan enable it to run on Windows?
        
             | jart wrote:
             | Absolutely. If you download last year's release of
             | Blinkenlights, you can actually use this software on
             | Windows today. It works great in the Windows 10 command
             | prompt or powershell.
             | https://justine.lol/blinkenlights/download.html
        
               | jvolkman wrote:
               | Awesome. I actually started trying to get it to build
               | against mingw-w64 earlier today, but I guess I'll just
               | wait for you. :)
               | 
               | I'm not a windows user, but super interested in using
               | Blink to ship pre-compiled binaries as part of various
               | Bazel rule sets.
        
       | [deleted]
        
       | mhh__ wrote:
       | Is it actually 2x faster or 2x faster at starting up? QEMU does
       | so much stuff, running cc1 on hello world isn't really a stress
       | of the interpreter IMO as much as all the crap that goes around
       | it.
        
         | jart wrote:
         | Blink actually does run the GCC9 CC1 command from start to
         | finish twice as fast. Qemu takes 600ms to run it and Blink
         | takes 300ms. Both Qemu and Blink use a JIT approach. Since GCC
         | CC1 is a 33mb binary, a lot of the time it takes to run it, it
         | stresses the JIT pretty hard.
         | https://twitter.com/JustineTunney/status/1610276286269722629
        
           | mhh__ wrote:
           | That's partly what I meant though, how fast is it at a longer
           | running process? C doesn't require all that much semantic
           | analysis so there usually isn't all that much hot code in the
           | compiler, so it would suit a simple-fast JIT whereas QEMU
           | does do some basic optimizations.
           | 
           | I've only ever really skimmed the TCG source code but it
           | wouldn't surprise me if a new-er JIT could smack it's arse
           | given that with these old C codebases (it's probably one of
           | Bellard's few flaws) it's pretty hard to actually make true
           | architectural changes.
           | 
           | The Java/script (I think more Javascript but I'm hedging my
           | bets by including jvms too) JITs are probably the cutting
           | edge but I'd imagine still quite beatable for a few cases.
        
       | muricula wrote:
       | At a glance, the debugger user interface looks much nicer than
       | gdb's terminal ui. How tightly coupled is the debugger interface
       | to the emulator/debugger engine? How much work would it be to
       | plug in a different debugger, say lldb or gdb, into the ui
       | instead of blink?
       | 
       | I think the user experience of cli debuggers is generally
       | somewhat dreadful when compared to their gui cousins -- they seem
       | to display a much narrower view of what's going on. Could the big
       | blinkenlights debugger view be useful outside of blink itself?
        
         | saagarjha wrote:
         | Ideally Blink would just support the GDB RSP, so you could
         | directly use GDB or LLDB on the emulator itself.
        
         | yurymik wrote:
         | You can go other way around and use other TUIs for GDB:
         | 
         | * https://github.com/pwndbg/pwndbg *
         | https://github.com/longld/peda * https://github.com/hugsy/gef
        
       | AaronFriel wrote:
       | This is really neat, but not to be confused with Blink, the name
       | of the browser engine underlying Google Chrome, Chromium, and
       | derivative browsers.
        
         | rcarr wrote:
         | The problem is compounded further considering the most popular
         | terminal emulator on iOS is also called Blink.
        
           | jart wrote:
           | Blink is short for Blinkenlights. See
           | https://github.com/jart/blink#blinkenlights and
           | https://justine.lol/blinkenlights/
        
           | saagarjha wrote:
           | Pretty sure there are several SSH clients that are several
           | times more popular than Blink :)
        
         | thriftwy wrote:
         | I was thinking they compiled Blink to Javascript and are
         | rendering web pages with it.
        
           | jart wrote:
           | We just managed to compile Blink to a 300kb javascript file
           | today. Follow https://github.com/jart/blink/issues/8 for
           | updates on our progress.
        
             | thriftwy wrote:
             | But I was thinking of HTML renderer...
        
         | chazeon wrote:
         | There is also an iOS/iPadOS SSH Client called Blink [1], short
         | for Blink Shell, which I use almost daily.
         | 
         | [1]: https://blink.sh/
        
         | [deleted]
        
       | duxup wrote:
       | Video conferencing is mentioned.
       | 
       | Anyone know of any other use cases they have in mind?
       | 
       | I always hear tech spec rumors but never about anything I would
       | want to do with this type of thing... outside say gaming?
        
         | bowmessage wrote:
         | I think you're looking for the Apple headset thread:
         | https://news.ycombinator.com/item?id=34250929
        
       ___________________________________________________________________
       (page generated 2023-01-04 23:00 UTC)