[HN Gopher] Box86/Box64 vs. QEMU vs. FEX (Vs Rosetta2)
___________________________________________________________________
Box86/Box64 vs. QEMU vs. FEX (Vs Rosetta2)
Author : pantalaimon
Score : 113 points
Date : 2022-07-19 08:51 UTC (1 days ago)
(HTM) web link (box86.org)
(TXT) w3m dump (box86.org)
| CoastalCoder wrote:
| Anyone know why qemu is so slow vs. the others?
|
| The article discusses differences in floating point handling and
| GPU passthrough, but I don't think the 7z benchmark uses either
| of those.
| yjftsjthsd-h wrote:
| Possibly because they don't care as much. Until very recently,
| the heaviest use of qemu was to run hardware accelerated
| virtual machines on the same architecture. If you're using it
| with KVM/HAXM/whatever, it is fast. I expect they would be
| happy to take performance enhancements for emulation, but that
| it simply hasn't been a priority.
| lunixbochs wrote:
| TCG has historically had more of a focus on accuracy than
| performance. It lifts a lot of guest architectures to a lot of
| host architectures, and isn't particularly specialized to any
| given host cpu type. It lifts many instructions to C helpers
| instead of bothering to jit them. Last I checked it had no
| vector -> vector jit. It's also not single address mapped -
| memory IO undergoes indirection, which is expensive. I think
| Rosetta for example has a shared address space for the guest
| and host code. Honestly on 64-bit CPUs, especially with pointer
| authentication on M1, the risk of the guest accidentally
| messing with host/jit memory is low.
| simjnd wrote:
| This is a great post, petitSeb is doing an outstanding job on
| Box86/64. I'm also keeping an eye on FEX which is evolving very
| rapidly. There has been 4 releases since the linked blog post
| which was written in late March, introducing very welcome
| features such as support for pressure-vessel or better OpenGL and
| Vulkan thunking.
| olliej wrote:
| yeah, I liked how they explicitly distinguished the benchmark
| apps that made significant use of x87 as supporting x87 is
| necessarily an all software floating point implementation -
| it's impossible to get close to native performance for x87
| heavy code on non-x86 architectures.
| lunixbochs wrote:
| I don't know if I agree with "impossible". There's a lot of
| performance left on the table with SoftFloat. A non x86
| architecture can add an 80-bit FPU if they want. There are
| architectures with 128-bit float, and CPUs with FPGA
| coprocessors. I suspect x87 is also not the most optimized
| path in modern x86 cpus (some instructions may even be fully
| emulated in microcode).
|
| Realistically, an x87-specific JIT could do significant
| instruction reordering, lift/reoptimize the underlying code
| (much of existing x87 code was compiled a very long time ago
| on older compilers), and vectorize the underlying integer
| float emulation, or even trace and move some computation to
| another core or a coprocessor like a GPU or DSP (often idle
| in embedded cpus).
|
| Many games work fine with x87 lowered to 64-bit or even
| 32-bit floats, and depending on the workload there's a middle
| ground where you could understand (or approximate) the
| current level of precision error for a value, generally run
| at a lower precision, and trace operations / "catch up" on
| precision at batched intervals.
| olliej wrote:
| Sorry, it is obviously possible to add hardware support for
| the 80bit ieee754 format (the format itself is not great,
| and in reality the precision isn't necessary in all but the
| most extreme cases, and those where it is are likely to
| prefer 128bit float), but it isn't something that is going
| to happen in the real world, and even if it was we're
| talking about software for generally available systems.**
|
| You could also emulate it by arbitrarily dropping
| precision, but as a translator that means breaking
| bincompat, and more importantly breaking programs the use
| 80bit format (a lot of fortran).
|
| Obviously many games (especially old ones) perform fine as
| they're only using 80bit because at the time x87 was the
| only hardware fp available on x86 hardware, not because
| they needed that perf.
|
| Even lowering the precision of the x87 unit isn't
| sufficient as that only reduces the precision of the
| mantissa not the exponent.
|
| Even outside of the core arithmetic (excluding negation
| which is really easy in all ieee754 formats) there is a
| whole bunch of state that you need to keep track of to
| ensure identical behavior.
|
| Obviously if you are willing to break precision guarantees,
| etc then breaking state isn't a problem, but if you're
| trying to be something like Rosetta - eg completely general
| and running anything - you don't really have the freedom to
| do that.
|
| ** sorry skim reading I missed your 128bit and x87 perf
| questions. Yes an emulator can (should?) use hw 128bit for
| the arithmetic if it's available but on vast majority of
| hardware it isn't.
|
| You are also right about x87 perf being slow compared to
| everything else, but it's still faster than anything you
| can do in software (addition especially does not work
| interact nicely) due to the GRS tracking a software impl
| needs to do through many bitewise operations.
| lunixbochs wrote:
| My middle paragraph up-thread proposes that you can
| emulate it much faster than we're doing now, at full
| precision with integer SIMD and a specialized JIT. I'll
| reiterate the 80-bit softfloat stuff I've seen in use now
| is not really optimized. I suspect that beating the
| performance of a cpu on x87 from the era where x87 was
| relevant is somewhere between realistic and trivial.
| Beating a modern cpu on x87 from another architecture
| still feels possible (but it's a less useful thing to
| spend time on).
|
| > Even lowering the precision of the x87 unit isn't
| sufficient as that only reduces the precision of the
| mantissa not the exponent.
|
| I don't know what you mean by "isn't sufficient". To be
| clear, I'm speaking from experience emulating x86 games
| on low resource arm devices, where I had success
| emulating x87 in lower precision.
|
| For QEMU, IMO the bigger performance issue is that it
| doesn't natively JIT _any_ FPU or vector instructions,
| and the indirect memory mapping hurts general performance
| quite a bit too.
| lunixbochs wrote:
| I'm excited Rosetta2 can be used in Linux VMs as of macOS
| Ventura. QEMU tends to be the most accurate emulation option for
| me on Linux and is nowhere near the speed of Rosetta. (Rosetta is
| quite accurate as well, it just wasn't available for Linux). I
| only have one remaining edge case with Rosetta around the FPU
| config register not behaving quite the same way as an x86 CPU,
| everything else has been great.
|
| FEX can be quite fast at some workloads, but was slower than QEMU
| for others, and had some glitches for me. I ended up porting my
| app to arm64 Linux for Linux dev on M1 rather than continue to
| slog through the issues I had with emulation on Linux.
|
| > I couldn't include FEX in the bench as it's not compatible with
| the 16k page actualy used on Asahi/M1.
|
| FEX ran fine for me in a Parallels VM on M1.
| ThatPlayer wrote:
| I've setup box86 on an ARM gaming handheld with a Qualcomm SDM845
| recently and it's pretty amazing what it can do. The SDM845 has a
| pretty good Linux support (with postmarketOS supporting the
| mainline kernel [0]). The open source drivers for the Adreno GPU
| even support Vulkan and full desktop OpenGL
|
| With box86/box64, I've been able to run Steam and even
| Wine/Proton with DXVK translating DirectX to Vulkan. I can even
| run older 3D Windows games like Skyrim! Though it did glitch on
| the infamous cart intro.
|
| [0] https://wiki.postmarketos.org/wiki/SDM845_Mainlining
| phh wrote:
| Can you share more about your setup? I also have a sdm845
| gaming handheld (Ayn Odin, currently running Android), and I
| was contemplating installing Windows on it to get Steam, but I
| much prefer your way.
|
| Do you have some clean distro that boots into something usable
| without mouse/keyboard? Some documentation for first stage
| boots? Gits?
| ThatPlayer wrote:
| Yes, that's the same one I have. It's the similar install to
| windows, running an edk2 bootloader on another partition. The
| developer for that has released a Debian 11 install that has
| working touch controls and software keyboard, though I have
| been using an USB-C hub with actual mouse and keyboard for
| setup as the UI isn't scaled well for a 5" screen.
|
| https://github.com/ProjectValhalla/OdinMultiBootGuides
|
| I don't think it's ready yet for full time use, as the
| joystick is mapped incorrectly for most games, but something
| to keep an eye on.
| eptcyka wrote:
| What's the device you're using? Color me interested.
| ThatPlayer wrote:
| The Ayn Odin. I'm not sure I'd recommend it with all the new
| upcoming x86_64 gaming handhelds coming out soon with similar
| pricing, better GPU drivers, and not having to deal with
| box86 compatibility issues.
|
| https://liliputing.com/2022/06/compare-handheld-gaming-pc-
| sp...
| rnk wrote:
| I don't have an apple arm device, I was waiting until I could run
| x86 vms reasonably efficiently, because I need that all the time.
| Up to now it seemed the answer was it's too slow if you want to
| use an x86 basically in a normal way with a vm. This article
| suggests rosetta2 would let you have usable performance, can
| someone provide the high level view? r2 was about 2/3 the speed
| of native exec on that last benchmark the article.
| olliej wrote:
| You would be able to run x86 code on an arm Linux vm. There is
| no VM option better than qemu or similar.
|
| The problem as far as I can infer is that for a binary
| translator the translator is given a bunch of context a full VM
| can't have (what random clump of bytes is an executable, etc)
___________________________________________________________________
(page generated 2022-07-20 23:02 UTC)