[HN Gopher] QEMU Internals
___________________________________________________________________
QEMU Internals
Author : Nusyne
Score : 258 points
Date : 2021-04-26 12:21 UTC (10 hours ago)
(HTM) web link (airbus-seclab.github.io)
(TXT) w3m dump (airbus-seclab.github.io)
| pthreads wrote:
| Thank you.
|
| On the same subject can someone recommend a book or any other
| resource to learn about virtual machine internals? My goal is to
| try to build a toy clone of VirtualBox/VMWare.
|
| So far I have found one -- Virtual Machines by James E. Smith and
| Ravi Nair.
| ahefner wrote:
| "KVM host in a few lines of code"
| (https://zserge.com/posts/kvm/) is a fun article to get started
| with.
| tkhattra wrote:
| Hardware and Software Support for Virtualization Synthesis
| Lectures on Computer Architecture (2017)
|
| https://www.morganclaypool.com/doi/abs/10.2200/S00754ED1V01Y...
|
| Bringing Virtualization to the x86 Architecture with the
| Original VMware Workstation (2012)
|
| https://dl.acm.org/doi/abs/10.1145/2382553.2382554
| hag wrote:
| I've always been intrigued by virtual machines and emulation as
| well. I've always wanted to try and make an emulator of some
| kind. I don't know much about the internals of VirtualBox, but
| my suggestion would be to start "easy" with one CPU/Computer
| System/Game Console and go from there. That's what I finally
| did with the 6502 and Commodore 64.
| pizza234 wrote:
| Conventionally, one starts from the CHIP-8, which is indeed a
| virtual machine rather than a system in a strict sense.
|
| What I've found difficult is the step beyond that. NES and
| GameBoy are typical steps, however, I've been very frustrated
| by the confusing documentation of the GameBoy. There are 3/4
| references, but one of them has significant mistakes, while
| another is incomplete. On the other hand, the Pan Docs should
| be complete and accurate.
|
| I'm not sure if there is an easy middle ground, that, at the
| same time, is also well documented.
|
| The Atary 2600 is architecturally simpler but less
| documented, and also requires very accurate timings. I've
| read somebody suggesting systems like Channel F, Astrocade
| and Odyssey2, but I'm not sure they're well documented.
|
| I've personally lost my interest once I've found that
| building an emulator was essentially fighting specifications
| rather than actually building something.
| toast0 wrote:
| I built about a third of a NES emulator. The nesdev wiki is
| mostly decent, although there's a fair number of things
| where it seems like the first people to figure things out
| got stuff kind of backwards, and if you flip it, it's a lot
| easier, that's the sort of fighting the specifications I
| think you're talking about.
|
| All that said, emulating the CPU was pretty fun. There's a
| CPU test rom out there you can run with tracing and compare
| to the published results. I also got the background tiling
| from the PPU done, but the foreground processing has a lot
| of steps, so I indefinitely paused for now. Also, I had
| amazingly poor performance, so I wasn't super motivated to
| continue.
|
| The 2600 has a very similar cpu, but the very limited
| Stella output chip means most games are very timing
| dependent, which means you have to be super accurate, which
| adds difficulty. I think you should try to be cycle
| accurate anyway, but it's easy to mess that up, and having
| some freedom would be nice.
| bambataa wrote:
| I did a GameBoy and similarly found the CPU enjoyable and
| the PPU a huge pain. Perhaps if I understood graphics
| better, I would have enjoyed it more, but like you say it
| just felt like a lot of steps.
| andrewf wrote:
| A subset of CP/M calls is a pretty simple "rest of the
| system" to implement on top of an 8080/Z80 CPU emulation.
| (It's a bit of a cheat - like qemu's "Linux user mode
| emulation" or early version of DOSBox, because you restrict
| software to interacting with a high-level software
| interface, there are no lower-level details to aim for
| fidelity with)
| teleforce wrote:
| The sibling's comment book recommendation "Hardware and
| Software Support for Virtualization" book is on point and it's
| written by one of the co-founders of VMware.
|
| Another book on Libvirt will be handy since it is the de facto
| API for most virtualization including VMs and containers[1].
|
| [1]https://www.amazon.com/Foundations-Libvirt-Development-
| Maint...
| alert0 wrote:
| Fuzz week shows how to make make a snapshot / resettable
| jitting hypervisor.
|
| https://m.youtube.com/playlist?list=PLSkhUfcCXvqHsOy2VUxuoAf...
| vitno wrote:
| I work on virtual machines at Google. I usually suggest
| "Hardware and Software Support for Virtualization" [1] to new
| team members without a virtualization background.
|
| [1] https://www.amazon.com/Hardware-Software-Virtualization-
| Synt...
| [deleted]
| DarmokJalad1701 wrote:
| For a really simple emulator project (not quite the level of
| VirtualBox), check the "IntCode" challenges from AdventOfCode
| 2019.
| sammorrowdrums wrote:
| Those were so fun! I loved my little VM as it progressed and
| played pong, and commanded robots and rendered the output
| etc.
|
| It's a really great fun way to learn the key concepts.
| junon wrote:
| This is very well organized, wow.
| whoisburbansky wrote:
| I don't mean this to disparage Airbus in any way but after
| Boeing's issues with the 737 MAX I'd assumed a fairly poor
| culture of software at airplane manufacturers in general. Super
| glad to see work like this coming out of Airbus, really makes me
| rethink my earlier assumptions about software competence in the
| field.
| Glawen wrote:
| Is "move fast and break things" a good culture for airplane
| manufacturer? Airbus is known for making good software, they
| earned their reputation by releasing the first fly by wire
| airliner (a320) in 84, which forced Boeing to go this route
| with the 777.
|
| Making safety critical software is a totally different world
| than what is seen on HN. The culture needed is safety culture
| and it is all about doing boring code, following strict coding
| rules, doing tons of documentation and analysis prior coding
| and a doing tons of review of tests. I don't think it will
| arouse interest here.
| Veserv wrote:
| That is such a bizarre viewpoint from my perspective. The
| absolute deathtrap that is the 737 MAX had two software-related
| critical failures in 400,000 flights. That constitutes a whole
| system per-flight software reliability of 2 in ~400,000 or a
| ~99.9995%, 5 9s. Obviously that is still unacceptable as that
| is far below the software standard amongst all commercial
| airplanes where software has not been implicated in a crash for
| at least the last 10 years except for the 737 MAX. Even if we
| include the two 737 MAX crashes into the statistics, the whole
| system per-flight software reliability of all commercial
| airplanes over the last decade is at least 2 in ~100,000,000 or
| ~99.999998% or 7 9s. The standard in airplane software is
| literally 5000x more reliable than AWS SLA guarantees and 500x
| the holy grail in server software of 5 9s. Even the 737 MAX is
| 20x better than the AWS guarantee and 2x more reliable than 5
| 9s. Airplane software is not bad, we just rightfully expect a
| lot from systems that lives depend on, so even systems that are
| better than best-in-class non-safety software are completely
| unacceptable which may give the impression that they are bad in
| absolute terms as they fail to live up to our expectations.
| zaphirplane wrote:
| That's an interesting way to look at uptime no pun intended
|
| thou I wouldn't buy a Toyota that exploded every 400,000
| trips world wide Or bank with a bank that lost all my money
| every 400,000 transactions world wide
| Glawen wrote:
| Well, Toyota had the sticking gas pedal issue 10 years ago:
| they did not implement a brake override when the gas pedal
| was stuck. This was a recommended feature by European
| manufacturers when they introduced the electronic throttle,
| apparently Toyota didn't get the memo.
|
| Although I find the GM ignition key issue way worse than
| Toyota which was an oversight.
| Veserv wrote:
| Indeed, a Toyota with a critical fatality-inducing safety
| defect every 200,000 trips would be rightfully viewed as a
| deathtrap. Given that the average trip is probably
| somewhere around ~30 miles that would be a fatality per 6M
| miles versus the standard of ~60M miles in the US, or about
| 10x more dangerous. However, when comparing a car versus
| airplanes, given that they both fulfill the niche of
| transportation and are to some degree substitutable, a more
| reasonable analysis would be fatalities/person-hour or
| fatalities/person-mile. For fatalities/person-hour the
| average flight is something like ~2 hours. In the same
| amount of time 200,000 cars for 2 hours at an average of 40
| mph would be ~16M miles, so the 737 MAX is ~4x more
| dangerous on a person-hour basis than cars. If we go by
| distance the average flight is ~500 miles, so the 737 MAX
| had a fatality per 100M person-miles or is ~1.6x _safer_
| than driving. That is just how high our standards are with
| planes that a plane that is viewed as an absolute death
| machine that is totally unfit for use is safer than its
| primary alternative for an equivalent distance. A plane
| that is 100x worse than any other commercial plane is still
| better than the non-plane alternative on a per-distance
| basis.
|
| Obviously, this does not excuse their actions as they still
| made a system at least 100x more dangerous than the
| standard, but it should give perspective on the difficulty
| of the problems actually being solved. It is not a bunch of
| amateurs or below-average engineers who need to adopt basic
| practices. It is a bunch of highly-skilled professionals
| developing systems with a level of reliability far beyond
| what most software developers even think is possible. Even
| the abysmal processes of the 737 MAX that are far below the
| standard in the airplane industry would, relative to most
| software, be very good. It is just that the problems they
| need to solve are very, very, very hard and very good does
| not cut it when lives, not data, are at stake.
| elteto wrote:
| Apples to oranges? The scale between AWS and 737s is several
| orders of magnitude different. Boeing has a critical issue
| every 200k flights, or let's say 3.8M hours of flight time
| (assuming all flights are 19h, which they are not). Assume
| AWS has 1M CPUs total (they have way more than that), if AWS
| saw a critical CPU bug every 3.8M hours of CPU time they
| would be having a 737 MAX crisis level every 3.8 hours.
| Veserv wrote:
| One failure per 3.8M hours would be once per 433 CPU-years,
| so they probably actually do have somewhere between 10-100x
| that failure rate for their CPUs given that expected CPU
| lifetime is probably around 20-30 years. Even using a much
| more reasonable 2 hours per flight that is still ~45 CPU-
| years so still within the likely range of expected CPU
| errors. Also that is a comparison against a system so
| dangerous that it is unfit for use instead of the actual
| standard which is once per 50,000,000 flights or ~250x
| better.
|
| Even ignoring that, I am discussing the uptime of a system
| using AWS which only guarantees 99.99% uptime for AWS
| service in any given AWS region and only a 10% refund
| (which is less than their profit margin) as long as they
| keep your system up more than 99% of the time. Downtime for
| a system due to AWS downtime in a region constitutes a
| critical failure of AWS to deliver expected service. That
| their lack of service does not result in deaths unlike an
| airplane is immaterial to a reliability analysis, it only
| tells us if their critical failures matter and what level
| of reliability we should require/demand when making
| reliability-cost tradeoffs. In other words, the probability
| and costs of failure are not actually related. It is just
| that costly failures result in more effort being spent on
| developing mitigations. In the case of airplanes, critical
| failure in the form of a crash is very costly, so they take
| great pains to minimize the whole-system risk of that
| failure mode.
| pjerem wrote:
| Airbus is known to be excellent in airplane software
| development.
|
| However, this is probably not about the airplane part of
| Airbus. Like Boeing, Airbus also have huge defense and space
| divisions.
| hhh wrote:
| Airbus also has the Airbus Defense and Space group as well,
| it's not just all airplanes :)
| [deleted]
___________________________________________________________________
(page generated 2021-04-26 23:01 UTC)