[HN Gopher] Weird things I learned while writing an x86 emulator
___________________________________________________________________
Weird things I learned while writing an x86 emulator
Author : fanf2
Score : 220 points
Date : 2024-07-10 14:42 UTC (8 hours ago)
(HTM) web link (www.timdbg.com)
(TXT) w3m dump (www.timdbg.com)
| pm2222 wrote:
| Prior discussion here
| https://news.ycombinator.com/item?id=34636699
|
| Cannot believe it's been 16months. How time flies.
| AstroJetson wrote:
| Check out Justine Tunney and her emulator.
| https://justine.lol/blinkenlights/
|
| The docs are an amazing tour of how the cpu works.
| zarathustreal wrote:
| Astonishing.. they never cease to amaze me
| sdsd wrote:
| What a cool person. I really enjoy writing assembly, it feels so
| simple and I really enjoy the vertical aesthetic quality.
|
| The closest I've ever come to something like OP (which is to say,
| not close at all) was when I was trying to help my JS friend
| understand the stack, and we ended up writing a mini vm with its
| own little ISA:
| https://gist.github.com/darighost/2d880fe27510e0c90f75680bfe...
|
| This could have gone much deeper - i'd have enjoyed that, but
| doing so would have detracted from the original educational goal
| lol. I should contact that friend and see if he still wants to
| study with me. it's hard since he's making so much money doing
| fancy web dev, he has no time to go deep into stuff. whereas my
| unemployed ass is basically an infinite ocean of time and energy.
| actionfromafar wrote:
| You should leverage that into your friend teaching you JS,
| maybe.
| dmitrygr wrote:
| I've written fast emulators for a dozen non-toy architectures and
| a few JIT translators for a few as well. x86 still gives me PTSD.
| I have never seen a messier architecture. There is history, and a
| reason for it, but still ... damn
| Arech wrote:
| Haha, man, I feel you :DD You probably should have started with
| it from the very beginning :D
| jcranmer wrote:
| > I have never seen a messier architecture.
|
| Itanium. Pretty much every time I open up the manual, I find a
| new thing that makes me go "what the hell were you guys
| thinking!?" without even trying to.
| snazz wrote:
| What sorts of projects are you working on that use Itanium?
| jcranmer wrote:
| None, really. I just happened to get a copy of the manual
| and start idly reading it when my computer got stuck in a
| very long update-reboot cycle and I couldn't do anything
| other than read a physical book.
| WalterBright wrote:
| That's why my music is played from a separate machine.
| trealira wrote:
| Studying the x86 architecture is kind of like studying
| languages with lots of irregularities and vestigial bits, and
| with competing grammatical paradigms, e.g. French. Other
| architectures, like RISC-V and ARMv8, are much more consistent.
| x0x0 wrote:
| I think English may be a better example; we just stapled
| chunks of vulgar latin to an inconsistently simplified proto-
| germanic and then borrowed words from every language we met
| along the way. Add in 44 sounds serialized to the page with
| 26 letters and tada!
| WalterBright wrote:
| The Norman conquest of England means English is a language
| with barbarian syntax and French nouns. It's a happy mess
| of cultural appropriation!
|
| Squeezing the lot into 26 characters was simply genius -
| enabling printing with movable type, Morse code, Baudot
| code, ASCII, etc.
|
| Of course, then icons came along and ruined everything.
| aengelke wrote:
| > Other architectures, like [...] ARMv8, are much more
| consistent.
|
| From an instruction/operation perspective, AArch64 is more
| clean. However, from an instruction operand and encoding
| perspective, AArch64 is a lot less consistent than x86.
| Consider the different operand types: on x86, there are a
| dozen register types, immediate (8/16/32/64 bits), and memory
| operands (always the same layout). On AArch64, there's: GP
| regs, incremented GP reg (MOPS extension), extended GP reg
| (e.g., SXTB), shifted GP reg, stack pointer, FP reg, vector
| register, vector register element, vector register table,
| vector register table element, a dozen types of memory
| operands, conditions, and a dozen types of immediate
| encodings (including the fascinating and very useful, but
| also very non-trivial encoding of logical immediates [1]).
|
| AArch64 also has some register constraints: some vector
| operations can only encode register 0-15 or 0-7; not to
| mention SVE with it's "movprfx" prefix instruction that is
| only valid in front of a few selected instructions.
|
| [1]: https://github.com/aengelke/disarm/blob/master/encode.c#
| L19-...
| trealira wrote:
| Admittedly, I never wrote an assembler, but the encoding of
| x86-64 seems pretty convoluted [0] [1], with much of the
| information smeared across the some bits in the Mod/RM and
| SIB bytes and then extended by prefix bytes. It's more
| complicated than you would assume from having only written
| assembly in text and then having the assembler encode the
| instructions.
|
| One of the things that sticks out to me is that on x86-64,
| operations on 32-bit registers implicitly zero out the
| upper 32-bits of the corresponding 64-bit register (e.g.
| "MOV EAX, EBX" also zeroes out the upper 32-bits of RAX),
| except for the opcode 0x90, which, logically, would be the
| accumulator encoding of "XCHG EAX, EAX" but is special-
| cased to do nothing because it's the traditional NOP
| instruction for x86. So you have to use the other encoding,
| 87 C0.
|
| The fact that AArch64 has instructions that are almost
| always 4 bytes, and the fact that they clearly thought out
| their instruction set more (e.g. instead of having CMP,
| they just use SUBS (subtract and store condition codes) and
| store the result in the zero register, which is always
| zero), is what made me say that it seemed more regular. I
| hadn't studied it in as close detail as x86-64, though.
| But, you've written an an ARM disassembler and I haven't;
| you seem to know more about it than me, and so I believe
| what you're saying.
|
| > AArch64 also has some register constraints
|
| x86-64 also has some, in that many instructions can't
| encode the upper 8 bits of a 16-bit register (AH, BH, DH,
| CH) if a REX prefix is used.
|
| [0]: https://wiki.osdev.org/X86-64_Instruction_Encoding
|
| [1]: http://www.c-jump.com/CIS77/CPU/x86/lecture.html
| WalterBright wrote:
| > you've written an an ARM disassembler
|
| Here's my AArch64 disassembler work in progress:
|
| https://github.com/dlang/dmd/blob/master/compiler/src/dmd
| /ba...
|
| I add to it in tandem with writing the code generator. It
| helps flush out bugs in both by doing this. I.e. generate
| the instruction, the disassemble it and compare with what
| I thought it should be.
|
| It's quite a bit more complicated than the corresponding
| x86 disassembler:
|
| https://github.com/dlang/dmd/blob/master/compiler/src/dmd
| /ba...
| WalterBright wrote:
| As I'm currently implementing an AArch64 code generator for
| the D language dmd compiler, the inconsistency of its
| instructions is equaled and worsened by the clumsiness of
| the documentation for it :-/ But I'm slowly figuring it
| out.
|
| (For example, some instructions with very different
| encodings have the same mnemonic. Arrgghh.)
| SunlitCat wrote:
| Haha! Writing an x86 emulator! I still remember writing a toy
| emulator which was able to execute something around the first
| 1000-ish lines of a real bios (and then it stuck or looped when
| it started to access ports or so, can't remember it was too long
| ago and I didn't continue it as I started to get into DirectX and
| modern c++ more).
| aengelke wrote:
| Bonus quirk: there's BSF/BSR, for which the Intel SDM states that
| on zero input, the destination has an undefined value. (AMD
| documents that the destination is not modified in that case.) And
| then there's glibc, which happily uses the undocumented fact that
| the destination is also unmodified on Intel [1]. It took me quite
| some time to track down the issue in my binary translator.
| (There's also TZCNT/LZCNT, which is BSF/BSR encoded with
| F3-prefix -- which is silently ignored on older processors not
| supporting the extension. So the same code will behave
| differently on different CPUs. At least, that's documented.)
|
| Encoding: People often complain about prefixes, but IMHO, that's
| by far not the worst thing. It is well known and somewhat well
| documented. There are worse quirks: For example, REX/VEX/EVEX.RXB
| extension bits are ignored when they do not apply (e.g., MMX
| registers); except for mask registers (k0-k7), where they trigger
| #UD -- also fine -- except if the register is encoded in
| ModRM.rm, in which case the extension bit is ignored again.
|
| APX takes the number of quirks to a different level: the REX2
| prefix can encode general-purpose registers r16-r31, but not
| xmm16-xmm31; the EVEX prefix has several opcode-dependent
| layouts; and the extension bits for a register used depend on the
| register type (XMM registers use X3:B3:rm and V4:X3:idx; GP
| registers use B4:B3:rm, X4:X3:idx). I can't give a complete list
| yet, I still haven't finished my APX decoder after a year...
|
| [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=31748
| CoastalCoder wrote:
| Can you imagine having to make all this logic work faithfully,
| let alone _fast_ , in silicon?
|
| X86 used to be Intel's moat, but what a nightmarish burden to
| carry.
| dx4100 wrote:
| Did people just... do this by hand (in software), transistor
| by transistor, or was it laid out programmatically in some
| sense? As in, were segments created algorithmically, then
| repeated to obtain the desired outcome? CPU design baffles
| me, especially considering there are 134 BILLION transistors
| or so in the latest i7 CPU. How does the team even keep track
| of, work on, or even load the files to WORK on the CPUs?
| tristor wrote:
| They use EDA (Electronic Design Automation) software, there
| are only a handful of vendors, the largest probably being
| Mentor Graphics, now owned by Siemens. So, yes, they use
| automation to algorithmically build and track/resolve
| refactors as they design CPUs. CPUs are /generally/ block-
| type designs these days, so particular functions get
| repeated identically in different places and can be
| somewhat abstracted away in your EDA.
|
| It's still enormously complex, and way more complex than
| the last time I touched this stuff more than 15 years ago.
| rikthevik wrote:
| I love that the EDA industry still uses Tcl heavily.
| Warms my heart.
| monocasa wrote:
| It's written in an HDL; IIRC both Intel and AMD use
| verilog. A modern core is on the order of a million or so
| lines of verilog.
|
| Some of that will be hand placed, quite a bit will just be
| thrown at the synthesizer. Other parts like SRAM blocks
| will have their cad generated directly from a macro and a
| description of the block in question.
| cogman10 wrote:
| To further expound on this. ASIC (like AMD CPUs) is a lot
| like software work. The engineers that create a lot of
| the digital logic aren't dealing with individual
| transistors, instead they are saying "give me an
| accumulator for this section of code" and the HDL
| provides it. The definition of that module exists
| elsewhere and is shared throughout the system.
|
| This is how the complexity can be wrangled.
|
| Now, MOST of the work is automated for digital logic.
| However, we live in an analog world. So, there is (As far
| as I'm aware) still quite a bit of work for analog
| engineers to bend the analog reality into digital. In the
| real world, changing current creates magnetic fields
| which means you need definitions limiting voltages and
| defining how close a signal line can be to avoid cross
| talk. Square waves are hard to come by, so there's effort
| in timing and voltage bands to make sure you aren't
| registering a "1" when it should have been a "0".
|
| Several of my professors were intel engineers. From what
| they told me, the ratios of employment were something
| like 100 digital engineers to 10 analog engineers to 1
| Physicist/materials engineer.
| bonzini wrote:
| It was entirely laid out by hand until the 286. Using
| standard cells in the 386 enabled the switch from microcode
| to a mostly hardwired core.
| gumby wrote:
| A lot of this is done in software (microcode). But even with
| that case, your statement still holds: "Can you imagine
| having to make all this logic work faithfully, let alone
| fast, in the chip itself?" Writing that microcode must be
| fiendishly difficult given all the functional units, out of
| order execution, register renaming...
| bonzini wrote:
| The crazy parts that were mentioned in the parent comment
| are all part of the hot path. Microcode handles slow paths
| related to paging and segmentation, and very rare
| instructions. Not necessarily unimportant (many common
| privileged instructions are microcoded) but still rare
| compared to the usual ALU instructions.
|
| But it's not a huge deal to program the quirky encoding in
| an HDL, it's just a waste of transistors. The really
| complicated part is the sequencing of micro operations and
| how they enter the (out of order) execution unit.
| aengelke wrote:
| > A lot of this is done in software (microcode).
|
| No, that's not the case, since >30 years. Microcode is only
| used for implementing some complex instructions (mostly
| system instructions). Most regular instructions (and the
| rest of the core) don't use microcode and their expansions
| into uOps are hardwired. Also the entire execution unit is
| hardwired.
|
| There are typically some undocumented registers (MSRs on
| x86) that can control how the core behaves (e.g., kill
| switches for certain optimizations). These can then be
| changed by microcode updates.
| im3w1l wrote:
| It's a lot easier to more or less accidentally create
| something quirky than it is to create a second quirk-for-
| quirk compatible system.
| bonzini wrote:
| On and off over the last year I have been rewriting QEMU's x86
| decoder. It started as a necessary task to incorporate AVX
| support, but I am now at a point where only a handful of
| opcodes are left to rewrite, after which it should not be too
| hard to add APX support. For EVEX my plan is to keep the raw
| bits until after the opcode has been read (i.e. before
| immediates and possibly before modrm) and the EVEX class
| identified.
|
| My decoder is mostly based on the tables in the manual, and the
| code is mostly okay--not too much indentation and phases mostly
| easy to separate/identify. Because the output is JITted code,
| it's ok to not be super efficient and keep the code readable;
| it's not where most of the time is spent. Nevertheless there
| are several cases in which the manual is wrong or doesn't say
| the whole story. And the tables haven't been updated for
| several years (no K register instructions, for example), so
| going forward there will be more manual work to do. :(
|
| The top comment explains a bit what's going on:
| https://github.com/qemu/qemu/blob/59084feb256c617063e0dbe7e6...
|
| (As I said above, there are still a few instructions handled by
| the old code predating the rewrite, notably BT/BTS/BTR/BTC. I
| have written the code but not merged it yet).
| aengelke wrote:
| Thanks for the pointer to QEMU's decoder! I actually never
| looked at it before.
|
| So you coded all the tables manually in C -- interesting,
| that's quite some effort. I opted to autogenerate the tables
| (and keep them as data only => smaller memory footprint)
| [1,2]. That's doable, because x86 encodings are mostly fairly
| consistent. I can also generate an encoder from it (ok, you
| don't need that). Re 'custom size "xh"': AVX-512 also has
| fourth and eighth. Also interesting that you have a separate
| row for "66+F2". I special case these two (CRC32, MOVBE)
| instructions with a flag.
|
| I think the prefix decoding is not quite right for x86-64:
| 26/2e/36/3e are ignored in 64-bit mode, except for 2e/3e as
| branch-not-taken/taken hints and 3e as notrack. (See SDM Vol.
| 1 3.3.7.1 "Other segment override prefixes (CS, DS, ES, and
| SS) are ignored.") Also, REX prefixes that don't immediately
| preceed the opcode (or VEX/EVEX prefix) are ignored. Anyhow,
| I need to take a closer look at the decoder with more time.
| :-)
|
| > For EVEX my plan is to keep the raw bits until after the
| opcode has been read
|
| I came to the same conclusion that this is necessary with
| APX. The map+prefix+opcode combination identifies how the
| other fields are to be interpreted. For AVX-512, storing the
| last byte was sufficient, but with APX, vvvv got a second
| meaning.
|
| > Nevertheless there are several cases in which the manual is
| wrong or doesn't say the whole story.
|
| Yes... especially for corner cases, getting real hardware is
| the only reliable way to find out, how the CPU behaves.
|
| [1]: https://github.com/aengelke/fadec/blob/master/instrs.txt
| [2]: https://github.com/aengelke/fadec/blob/master/decode.c
| bonzini wrote:
| > interesting that you have a separate row for "66+F2"
|
| Yeah that's only for 0F38F0 to 0F38FF.
|
| > Re 'custom size "xh"': AVX-512 also has fourth and eighth
|
| Also AVX for VPMOVSX and VPMOVZX but those are handled
| differently. I probably should check if xh is actually
| redundant... _EDIT_ : it's only needed for VCVTPS2PH, which
| is the only instruction with a half-sized _destination_.
|
| > I think the prefix decoding is not quite right for
| x86-64: 26/2e/36/3e are ignored in 64-bit mode
|
| Interesting, I need to check how they interact with the
| FS/GS prefixes (64/65).
|
| > REX prefixes that don't immediately preceed the opcode
| (or VEX/EVEX prefix) are ignored
|
| Oh, didn't know that!
| aengelke wrote:
| > how they interact with the FS/GS prefixes (64/65)
|
| For memory operations, they are ignored: 64-2e-65-3e
| gives 65 as segment override. (From my memory and the
| resulting implementation, I did some tests with hardware
| a few years back.)
|
| I do need to check myself how 2e/3e on branches interact
| with other segment overrides, though.
| jart wrote:
| FWIW here's all 700 lines of Blink's x86 decoder.
| https://github.com/jart/blink/blob/master/blink/x86.c
| bonzini wrote:
| I don't want to be the person that has to add an
| instruction to blink...
| mananaysiempre wrote:
| The semantics of LZCNT combined with its encoding feels like an
| own goal: it's encoded as a BSR instruction with a legacy-
| ignored prefix, _but_ for nonzero inputs its return value is
| the operand size minus the return value of the legacy version.
| Yes, clz() is a function that exists, but the extra subtraction
| in its implementation feels like a small cost to pay for extra
| compatibility when LZCNT could've just been BSR with different
| zero-input semantics.
| bonzini wrote:
| Yes, it's like someone looked at TZCNT and thought "let's
| encode LZCNT the same way", but it makes no sense.
| torusle wrote:
| Another bonus quirk, from the 486 and Pentium area..
|
| BSWAP EAX converts from little endian to big endian and vice
| versa. It was a 32 bit instruction to begin with.
|
| However, we have the 0x66 prefix that switches between 16 and
| 32 bit mode. If you apply that to BSWAP EAX undefined funky
| things happen.
|
| On some CPU architectures (Intel vs. AMD) the prefix was just
| ignored. On others it did something that I call an "inner
| swap". E.g. in your four bytes that are stored in EAX byte 1
| and 2 are swapped. 0x11223344 became
| 0x11332244.
| was_a_dev wrote:
| Off topic, but I like this blog style/layout. I can imagine it
| isn't everyones taste, but it just works for me
| timmisiak wrote:
| Glad you like it. I used m10c, with a few tweaks:
| https://github.com/vaga/hugo-theme-m10c
| waynecochran wrote:
| Intel architecture is loaded with historical artifacts. The
| switch in how segment registers were used as you went from real
| mode to protected mode was an incredible hardware hack to keep
| older software working. I blame Intel for why so many folks avoid
| assembly language. I programmed in assembly for years using TI's
| 84010 graphics chips and the design was gorgeous -- simple RISC
| instruction set, flat address space, and bit addressable! If
| during the earlier decades folks were programming using chips
| with more elegant designs, far more folks would be programming in
| assembly language (or at least would know how to).
| hajile wrote:
| > I blame Intel for why so many folks avoid assembly language.
|
| x86 (the worst assembly of any of the top 50 most popular ISAs
| by a massive margin) and tricky MIPS branch delay slots trivia
| questions at university have done more to turn off programmers
| from learning assembly than anything else and it's not even
| close.
|
| This is one reason I'm hoping that RISC-V kills off x86. It
| actually has a chance of once again allowing your average
| programmer to learn useful assembly.
| LegionMammal978 wrote:
| What do you find particularly problematic about x86 assembly,
| from a pedagogical standpoint? I've never noticed any glaring
| issues with it, except for the weird suffixes and sigils if
| you use AT&T syntax (which I generally avoid).
| timmisiak wrote:
| I suspect the biggest issue is that courses like to talk
| about how instructions are encoded, and that can be
| difficult with x86 considering how complex the encoding
| scheme is. Personally, I don't think x86 is all that bad as
| long as you look at a small useful subset of instructions
| and ignore legacy and encoding.
| LegionMammal978 wrote:
| True, encoding is one thing that really sets x86 apart.
| But as you say, the assembly itself doesn't seem that
| uniquely horrible (at least not since the 32-bit era),
| which is why I found the sentiment confusing as it was
| phrased.
|
| Maybe it's the haphazard SIMD instruction set, with every
| extension adding various subtly-different ways to permute
| bytes and whatnot? But that would hardly seem like a
| beginner's issue. The integer multiplication and division
| instructions can also be a bit wonky to use, but hardly
| unbearably so.
| bheadmaster wrote:
| If Terry A. Davis is to be trusted, as long as you ignore the
| legacy stuff, x64 assembly is nice to work with.
| russdill wrote:
| What's crazy is that depending on how deep you want to go, a
| lot of the information is not available in documents published
| by Intel. Fortunately, if it matters for emulators it typically
| can be/has been reverse engineering.
| Max-q wrote:
| Wow, is that true? And the documentation is thousands of
| pages! I can't understand how Intel keep these processors
| consistent during development. It must be a really, really
| hard job.
|
| It was a nice period in history when proper RISC was working
| well, when we could get stuff to run faster, but getting more
| transistors was expensive. Now we can't get it to run faster
| but we can get billions of transistors, making stuff more
| complicated being the way to more performance.
|
| I wonder if we ever again will get a time where
| simplification is a valid option...
| russdill wrote:
| There's some info here:
|
| http://www.rcollins.org/secrets/
| http://www.rcollins.org/ddj/ddj.html
|
| Notably things like undocumented opcodes, processor modes,
| undocumented registers, undocumented bits within registers,
| etc. It's not uncommon to release a specific set of
| documentation to trusted partners only. Back in the day
| intel called these "gold books".
| jecel wrote:
| Wouldn't that be the 34010?
| waynecochran wrote:
| Yes. TMS 34010 and 34020 ... my bad.
| trollied wrote:
| > Writing a CPU emulator is, in my opinion, the best way to
| REALLY understand how a CPU works
|
| Hard disagree.
|
| The best way is to create a CPU from gate level, like you do on a
| decent CS course. (I really enjoyed making a cut down ARM from
| scratch)
| whobre wrote:
| Reading Petzold's "Code" comes pretty close to, though and is
| easier.
| commandlinefan wrote:
| OTOH, are you really going to be implementing memory segmenting
| in your gate-level CPU? I'd say actually creating a working CPU
| and _then_ emulating a real CPU (warts and all) are both
| necessary steps to real understanding.
| trollied wrote:
| I agree.
| monocasa wrote:
| > OTOH, are you really going to be implementing memory
| segmenting in your gate-level CPU?
|
| I have, but it was a PDP-8 which I'll be the first to admit
| is kind of cheating.
| quantified wrote:
| Well, I think you're both right. It's satisfying as heck to
| sling 74xx chips together and you get a feel for the electrical
| side of things and internal tradeoffs.
|
| When you get to doing that for the CPU that you want to do
| meaningful work with, you start to lose interest in that
| detail. Then the complexities of the behavior and spec become
| interesting and the emulator approach is more tractable, can
| cover more types of behavior.
| IshKebab wrote:
| I think trollied is correct actually. I work on a CPU
| emulator professionally and while it gives you a great
| understanding of the spec there are lots of details about
| _why_ the spec is the way it is that are due to how you
| actually implement the microarchitecture. You only learn that
| stuff by actually implementing a microarchitecture.
|
| Emulators tend not to have many features that you find in
| real chips, e.g. caches, speculative execution, out-of-order
| execution, branch predictors, pipelining, etc.
|
| This isn't "the electrical side of things". When he said
| "gate level" he meant RTL (SystemVerilog/VHDL) which is
| pretty much entirely in the digital domain; you very rarely
| need to worry about actual electricity.
| trollied wrote:
| I write retro console emulators for fun, so agree with you
| 100% :)
| timmisiak wrote:
| I think both are useful, but designing a modern CPU from the
| gate level is out of reach for most folks, and I think there's
| a big gap between the sorts of CPUs we designed in college and
| the sort that run real code. I think creating an emulator of a
| modern CPU is a somewhat more accessible challenge, while still
| being very educational even if you only get something partially
| working.
| banish-m4 wrote:
| This is an illusion and a red herring. RTL synthesis is the
| typical functional prototype stage reached which is generally
| sufficient for FPGA work. To burn an ASIC as part of an
| educational consortium run is doable, but it's uncommon.
| WalterBright wrote:
| When I was at Caltech, another student in the dorm had been
| admitted because he'd designed and implemented a CPU using
| only 7400 TTL.
|
| Woz wasn't the only supersmart young computer guy at the time
| :-)
|
| (I don't know how capable it was, even a 4 bit CPU would be
| quite a challenge with TTL.)
| brailsafe wrote:
| So far on my journey through Nand2Tetris (since I kind of
| dropped out of my real CS course) I've found the entire process
| of working my way up from gate level, and just finished the VM
| emulator chapter which took an eternity. Now onto compilation.
| banish-m4 wrote:
| Seconded. A microcoded, pipelined, superscalar, branch-
| predicting basic processor with L1 data & instruction caches
| and write-back L2 cache controller is nontrivial. Most software
| engineers have an incomplete grasp of data hazards, cache
| invalidation, or pipeline stalls.
| fjfaase wrote:
| Interesting read. I have a lot of respect for people who develop
| emulator for x86 processors. It is a complicated processor and
| from first hand experience I know that developing and debugging
| emulators for CPU's can be very challenging. In the past year, I
| spend some time developing a very limited i386 emulator [1]
| including some system calls for executing the first steps of
| live-bootstrap [2], primarily to figure out how it is working. I
| learned a lot about system calls and ELF.
|
| [1] https://github.com/FransFaase/Emulator/
|
| [2] https://github.com/fosslinux/live-bootstrap/
| banish-m4 wrote:
| Most of the complexities lie in managing the various
| configurations of total system compatibility emulation,
| especially for timing, analog oddities, and whether to include
| bugs or not and for which steppings. If you want precise and
| accurate emulation, you have to have real hardware to validate
| behavior against. Then there are the cases of what not to
| emulate and offering better-than-original alternatives.
| Quekid5 wrote:
| Just as an adjacent aside from a random about learning by doing:
|
| Implementing a ThingDoer is a huge learning experience. I
| remember doing co-op "write-a-compiler" coursework with another
| person. We were doing great, everything was working and then we
| got to the oral exam...
|
| "Why is your Stack Pointer growing upwards"?
|
| ... I was kinda stunned. I'd never thought about that. We
| understood most of the things, but sometimes we kind of just
| bashed at things until they worked... and it turned out upward-
| growing SP did work (up to a point) on the architecture our toy
| compiler was targeting.
| boricj wrote:
| It's funny to me how much grief x86 assembly generates when
| compared to RISC here, because I have the opposite problem when
| delinking code back into object files.
|
| For this use-case, x86 is really easy to analyze whereas MIPS has
| been a nightmare to pull off. This is because all I mostly care
| about are references to code and data. x86 has pointer-sized
| immediate constants and MIPS has split HI16/LO16 relocation
| pairs, which leads to all sorts of trouble with register usage
| graphs, code flow and branch delay instructions.
|
| That should not be constructed as praise on my end for x86.
___________________________________________________________________
(page generated 2024-07-10 23:00 UTC)