hngopher.com

       [HN Gopher] Weird things I learned while writing an x86 emulator
       ___________________________________________________________________
        
       Weird things I learned while writing an x86 emulator
        
       Author : fanf2
       Score  : 220 points
       Date   : 2024-07-10 14:42 UTC (8 hours ago)
        
 (HTM) web link (www.timdbg.com)
 (TXT) w3m dump (www.timdbg.com)
        
       | pm2222 wrote:
       | Prior discussion here
       | https://news.ycombinator.com/item?id=34636699
       | 
       | Cannot believe it's been 16months. How time flies.
        
       | AstroJetson wrote:
       | Check out Justine Tunney and her emulator.
       | https://justine.lol/blinkenlights/
       | 
       | The docs are an amazing tour of how the cpu works.
        
         | zarathustreal wrote:
         | Astonishing.. they never cease to amaze me
        
       | sdsd wrote:
       | What a cool person. I really enjoy writing assembly, it feels so
       | simple and I really enjoy the vertical aesthetic quality.
       | 
       | The closest I've ever come to something like OP (which is to say,
       | not close at all) was when I was trying to help my JS friend
       | understand the stack, and we ended up writing a mini vm with its
       | own little ISA:
       | https://gist.github.com/darighost/2d880fe27510e0c90f75680bfe...
       | 
       | This could have gone much deeper - i'd have enjoyed that, but
       | doing so would have detracted from the original educational goal
       | lol. I should contact that friend and see if he still wants to
       | study with me. it's hard since he's making so much money doing
       | fancy web dev, he has no time to go deep into stuff. whereas my
       | unemployed ass is basically an infinite ocean of time and energy.
        
         | actionfromafar wrote:
         | You should leverage that into your friend teaching you JS,
         | maybe.
        
       | dmitrygr wrote:
       | I've written fast emulators for a dozen non-toy architectures and
       | a few JIT translators for a few as well. x86 still gives me PTSD.
       | I have never seen a messier architecture. There is history, and a
       | reason for it, but still ... damn
        
         | Arech wrote:
         | Haha, man, I feel you :DD You probably should have started with
         | it from the very beginning :D
        
         | jcranmer wrote:
         | > I have never seen a messier architecture.
         | 
         | Itanium. Pretty much every time I open up the manual, I find a
         | new thing that makes me go "what the hell were you guys
         | thinking!?" without even trying to.
        
           | snazz wrote:
           | What sorts of projects are you working on that use Itanium?
        
             | jcranmer wrote:
             | None, really. I just happened to get a copy of the manual
             | and start idly reading it when my computer got stuck in a
             | very long update-reboot cycle and I couldn't do anything
             | other than read a physical book.
        
               | WalterBright wrote:
               | That's why my music is played from a separate machine.
        
         | trealira wrote:
         | Studying the x86 architecture is kind of like studying
         | languages with lots of irregularities and vestigial bits, and
         | with competing grammatical paradigms, e.g. French. Other
         | architectures, like RISC-V and ARMv8, are much more consistent.
        
           | x0x0 wrote:
           | I think English may be a better example; we just stapled
           | chunks of vulgar latin to an inconsistently simplified proto-
           | germanic and then borrowed words from every language we met
           | along the way. Add in 44 sounds serialized to the page with
           | 26 letters and tada!
        
             | WalterBright wrote:
             | The Norman conquest of England means English is a language
             | with barbarian syntax and French nouns. It's a happy mess
             | of cultural appropriation!
             | 
             | Squeezing the lot into 26 characters was simply genius -
             | enabling printing with movable type, Morse code, Baudot
             | code, ASCII, etc.
             | 
             | Of course, then icons came along and ruined everything.
        
           | aengelke wrote:
           | > Other architectures, like [...] ARMv8, are much more
           | consistent.
           | 
           | From an instruction/operation perspective, AArch64 is more
           | clean. However, from an instruction operand and encoding
           | perspective, AArch64 is a lot less consistent than x86.
           | Consider the different operand types: on x86, there are a
           | dozen register types, immediate (8/16/32/64 bits), and memory
           | operands (always the same layout). On AArch64, there's: GP
           | regs, incremented GP reg (MOPS extension), extended GP reg
           | (e.g., SXTB), shifted GP reg, stack pointer, FP reg, vector
           | register, vector register element, vector register table,
           | vector register table element, a dozen types of memory
           | operands, conditions, and a dozen types of immediate
           | encodings (including the fascinating and very useful, but
           | also very non-trivial encoding of logical immediates [1]).
           | 
           | AArch64 also has some register constraints: some vector
           | operations can only encode register 0-15 or 0-7; not to
           | mention SVE with it's "movprfx" prefix instruction that is
           | only valid in front of a few selected instructions.
           | 
           | [1]: https://github.com/aengelke/disarm/blob/master/encode.c#
           | L19-...
        
             | trealira wrote:
             | Admittedly, I never wrote an assembler, but the encoding of
             | x86-64 seems pretty convoluted [0] [1], with much of the
             | information smeared across the some bits in the Mod/RM and
             | SIB bytes and then extended by prefix bytes. It's more
             | complicated than you would assume from having only written
             | assembly in text and then having the assembler encode the
             | instructions.
             | 
             | One of the things that sticks out to me is that on x86-64,
             | operations on 32-bit registers implicitly zero out the
             | upper 32-bits of the corresponding 64-bit register (e.g.
             | "MOV EAX, EBX" also zeroes out the upper 32-bits of RAX),
             | except for the opcode 0x90, which, logically, would be the
             | accumulator encoding of "XCHG EAX, EAX" but is special-
             | cased to do nothing because it's the traditional NOP
             | instruction for x86. So you have to use the other encoding,
             | 87 C0.
             | 
             | The fact that AArch64 has instructions that are almost
             | always 4 bytes, and the fact that they clearly thought out
             | their instruction set more (e.g. instead of having CMP,
             | they just use SUBS (subtract and store condition codes) and
             | store the result in the zero register, which is always
             | zero), is what made me say that it seemed more regular. I
             | hadn't studied it in as close detail as x86-64, though.
             | But, you've written an an ARM disassembler and I haven't;
             | you seem to know more about it than me, and so I believe
             | what you're saying.
             | 
             | > AArch64 also has some register constraints
             | 
             | x86-64 also has some, in that many instructions can't
             | encode the upper 8 bits of a 16-bit register (AH, BH, DH,
             | CH) if a REX prefix is used.
             | 
             | [0]: https://wiki.osdev.org/X86-64_Instruction_Encoding
             | 
             | [1]: http://www.c-jump.com/CIS77/CPU/x86/lecture.html
        
               | WalterBright wrote:
               | > you've written an an ARM disassembler
               | 
               | Here's my AArch64 disassembler work in progress:
               | 
               | https://github.com/dlang/dmd/blob/master/compiler/src/dmd
               | /ba...
               | 
               | I add to it in tandem with writing the code generator. It
               | helps flush out bugs in both by doing this. I.e. generate
               | the instruction, the disassemble it and compare with what
               | I thought it should be.
               | 
               | It's quite a bit more complicated than the corresponding
               | x86 disassembler:
               | 
               | https://github.com/dlang/dmd/blob/master/compiler/src/dmd
               | /ba...
        
             | WalterBright wrote:
             | As I'm currently implementing an AArch64 code generator for
             | the D language dmd compiler, the inconsistency of its
             | instructions is equaled and worsened by the clumsiness of
             | the documentation for it :-/ But I'm slowly figuring it
             | out.
             | 
             | (For example, some instructions with very different
             | encodings have the same mnemonic. Arrgghh.)
        
       | SunlitCat wrote:
       | Haha! Writing an x86 emulator! I still remember writing a toy
       | emulator which was able to execute something around the first
       | 1000-ish lines of a real bios (and then it stuck or looped when
       | it started to access ports or so, can't remember it was too long
       | ago and I didn't continue it as I started to get into DirectX and
       | modern c++ more).
        
       | aengelke wrote:
       | Bonus quirk: there's BSF/BSR, for which the Intel SDM states that
       | on zero input, the destination has an undefined value. (AMD
       | documents that the destination is not modified in that case.) And
       | then there's glibc, which happily uses the undocumented fact that
       | the destination is also unmodified on Intel [1]. It took me quite
       | some time to track down the issue in my binary translator.
       | (There's also TZCNT/LZCNT, which is BSF/BSR encoded with
       | F3-prefix -- which is silently ignored on older processors not
       | supporting the extension. So the same code will behave
       | differently on different CPUs. At least, that's documented.)
       | 
       | Encoding: People often complain about prefixes, but IMHO, that's
       | by far not the worst thing. It is well known and somewhat well
       | documented. There are worse quirks: For example, REX/VEX/EVEX.RXB
       | extension bits are ignored when they do not apply (e.g., MMX
       | registers); except for mask registers (k0-k7), where they trigger
       | #UD -- also fine -- except if the register is encoded in
       | ModRM.rm, in which case the extension bit is ignored again.
       | 
       | APX takes the number of quirks to a different level: the REX2
       | prefix can encode general-purpose registers r16-r31, but not
       | xmm16-xmm31; the EVEX prefix has several opcode-dependent
       | layouts; and the extension bits for a register used depend on the
       | register type (XMM registers use X3:B3:rm and V4:X3:idx; GP
       | registers use B4:B3:rm, X4:X3:idx). I can't give a complete list
       | yet, I still haven't finished my APX decoder after a year...
       | 
       | [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=31748
        
         | CoastalCoder wrote:
         | Can you imagine having to make all this logic work faithfully,
         | let alone _fast_ , in silicon?
         | 
         | X86 used to be Intel's moat, but what a nightmarish burden to
         | carry.
        
           | dx4100 wrote:
           | Did people just... do this by hand (in software), transistor
           | by transistor, or was it laid out programmatically in some
           | sense? As in, were segments created algorithmically, then
           | repeated to obtain the desired outcome? CPU design baffles
           | me, especially considering there are 134 BILLION transistors
           | or so in the latest i7 CPU. How does the team even keep track
           | of, work on, or even load the files to WORK on the CPUs?
        
             | tristor wrote:
             | They use EDA (Electronic Design Automation) software, there
             | are only a handful of vendors, the largest probably being
             | Mentor Graphics, now owned by Siemens. So, yes, they use
             | automation to algorithmically build and track/resolve
             | refactors as they design CPUs. CPUs are /generally/ block-
             | type designs these days, so particular functions get
             | repeated identically in different places and can be
             | somewhat abstracted away in your EDA.
             | 
             | It's still enormously complex, and way more complex than
             | the last time I touched this stuff more than 15 years ago.
        
               | rikthevik wrote:
               | I love that the EDA industry still uses Tcl heavily.
               | Warms my heart.
        
             | monocasa wrote:
             | It's written in an HDL; IIRC both Intel and AMD use
             | verilog. A modern core is on the order of a million or so
             | lines of verilog.
             | 
             | Some of that will be hand placed, quite a bit will just be
             | thrown at the synthesizer. Other parts like SRAM blocks
             | will have their cad generated directly from a macro and a
             | description of the block in question.
        
               | cogman10 wrote:
               | To further expound on this. ASIC (like AMD CPUs) is a lot
               | like software work. The engineers that create a lot of
               | the digital logic aren't dealing with individual
               | transistors, instead they are saying "give me an
               | accumulator for this section of code" and the HDL
               | provides it. The definition of that module exists
               | elsewhere and is shared throughout the system.
               | 
               | This is how the complexity can be wrangled.
               | 
               | Now, MOST of the work is automated for digital logic.
               | However, we live in an analog world. So, there is (As far
               | as I'm aware) still quite a bit of work for analog
               | engineers to bend the analog reality into digital. In the
               | real world, changing current creates magnetic fields
               | which means you need definitions limiting voltages and
               | defining how close a signal line can be to avoid cross
               | talk. Square waves are hard to come by, so there's effort
               | in timing and voltage bands to make sure you aren't
               | registering a "1" when it should have been a "0".
               | 
               | Several of my professors were intel engineers. From what
               | they told me, the ratios of employment were something
               | like 100 digital engineers to 10 analog engineers to 1
               | Physicist/materials engineer.
        
             | bonzini wrote:
             | It was entirely laid out by hand until the 286. Using
             | standard cells in the 386 enabled the switch from microcode
             | to a mostly hardwired core.
        
           | gumby wrote:
           | A lot of this is done in software (microcode). But even with
           | that case, your statement still holds: "Can you imagine
           | having to make all this logic work faithfully, let alone
           | fast, in the chip itself?" Writing that microcode must be
           | fiendishly difficult given all the functional units, out of
           | order execution, register renaming...
        
             | bonzini wrote:
             | The crazy parts that were mentioned in the parent comment
             | are all part of the hot path. Microcode handles slow paths
             | related to paging and segmentation, and very rare
             | instructions. Not necessarily unimportant (many common
             | privileged instructions are microcoded) but still rare
             | compared to the usual ALU instructions.
             | 
             | But it's not a huge deal to program the quirky encoding in
             | an HDL, it's just a waste of transistors. The really
             | complicated part is the sequencing of micro operations and
             | how they enter the (out of order) execution unit.
        
             | aengelke wrote:
             | > A lot of this is done in software (microcode).
             | 
             | No, that's not the case, since >30 years. Microcode is only
             | used for implementing some complex instructions (mostly
             | system instructions). Most regular instructions (and the
             | rest of the core) don't use microcode and their expansions
             | into uOps are hardwired. Also the entire execution unit is
             | hardwired.
             | 
             | There are typically some undocumented registers (MSRs on
             | x86) that can control how the core behaves (e.g., kill
             | switches for certain optimizations). These can then be
             | changed by microcode updates.
        
           | im3w1l wrote:
           | It's a lot easier to more or less accidentally create
           | something quirky than it is to create a second quirk-for-
           | quirk compatible system.
        
         | bonzini wrote:
         | On and off over the last year I have been rewriting QEMU's x86
         | decoder. It started as a necessary task to incorporate AVX
         | support, but I am now at a point where only a handful of
         | opcodes are left to rewrite, after which it should not be too
         | hard to add APX support. For EVEX my plan is to keep the raw
         | bits until after the opcode has been read (i.e. before
         | immediates and possibly before modrm) and the EVEX class
         | identified.
         | 
         | My decoder is mostly based on the tables in the manual, and the
         | code is mostly okay--not too much indentation and phases mostly
         | easy to separate/identify. Because the output is JITted code,
         | it's ok to not be super efficient and keep the code readable;
         | it's not where most of the time is spent. Nevertheless there
         | are several cases in which the manual is wrong or doesn't say
         | the whole story. And the tables haven't been updated for
         | several years (no K register instructions, for example), so
         | going forward there will be more manual work to do. :(
         | 
         | The top comment explains a bit what's going on:
         | https://github.com/qemu/qemu/blob/59084feb256c617063e0dbe7e6...
         | 
         | (As I said above, there are still a few instructions handled by
         | the old code predating the rewrite, notably BT/BTS/BTR/BTC. I
         | have written the code but not merged it yet).
        
           | aengelke wrote:
           | Thanks for the pointer to QEMU's decoder! I actually never
           | looked at it before.
           | 
           | So you coded all the tables manually in C -- interesting,
           | that's quite some effort. I opted to autogenerate the tables
           | (and keep them as data only => smaller memory footprint)
           | [1,2]. That's doable, because x86 encodings are mostly fairly
           | consistent. I can also generate an encoder from it (ok, you
           | don't need that). Re 'custom size "xh"': AVX-512 also has
           | fourth and eighth. Also interesting that you have a separate
           | row for "66+F2". I special case these two (CRC32, MOVBE)
           | instructions with a flag.
           | 
           | I think the prefix decoding is not quite right for x86-64:
           | 26/2e/36/3e are ignored in 64-bit mode, except for 2e/3e as
           | branch-not-taken/taken hints and 3e as notrack. (See SDM Vol.
           | 1 3.3.7.1 "Other segment override prefixes (CS, DS, ES, and
           | SS) are ignored.") Also, REX prefixes that don't immediately
           | preceed the opcode (or VEX/EVEX prefix) are ignored. Anyhow,
           | I need to take a closer look at the decoder with more time.
           | :-)
           | 
           | > For EVEX my plan is to keep the raw bits until after the
           | opcode has been read
           | 
           | I came to the same conclusion that this is necessary with
           | APX. The map+prefix+opcode combination identifies how the
           | other fields are to be interpreted. For AVX-512, storing the
           | last byte was sufficient, but with APX, vvvv got a second
           | meaning.
           | 
           | > Nevertheless there are several cases in which the manual is
           | wrong or doesn't say the whole story.
           | 
           | Yes... especially for corner cases, getting real hardware is
           | the only reliable way to find out, how the CPU behaves.
           | 
           | [1]: https://github.com/aengelke/fadec/blob/master/instrs.txt
           | [2]: https://github.com/aengelke/fadec/blob/master/decode.c
        
             | bonzini wrote:
             | > interesting that you have a separate row for "66+F2"
             | 
             | Yeah that's only for 0F38F0 to 0F38FF.
             | 
             | > Re 'custom size "xh"': AVX-512 also has fourth and eighth
             | 
             | Also AVX for VPMOVSX and VPMOVZX but those are handled
             | differently. I probably should check if xh is actually
             | redundant... _EDIT_ : it's only needed for VCVTPS2PH, which
             | is the only instruction with a half-sized _destination_.
             | 
             | > I think the prefix decoding is not quite right for
             | x86-64: 26/2e/36/3e are ignored in 64-bit mode
             | 
             | Interesting, I need to check how they interact with the
             | FS/GS prefixes (64/65).
             | 
             | > REX prefixes that don't immediately preceed the opcode
             | (or VEX/EVEX prefix) are ignored
             | 
             | Oh, didn't know that!
        
               | aengelke wrote:
               | > how they interact with the FS/GS prefixes (64/65)
               | 
               | For memory operations, they are ignored: 64-2e-65-3e
               | gives 65 as segment override. (From my memory and the
               | resulting implementation, I did some tests with hardware
               | a few years back.)
               | 
               | I do need to check myself how 2e/3e on branches interact
               | with other segment overrides, though.
        
             | jart wrote:
             | FWIW here's all 700 lines of Blink's x86 decoder.
             | https://github.com/jart/blink/blob/master/blink/x86.c
        
               | bonzini wrote:
               | I don't want to be the person that has to add an
               | instruction to blink...
        
         | mananaysiempre wrote:
         | The semantics of LZCNT combined with its encoding feels like an
         | own goal: it's encoded as a BSR instruction with a legacy-
         | ignored prefix, _but_ for nonzero inputs its return value is
         | the operand size minus the return value of the legacy version.
         | Yes, clz() is a function that exists, but the extra subtraction
         | in its implementation feels like a small cost to pay for extra
         | compatibility when LZCNT could've just been BSR with different
         | zero-input semantics.
        
           | bonzini wrote:
           | Yes, it's like someone looked at TZCNT and thought "let's
           | encode LZCNT the same way", but it makes no sense.
        
         | torusle wrote:
         | Another bonus quirk, from the 486 and Pentium area..
         | 
         | BSWAP EAX converts from little endian to big endian and vice
         | versa. It was a 32 bit instruction to begin with.
         | 
         | However, we have the 0x66 prefix that switches between 16 and
         | 32 bit mode. If you apply that to BSWAP EAX undefined funky
         | things happen.
         | 
         | On some CPU architectures (Intel vs. AMD) the prefix was just
         | ignored. On others it did something that I call an "inner
         | swap". E.g. in your four bytes that are stored in EAX byte 1
         | and 2 are swapped.                 0x11223344 became
         | 0x11332244.
        
       | was_a_dev wrote:
       | Off topic, but I like this blog style/layout. I can imagine it
       | isn't everyones taste, but it just works for me
        
         | timmisiak wrote:
         | Glad you like it. I used m10c, with a few tweaks:
         | https://github.com/vaga/hugo-theme-m10c
        
       | waynecochran wrote:
       | Intel architecture is loaded with historical artifacts. The
       | switch in how segment registers were used as you went from real
       | mode to protected mode was an incredible hardware hack to keep
       | older software working. I blame Intel for why so many folks avoid
       | assembly language. I programmed in assembly for years using TI's
       | 84010 graphics chips and the design was gorgeous -- simple RISC
       | instruction set, flat address space, and bit addressable! If
       | during the earlier decades folks were programming using chips
       | with more elegant designs, far more folks would be programming in
       | assembly language (or at least would know how to).
        
         | hajile wrote:
         | > I blame Intel for why so many folks avoid assembly language.
         | 
         | x86 (the worst assembly of any of the top 50 most popular ISAs
         | by a massive margin) and tricky MIPS branch delay slots trivia
         | questions at university have done more to turn off programmers
         | from learning assembly than anything else and it's not even
         | close.
         | 
         | This is one reason I'm hoping that RISC-V kills off x86. It
         | actually has a chance of once again allowing your average
         | programmer to learn useful assembly.
        
           | LegionMammal978 wrote:
           | What do you find particularly problematic about x86 assembly,
           | from a pedagogical standpoint? I've never noticed any glaring
           | issues with it, except for the weird suffixes and sigils if
           | you use AT&T syntax (which I generally avoid).
        
             | timmisiak wrote:
             | I suspect the biggest issue is that courses like to talk
             | about how instructions are encoded, and that can be
             | difficult with x86 considering how complex the encoding
             | scheme is. Personally, I don't think x86 is all that bad as
             | long as you look at a small useful subset of instructions
             | and ignore legacy and encoding.
        
               | LegionMammal978 wrote:
               | True, encoding is one thing that really sets x86 apart.
               | But as you say, the assembly itself doesn't seem that
               | uniquely horrible (at least not since the 32-bit era),
               | which is why I found the sentiment confusing as it was
               | phrased.
               | 
               | Maybe it's the haphazard SIMD instruction set, with every
               | extension adding various subtly-different ways to permute
               | bytes and whatnot? But that would hardly seem like a
               | beginner's issue. The integer multiplication and division
               | instructions can also be a bit wonky to use, but hardly
               | unbearably so.
        
         | bheadmaster wrote:
         | If Terry A. Davis is to be trusted, as long as you ignore the
         | legacy stuff, x64 assembly is nice to work with.
        
         | russdill wrote:
         | What's crazy is that depending on how deep you want to go, a
         | lot of the information is not available in documents published
         | by Intel. Fortunately, if it matters for emulators it typically
         | can be/has been reverse engineering.
        
           | Max-q wrote:
           | Wow, is that true? And the documentation is thousands of
           | pages! I can't understand how Intel keep these processors
           | consistent during development. It must be a really, really
           | hard job.
           | 
           | It was a nice period in history when proper RISC was working
           | well, when we could get stuff to run faster, but getting more
           | transistors was expensive. Now we can't get it to run faster
           | but we can get billions of transistors, making stuff more
           | complicated being the way to more performance.
           | 
           | I wonder if we ever again will get a time where
           | simplification is a valid option...
        
             | russdill wrote:
             | There's some info here:
             | 
             | http://www.rcollins.org/secrets/
             | http://www.rcollins.org/ddj/ddj.html
             | 
             | Notably things like undocumented opcodes, processor modes,
             | undocumented registers, undocumented bits within registers,
             | etc. It's not uncommon to release a specific set of
             | documentation to trusted partners only. Back in the day
             | intel called these "gold books".
        
         | jecel wrote:
         | Wouldn't that be the 34010?
        
           | waynecochran wrote:
           | Yes. TMS 34010 and 34020 ... my bad.
        
       | trollied wrote:
       | > Writing a CPU emulator is, in my opinion, the best way to
       | REALLY understand how a CPU works
       | 
       | Hard disagree.
       | 
       | The best way is to create a CPU from gate level, like you do on a
       | decent CS course. (I really enjoyed making a cut down ARM from
       | scratch)
        
         | whobre wrote:
         | Reading Petzold's "Code" comes pretty close to, though and is
         | easier.
        
         | commandlinefan wrote:
         | OTOH, are you really going to be implementing memory segmenting
         | in your gate-level CPU? I'd say actually creating a working CPU
         | and _then_ emulating a real CPU (warts and all) are both
         | necessary steps to real understanding.
        
           | trollied wrote:
           | I agree.
        
           | monocasa wrote:
           | > OTOH, are you really going to be implementing memory
           | segmenting in your gate-level CPU?
           | 
           | I have, but it was a PDP-8 which I'll be the first to admit
           | is kind of cheating.
        
         | quantified wrote:
         | Well, I think you're both right. It's satisfying as heck to
         | sling 74xx chips together and you get a feel for the electrical
         | side of things and internal tradeoffs.
         | 
         | When you get to doing that for the CPU that you want to do
         | meaningful work with, you start to lose interest in that
         | detail. Then the complexities of the behavior and spec become
         | interesting and the emulator approach is more tractable, can
         | cover more types of behavior.
        
           | IshKebab wrote:
           | I think trollied is correct actually. I work on a CPU
           | emulator professionally and while it gives you a great
           | understanding of the spec there are lots of details about
           | _why_ the spec is the way it is that are due to how you
           | actually implement the microarchitecture. You only learn that
           | stuff by actually implementing a microarchitecture.
           | 
           | Emulators tend not to have many features that you find in
           | real chips, e.g. caches, speculative execution, out-of-order
           | execution, branch predictors, pipelining, etc.
           | 
           | This isn't "the electrical side of things". When he said
           | "gate level" he meant RTL (SystemVerilog/VHDL) which is
           | pretty much entirely in the digital domain; you very rarely
           | need to worry about actual electricity.
        
             | trollied wrote:
             | I write retro console emulators for fun, so agree with you
             | 100% :)
        
         | timmisiak wrote:
         | I think both are useful, but designing a modern CPU from the
         | gate level is out of reach for most folks, and I think there's
         | a big gap between the sorts of CPUs we designed in college and
         | the sort that run real code. I think creating an emulator of a
         | modern CPU is a somewhat more accessible challenge, while still
         | being very educational even if you only get something partially
         | working.
        
           | banish-m4 wrote:
           | This is an illusion and a red herring. RTL synthesis is the
           | typical functional prototype stage reached which is generally
           | sufficient for FPGA work. To burn an ASIC as part of an
           | educational consortium run is doable, but it's uncommon.
        
           | WalterBright wrote:
           | When I was at Caltech, another student in the dorm had been
           | admitted because he'd designed and implemented a CPU using
           | only 7400 TTL.
           | 
           | Woz wasn't the only supersmart young computer guy at the time
           | :-)
           | 
           | (I don't know how capable it was, even a 4 bit CPU would be
           | quite a challenge with TTL.)
        
         | brailsafe wrote:
         | So far on my journey through Nand2Tetris (since I kind of
         | dropped out of my real CS course) I've found the entire process
         | of working my way up from gate level, and just finished the VM
         | emulator chapter which took an eternity. Now onto compilation.
        
         | banish-m4 wrote:
         | Seconded. A microcoded, pipelined, superscalar, branch-
         | predicting basic processor with L1 data & instruction caches
         | and write-back L2 cache controller is nontrivial. Most software
         | engineers have an incomplete grasp of data hazards, cache
         | invalidation, or pipeline stalls.
        
       | fjfaase wrote:
       | Interesting read. I have a lot of respect for people who develop
       | emulator for x86 processors. It is a complicated processor and
       | from first hand experience I know that developing and debugging
       | emulators for CPU's can be very challenging. In the past year, I
       | spend some time developing a very limited i386 emulator [1]
       | including some system calls for executing the first steps of
       | live-bootstrap [2], primarily to figure out how it is working. I
       | learned a lot about system calls and ELF.
       | 
       | [1] https://github.com/FransFaase/Emulator/
       | 
       | [2] https://github.com/fosslinux/live-bootstrap/
        
         | banish-m4 wrote:
         | Most of the complexities lie in managing the various
         | configurations of total system compatibility emulation,
         | especially for timing, analog oddities, and whether to include
         | bugs or not and for which steppings. If you want precise and
         | accurate emulation, you have to have real hardware to validate
         | behavior against. Then there are the cases of what not to
         | emulate and offering better-than-original alternatives.
        
       | Quekid5 wrote:
       | Just as an adjacent aside from a random about learning by doing:
       | 
       | Implementing a ThingDoer is a huge learning experience. I
       | remember doing co-op "write-a-compiler" coursework with another
       | person. We were doing great, everything was working and then we
       | got to the oral exam...
       | 
       | "Why is your Stack Pointer growing upwards"?
       | 
       | ... I was kinda stunned. I'd never thought about that. We
       | understood most of the things, but sometimes we kind of just
       | bashed at things until they worked... and it turned out upward-
       | growing SP did work (up to a point) on the architecture our toy
       | compiler was targeting.
        
       | boricj wrote:
       | It's funny to me how much grief x86 assembly generates when
       | compared to RISC here, because I have the opposite problem when
       | delinking code back into object files.
       | 
       | For this use-case, x86 is really easy to analyze whereas MIPS has
       | been a nightmare to pull off. This is because all I mostly care
       | about are references to code and data. x86 has pointer-sized
       | immediate constants and MIPS has split HI16/LO16 relocation
       | pairs, which leads to all sorts of trouble with register usage
       | graphs, code flow and branch delay instructions.
       | 
       | That should not be constructed as praise on my end for x86.
        
       ___________________________________________________________________
       (page generated 2024-07-10 23:00 UTC)