[HN Gopher] Inside the 8086 processor's instruction prefetch cir...
___________________________________________________________________
Inside the 8086 processor's instruction prefetch circuitry
Author : klelatti
Score : 64 points
Date : 2023-01-02 18:11 UTC (4 hours ago)
(HTM) web link (www.righto.com)
(TXT) w3m dump (www.righto.com)
| rep_lodsb wrote:
| * * *
| userbinator wrote:
| If you search for information on 8086/88-specific features and
| terminology such as BIU/EU, prefetch queue, etc., a surprisingly
| large number of results are Indian. They are apparently still
| quite popular in CS courses there. I wonder if new discrete
| clones are still being made, as Intel stopped making them long
| ago.
| ajross wrote:
| I love these. kens, nitpick on your first footnote:
|
| In fact the Apple II was designed for 4k DRAMs with a whole-cycle
| time of 500ns, which was the low end of the range in 1977. The
| idea is that the machine double pumps a 2MHz memory bus, with
| cycles alternating between the 1 MHz 6502 and the video scan
| circuitry (which doubles as the refresh controller). By the
| advent of 4164 chips, DRAM cycle rates were pushing 4MHz. But Woz
| was stuck with more limited hardware.
|
| Similarly the 8086 bus is multiplexed between address and data, a
| memory access is 4 cycles. So a 4.77 MHz IBM PC was actually
| cycling the DRAM at just 1.2MHz (though with no clever tricks, as
| MDA/CGA had physically separate framebuffers; there the extra
| bandwidth really was wasted).
|
| Which isn't to say that the footnote is wrong, just that "DRAM
| was faster than CPUs" is only part of the story.
| mrlonglong wrote:
| I hate to nitpick but there's an error in the text where it's
| stated that the 8086 could address 4 megabytes with 20 bit
| addresses, but that's actually one megabyte. Otherwise, all
| fascinating reading. My first proper pc was an Amstrad PC1512
| with a NEC V30 (clone of the 8086 with some enhanced
| instructions). Maybe you could take a look at the V30 at some
| point?
| kens wrote:
| Thanks, I've fixed that. As for the NEC V30, the 8086 is likely
| to keep me busy for a long time, but it would be interesting to
| see how the enhanced instructions in the V30 were implemented.
| mrlonglong wrote:
| I've got a NEC V30 coprocessor plugged into my raspberry pi.
| It works, amazingly well enough
| [deleted]
| kens wrote:
| It's nice to see my post about the Intel 8086 here, even if the
| title is totally mangled. Any questions?
| toast0 wrote:
| Footnote 10 ends midsentance... I suspect 'instruction' is the
| missing word at the end
| kens wrote:
| Thanks, I've added the missing line.
| moosedev wrote:
| Hey Ken,
|
| Always enjoy your posts!
|
| > The much-reviled solution was to create a 4-megabyte (20-bit)
| address space consisting of 64K segments,
|
| Did you mean 1-megabyte here? (and again later in the same
| "paragraph", pun intended)
| userbinator wrote:
| There are 4 segments and which one is being used is presented
| on the bus interface (S4/S3) so 4MB of total memory is
| actually possible in an 8086 system, and the 8086
| documentation does explicitly mention that as an
| implementation choice, but I don't think many systems made
| use of that feature; bank-switching seems to have been more
| common.
|
| The reason being that separating the segments into their own
| address spaces would create something more like a Harvard
| architecture, which isn't really wanted in a general-purpose
| computer.
| moosedev wrote:
| That's interesting - I didn't know that the memory
| subsystem could "know" which of the 4 segment registers was
| in use for a given request. With that detail, one could
| certainly call it a 4MB "address space", albeit even more
| of a pain to use than the usual 1MB one :-)
| kens wrote:
| Thanks, I've fixed that.
| zozbot234 wrote:
| That solution actually made quite a bit of sense. What was
| really messy is protected mode in the 80286 where the segment
| address actually indexes into an indirection table, in order
| to address 24 bits (16M) of physical address space. But the
| 8086 had none of that. And the 80386 finally got 32-bit flat
| addressing in protected mode, enabling modern OS's to be
| ported.
| klelatti wrote:
| Hi Ken, Thanks for a great post as usual. I forgot HN can mess
| up titles - hopefully fixed now!
| ajross wrote:
| This shocked me a little:
|
| > the 8088 has a 4-byte prefetch queue instead of a 6-byte
| prefetch queue
|
| My whole professional life, I've been led to believe that an 8088
| is just an 8086 with a tiny state machine bolted in front of the
| bus to split multiword accesses. That it has design differences
| is really surprising. If the prefetch queue is smaller, then...
| what are they putting in that die area? Or did they redo the
| layout in other ways?
| rep_lodsb wrote:
| Intel removed / disabled the last two bytes in the prefetch
| queue because they found that it actually made the CPU faster
| (because of less wasted prefetch cycles on jump instructions).
|
| The 8086 didn't have that problem, because it only started a
| new prefetch when there were two bytes free in the queue, so
| there was enough time for the jump microcode to stop it.
|
| _edit: already mentioned in the article_
| adrian_b wrote:
| This difference between 8088 and 8086 was used by several MS-
| DOS programs to detect the type of the CPU on which they were
| run.
|
| It was easy to measure the length of the prefetch queue by a
| self-modifying program, which replaced the instruction placed 4
| bytes after the current instruction pointer.
|
| On an 8088, the new instruction was executed, while on an 8086
| the old instruction was executed.
|
| Various other such obscure differences (e.g. the value of a
| pushed stack pointer or the result of a shift operation with a
| shift count greater than the register size) were used to
| identify NEC V20 and V30, Intel 80186, 80188, 80286 and 80386,
| because architectural identification means, like the CPUID
| instruction, have been introduced only in Intel Pentium and in
| the late 486 models that were launched after Pentium.
| userbinator wrote:
| The P6 family was when Intel decided to add circuitry to
| detect SMC and flush the queue, effectively making it appear
| to have a queue size of 0. AFAIK all x86 henceforth have been
| the same.
| kens wrote:
| I tried to track this down but I found sources claiming
| this was added to the 486, Pentium, and Pentium Pro (P6).
| So if you know for sure that it's P6, please let me know
| the details.
| robocat wrote:
| > It was easy to measure the length of the prefetch queue by
| a self-modifying program
|
| It became hard on the 386, as linked in the article:
| http://www.rcollins.org/secrets/PrefetchQueue.html
| kens wrote:
| The 8088 die is almost the same as the 8086, but there are a
| few differences. There are two prefetch queue registers instead
| of three. But the 8088 queue registers are slightly more
| complicated because the chip needs to write one byte at a time
| instead of one word. So there's just a small gap in the layout.
| There are various changes in the memory cycle circuitry. And
| there are a few changes to the microcode. Once I finish with
| the 8086, I plan to look at the changes in the 8088 more
| closely.
___________________________________________________________________
(page generated 2023-01-02 23:00 UTC)