[HN Gopher] The Intel 8088 processor's instruction prefetch circ...
___________________________________________________________________
The Intel 8088 processor's instruction prefetch circuitry: a look
inside
Author : matt_d
Score : 86 points
Date : 2024-03-23 18:17 UTC (4 hours ago)
(HTM) web link (www.righto.com)
(TXT) w3m dump (www.righto.com)
| kens wrote:
| Author here for your 8086/8088 questions...
| h2odragon wrote:
| > The 8086/8088 do not provide consistency ... self-modifying
| code can be used to determine the queue length, distinguishing
| the 8086 from the 8088 in software.
|
| I sincerely doubt I was the first to work that out; but I
| remember being so incredibly happy when _I_ figured that one out,
| when it solved a problem i had.
|
| Cannot now recall why the difference was significant, something
| about installing different routines for bashing serial ports i
| think.
| kens wrote:
| Neat! Eventually, Intel added the CPUID instruction so you
| could determine the processor type without trickery.
| temac wrote:
| IIRC it was added on Pentium and maybe late 486. You had to
| do classic tricks to identify the model before that.
| userbinator wrote:
| I beieve in the Pentium, the prefetch queue became snooped,
| which coincides nicely with the introduction of CPUID.
| convolvatron wrote:
| it also may have something to do with why CPUID is
| strongly serializing? that always really confused me..its
| not like, the CPU type is going to race with a load or a
| store
| phire wrote:
| I suspect the serialising is just a side effect of the
| CPUID instruction being implemented as a massive
| microcode routine.
| adrian_b wrote:
| In Pentium the cache memory became split into instruction
| cache and data cache.
|
| This has forced the introduction of the snooping
| workaround, otherwise the stores into the data cache
| would not have influenced the content of the instruction
| cache.
| accrual wrote:
| Indeed, I have a 486 DX-50 without CPUID, and a 486 DX2-66
| with CPUID support. The latter provides much more detail
| when viewed in CPU-Z.
| adrian_b wrote:
| CPUID has been added first in Pentium (66 MHz), in 1993.
|
| Nevertheless, there have been some late variants of 486
| that have been introduced after the first Pentium, in 1994
| or later, and which had CPUID, e.g. the Intel 486DX4 (100
| MHz).
|
| AMD had 2 generations of 486DX4 (and of 486DX2), the first
| did not have CPUID (and it had a write-through cache
| memory), while the second had CPUID (and it had a write-
| back cache memory).
|
| Some Cyrix CPUs with properties intermediate between 486
| and Pentium had CPUID, but it was disabled by default and
| it could be enabled in the BIOS.
|
| Measuring the length of the prefetch queue was the standard
| method to identify 8088 vs. 8086 and this was available in
| several commercial CPU detection utilities that were
| available for MS-DOS, e.g. in Norton Utilities or the like.
|
| At that time I have discovered this by disassembling such a
| utility program.
| Agingcoder wrote:
| There were all kinds of tricks prior to cpuid to figure out
| what kind of cpu you were running on . I had actually
| forgotten about that - thanks for reminding me !
| transitionnel wrote:
| Half of what keeps me going is the belief that I can randomly
| wander into this forum and witness black magic being tossed
| around like spare change. ;D
| vardump wrote:
| In the era before internet, you either figured things out on
| your own or you couldn't get anything done.
|
| If you were very lucky, some magazine might have mentioned it.
| Another way out was to just use disassembler if some other
| software package performed the same thing.
| mananaysiempre wrote:
| You still do on occasion. Some years ago I've tried to write
| a precisely timed bitbanging loop on a STM8 microcontroller
| (a 650x/680x-alike with an allegedly 3-stage pipeline and a
| small prefetch buffer; cents per chip pre-Covid), and the
| instruction prefetch completely screwed me up. (At least I
| think it was the instruction prefetch? The thing depended on
| branch target alignment, whatever it was.) The one relevant
| question on Stack Overflow that hasn't received any answers
| in years; the manufacturer documentation mentions its
| instruction timings are "simplified", aka a lie, and gives a
| more elaborate model of the pipeline that it admits is also a
| lie and can't reproduce the timings I'm seeing.
|
| Many things _are_ immeasurably easier than what I remember as
| a middle-schooler with an utterly anachronistic 286 in post-
| Soviet early 2000s Moscow, so that's nice. It doesn't make
| the blasted loop work, though.
|
| (Many others are also worse. Today's me could work the
| motherboard design of the 286 by looking at it, even without
| the manuals; my current laptop's manufacturer's refusal to
| release the schematics annoys me enough that I've half a mind
| to ask some physicists if they have a CT machine they could
| run the board through.)
| vardump wrote:
| Oh yeah. But way back then it was a lot more you had to
| figure out on your own.
|
| I remember coding a game on C64 in the eighties. Just to
| figure out how to print the players _score_ so that it is
| sufficiently fast was a challenge. Dividing by 10 with
| modulo to convert numbers to digits was just way too slow.
|
| My method was not to use normal math, but to directly
| manipulate screen RAM characters when the score increased.
|
| That was a very cheap way to increase the players score by
| say 1000 - you didn't even have to care about 3 lowest
| digits, just inc thousands place by 1, if it overflowed
| past 9, increase next position left, etc.
| jecel wrote:
| A 4116 DRAM chip with a 250 ns access time would have a 500 ns
| cycle time, so could read or write 2 million words per second.
| The Acorn people called this a 2 MHz memory. You could have
| shorter cycle times in "column mode" where you pulsed the /CAS
| signal only to access words in the same row and the Acorn
| designers made good use of that for the Electron (reading 8 bits
| from a 4 bit wide memory) and the ARM based Archimedes (fetching
| instructions other than jump, load or store as well as loading
| the video and audio FIFOs).
|
| From one DRAM generation to the next the cycle time might be
| different for chips with the same access time. This allowed me to
| make the 512KB Macintosh faster than the 128KB one
| (http://www.merlintec.com/lsi/mac512.html).
___________________________________________________________________
(page generated 2024-03-23 23:00 UTC)