[HN Gopher] The Intel 8088 processor's instruction prefetch circ...
       ___________________________________________________________________
        
       The Intel 8088 processor's instruction prefetch circuitry: a look
       inside
        
       Author : matt_d
       Score  : 86 points
       Date   : 2024-03-23 18:17 UTC (4 hours ago)
        
 (HTM) web link (www.righto.com)
 (TXT) w3m dump (www.righto.com)
        
       | kens wrote:
       | Author here for your 8086/8088 questions...
        
       | h2odragon wrote:
       | > The 8086/8088 do not provide consistency ... self-modifying
       | code can be used to determine the queue length, distinguishing
       | the 8086 from the 8088 in software.
       | 
       | I sincerely doubt I was the first to work that out; but I
       | remember being so incredibly happy when _I_ figured that one out,
       | when it solved a problem i had.
       | 
       | Cannot now recall why the difference was significant, something
       | about installing different routines for bashing serial ports i
       | think.
        
         | kens wrote:
         | Neat! Eventually, Intel added the CPUID instruction so you
         | could determine the processor type without trickery.
        
           | temac wrote:
           | IIRC it was added on Pentium and maybe late 486. You had to
           | do classic tricks to identify the model before that.
        
             | userbinator wrote:
             | I beieve in the Pentium, the prefetch queue became snooped,
             | which coincides nicely with the introduction of CPUID.
        
               | convolvatron wrote:
               | it also may have something to do with why CPUID is
               | strongly serializing? that always really confused me..its
               | not like, the CPU type is going to race with a load or a
               | store
        
               | phire wrote:
               | I suspect the serialising is just a side effect of the
               | CPUID instruction being implemented as a massive
               | microcode routine.
        
               | adrian_b wrote:
               | In Pentium the cache memory became split into instruction
               | cache and data cache.
               | 
               | This has forced the introduction of the snooping
               | workaround, otherwise the stores into the data cache
               | would not have influenced the content of the instruction
               | cache.
        
             | accrual wrote:
             | Indeed, I have a 486 DX-50 without CPUID, and a 486 DX2-66
             | with CPUID support. The latter provides much more detail
             | when viewed in CPU-Z.
        
             | adrian_b wrote:
             | CPUID has been added first in Pentium (66 MHz), in 1993.
             | 
             | Nevertheless, there have been some late variants of 486
             | that have been introduced after the first Pentium, in 1994
             | or later, and which had CPUID, e.g. the Intel 486DX4 (100
             | MHz).
             | 
             | AMD had 2 generations of 486DX4 (and of 486DX2), the first
             | did not have CPUID (and it had a write-through cache
             | memory), while the second had CPUID (and it had a write-
             | back cache memory).
             | 
             | Some Cyrix CPUs with properties intermediate between 486
             | and Pentium had CPUID, but it was disabled by default and
             | it could be enabled in the BIOS.
             | 
             | Measuring the length of the prefetch queue was the standard
             | method to identify 8088 vs. 8086 and this was available in
             | several commercial CPU detection utilities that were
             | available for MS-DOS, e.g. in Norton Utilities or the like.
             | 
             | At that time I have discovered this by disassembling such a
             | utility program.
        
           | Agingcoder wrote:
           | There were all kinds of tricks prior to cpuid to figure out
           | what kind of cpu you were running on . I had actually
           | forgotten about that - thanks for reminding me !
        
         | transitionnel wrote:
         | Half of what keeps me going is the belief that I can randomly
         | wander into this forum and witness black magic being tossed
         | around like spare change. ;D
        
         | vardump wrote:
         | In the era before internet, you either figured things out on
         | your own or you couldn't get anything done.
         | 
         | If you were very lucky, some magazine might have mentioned it.
         | Another way out was to just use disassembler if some other
         | software package performed the same thing.
        
           | mananaysiempre wrote:
           | You still do on occasion. Some years ago I've tried to write
           | a precisely timed bitbanging loop on a STM8 microcontroller
           | (a 650x/680x-alike with an allegedly 3-stage pipeline and a
           | small prefetch buffer; cents per chip pre-Covid), and the
           | instruction prefetch completely screwed me up. (At least I
           | think it was the instruction prefetch? The thing depended on
           | branch target alignment, whatever it was.) The one relevant
           | question on Stack Overflow that hasn't received any answers
           | in years; the manufacturer documentation mentions its
           | instruction timings are "simplified", aka a lie, and gives a
           | more elaborate model of the pipeline that it admits is also a
           | lie and can't reproduce the timings I'm seeing.
           | 
           | Many things _are_ immeasurably easier than what I remember as
           | a middle-schooler with an utterly anachronistic 286 in post-
           | Soviet early 2000s Moscow, so that's nice. It doesn't make
           | the blasted loop work, though.
           | 
           | (Many others are also worse. Today's me could work the
           | motherboard design of the 286 by looking at it, even without
           | the manuals; my current laptop's manufacturer's refusal to
           | release the schematics annoys me enough that I've half a mind
           | to ask some physicists if they have a CT machine they could
           | run the board through.)
        
             | vardump wrote:
             | Oh yeah. But way back then it was a lot more you had to
             | figure out on your own.
             | 
             | I remember coding a game on C64 in the eighties. Just to
             | figure out how to print the players _score_ so that it is
             | sufficiently fast was a challenge. Dividing by 10 with
             | modulo to convert numbers to digits was just way too slow.
             | 
             | My method was not to use normal math, but to directly
             | manipulate screen RAM characters when the score increased.
             | 
             | That was a very cheap way to increase the players score by
             | say 1000 - you didn't even have to care about 3 lowest
             | digits, just inc thousands place by 1, if it overflowed
             | past 9, increase next position left, etc.
        
       | jecel wrote:
       | A 4116 DRAM chip with a 250 ns access time would have a 500 ns
       | cycle time, so could read or write 2 million words per second.
       | The Acorn people called this a 2 MHz memory. You could have
       | shorter cycle times in "column mode" where you pulsed the /CAS
       | signal only to access words in the same row and the Acorn
       | designers made good use of that for the Electron (reading 8 bits
       | from a 4 bit wide memory) and the ARM based Archimedes (fetching
       | instructions other than jump, load or store as well as loading
       | the video and audio FIFOs).
       | 
       | From one DRAM generation to the next the cycle time might be
       | different for chips with the same access time. This allowed me to
       | make the 512KB Macintosh faster than the 128KB one
       | (http://www.merlintec.com/lsi/mac512.html).
        
       ___________________________________________________________________
       (page generated 2024-03-23 23:00 UTC)