[HN Gopher] The 100MHz 6502 (2022)
       ___________________________________________________________________
        
       The 100MHz 6502 (2022)
        
       Author : throwup238
       Score  : 151 points
       Date   : 2024-01-27 21:58 UTC (1 days ago)
        
 (HTM) web link (www.e-basteln.de)
 (TXT) w3m dump (www.e-basteln.de)
        
       | pvg wrote:
       | Big previous thread in from 2021:
       | 
       | https://news.ycombinator.com/item?id=28852857
        
       | dmitrygr wrote:
       | There are 300MHz 8051s around in all sorts of unlikely places.
       | Many pocket mp3 players used to be based on them.
        
         | Moto7451 wrote:
         | Is there an advantage to a 300MHz 8051 vs a Cortex M0? Predated
         | their existence?
         | 
         | I know there's a lot of 8051 tooling but I'm only a dabbler in
         | microcontrollers and it seems like AVRs and M0/M3s have taken
         | the place of PICs and 8051s in hobby world.
        
           | dmitrygr wrote:
           | 8051 has no cost to license and if you are mostly using an
           | accelerator to decode mp3, then the servicing of it is simple
           | enough. Why rewrite the code you already have (from before
           | cortex-m0 existed) or redesign the accelerator you already
           | have?
        
           | ddingus wrote:
           | There are a few:
           | 
           | If you need rapid, real time responses to external signals,
           | the faster clocked 8 bitters are excellent! Many chips can
           | get into an interrupt service routine in just a handful of
           | cycles. In tandem with this, these devices can pack a lot of
           | logic into a small amount of code.
           | 
           | From Dallas Semi: Our 1 clock-per-machine-cycle processors
           | reached a remarkable performance goal--1 clock-per-machine-
           | cycle, currently at 33 million instructions per second
           | (MIPS).
           | 
           | From Silicon Labs: The proven 8051 core received a welcome
           | second wind when its architecture lost patent protection in
           | 1998. [...] The original Intel 8051 core took 12 cycles to
           | execute 1 instruction; thus, at 12 MHz, it ran at 1million
           | instructions per second (1 MIP). In contrast, a 100 MHz
           | Silicon Labs 8051 core will run at 100 MIPS or 100 times
           | faster than the classic 8051 at a frequency that is only
           | about 8x the classic 8051's frequency.
           | 
           | That's really fast, when it comes to responding to external
           | events!
           | 
           | Say one needs to read an incoming data stream, or control
           | something moving at a high rate of speed. Both of these tasks
           | depend on a device that can sense and respond in as close to
           | real time as things get.
           | 
           | Large, production proven, time tested code bodies. 6502,
           | 8051, z80, etc... all have a ton of library code that's not
           | difficult to understand and make use of.
           | 
           | Often, these 8 bit designs can run on crazy low power, or
           | operate very efficiently at full speed.
           | 
           | Licensing isn't generally an issue. Adding a well documented
           | and production proven 8 bit core to a specialized design
           | works pretty well! Often, the custom hardware on chip does
           | the heavy lifting, leaving UX and control tasks, both of
           | which are easy and lean enough for 8 bits of CPU to make
           | sense.
           | 
           | The last thing I would put here is subjective, but ease of
           | development can be an advantage, but it depends on the
           | developer. Once someone has bootstrapped themselves onto 8
           | bit computing, the constraints on development both limit
           | possible application scope and with that limit comes ease of
           | development. When used to their strengths, simple chips like
           | these are easy to develop for. It's possible for one person
           | to completely understand a device and with that understanding
           | fully exploit it.
        
             | nullc wrote:
             | I wonder what the limit of computing power per joule is
             | with current technology, assuming you were freely able to
             | change the architecture.
             | 
             | For example, perhaps you wouldn't use an integer
             | instruction pointer, because a full adder to increment it
             | is expensive. Instead you could use a LFSR where each
             | increment requires only a couple xor gates and some wires.
             | But it would mean that your code would have to be scattered
             | in memory in a funny order. No problem for a smart
             | assembler.
             | 
             | How much computing power could you drive from a device
             | powered by nothing other than ambient RF?
        
               | dmitrygr wrote:
               | The thing you describe about replacing PC with an LFSR
               | has actually been done, to simplify the silicon. Some
               | very cheap 4-bit micro controllers, often used for TV
               | Remotes, in fact, do this.
        
               | ddingus wrote:
               | I think of this too. And I break it down into ads and
               | operations. An ad is simply summing two inputs of some
               | kind. In operation, might be a change of state or an
               | input coming online or going away. Something analogous to
               | the bit operations and or not exclusive or and friends.
               | 
               | With all the physics talk of information being
               | fundamental, I suspect we will find both an upper bound
               | and a lower bound.
               | 
               | The upper bound will be something like compute power per
               | volume divided by energy or some similar construct.
               | Basically you can only pack so much information and so
               | many operations into a given region of space and energy
               | level possible for that energy for that region of space
               | to contain.
               | 
               | The lower bound might be something like the plank
               | constant for computation. What's the smallest unit of
               | space and energy level that can support an add, for
               | example. It's interesting to think about.
               | 
               | Sorry for the typos I used voice dictation on this one
        
               | crotchfire wrote:
               | ... and adding an integer to a pointer would be
               | hellaciously expensive.
               | 
               | If you have the foundry SPICE models you can calculate
               | these kinds of lower-limit values. I did this for 45nm a
               | long time ago and vaguely recall getting numbers down in
               | the double-digit femtojoule range for 32-bit addition,
               | measuring only transistor R+C.
               | 
               | But the transistors don't really cost anything; all the
               | CV^2 is in charging and discharging the wires. And the
               | "C" is totally geometry-dependent. It's not like software
               | -- at least not when you're pushing all the limits --
               | everything affects everything else.
        
               | nullc wrote:
               | > ... and adding an integer to a pointer would be
               | hellaciously expensive.
               | 
               | Fair, though if it were only the PC and instruction
               | memory that were permuted that isn't much of an issue.
               | 
               | It's not _that bad_ the circuit looks more like a
               | multiplier rather than an adder. (search term would be
               | LFSR fast-forward or jump-ahead).
               | 
               | PC is updated presumably on every cycle, while adding an
               | an integer to it is probably a rare operation (just don't
               | use computed jumps...).
        
               | IIAOPSW wrote:
               | >I wonder what the limit of computing power per joule is
               | 
               | Today you're going to learn that the universe puts a hard
               | limit on this known as "Launder's principle".
               | 
               | To derive it in short, the (information) entropy on the
               | input side of a single traditional logic gate is 2 bits,
               | but on the output side it is just 1 bit. This seems to
               | imply that the (physical) entropy of the computer would
               | go down after the computation, because your computer had
               | more possible physical states it could be in at the time
               | of input than it seems to have at the time of output. But
               | this is impossible as it violates the 2nd law of
               | thermodynamics.
               | 
               | To resolve the contradiction, each logic gate must be
               | putting the missing bit of entropy on an untracked non-
               | computational degree of freedom within the physical
               | system. In other words the untracked "missing"
               | information is encoded as seemingly random waste heat,
               | and dumped into the environment at room temperature.
               | 
               | https://en.wikipedia.org/wiki/Landauer%27s_principle
        
               | ComputerGuru wrote:
               | From the linked Wikipedia, as a more direct answer to
               | GP's question:
               | 
               | > Modern computers use about a billion times as much
               | energy per operation [than this theoretical minimum
               | energy per bit of entropy "erased"]
        
               | nullc wrote:
               | well aware, which is why I put the technology limit on
               | it!
        
             | iamflimflam1 wrote:
             | Interesting - I've been playing with some SD Card to USB
             | interface ICs and almost all of them include an 8051 core.
        
             | guenthert wrote:
             | I'd think, if you need rapid, real time response to
             | external signals, you don't use interrupts, since then you
             | usually need it to be deterministic as well. Either use
             | 'nother micro, a spare core, given specialized hardware
             | (e.g. the PRUs in TI's sitara MCUs, the PIO of RP2040,
             | P8X32A) or roll your own (these days probably using FPGAs).
        
               | SomeoneFromCA wrote:
               | You are wrong, as polling always introduces jitter (due
               | indeterminacy of the event arriving moment between the
               | last entry to the polling loop and the comparison
               | operation). If you want really precise timing while
               | reacting to external event is to sit in "halt" state,
               | with interesting interrupts enabled.
        
               | ddingus wrote:
               | Another micro = big BOM cost increase
               | 
               | Another core = overall device cost likely unnecessary.
               | 
               | Both blow the power budget up.
               | 
               | Regarding determinism, relative to what?
               | 
               | An interrupt will, or can be very consistent relative to
               | the signal. Polling jitters relative to the same signal.
               | That may not be desirable.
               | 
               | Now, I did see you mention a Propeller chip!
               | 
               | The first chip worked as you describe and it is
               | beautiful.
               | 
               | There are good reasons why interrupts were added to the
               | second generation chip. They are mostly the reasons found
               | in this discussion.
        
           | pclmulqdq wrote:
           | M0s have taken the place of many 8051s in the "pro" world as
           | well. There's still the niche that sibling comments have
           | mentioned, but a lot of "default small MCUs" for new projects
           | used to be 8051s and are now M0s.
        
           | addaon wrote:
           | Besides the other comments, you can get 8051s at much lower
           | power than M0s... think 1 picojoule per (8 bit) op vs 10
           | picojoules per (32 bit) op, give or take. It's pretty common
           | to see 8051s in the low power zone of microcontrollers that
           | also have one or more 32 bit cores on them. Generally the low
           | power zone (including the 8051) can be run off an external
           | clock (so 25 MHz - 100 MHz) in the 1 mW range, or can be run
           | off an RC oscillator at a lower speed (like 7 +- 3 MHz) in
           | the 100 uA range, both of which usefully extend the ability
           | of the system to monitor for wake events and decide when to
           | bring those Arm big boys on line. Some can even take the 8051
           | down to your 32 kHz real time clock for < 40 uA operation.
        
         | userbinator wrote:
         | The Z80 was common in MP3/"MP4" players too.
        
       | Dwedit wrote:
       | Throwing a cache into a system that never had a cache before can
       | be quite tricky.
       | 
       | You could have these kinds of memory pages:
       | 
       | * Fixed ROM bank
       | 
       | * Bankswitchable ROM bank
       | 
       | * Fixed RAM bank
       | 
       | * Bankswtichable RAM bank
       | 
       | * IO memory
       | 
       | * RAM that's read by external devices
       | 
       | * RAM that's written to by external devices (basically just IO)
       | 
       | Caching is _trivially easy_ for fixed a ROM or RAM bank which are
       | not used by other devices. Caching a bankswitchable bank requires
       | either invalidating on bankswitch, or knowing the bank switching
       | well enough to just cache everything. Pure IO memory is simple,
       | no caching for that at all. For RAM that 's read by other
       | devices, Write-Through caching would work.
        
         | rbanffy wrote:
         | The article mentions the bank-switching issues and that the
         | FPGA only has 64K, which limits emulation of higher memory
         | configs - it'd not emulate a //e with 80-column display (which
         | requires 128K).
        
           | bpye wrote:
           | You could plausibly make bank switching work, but it'd take
           | some effort. You'd want your block RAM to act as a write back
           | cache, and then any bank switch must be intercepted and
           | delayed until you can flush the full contents of the cache to
           | memory.
           | 
           | Or if bank switching is fast and occurs too frequently for
           | that to be viable, you could avoid the flush across a bank
           | switch, but then you may need to perform a bank switch for an
           | eviction.
        
         | masfuerte wrote:
         | The BBC Master had an even funkier mode. The RAM bank accessed
         | could depend on the current program counter.
         | 
         | Imagine a video display taking 16K of RAM. This would be
         | situated between addresses 0x4000 and 0x8000. This same memory
         | range also included non-video RAM. The hardware transparently
         | selected the video or non-video RAM depending on which code was
         | accessing it.
         | 
         | Specifically, if the program counter was at 0xC000 or above
         | (i.e. code in the OS ROM was running) then accesses to the
         | video range would go to the video RAM. But if the program
         | counter was elsewhere (i.e. running user code or an application
         | ROM) then accesses to the video range would go to user RAM, not
         | video RAM.
         | 
         | Additionally, there was a hardware register controlling this so
         | that user code could choose to directly access the video RAM,
         | and OS code could access the user RAM.
        
           | forinti wrote:
           | I've been looking for a detailed explanation on this, because
           | I would like to know how they read the PC.
           | 
           | The New Advanced User's Manual describes the flag at &FE34 on
           | the Master and B+ and I've read this thing about using the PC
           | from a few places but I haven't found any specifics.
           | 
           | Could you clarify? How do you know from the outside what's in
           | the PC?
        
             | ddingus wrote:
             | Would they not just mooch the address off the address bus?
        
               | morcheeba wrote:
               | Yep. The 65C02 has a SYNC output that goes high to
               | indicate an instruction is being fetched on the current
               | cycle. Since there is no cache, it's pretty simple to use
               | this to determine the PC.
        
               | ddingus wrote:
               | Thought so. From there, it is a bit of logic to map the
               | chip selects to make the writes and reads come and go
               | from the intended resources.
        
               | forinti wrote:
               | That's it then. I was missing the sync pin. Thank you.
        
           | flohofwoe wrote:
           | There were also 8-bit home computers (like the Amstrad CPC
           | and C64) where a different memory bank would be accessed
           | depending on whether the CPU would do a read or write access,
           | e.g. a read would access a ROM bank, but a write would access
           | a RAM bank at the same address (usually called 'shadow ROM').
        
             | rjeli wrote:
             | Huh? How do you read from the ram after you write to it?
        
               | vidarh wrote:
               | You bank switch then. A common use would be to e.g. be
               | able to use the ROM to load code into the RAM under the
               | ROM and _then_ bank switch, but also e.g. for extensions
               | where you might want to first copy the ROM into the
               | underlying ROM, and then patch whatever changes you
               | wanted into before bank switching.
        
               | flohofwoe wrote:
               | And sometimes you also don't need to read the written
               | data back (for instance when writing to video memory).
        
         | okl wrote:
         | A minimum solution could be an instruction fetch buffer that
         | memorizes the last N instructions (maybe even after decoding?)
         | to alleviate pipeline bubbles and when jumping back.
        
       | satiric wrote:
       | I'd be surprised if 6502-based computers really ran OK at 100MHz:
       | surely you'd run into EMI or timing issues when using the same
       | motherboard at 100 times the original clock speed?
        
         | rbanffy wrote:
         | It slows down to access the bus when needed. Memory access runs
         | at full speed all the time as the memory is inside the FPGA.
        
         | Someone wrote:
         | This doesn't do that. It runs an FPGA-built 6502 with 64kB of
         | RAM at 100MHz. The FPGA knows which memory addresses it has to
         | read and write to the actual memory of the system it's plugged
         | in, and, when needed, accesses that at the speed the system
         | expects.
        
           | satiric wrote:
           | Oh I see, that's neat. That makes way more sense that what I
           | was thinking.
        
       | bitwize wrote:
       | People write games for the TI-99/4A in TI BASIC (or Extended
       | BASIC) that would be too slow to be any fun on the original
       | hardware, but flip on Classic99's Turbo mode and suddenly you
       | have arcade action!
       | 
       | I can see an upgrade like this enabling games, demos, and other
       | software that wouldn't be possible on the stock Apple or
       | Commodore systems.
        
         | ddingus wrote:
         | It does not take 100Mhz to do that.
         | 
         | A while back I bought a FastChip for my Apple 2e. That delivers
         | a 16Mhz 65C02 or 65C816 (I bought the latter option)
         | 
         | The trick to getting superfast Applesoft is to copy it into the
         | card fast RAM. Otherwise, Applesoft is still faster than stock,
         | but main board RAM is still clocked at 1Mhz. Not enough of a
         | boost to really matter.
         | 
         | However, once Applesoft is on the card, the situation is
         | reversed! All accesses to main board RAM are 1Mhz, but that is
         | plenty fast to draw a ton of graphics. Applesoft programs run
         | crazy fast when the 16Mhz 6502 has the job.
        
         | leptons wrote:
         | There are demos written for the SuperCPU 20mhz accelerator for
         | the C64. There are also demos written for the Ram Expansion
         | Unit. I think there was one recent demo released that runs in
         | an emulator with no throttling so it's something like a 40MHz
         | C64.
         | 
         | One interesting demo combines 4 Commodore 64s to run one demo,
         | called "Quad Core"
         | 
         | https://youtu.be/B4UBlpTucFc?si=a1irvH7CRYhETnk9
        
           | ddingus wrote:
           | The Quad one is neat!
        
       | nubinetwork wrote:
       | Just throwing it out there, but I think you can still buy eZ180's
       | that run at something like 133mhz.
       | 
       | Edit: hmm nope, mouser and digikey don't have them anymore...
       | fastest I could find was 50mhz and it was marked as not for new
       | designs. Bummer.
        
       | Theodores wrote:
       | I wish I could go back in time to run Mandelbrot fractals on a
       | BBC Micro with this - think how impressed people would be with
       | instant zoom rather than a half hour wait!
        
         | jacquesm wrote:
         | Part of the magic was the half hour wait though. It felt as
         | though you were doing some serious computation!
        
           | ddingus wrote:
           | Same with things like sine plots.
           | 
           | The reveal is part of the fun.
           | 
           | I just had a thought about what might be really fun at
           | 100Mhz, and that is cellular automata. There are the classic
           | game of life rules. Going fast on those is fun.
           | 
           | But, maybe a more general engine is worth writing. I may try
           | it on my 16Mhz 6502 system.
        
       | Lio wrote:
       | This sounds like it would be really suitable for a BBC Micro
       | second processor.
       | 
       | They had designs for, amongst other, a 64K 65C02 running at a
       | different clock speed[1].
       | 
       | Back in the day I always wanted one for playing Elite[2] (but
       | then I also wished that Acorn had provided an official hardware
       | update for 16 colours instead of 8.)
       | 
       | 1. https://en.wikipedia.org/wiki/BBC_Micro_expansion_unit
       | 
       | 2. https://www.bbcelite.com/6502sp/
        
         | jacquesm wrote:
         | It'd be amazing to have it replace the primary one. It would be
         | so fast you'd not need the tube at all. Though, with the tube
         | you probably would have fewer timing issues to contend with. I
         | wonder if the Elite code would seamlessly adapt to being run
         | that much faster.
        
       | indrora wrote:
       | I'm curious how competitors in the "tiny FPGA" market are going
       | to affect things.
       | 
       | I'd love to see this rebuilt not on a Xilinx but something like
       | the Gowin GW1N FPGAs:
       | https://www.gowinsemi.com/en/product/detail/46/
        
       | jacquesm wrote:
       | That's still such an amazing accomplishment. Look at the bottom
       | of the board how densely packed it is, this is nothing short of
       | jewelry.
        
       | syngrog66 wrote:
       | hopefully someone forwards to Woz, he might get a kick out of it
        
       ___________________________________________________________________
       (page generated 2024-01-28 23:01 UTC)