[HN Gopher] The ZedRipper: Part 2
___________________________________________________________________
The ZedRipper: Part 2
Author : zdw
Score : 26 points
Date : 2021-01-02 00:05 UTC (22 hours ago)
(HTM) web link (www.chrisfenton.com)
(TXT) w3m dump (www.chrisfenton.com)
| fentonc wrote:
| If anyone's got questions about my extensive upgrades to the
| world's least useful laptop, fire away
| jhgorrell wrote:
| Thank you for writing this up - as someone who is getting
| started with FPGAs, it is an inspiration.
| fentonc wrote:
| Thanks! The actual logic required for the Z80 core is tiny -
| you could probably make a dual-core version with something
| like this: https://www.tindie.com/products/tinyvision_ai/updu
| ino-v30-lo...
|
| FPGAs are super fun to play around with.
| jhgorrell wrote:
| I ended up getting an Arty-A7 as I wanted to have an
| ethernet port. It arrived a couple weeks ago. Still getting
| the build and toolchain built.
|
| My goal is an array of 6502s - It was when I was
| researching that idea and dev boards I found your posts.
| Lerc wrote:
| What's the minimum workload that can be transferred to another
| processor for a speed gain?
|
| For instance can you do little things like floating point x _y
| + u_ v by bumping the subexpressions to separate units and have
| the parallelism outweigh the communication cost for a net gain.
| fentonc wrote:
| That's an interesting question that I haven't explored much -
| the network on the ZedRipper is a unidirectional synchronous
| ring operating at the full 140 MHz, with a round-trip latency
| of ~32 clocks or so, but the interface exposed to the Z80 is
| a sort of re-targetable serial port (you write an 8-bit
| 'destination' register, and then you push bytes to that
| node). The current buffer depth on the receive side is only a
| single byte, so the sender needs to wait until the
| destination node has read the byte and the credit gets
| returned. Turbo Pascal uses the 'Real48' format for floating
| point - 6 bytes per number - and I believe floating point
| operations take several thousand clock cycles. So in a tight
| loop on both sides, you might transfer a floating point
| number to a neighboring node in ~500 cycles.
|
| Especially if I improved the network a bit - deeper receive
| buffers at a minimum, maybe a simple DMA engine - you could
| probably get it down to <100 cycles to forward a Real48 to a
| neighbor. The performance of emulated floating point on an
| 8-bit CPU is sufficiently bad, and the network performance is
| sufficiently good, that you probably could get away with some
| very fine-grained parallelism that way! When I'm back to
| commuting, I should write an n-body gravity simulator or
| something for it so that there is lots of numerical work to
| spread around, and see how much of a speedup I can get.
| tomcam wrote:
| How fast does Turbo Pascal feel on this machine? I didn't even
| know it ran on a Z-80. Also, you are a straight up beast.
| fentonc wrote:
| I have a real Kaypro 2 computer with a 4MHz Z80 in it, which
| I also use Turbo Pascal on - on the Kaypro, it's perfectly
| usable, but you get used to waiting a few seconds when you're
| loading files, compiling, etc. On the ZedRipper, when things
| are executing out of RAM everything is instantaneous. I think
| the CPU core I'm using is close to cycle-accurate, so it
| probably is ~35x faster than the Kaypro when executing actual
| code.
| tomcam wrote:
| Thank you. I absolutely refuse to get into retro computing.
| I refuse. I'm not going to. So I don't think your amazing
| work has me at all interested. I have enough hobbies, and
| my wife knows it.
___________________________________________________________________
(page generated 2021-01-02 23:01 UTC)