[HN Gopher] Conway's Game of Life on FPGA
___________________________________________________________________
Conway's Game of Life on FPGA
Author : petrohi
Score : 77 points
Date : 2021-06-05 05:29 UTC (17 hours ago)
(HTM) web link (k155la3.blog)
(TXT) w3m dump (k155la3.blog)
| pastrami_panda wrote:
| I love that Conway published a request for proof that infinite
| games existed, or something to that effect. And the proof turned
| out to be known as a Gun structure (iirc). A Gun is essentially a
| starting condition that leads to new projectiles continuously
| getting launched into the world, thus creating infinite (and non
| repeating?) games.
| tantalor wrote:
| Does this have anything to do with the topic?
| lk0nga wrote:
| I G'
| darzu wrote:
| I wonder how this would compare to a GPU accelerated GoL.
| prestonbriggs wrote:
| Golly would crush them all.
| sltkr wrote:
| Both of these approaches lose to a CPU. The state of the art
| algorithm is Hashlife [1], which compresses both time and
| space, and can evaluate billions of generations on a grid of
| trillions cells in milliseconds.
|
| The GPA approach is really efficient at what it does but
| ultimately it doesn't scale. For one, it needs 1 bit per cell
| in the 2D torus, but FPGA have kilobytes or low-megabytes
| amounts of memory. That makes it hard to simulate a 10,000 x
| 10,000 grid, let alone a 1,000,000 x 1,000,000 grid. For two,
| the FPGA explicitly calculates each iteration one-by-one. This
| is pretty fast in the beginning, and it means you can use it to
| calculate a billion iterations in a few seconds or a trillion
| iterations in a few hours, but you can't scale past that.
|
| Hashlife can probably be sped up by GPUs a bit, but it
| processes a symbolic representation and consequently is quite
| suited to CPUs. It spends a lot of its time doing hash table
| lookups (hence the name) which is not a good fit for GPUs and a
| terrible fit for FPGAs.
|
| 1. https://en.wikipedia.org/wiki/Hashlife
| marcodiego wrote:
| Considering that the cell of each next generation can be
| individually calculated in parallel, I don't think a GPU
| implementation would be able to beat it. A GPU can have many
| pipelines and quickly process many "pixels" simultaneously but
| it will only be able to parallelize all of them for very small
| screen sizes.
| daxfohl wrote:
| GPU L0 cache latency IIUC is ~20x higher than CPU. In fact in
| this case I think GPU would have to use L2 cache since the data
| is shared across so many cores, so now you're talking ~50x. So
| even if you get full parallelism of cell computation you can
| plug in the numbers and find it would be far slower than FPGA
| (but still faster than CPU).
|
| I'm not an expert though. Maybe GPUs have some way of
| mitigating the high cache latencies.
| mwest217 wrote:
| I implemented Conway's game of life in VHDL to run on a Xilinx
| FPGA board in my digital electronics class in college -
| interestingly, I think the hardest part was actually the HDMI
| driving.
| ChuckMcM wrote:
| Very nice! I would quibble with this bit: _Verilog initially
| focused on describing the hardware-very close to what could be
| expressed by conventional schematic-and later added general-
| purpose programming elements to create more complex components._
|
| The concept here is 'inference' or 'synthesis' and it is the
| fundamental difference between and HDL and an imperative
| programming language. When you write general purpose statements
| in an HDL, the tools have to infer what sort of hardware would
| give you that behavior, and in a funny twist, the more you lean
| on high level language features the more likely you are to run
| into something that cannot be synthesized into hardware gates.
| Perfectly valid HDL with no valid solution!
| anyfoo wrote:
| I've been working with FPGAs for years (in hobby, at work I'm a
| mere "user" of them), and it always baffled me how poorly matched
| the chosen imperative paradigm of Verilog and VHDL is to them.
|
| I think the idea was to make it look "familiar" to engineers by
| looking like C (Verilog) or Ada (VHDL). But FPGAs are nothing
| like CPUs, and what you end up with instead is an unfitting
| language where you use a whole lot of "common constructs",
| knowing how they will be synthesized into hardware. And worse:
| Practically no good way to do abstraction.
|
| Functional languages are a much, much better match, because
| that's what FPGAs are: Combining functions together. This works
| on higher orders as well, and it works well with polymorphism!
|
| So privately at least, for anything substantial I've since been
| using Clash, which is essentially a Haskell subset translated to
| Verilog or VHDL: https://clash-lang.org
|
| The learning curve is steep, it definitely helped that I was
| already proficient in Haskell. But then the code is so enormously
| concise and modular, and I now have a small library of
| abstractions that I can just reuse (for example, adding AXI4 to
| my designs). It's a joy.
| remexre wrote:
| The Bluespec Language is another Haskell-like that compiles to
| hardware, but I think it's a bit more mature, too. It also has
| a second non-Haskell-like syntax to avoid scaring those used to
| Verilog.
| ohazi wrote:
| There are other options as well: nMigen (python), Chisel
| (Scala), SpinalHDL (Scala), Bluespec (Haskell).
|
| It's worth taking a quick glance at all of them (maybe making a
| toy or two) to see if any of them strike your fancy. Being able
| to do metaprogramming and parameterization in an actual
| programming language rather than a not-quite C-preprocessor
| pass makes a lot of things easier.
|
| Just about anything feels "nicer" to me than VHDL or Verilog,
| but I particularly like nMigen, and find SpinalHDL to be at
| least somewhat readable.
| anyfoo wrote:
| Thanks, I will give them a try. Right now I've done enough
| with Clash to feel very comfortable with it. Having already
| done a lot of Haskell before helps, because I tend to know
| what's available in general. You can use almost the entirety
| of Haskell's functional toolbox, including Monads and Arrows.
| But it certainly won't hurt checking all the other options
| out.
|
| I agree, metaprogramming and parameterization (both enabled
| by polymorphism) is the key, and also what enables me to just
| "plug in" a function that essentially implements e.g. an AXI4
| enabled register, or a whole burst transfer.
|
| I shudder having to do that by hand every time in
| Verilog/VHDL now, tightly coupled to the rest of the logic.
| progbits wrote:
| Fully agreed. As an amateur, just trying to get into the FPGA
| world, my experience was very similar. I've recently discovered
| nMigen [1] (in a very nice series building Motorola 6800 [2])
| which approaches this by using Python and overloading type
| operators, something like the TF dataflow model and that feels
| more natural.
|
| However even for nMigen, and doubly so for Verilog/VHDL, it
| feels very 90s when it comes to engineering practices. The
| tooling is often lacking (module management, automated testing,
| CI and so on), cryptic acronyms are used everywhere as if each
| source code byte cost a LUT, you have to keep various things
| (eg. signal bit widths) in sync in multiple places and many
| more things which make C99 feel modern.
|
| I'll have to introduce Clash, it seems like a big step in the
| right direction - thanks for mentioning it.
|
| [1] https://m-labs.hk/gateware/nmigen/ [2]
| https://www.youtube.com/playlist?list=PLEeZWGE3PwbbjxV7_XnPS...
| ale42 wrote:
| Funny, it looks like the practical work we had to do in our CS
| master class "advanced digital systems design", something like 20
| years ago, on a nowadays archaic XC4013 FPGA... (including the
| VGA part).
___________________________________________________________________
(page generated 2021-06-05 23:00 UTC)