[HN Gopher] Accelerating Conway's Game of Life Using CUDA
___________________________________________________________________
Accelerating Conway's Game of Life Using CUDA
Author : brendanrayw
Score : 46 points
Date : 2021-06-03 16:24 UTC (1 days ago)
(HTM) web link (brendanrayw.medium.com)
(TXT) w3m dump (brendanrayw.medium.com)
| iseanstevens wrote:
| Need to re-read in detail, but seems like great learning
| material.
|
| I especially liked the line including "with less power comes
| greater simplicity" :)
| brendanrayw wrote:
| Thanks for reading :D
| brendanrayw wrote:
| After having attended a few CUDA workshops at NVIDA's latest GTC,
| I was inspired to continue learning CUDA on my own. To do so I
| decided to build John Conway's famous "Game of Life" and use CUDA
| to accelerate the program. I explore multiple different CUDA
| techniques including managed memory, pinned memory, multiple
| streams, and asynchronous memory transfers.
| IdiocyInAction wrote:
| Nice. I took a CUDA course at uni where I built a neural
| network and a physics simulation. Optimizing them was very
| involved, but ultimately very cool; I learned a ton of stuff.
|
| I'd love to work with CUDA in practice, but there's not that
| many jobs around.
| gtn42 wrote:
| Nice, thanks for sharing your experience!
| rrss wrote:
| this was a fun read, thanks for sharing.
|
| FYI, the transfers from pageable memory almost certainly do not
| go to the storage device in your system, unless you have high
| memory pressure. "pageable" (as a cuda-ism) does mean that the
| buffer _may_ be paged out to storage, but as a result it means
| that (more importantly) even if the buffer is in RAM, the GPU
| cannot access it directly.
|
| so for pageable copies the flow is probably not:
| storage - buffer in RAM - device,
|
| but rather: original buffer in RAM
| (inaccessible to the device) - intermediate buffer in RAM
| (accessible to the device) - device.
|
| also, in several places you use the term 'stack' where I think
| it should just be 'RAM' / main memory.
| brendanrayw wrote:
| Thanks for reading! I appreciate the feedback and the info,
| I'll keep that in mind.
| joe_the_user wrote:
| Thanks for your effort! I really like the idea, it's similar to
| a more ambitious project I'm thinking of. And I do have
| questions
|
| Is your board a giant two-dimensional array in memory?
|
| Are your threads/kernels reading from this array and then
| writing back to it?
|
| Do you do synchronization to make sure reads happen before the
| later rights?
|
| Do you do any verification that your transition happen
| correctly?
|
| Do you have an estimate for time spend in - transfer from
| global GPU memory to each kernel, calculations in the kernel,
| and time spent idling through synchronization (assuming you do
| it).
| jacquesm wrote:
| Interesting. I'm assuming you are familiar with Hashlife? If not
| check it out, it is absolutely amazing how fast it is, and as a
| study in memoization maybe it will inspire you on how you can get
| some more mileage out of your CUDA version.
|
| https://en.wikipedia.org/wiki/Hashlife
| buescher wrote:
| Agreed! The CUDA implementation is nice and I was going to say
| "now do Hashlife" myself. Here's the original paper
| https://www.lri.fr/~filliatr/m1/gol/gosper-84.pdf
___________________________________________________________________
(page generated 2021-06-04 23:01 UTC)