[HN Gopher] Memory and ILP handling in 2D convolutions
___________________________________________________________________
Memory and ILP handling in 2D convolutions
Author : pdziepak
Score : 27 points
Date : 2024-07-20 12:34 UTC (10 hours ago)
(HTM) web link (riemani.ca)
(TXT) w3m dump (riemani.ca)
| epistasis wrote:
| Interesting article, thanks, IMHO mostly for the low level
| performance analysis.
|
| When it comes to actual computation of convolutions, the fast
| Fourier transform should at least be mentioned, even if in
| passing. Early in grad school I peaked at the source for R's
| density() function, and was blown away that it was using FFT, and
| that I had not picked up that trick in my math classes (or maybe
| I had just forgotten it...)
|
| For a 2d example:
|
| https://stackoverflow.com/questions/50453981/implement-2d-co...
|
| And a recent HN thread that was very good:
|
| https://news.ycombinator.com/item?id=40840396
| toxik wrote:
| ILP is instruction-level parallelism, if you had a hard time
| remembering like me.
| SkiFire13 wrote:
| I was thinking of Integer Linear Programming when I saw the
| title. Just another example of why acronyms are bad.
| imtringued wrote:
| As cool as this is, I can't help but think how pointless the goal
| itself is.
|
| XDNA 2 will have 12 TFLOPs, roughly matching the 96 core
| Threadripper Pro 7995WX at a much lower price point.
| bee_rider wrote:
| These sort of computations generally just get fed bigger inputs
| as compute gets better.
|
| Also, plenty of threadrippers exist out there already, if you
| get access to some cluster, it might have whatever type of chip
| in it. If I have access to a cluster with many 7995's, I don't
| really care too much about what's available on the consumer
| side.
| mratsim wrote:
| Years ago I started a collection of convolution optimization
| resources: https://github.com/mratsim/laser/wiki/Convolution-
| optimisati...
|
| Also checked and apparently Nvidia Cutlass now supports generic
| convolutions: https://github.com/NVIDIA/cutlass
___________________________________________________________________
(page generated 2024-07-20 23:09 UTC)