hngopher.com

       [HN Gopher] Memory and ILP handling in 2D convolutions
       ___________________________________________________________________
        
       Memory and ILP handling in 2D convolutions
        
       Author : pdziepak
       Score  : 27 points
       Date   : 2024-07-20 12:34 UTC (10 hours ago)
        
 (HTM) web link (riemani.ca)
 (TXT) w3m dump (riemani.ca)
        
       | epistasis wrote:
       | Interesting article, thanks, IMHO mostly for the low level
       | performance analysis.
       | 
       | When it comes to actual computation of convolutions, the fast
       | Fourier transform should at least be mentioned, even if in
       | passing. Early in grad school I peaked at the source for R's
       | density() function, and was blown away that it was using FFT, and
       | that I had not picked up that trick in my math classes (or maybe
       | I had just forgotten it...)
       | 
       | For a 2d example:
       | 
       | https://stackoverflow.com/questions/50453981/implement-2d-co...
       | 
       | And a recent HN thread that was very good:
       | 
       | https://news.ycombinator.com/item?id=40840396
        
       | toxik wrote:
       | ILP is instruction-level parallelism, if you had a hard time
       | remembering like me.
        
         | SkiFire13 wrote:
         | I was thinking of Integer Linear Programming when I saw the
         | title. Just another example of why acronyms are bad.
        
       | imtringued wrote:
       | As cool as this is, I can't help but think how pointless the goal
       | itself is.
       | 
       | XDNA 2 will have 12 TFLOPs, roughly matching the 96 core
       | Threadripper Pro 7995WX at a much lower price point.
        
         | bee_rider wrote:
         | These sort of computations generally just get fed bigger inputs
         | as compute gets better.
         | 
         | Also, plenty of threadrippers exist out there already, if you
         | get access to some cluster, it might have whatever type of chip
         | in it. If I have access to a cluster with many 7995's, I don't
         | really care too much about what's available on the consumer
         | side.
        
       | mratsim wrote:
       | Years ago I started a collection of convolution optimization
       | resources: https://github.com/mratsim/laser/wiki/Convolution-
       | optimisati...
       | 
       | Also checked and apparently Nvidia Cutlass now supports generic
       | convolutions: https://github.com/NVIDIA/cutlass
        
       ___________________________________________________________________
       (page generated 2024-07-20 23:09 UTC)