hngopher.com

       [HN Gopher] Solving Machine Learning Performance Anti-Patterns: ...
       ___________________________________________________________________
        
       Solving Machine Learning Performance Anti-Patterns: A Systematic
       Approach
        
       Author : briggers
       Score  : 46 points
       Date   : 2021-07-19 10:57 UTC (1 days ago)
        
 (HTM) web link (paulbridger.com)
 (TXT) w3m dump (paulbridger.com)
        
       | ad404b8a372f2b9 wrote:
       | That's interesting and reflects my personal training optimization
       | workflow pretty well. Usually I'll check nvidia-smi and ensure I
       | have a good GPU util, if not I make sure in order:
       | 
       | * That my batch transfers to VRAM are done in a sensible way in
       | the dataloader and don't hide CPU-bound preprocessing
       | 
       | * That my batch size is large enough
       | 
       | * That the model is adequate for the GPU (even convolutional
       | models can be better on the CPU for specific sizes)
       | 
       | It's good enough to go from a CPU-bound pattern to a GPU-bound
       | one but I don't really get that detailed understanding of the
       | spectrum between these so I'm definitely going to try this tool
       | in the future, especially since it's so easy to add.
       | 
       | On the subject of optimization tricks, I haven't really found any
       | magic bullets, you can't always increase the batch size to get
       | 100% util because of the performance implications. FP16 precision
       | has never done anything for me, weirdly. My preprocessing is
       | never CPU-bound unless I do dumb shit in it so rewriting it in
       | cpp would do nothing.
        
       ___________________________________________________________________
       (page generated 2021-07-20 23:03 UTC)