[HN Gopher] Solving Machine Learning Performance Anti-Patterns: ...
___________________________________________________________________
Solving Machine Learning Performance Anti-Patterns: A Systematic
Approach
Author : briggers
Score : 46 points
Date : 2021-07-19 10:57 UTC (1 days ago)
(HTM) web link (paulbridger.com)
(TXT) w3m dump (paulbridger.com)
| ad404b8a372f2b9 wrote:
| That's interesting and reflects my personal training optimization
| workflow pretty well. Usually I'll check nvidia-smi and ensure I
| have a good GPU util, if not I make sure in order:
|
| * That my batch transfers to VRAM are done in a sensible way in
| the dataloader and don't hide CPU-bound preprocessing
|
| * That my batch size is large enough
|
| * That the model is adequate for the GPU (even convolutional
| models can be better on the CPU for specific sizes)
|
| It's good enough to go from a CPU-bound pattern to a GPU-bound
| one but I don't really get that detailed understanding of the
| spectrum between these so I'm definitely going to try this tool
| in the future, especially since it's so easy to add.
|
| On the subject of optimization tricks, I haven't really found any
| magic bullets, you can't always increase the batch size to get
| 100% util because of the performance implications. FP16 precision
| has never done anything for me, weirdly. My preprocessing is
| never CPU-bound unless I do dumb shit in it so rewriting it in
| cpp would do nothing.
___________________________________________________________________
(page generated 2021-07-20 23:03 UTC)