Post AWDThisTp1YFilLWfA by joosteto@mamot.fr
 (DIR) More posts by joosteto@mamot.fr
 (DIR) Post #AWDPmWP3EKvYf9c7RQ by lore@berserker.town
       2023-05-31T13:30:08Z
       
       0 likes, 0 repeats
       
       slowly beginning to learn that calling literally anything from math.h in your inner loops is disastrous for performance, even with the so-called -ffast-math flag enabled.and they are never ever faster than a conditional expression. they're not normal math as far as the CPU is concerned, even when the compiler replaces them with intrinsics.the safe operations are add, subtract and multiply. divide is a bit slower, but nowhere near as slow as anything from math.h.in GCC/Clang, you can help the branch predictor by defining these macros:#define unlikely(cond) \    __builtin_expect((cond), 0)#define likely(cond) \    __builtin_expect((cond), 1)they're used like this:if(likely(i == 1)) {    // likely outcome} else if(unlikely(i == 2)) {   // unlikely outcome}it can help the compiler to generate code that doesn't contain unnecessary jumps that could stall the CPU pipeline.
       
 (DIR) Post #AWDThisTp1YFilLWfA by joosteto@mamot.fr
       2023-05-31T14:14:04Z
       
       0 likes, 0 repeats
       
       @lore Would have expected fmod to be about as slow as a divide, and surely ceil and floor are faster than divide?
       
 (DIR) Post #AWDU4aU3CGYqLcp2S8 by lore@berserker.town
       2023-05-31T14:18:12Z
       
       0 likes, 0 repeats
       
       @joosteto i haven't compared the specific pairing of fmod and divide. fmod probably isn't the slowest thing in math.h, but at best, it's only as fast as a divide.i had one place in my code where i was using a statement liket -= floorf(t);to keep t wrapping back to 0 when it became 1 or greater.turns out it was faster to write:if(t >= 1) {    t -= 1;}
       
 (DIR) Post #AWDUVLx9KUaP6ICCdU by lore@berserker.town
       2023-05-31T14:23:03Z
       
       0 likes, 0 repeats
       
       @joosteto another surprising thing i'm learning is how little it costs to just fill a buffer with a computation you're going to use later to fill other buffers. as long as these buffers are small enough to fit in L1 cache, they're basically as fast as register writes, it seems.
       
 (DIR) Post #AWDUjwxSXhZLvKIjGS by lore@berserker.town
       2023-05-31T14:25:41Z
       
       0 likes, 0 repeats
       
       @joosteto i think that sort of thing also helps the CPU to pipeline, since it's a long series of simple identical computations where only the inputs vary.