fsebugoutzone.org:9999

       Post AWDThisTp1YFilLWfA by joosteto@mamot.fr
 (DIR) More posts by joosteto@mamot.fr
 (DIR) Post #AWDPmWP3EKvYf9c7RQ by lore@berserker.town
       2023-05-31T13:30:08Z
       
       0 likes, 0 repeats
       
       slowly beginning to learn that calling literally anything from math.h in your inner loops is disastrous for performance, even with the so-called -ffast-math flag enabled.and they are never ever faster than a conditional expression. they&#39;re not normal math as far as the CPU is concerned, even when the compiler replaces them with intrinsics.the safe operations are add, subtract and multiply. divide is a bit slower, but nowhere near as slow as anything from math.h.in GCC/Clang, you can help the branch predictor by defining these macros:#define unlikely(cond) \    __builtin_expect((cond), 0)#define likely(cond) \    __builtin_expect((cond), 1)they&#39;re used like this:if(likely(i == 1)) {    // likely outcome} else if(unlikely(i == 2)) {   // unlikely outcome}it can help the compiler to generate code that doesn&#39;t contain unnecessary jumps that could stall the CPU pipeline.
       
 (DIR) Post #AWDThisTp1YFilLWfA by joosteto@mamot.fr
       2023-05-31T14:14:04Z
       
       0 likes, 0 repeats
       
       @lore Would have expected fmod to be about as slow as a divide, and surely ceil and floor are faster than divide?
       
 (DIR) Post #AWDU4aU3CGYqLcp2S8 by lore@berserker.town
       2023-05-31T14:18:12Z
       
       0 likes, 0 repeats
       
       @joosteto i haven&#39;t compared the specific pairing of fmod and divide. fmod probably isn&#39;t the slowest thing in math.h, but at best, it&#39;s only as fast as a divide.i had one place in my code where i was using a statement liket -= floorf(t);to keep t wrapping back to 0 when it became 1 or greater.turns out it was faster to write:if(t &gt;= 1) {    t -= 1;}
       
 (DIR) Post #AWDUVLx9KUaP6ICCdU by lore@berserker.town
       2023-05-31T14:23:03Z
       
       0 likes, 0 repeats
       
       @joosteto another surprising thing i&#39;m learning is how little it costs to just fill a buffer with a computation you&#39;re going to use later to fill other buffers. as long as these buffers are small enough to fit in L1 cache, they&#39;re basically as fast as register writes, it seems.
       
 (DIR) Post #AWDUjwxSXhZLvKIjGS by lore@berserker.town
       2023-05-31T14:25:41Z
       
       0 likes, 0 repeats
       
       @joosteto i think that sort of thing also helps the CPU to pipeline, since it&#39;s a long series of simple identical computations where only the inputs vary.