https://probablydance.com/2025/02/08/why-does-integer-addition-approximate-float-multiplication/ Probably Dance I can program and like games Why Does Integer Addition Approximate Float Multiplication? by Malte Skarupke Here is a rough approximation of float multiplication (source): float rough_float_multiply(float a, float b) { constexpr uint32_t bias = 0x3f76d000; return bit_cast(bit_cast(a) + bit_cast(b) - bias); } We're casting the floats to ints, adding them, adjusting the exponent, and returning as float. If you think about it for a second you will realize that since the float contains the exponent, this won't be too wrong: You can multiply two numbers by adding their exponents. So just with the exponent-addition you will be within a factor of 2 of the right result. But this will actually be much better and get within 7.5% of the right answer. Why? [custom_multiply-3] It's not the magic number. Even if you only adjust the exponent (subtract 127 to get it back into range after the addition) you get within 12.5% of the right result. There is also a mantissa offset in that constant which helps a little, but 12.5% is surprisingly good as a default. I should also say that the above fails catastrophically when you overflow or underflow the exponent. I think the source paper doesn't handle that, even though underflowing is really easy by doing e.g. 0.5 * 0. It's probably fine to ignore overflow, so here is a version that just handles underflow: float custom_multiply(float a, float b) { constexpr uint32_t sign_bit = 0x8000'0000; constexpr uint32_t exp_offset = 0b0'01111111'0000000'00000000'00000000; constexpr uint32_t mantissa_bias = 0b0'00000000'0001001'00110000'00000000; constexpr uint32_t offset = exp_offset - mantissa_bias; uint32_t bits_a = std::bit_cast(a); uint32_t bits_b = std::bit_cast(b); uint32_t c = (bits_a & ~sign_bit) + (bits_b & ~sign_bit); if (c <= offset) c = 0; else c -= offset; c |= ((bits_a ^ bits_b) & sign_bit); return std::bit_cast(c); } Clang compiles this to a branchless version that doesn't perform too far off from float multiplication. Is this ever worth using? The paper talks about using this to save power, but that's probably not worth it for a few reasons: 1. Most of the power-consumption comes from moving bits around, the actual float multiplication is a small power drain compared to loading the float and saving the result 2. You wouldn't be able to use tensor cores 3. I don't think you can actually be faster than float multiplication because there are so many edge cases to handle It feels close to being worth it though, so I wouldn't be surprised if someone found a use case. But there is still the question of why this works so well. The mantissa is not stored in log-space, it's just stored in plain old linear space where addition does not do multiplication. But lets think about how to get the exponent from the mantissa. In general how do you get the remaining exponent-fraction from the remaining bits? This is easier to think about for integers where you can get the log2 by determining the highest set bit: log2(20) = log2(0b10100) ~= highest_set_bit(0b10100) = 4 The actual correct value is log2(20)=4.322. The question we need to answer is: How do you get the remaining exponent-fraction, 0.322 from the remaining bits, 0b0100? To make this work for any number of bits we should normalize the remaining bits into the range 0 to 1, which in this case means doing the division 0b0100/float(1 << 4)=0.25. (in general you divide by the highest set bit, which you already had to calculate for the previous step) After we brought the numbers into the range from 0 to 1, you can get the remaining exponent fraction with log2(1+x). In this case it's log2(1+0.25) = 0.322. If you plot y=log2(1+x) for the range from 0 to 1 you will find that it doesn't deviate too far from y=x. So if you just want an approximate solution you might as well skip this step. [log2] And then the mantissa is already interpreted as a fraction on floats, so you also don't have to divide. So the whole operation cancels out and you can just add. You still need to handle 1. The sign bit 2. Overflowing mantissa 3. Overflowing exponent 4. Underflowing exponent Number 1 and 2 also work out naturally using addition because of how floats are represented: * Since the sign bit is the highest bit, overflow is ignored so addition is the same as xor, which is what you want * When the mantissa overflows you end up increasing the exponent, which is what you want. (e.g. 1.5 * 1.5 = 2.25, which has a higher base-2 exponent) Number 3 can be ignored for most floats you care about. Number 4 is the one that required me to write that more complicated version of the code. It's really easy to underflow the exponent, which will wrap around and give you large numbers instead. In neural networks lots of activation functions like to return 0 or close to 0 and when you multiply with that you will underflow the initial function and get very wrong results. So you need to handle underflow. I have not found an elegant way of doing it because you only have a few cycles to work in, otherwise you might as well use float multiplication. The last open question is that mantissa-adjustment: You can see in the graph above that the approximation y=x is never too big, so by default you will always bias towards 0. But you can add a little bias to the mantissa to shift the whole line up. I tried a few analytic ways to arrive at a good constant, but they all gave terrible results when I actually tried this on a bunch of floats. So I just tried many different constants and stuck with the one that gave the least error on 10,000 randomly generated test floats. So even though this is probably not useful and I haven't found a really elegant way of doing it, it's still neat how the whole thing almost works out as a one-liner because so many things cancel out once you approximate y=log2(1+x) as y=x. Share this: * Twitter * Facebook * Like Loading... Related Published: February 8, 2025 Filed Under: Programming Tags: C++ : float Leave a comment Cancel reply [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] This site uses Akismet to reduce spam. Learn how your comment data is processed. << Previous Post Search for: [ ] [Search] Recent Posts * Why Does Integer Addition Approximate Float Multiplication? * Initial CUDA Performance Surprises * How I use LLMs to program * Transform Matrices are Great and You Should Understand Them * Two Kids Put Me on a Two Sleep Schedule Archives * February 2025 * October 2024 * April 2024 * October 2023 * September 2023 * April 2023 * December 2022 * September 2022 * June 2022 * February 2022 * January 2022 * October 2021 * July 2021 * April 2021 * January 2021 * November 2020 * October 2020 * August 2020 * July 2020 * June 2020 * May 2020 * April 2020 * March 2020 * January 2020 * December 2019 * September 2019 * August 2019 * June 2019 * April 2019 * March 2019 * June 2018 * May 2018 * April 2018 * January 2018 * December 2017 * November 2017 * October 2017 * September 2017 * August 2017 * February 2017 * January 2017 * December 2016 * November 2016 * June 2016 * April 2016 * March 2016 * February 2016 * December 2015 * September 2015 * July 2015 * June 2015 * May 2015 * February 2015 * January 2015 * December 2014 * November 2014 * October 2014 * September 2014 * August 2014 * June 2014 * May 2014 * April 2014 * March 2014 * February 2014 * January 2014 * October 2013 * September 2013 * August 2013 * May 2013 * February 2013 * January 2013 * December 2012 * November 2012 * October 2012 * August 2012 * July 2012 * April 2012 * March 2012 * February 2012 * January 2012 * October 2011 * September 2011 * August 2011 * July 2011 * June 2011 * May 2011 Categories * Children * Games * Links * Math * Politics and Economics * Programming * Uncategorized Meta * Register * Log in * Entries feed * Comments feed * WordPress.com [ ] [Search] Blog at WordPress.com. * Comment * Reblog * Subscribe Subscribed + [wpcom-] Probably Dance Join 205 other subscribers [ ] Sign me up + Already have a WordPress.com account? Log in now. * Privacy * + [wpcom-] Probably Dance + Subscribe Subscribed + Sign up + Log in + Copy shortlink + Report this content + View post in Reader + Manage subscriptions + Collapse this bar Loading Comments... Write a Comment... [ ] Email (Required) [ ] Name (Required) [ ] Website [ ] [Post Comment] %d [b]