[HN Gopher] Why is x & -x equal to the largest power of 2 that d...
       ___________________________________________________________________
        
       Why is x & -x equal to the largest power of 2 that divides x?
        
       Author : arun-mani-j
       Score  : 158 points
       Date   : 2024-05-25 14:44 UTC (1 days ago)
        
 (HTM) web link (arunmani.in)
 (TXT) w3m dump (arunmani.in)
        
       | dataflow wrote:
       | Short explanation:
       | 
       | 1. The largest power of 2 that divides x is just 2^(number of
       | trailing zeros in x)
       | 
       | 2. Crucial observation: -x == ~x + 1
       | 
       | 3. ~x flips all the bits of x bits, so none of the bits of ~x
       | match those of x. (i.e. (x & ~x) == 0)
       | 
       | 4. When you do +1, all the trailing 1's flip AGAIN, becoming zero
       | like they were in x. The next highest 0 (say it was the n'th)
       | also flips, becoming 1... like it was in x.
       | 
       | 5. Crucial observation: The n'th 0 did NOT match the
       | corresponding bit in x prior to the increment, therefore it MUST
       | match after the increment. All higher bits stay as-is.
       | 
       | 6. This leaves only the n'th bits matching in x and ~x + 1,
       | isolating the highest power-of-2 divisor of x when you AND them.
        
         | jb1991 wrote:
         | I thought there was just a sign bit. If not, how does a system
         | know if a number should be interpreted as positive or negative?
        
           | dataflow wrote:
           | Signedness doesn't affect anything here, as long as the
           | representation is in two's complement. The negation of x
           | (i.e. -x) in two's complement is ~x + 1. The interpretation
           | of the bits as signed or unsigned doesn't change any of the
           | following steps.
        
             | MaxBarraclough wrote:
             | I think strictly speaking signedness _does_ affect things,
             | assuming we 're talking C/C++. If x has type int and were
             | to take the value of INT_MIN, then evaluating -x would
             | result in a signed arithmetic overflow, which is undefined
             | behaviour. (Related: the C standard now insists on use of
             | 2's complement, where it used to permit other schemes.)
             | 
             | If x had type unsigned int, this wouldn't introduce
             | undefined behaviour.
        
           | elevatedastalt wrote:
           | There's still a sign bit. The sign bit tells you whether to
           | interpret the rest of the number as positive or treat as
           | negative and undo the 2s complement operation.
        
           | eslaught wrote:
           | You're thinking about a sign and magnitude representation,
           | which is not how integers are represented in a modern
           | computer.
           | 
           | The modern version is two's complement. It still has a sign
           | bit, but negating a number involves more than just changing
           | the sign bit since the representation is modular.
           | 
           | https://en.wikipedia.org/wiki/Two%27s_complement
           | 
           | https://en.wikipedia.org/wiki/Modular_arithmetic
        
             | gus_massa wrote:
             | I agree, but I just want to add that floating point numbers
             | use sign and magnitude representation, so the GP may be
             | confused by that.
        
           | cvz wrote:
           | In the "two's complement" representation, there is a sign
           | bit, but the meaning isn't "invert this number", it's
           | "subtract a large, fixed power of 2 from this number".
           | 
           | The reason this has become the most common representation for
           | signed numbers in computer hardware is that it makes the
           | difference between signed and unsigned numbers basically
           | irrelevant as far as the hardware is concerned. When the
           | difference does matter (which it does in division,
           | conversions between different word sizes, conversions to/from
           | text, and sometimes multiplication), there are two different
           | instructions for signed and unsigned, and it's up to the
           | programmer or compiler to pick the right one.
        
             | thaumasiotes wrote:
             | > n the "two's complement" representation, there is a sign
             | bit, but the meaning isn't "invert this number", it's
             | "subtract a large, fixed power of 2 from this number".
             | 
             | This is only true if the size of your modulus is fixed. In
             | fact, there is a "sign extension" command allowing you to
             | produce, for example, signed 128-bit values from signed
             | 64-bit values, and this basically requires interpreting
             | two's complement values the same way they represent
             | themselves: a three-bit -1 is just the sum of the positive
             | values +1, +2, and +4. To extend that to six bits, you add
             | the positive values +8, +16, and +32.
             | 
             | The meaning of the high bit in our six-bit scheme shouldn't
             | be viewed as "when this bit is set, subtract 32 from the
             | value instead of adding it". It is "whether this bit is set
             | or not, all higher bits in the number, the ones that don't
             | exist in the hardware, are equal to it"; under this
             | interpretation, the 8-, 16-, and 32- bits were already set
             | in the three-bit value, and that's why they continue to be
             | set when we do a sign extension.
             | 
             | Every place value is still positive, but as long as there
             | is no _highest_ set bit, the number itself will be
             | negative, and equal to the value you 'd calculate from its
             | representation as a geometric series.
        
               | kbolino wrote:
               | This is true mathematically but simple to implement in
               | practice. If the number is signed, shifts to the right
               | and widening casts fill the new bits with a copy of the
               | sign bit. If it's unsigned, shifts to the right and
               | widening casts fill the new bits with zeroes. Shifts to
               | the left and narrowing casts work the same for signed and
               | unsigned.
        
           | pests wrote:
           | 2s compliment helps with the problem of otherwise having a
           | +0/-0 and lets you subtract numbers just with adders.
        
           | umanwizard wrote:
           | Assuming 4-byte integers,
           | 
           | 0000 is 0
           | 
           | 0001 is 1
           | 
           | 0010 is 2
           | 
           | 0011 is 3
           | 
           | 0100 is 4
           | 
           | 0101 is 5
           | 
           | 0110 is 6
           | 
           | 0111 is 7
           | 
           | 1111 is -1
           | 
           | 1110 is -2
           | 
           | 1101 is -3
           | 
           | 1100 is -4
           | 
           | 1011 is -5
           | 
           | 1010 is -6
           | 
           | 1001 is -7
           | 
           | 1000 is -8
           | 
           | The highest bit is always 1 when the sign is negative.
        
             | umanwizard wrote:
             | Too late to edit, but I meant 4-bit, not 4-byte.
        
         | mabster wrote:
         | Thanks! I've used the trick and derived why it works a few
         | times, but always forget.
         | 
         | I battled to understand the post. Your explanation got me there
         | very quickly.
        
           | dataflow wrote:
           | Sure thing, happy to hear!
        
         | pbsd wrote:
         | An alternative, more arithmetic, argument:
         | 
         | - x is 2^b*k for some odd k < 2^(n-b)
         | 
         | - -x is 2^n - 2^b*k = 2^b*(2^(n-b)-k)
         | 
         | - k is odd by definition, and (2^(n-b)-k) + k = 0 (mod
         | 2^(n-b)). This means that the LSB must be 1 in both operands,
         | which results in a carry out, and in the following ith bits we
         | have that the sum is 0 mod 2^i if and only if the ith bits of
         | (2^(n-b)-k) and k are distinct. Thus x & -x = 2^b.
        
           | Droobfest wrote:
           | This immediately reads like gobbledygook to me, while the
           | parent is easy to follow.
           | 
           | I'm not trying to dunk on you, but I can't help to note that
           | the denseness of math is too much for an idiot like me.
        
             | schoen wrote:
             | It might be nicer with exponents rendered as superscripts,
             | because it could make it more apparent how the exponents
             | are being used.
        
           | chx wrote:
           | > -x is 2^n - 2^b * k
           | 
           | Erm no. -x is - 2^b * k and it's the "Two's complement"
           | notation which is 2^n - 2 * b * k
           | 
           | This is because the way we generate the "Two's complement"
           | number. Given A in binary representation, let's call A1 the
           | number you get by flipping every bit in A also called one's
           | complement. Observe how every bit in A+A1 is 1 because if the
           | bit was 0 in A then the bit is 1 in A1 and vica versa. So
           | then that's 2^N-1. Two's complement is created by adding 1 to
           | A1 so then A + A1 is 2^N or A1 = 2^N-A. But you do need to
           | prove this before you can use it.
        
             | bonzini wrote:
             | The point of two's complement is that it's the same as
             | modulo-2^n arithmetic. In two's complement -x is defined
             | such that x + -x = 0 (mod 2^n); and because x + ~x = 2^n -
             | 1 (mod 2^n), _then_ -x = ~x + 1.
             | 
             | Therefore x & -x works only in two's complement, and it's
             | fair to assume it as the parent comment did.
        
         | arun-mani-j wrote:
         | Thanks for the nice summary. I found myself difficult to
         | explain it so I took the help of visual explanation using the
         | bits structure. :)
        
         | baryphonic wrote:
         | When I read the title, my immediate thought was "because 2s
         | complement!"
         | 
         | Your explanation is more thorough.
        
         | monktastic1 wrote:
         | Another, more "visual" description:
         | 
         | Without loss of generality, let the number be xx...xx100...000,
         | where x stands for any bit and there are n trailing zeros.
         | 
         | 1. The largest power of 2 that divides it is 2^n.
         | 
         | 2. The twos-complement representation is xx...xx011...111,
         | where the x's are inverted from before, _plus one_.
         | 
         | 3. After adding one, we get xx...xx100...000. Notice that the
         | carryover stopped exactly where the zero was (which is where
         | our first 1 from the right originally was).
         | 
         | 4. All the "x" bits are now inverted from their original
         | values, so their & is zero. All the original trailing zero bits
         | are still zero, so their & is zero. The middle 1 has stayed 1,
         | so its & is 1.
         | 
         | 5. Thus, the overall & is 00...00100...000 = 2^n, as required.
        
           | _proofs wrote:
           | this is actually a fantastic comment and as someone who
           | thinks more visually, was greatly appreciated. thank you.
        
             | dgfitz wrote:
             | Seconded, this comment was hugely valuable to me.
        
         | oldgradstudent wrote:
         | > 2. Crucial observation: -x == ~x + 1
         | 
         | To be annoyingly pedantic, this is not simply an observation,
         | this is essentially the definition of 2's complement
         | arithmetic.
        
           | dataflow wrote:
           | Some people do see it that way, and I think it's fine if you
           | do, but I think I disagree. To me, the definition of N-bit
           | two's complement is that -x (i.e. 0 - x) is represented by
           | 2^N - x. The fact that that is equal to ~x + 1 is not
           | obvious, or (as I see it) part of the definition.
        
         | StevenXC wrote:
         | Algebraically, where | denotes concatenation, and z is a string
         | of 0s:
         | 
         | x = y | 1 | z
         | 
         | -x = ~x + 1
         | 
         | -x = (~y | 0 | ~z) + 1
         | 
         | -x = ~y | 1 | z
         | 
         | x & -x = (y & ~y) | (1 & 1) | (z & z)
         | 
         | x & -x = 0 | 1 | z
        
       | mistercow wrote:
       | This is a fun puzzle to try to figure out before reading the
       | article.
        
       | o11c wrote:
       | Or briefly, copied from my StackOverflow answer[1]:
       | v    000...00010100       ~v   111...11101011 (not used directly,
       | all bits opposite)       -v   111...11101100 (-v == ~v + 1; this
       | causes all low 1 bits to overflow and carry)       v&-v
       | 000...00000100 (has a single 1 bit, from the carry)
       | 
       | The linked article is wrong in only mentioning 2 signed integer
       | representations. Old versions of C allowed 3 representations for
       | integers, floats use one of those (sign-magnitude, also used
       | widedly by bignum libraries) and one other (offset binary), and
       | base negative-2 is also possible in theory (not sure if practical
       | for anything), for a total of 5.
       | 
       | [1]: https://stackoverflow.com/a/63552117/1405588
        
         | arun-mani-j wrote:
         | I tried to explain only one's complement and two's complement
         | because they were only relevant to the explanation.
         | 
         | TIL we have 5 such representations! :)
        
           | unconed wrote:
           | I find this a pretty confusing explanation because the actual
           | mechanism is buried in the middle and you can't understand
           | the first diagram without it.
           | 
           | The goal is to produce the number `0...010...0` with the same
           | number of trailing zeroes as x.
           | 
           | If you flip all the bits in x, then `...10...0` turns into
           | `...01...1`. Adding one cascades the tail into `...10...0`.
           | 
           | You can do this entirely in unsigned math and you don't need
           | to drag two's complement into it. The fact that it
           | corresponds to `x & -x` is a curiosity, and then you have to
           | explain why it works for e.g. negative numbers, where it
           | counts trailing ones instead but still produces a positive.
           | 
           | There are plenty of other tricks like this, e.g. `x & (x - 1)
           | == 0` tells you if a (non-zero) x is a power of two, but it
           | doesn't work for negatives.
        
         | lifthrasiir wrote:
         | And then we have a zig-zag bijection for too many serialization
         | formats.
        
           | IshKebab wrote:
           | Why too many? It's a neat trick.
        
       | ww520 wrote:
       | Good explanation. Always good to read up on some bit tweaking
       | once a while.
        
       | analognoise wrote:
       | Does the site actually mention the book it was in?
        
         | arun-mani-j wrote:
         | It's Competitive Programmer's Handbook -
         | https://github.com/pllk/cphb/
        
       | xjay wrote:
       | See also: Bit Twiddling Hacks
       | 
       | https://graphics.stanford.edu/~seander/bithacks.html
        
         | richrichie wrote:
         | I was looking for this link. You made my morning!
        
         | anthk wrote:
         | Similar but for general math and optimizations:
         | 
         | http://www.hakmem.org/
        
         | FabHK wrote:
         | Or the delightful _Hacker 's Delight_ by Henry S. Warren, Jr.
         | 
         | https://en.wikipedia.org/wiki/Hacker%27s_Delight
        
         | mike_hock wrote:
         | > Compute the sign of an integer
         | 
         | > On the other hand, if you prefer the result be either -1, 0,
         | or +1, then use:
         | 
         | > sign = (v != 0) | -(int)((unsigned int)((int)v) >>
         | (sizeof(int) * CHAR_BIT - 1));
         | 
         | > // Or, for more speed but less portability:
         | 
         | > sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1,
         | 0, or +1
         | 
         | > // Or, for portability, brevity, and (perhaps) speed:
         | 
         | > sign = (v > 0) - (v < 0); // -1, 0, or +1
         | 
         | Or if you prefer portability, speed, and readability at the
         | same time:                   int sign(int i) {             if(i
         | > 0)                 return 1;             else if(i < 0)
         | return -1;             else                 return 0;         }
         | 
         | gets compiled to                       mov     ecx, edi
         | sar     ecx, 31             test    edi, edi             mov
         | eax, 1             cmovle  eax, ecx             ret
         | 
         | which _is_ the bit shift hack.
         | 
         | (still useful as a reference for _implementing_ an optimizer,
         | to be read as pseudo code)
        
           | jimbobthrowawy wrote:
           | I think a bunch of compilers nowadays are smart enough to see
           | these kinds of hacks and compile them to an equivalent on the
           | target that runs best. Favourite example of this is writing a
           | duff's device to unroll a loop, and then gcc completely
           | ignores it and replaces it with a vectorised loop.
        
       | omnicognate wrote:
       | Grammar nazi correction, but one I think is both interesting and
       | a useful way to remember how these work: ones' complement has the
       | apostrophe at the end (but two's complement is correct).
       | 
       | Ones' complement is the "complement of ones (plural)" in that if
       | you take an n-bit number and the representation of its additive
       | inverse in that scheme and add them as ordinary binary numbers
       | you get a result that is n ones. Eg. using 4 bits, 6 is 0110, -6
       | is 1001, adding them with ordinary binary addition gives 1111.
       | 
       | Two's complement is the "complement of two (to the n)" in that if
       | you do the same thing you get 2^n. Eg. 6 is 0110, -6 is 1010,
       | adding them with ordinary binary addition gives 10000, or 2^4.
        
         | layer8 wrote:
         | From https://en.wikipedia.org/wiki/Method_of_complements:
         | 
         | "Some people, notably Donald Knuth, recommend using the
         | placement of the apostrophe to distinguish between the radix
         | complement and the diminished radix complement. In this usage,
         | the four's complement refers to the radix complement of a
         | number in base four while fours' complement is the diminished
         | radix complement of a number in base 5. However, the
         | distinction is not important when the radix is apparent (nearly
         | always), and the subtle difference in apostrophe placement is
         | not common practice. Most writers use one's and nine's
         | complement, and many style manuals leave out the apostrophe,
         | recommending ones and nines complement."
         | 
         | I.e., the distinction isn't as well established as some claim,
         | and in the context of binary presentations it doesn't really
         | matter.
        
       | wruza wrote:
       | Because to negate you invert (so & = 0) and add one, which
       | overflows former zeroes until it meets a former one, which flips,
       | so & gives 1 there. Former zeroes are how even a number is.
       | 01100 (12, 2 bit even)       10011 (inv 12, & = 0)       10100
       | (inv 12 +1 aka -12)       00100 &, 2 bit even
        
       | amelius wrote:
       | It is a simple trick, but in practice you probably want to know
       | the position of the last nonzero bit instead of the 2^n pattern.
       | And most CPUs have instructions for that, which are more
       | efficient and as a bonus also document better what is happening.
        
       | tzs wrote:
       | Note that this gives 0 when x == O. For applications where you
       | need the largest power of 2 that divides x because you are going
       | to actually divide x by that you may want to check for 0 first.
        
       | lynx23 wrote:
       | This was a real joy to read. Especially kudos for the (I guess
       | already oldschool?) plain text figures. I was reading this with
       | lynx (as I stll frequently use it) and totally enjoyed the
       | flawless accessibility of this article. Whatever the motivation,
       | thanks for putting out content like this, it is totally
       | appreciated!
        
         | arun-mani-j wrote:
         | Aww thanks! Actually we both should thank Hugo for converting
         | the MarkDown into a very simple HTML structure. I also try to
         | follow semantic web whenever possible (article, header, footer
         | etc. instead of divsoup).
         | 
         | Very glad to know the site works well with Lynx!
        
       | JoeAltmaier wrote:
       | Carry. -x is ~x (not x) PLUS ONE
       | 
       | It's that carry that ripples through, making bits flip until it
       | gets to that 'largest power of 2'
        
       | rokicki wrote:
       | To take this to the next level: what does [(a^b) & (-(a^b)) & a]
       | compute? (Assume unsigned arithmetic.)
       | 
       | And then after that: what use can this be put to?
        
         | wruza wrote:
         | Iow, we flip some bits in `a`, then do subj, then mask it back
         | to `a`. It's unclear what it computes in general, but if `b`
         | disturbs the 2-powerness of `a` then I guess we learn that fact
         | by seeing zero. Not sure where to use it.
        
       | ChuckMcM wrote:
       | Nice article, which is putatively about the question posed, but
       | really is about how fun number theory can be. This trick and the
       | "reverse the order of bits with XOR" kinds of things are gateways
       | into the properties of numbers and their bases. Base 2 happens to
       | be "easy" because you can use simple boolean functions to move it
       | around but once you get the hang of it you can do this with other
       | bases as well.
       | 
       | I really didn't appreciate number theory until I started writing
       | cryptography code and trying to understand some of the 'tricks'
       | used to optimize the algorithms. Fun stuff!
        
       ___________________________________________________________________
       (page generated 2024-05-26 23:01 UTC)