[HN Gopher] Why is x & -x equal to the largest power of 2 that d...
___________________________________________________________________
Why is x & -x equal to the largest power of 2 that divides x?
Author : arun-mani-j
Score : 158 points
Date : 2024-05-25 14:44 UTC (1 days ago)
(HTM) web link (arunmani.in)
(TXT) w3m dump (arunmani.in)
| dataflow wrote:
| Short explanation:
|
| 1. The largest power of 2 that divides x is just 2^(number of
| trailing zeros in x)
|
| 2. Crucial observation: -x == ~x + 1
|
| 3. ~x flips all the bits of x bits, so none of the bits of ~x
| match those of x. (i.e. (x & ~x) == 0)
|
| 4. When you do +1, all the trailing 1's flip AGAIN, becoming zero
| like they were in x. The next highest 0 (say it was the n'th)
| also flips, becoming 1... like it was in x.
|
| 5. Crucial observation: The n'th 0 did NOT match the
| corresponding bit in x prior to the increment, therefore it MUST
| match after the increment. All higher bits stay as-is.
|
| 6. This leaves only the n'th bits matching in x and ~x + 1,
| isolating the highest power-of-2 divisor of x when you AND them.
| jb1991 wrote:
| I thought there was just a sign bit. If not, how does a system
| know if a number should be interpreted as positive or negative?
| dataflow wrote:
| Signedness doesn't affect anything here, as long as the
| representation is in two's complement. The negation of x
| (i.e. -x) in two's complement is ~x + 1. The interpretation
| of the bits as signed or unsigned doesn't change any of the
| following steps.
| MaxBarraclough wrote:
| I think strictly speaking signedness _does_ affect things,
| assuming we 're talking C/C++. If x has type int and were
| to take the value of INT_MIN, then evaluating -x would
| result in a signed arithmetic overflow, which is undefined
| behaviour. (Related: the C standard now insists on use of
| 2's complement, where it used to permit other schemes.)
|
| If x had type unsigned int, this wouldn't introduce
| undefined behaviour.
| elevatedastalt wrote:
| There's still a sign bit. The sign bit tells you whether to
| interpret the rest of the number as positive or treat as
| negative and undo the 2s complement operation.
| eslaught wrote:
| You're thinking about a sign and magnitude representation,
| which is not how integers are represented in a modern
| computer.
|
| The modern version is two's complement. It still has a sign
| bit, but negating a number involves more than just changing
| the sign bit since the representation is modular.
|
| https://en.wikipedia.org/wiki/Two%27s_complement
|
| https://en.wikipedia.org/wiki/Modular_arithmetic
| gus_massa wrote:
| I agree, but I just want to add that floating point numbers
| use sign and magnitude representation, so the GP may be
| confused by that.
| cvz wrote:
| In the "two's complement" representation, there is a sign
| bit, but the meaning isn't "invert this number", it's
| "subtract a large, fixed power of 2 from this number".
|
| The reason this has become the most common representation for
| signed numbers in computer hardware is that it makes the
| difference between signed and unsigned numbers basically
| irrelevant as far as the hardware is concerned. When the
| difference does matter (which it does in division,
| conversions between different word sizes, conversions to/from
| text, and sometimes multiplication), there are two different
| instructions for signed and unsigned, and it's up to the
| programmer or compiler to pick the right one.
| thaumasiotes wrote:
| > n the "two's complement" representation, there is a sign
| bit, but the meaning isn't "invert this number", it's
| "subtract a large, fixed power of 2 from this number".
|
| This is only true if the size of your modulus is fixed. In
| fact, there is a "sign extension" command allowing you to
| produce, for example, signed 128-bit values from signed
| 64-bit values, and this basically requires interpreting
| two's complement values the same way they represent
| themselves: a three-bit -1 is just the sum of the positive
| values +1, +2, and +4. To extend that to six bits, you add
| the positive values +8, +16, and +32.
|
| The meaning of the high bit in our six-bit scheme shouldn't
| be viewed as "when this bit is set, subtract 32 from the
| value instead of adding it". It is "whether this bit is set
| or not, all higher bits in the number, the ones that don't
| exist in the hardware, are equal to it"; under this
| interpretation, the 8-, 16-, and 32- bits were already set
| in the three-bit value, and that's why they continue to be
| set when we do a sign extension.
|
| Every place value is still positive, but as long as there
| is no _highest_ set bit, the number itself will be
| negative, and equal to the value you 'd calculate from its
| representation as a geometric series.
| kbolino wrote:
| This is true mathematically but simple to implement in
| practice. If the number is signed, shifts to the right
| and widening casts fill the new bits with a copy of the
| sign bit. If it's unsigned, shifts to the right and
| widening casts fill the new bits with zeroes. Shifts to
| the left and narrowing casts work the same for signed and
| unsigned.
| pests wrote:
| 2s compliment helps with the problem of otherwise having a
| +0/-0 and lets you subtract numbers just with adders.
| umanwizard wrote:
| Assuming 4-byte integers,
|
| 0000 is 0
|
| 0001 is 1
|
| 0010 is 2
|
| 0011 is 3
|
| 0100 is 4
|
| 0101 is 5
|
| 0110 is 6
|
| 0111 is 7
|
| 1111 is -1
|
| 1110 is -2
|
| 1101 is -3
|
| 1100 is -4
|
| 1011 is -5
|
| 1010 is -6
|
| 1001 is -7
|
| 1000 is -8
|
| The highest bit is always 1 when the sign is negative.
| umanwizard wrote:
| Too late to edit, but I meant 4-bit, not 4-byte.
| mabster wrote:
| Thanks! I've used the trick and derived why it works a few
| times, but always forget.
|
| I battled to understand the post. Your explanation got me there
| very quickly.
| dataflow wrote:
| Sure thing, happy to hear!
| pbsd wrote:
| An alternative, more arithmetic, argument:
|
| - x is 2^b*k for some odd k < 2^(n-b)
|
| - -x is 2^n - 2^b*k = 2^b*(2^(n-b)-k)
|
| - k is odd by definition, and (2^(n-b)-k) + k = 0 (mod
| 2^(n-b)). This means that the LSB must be 1 in both operands,
| which results in a carry out, and in the following ith bits we
| have that the sum is 0 mod 2^i if and only if the ith bits of
| (2^(n-b)-k) and k are distinct. Thus x & -x = 2^b.
| Droobfest wrote:
| This immediately reads like gobbledygook to me, while the
| parent is easy to follow.
|
| I'm not trying to dunk on you, but I can't help to note that
| the denseness of math is too much for an idiot like me.
| schoen wrote:
| It might be nicer with exponents rendered as superscripts,
| because it could make it more apparent how the exponents
| are being used.
| chx wrote:
| > -x is 2^n - 2^b * k
|
| Erm no. -x is - 2^b * k and it's the "Two's complement"
| notation which is 2^n - 2 * b * k
|
| This is because the way we generate the "Two's complement"
| number. Given A in binary representation, let's call A1 the
| number you get by flipping every bit in A also called one's
| complement. Observe how every bit in A+A1 is 1 because if the
| bit was 0 in A then the bit is 1 in A1 and vica versa. So
| then that's 2^N-1. Two's complement is created by adding 1 to
| A1 so then A + A1 is 2^N or A1 = 2^N-A. But you do need to
| prove this before you can use it.
| bonzini wrote:
| The point of two's complement is that it's the same as
| modulo-2^n arithmetic. In two's complement -x is defined
| such that x + -x = 0 (mod 2^n); and because x + ~x = 2^n -
| 1 (mod 2^n), _then_ -x = ~x + 1.
|
| Therefore x & -x works only in two's complement, and it's
| fair to assume it as the parent comment did.
| arun-mani-j wrote:
| Thanks for the nice summary. I found myself difficult to
| explain it so I took the help of visual explanation using the
| bits structure. :)
| baryphonic wrote:
| When I read the title, my immediate thought was "because 2s
| complement!"
|
| Your explanation is more thorough.
| monktastic1 wrote:
| Another, more "visual" description:
|
| Without loss of generality, let the number be xx...xx100...000,
| where x stands for any bit and there are n trailing zeros.
|
| 1. The largest power of 2 that divides it is 2^n.
|
| 2. The twos-complement representation is xx...xx011...111,
| where the x's are inverted from before, _plus one_.
|
| 3. After adding one, we get xx...xx100...000. Notice that the
| carryover stopped exactly where the zero was (which is where
| our first 1 from the right originally was).
|
| 4. All the "x" bits are now inverted from their original
| values, so their & is zero. All the original trailing zero bits
| are still zero, so their & is zero. The middle 1 has stayed 1,
| so its & is 1.
|
| 5. Thus, the overall & is 00...00100...000 = 2^n, as required.
| _proofs wrote:
| this is actually a fantastic comment and as someone who
| thinks more visually, was greatly appreciated. thank you.
| dgfitz wrote:
| Seconded, this comment was hugely valuable to me.
| oldgradstudent wrote:
| > 2. Crucial observation: -x == ~x + 1
|
| To be annoyingly pedantic, this is not simply an observation,
| this is essentially the definition of 2's complement
| arithmetic.
| dataflow wrote:
| Some people do see it that way, and I think it's fine if you
| do, but I think I disagree. To me, the definition of N-bit
| two's complement is that -x (i.e. 0 - x) is represented by
| 2^N - x. The fact that that is equal to ~x + 1 is not
| obvious, or (as I see it) part of the definition.
| StevenXC wrote:
| Algebraically, where | denotes concatenation, and z is a string
| of 0s:
|
| x = y | 1 | z
|
| -x = ~x + 1
|
| -x = (~y | 0 | ~z) + 1
|
| -x = ~y | 1 | z
|
| x & -x = (y & ~y) | (1 & 1) | (z & z)
|
| x & -x = 0 | 1 | z
| mistercow wrote:
| This is a fun puzzle to try to figure out before reading the
| article.
| o11c wrote:
| Or briefly, copied from my StackOverflow answer[1]:
| v 000...00010100 ~v 111...11101011 (not used directly,
| all bits opposite) -v 111...11101100 (-v == ~v + 1; this
| causes all low 1 bits to overflow and carry) v&-v
| 000...00000100 (has a single 1 bit, from the carry)
|
| The linked article is wrong in only mentioning 2 signed integer
| representations. Old versions of C allowed 3 representations for
| integers, floats use one of those (sign-magnitude, also used
| widedly by bignum libraries) and one other (offset binary), and
| base negative-2 is also possible in theory (not sure if practical
| for anything), for a total of 5.
|
| [1]: https://stackoverflow.com/a/63552117/1405588
| arun-mani-j wrote:
| I tried to explain only one's complement and two's complement
| because they were only relevant to the explanation.
|
| TIL we have 5 such representations! :)
| unconed wrote:
| I find this a pretty confusing explanation because the actual
| mechanism is buried in the middle and you can't understand
| the first diagram without it.
|
| The goal is to produce the number `0...010...0` with the same
| number of trailing zeroes as x.
|
| If you flip all the bits in x, then `...10...0` turns into
| `...01...1`. Adding one cascades the tail into `...10...0`.
|
| You can do this entirely in unsigned math and you don't need
| to drag two's complement into it. The fact that it
| corresponds to `x & -x` is a curiosity, and then you have to
| explain why it works for e.g. negative numbers, where it
| counts trailing ones instead but still produces a positive.
|
| There are plenty of other tricks like this, e.g. `x & (x - 1)
| == 0` tells you if a (non-zero) x is a power of two, but it
| doesn't work for negatives.
| lifthrasiir wrote:
| And then we have a zig-zag bijection for too many serialization
| formats.
| IshKebab wrote:
| Why too many? It's a neat trick.
| ww520 wrote:
| Good explanation. Always good to read up on some bit tweaking
| once a while.
| analognoise wrote:
| Does the site actually mention the book it was in?
| arun-mani-j wrote:
| It's Competitive Programmer's Handbook -
| https://github.com/pllk/cphb/
| xjay wrote:
| See also: Bit Twiddling Hacks
|
| https://graphics.stanford.edu/~seander/bithacks.html
| richrichie wrote:
| I was looking for this link. You made my morning!
| anthk wrote:
| Similar but for general math and optimizations:
|
| http://www.hakmem.org/
| FabHK wrote:
| Or the delightful _Hacker 's Delight_ by Henry S. Warren, Jr.
|
| https://en.wikipedia.org/wiki/Hacker%27s_Delight
| mike_hock wrote:
| > Compute the sign of an integer
|
| > On the other hand, if you prefer the result be either -1, 0,
| or +1, then use:
|
| > sign = (v != 0) | -(int)((unsigned int)((int)v) >>
| (sizeof(int) * CHAR_BIT - 1));
|
| > // Or, for more speed but less portability:
|
| > sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1,
| 0, or +1
|
| > // Or, for portability, brevity, and (perhaps) speed:
|
| > sign = (v > 0) - (v < 0); // -1, 0, or +1
|
| Or if you prefer portability, speed, and readability at the
| same time: int sign(int i) { if(i
| > 0) return 1; else if(i < 0)
| return -1; else return 0; }
|
| gets compiled to mov ecx, edi
| sar ecx, 31 test edi, edi mov
| eax, 1 cmovle eax, ecx ret
|
| which _is_ the bit shift hack.
|
| (still useful as a reference for _implementing_ an optimizer,
| to be read as pseudo code)
| jimbobthrowawy wrote:
| I think a bunch of compilers nowadays are smart enough to see
| these kinds of hacks and compile them to an equivalent on the
| target that runs best. Favourite example of this is writing a
| duff's device to unroll a loop, and then gcc completely
| ignores it and replaces it with a vectorised loop.
| omnicognate wrote:
| Grammar nazi correction, but one I think is both interesting and
| a useful way to remember how these work: ones' complement has the
| apostrophe at the end (but two's complement is correct).
|
| Ones' complement is the "complement of ones (plural)" in that if
| you take an n-bit number and the representation of its additive
| inverse in that scheme and add them as ordinary binary numbers
| you get a result that is n ones. Eg. using 4 bits, 6 is 0110, -6
| is 1001, adding them with ordinary binary addition gives 1111.
|
| Two's complement is the "complement of two (to the n)" in that if
| you do the same thing you get 2^n. Eg. 6 is 0110, -6 is 1010,
| adding them with ordinary binary addition gives 10000, or 2^4.
| layer8 wrote:
| From https://en.wikipedia.org/wiki/Method_of_complements:
|
| "Some people, notably Donald Knuth, recommend using the
| placement of the apostrophe to distinguish between the radix
| complement and the diminished radix complement. In this usage,
| the four's complement refers to the radix complement of a
| number in base four while fours' complement is the diminished
| radix complement of a number in base 5. However, the
| distinction is not important when the radix is apparent (nearly
| always), and the subtle difference in apostrophe placement is
| not common practice. Most writers use one's and nine's
| complement, and many style manuals leave out the apostrophe,
| recommending ones and nines complement."
|
| I.e., the distinction isn't as well established as some claim,
| and in the context of binary presentations it doesn't really
| matter.
| wruza wrote:
| Because to negate you invert (so & = 0) and add one, which
| overflows former zeroes until it meets a former one, which flips,
| so & gives 1 there. Former zeroes are how even a number is.
| 01100 (12, 2 bit even) 10011 (inv 12, & = 0) 10100
| (inv 12 +1 aka -12) 00100 &, 2 bit even
| amelius wrote:
| It is a simple trick, but in practice you probably want to know
| the position of the last nonzero bit instead of the 2^n pattern.
| And most CPUs have instructions for that, which are more
| efficient and as a bonus also document better what is happening.
| tzs wrote:
| Note that this gives 0 when x == O. For applications where you
| need the largest power of 2 that divides x because you are going
| to actually divide x by that you may want to check for 0 first.
| lynx23 wrote:
| This was a real joy to read. Especially kudos for the (I guess
| already oldschool?) plain text figures. I was reading this with
| lynx (as I stll frequently use it) and totally enjoyed the
| flawless accessibility of this article. Whatever the motivation,
| thanks for putting out content like this, it is totally
| appreciated!
| arun-mani-j wrote:
| Aww thanks! Actually we both should thank Hugo for converting
| the MarkDown into a very simple HTML structure. I also try to
| follow semantic web whenever possible (article, header, footer
| etc. instead of divsoup).
|
| Very glad to know the site works well with Lynx!
| JoeAltmaier wrote:
| Carry. -x is ~x (not x) PLUS ONE
|
| It's that carry that ripples through, making bits flip until it
| gets to that 'largest power of 2'
| rokicki wrote:
| To take this to the next level: what does [(a^b) & (-(a^b)) & a]
| compute? (Assume unsigned arithmetic.)
|
| And then after that: what use can this be put to?
| wruza wrote:
| Iow, we flip some bits in `a`, then do subj, then mask it back
| to `a`. It's unclear what it computes in general, but if `b`
| disturbs the 2-powerness of `a` then I guess we learn that fact
| by seeing zero. Not sure where to use it.
| ChuckMcM wrote:
| Nice article, which is putatively about the question posed, but
| really is about how fun number theory can be. This trick and the
| "reverse the order of bits with XOR" kinds of things are gateways
| into the properties of numbers and their bases. Base 2 happens to
| be "easy" because you can use simple boolean functions to move it
| around but once you get the hang of it you can do this with other
| bases as well.
|
| I really didn't appreciate number theory until I started writing
| cryptography code and trying to understand some of the 'tricks'
| used to optimize the algorithms. Fun stuff!
___________________________________________________________________
(page generated 2024-05-26 23:01 UTC)