Subj : Re: how does this work
To   : comp.programming
From : websnarf
Date : Wed Aug 03 2005 05:01 pm

What is the count for the number of bits in the sub-set of bits from a
to b (inclusive)?  Well if b = a, then the count is equal to the value
of that bit itself.  Other than using that as the starting point, the
code adds "pairs" of these counts using a technique called "SIMD",
essentially doubling the field that its counting in each time.  Since
the count of the number of bits is always less than or equal to the
field for those bits, the algorithm simply places these counts into
exactly those fields.

Let comment the code:

> unsigned int x;
>
> x = (x & 0x55555555) + ((x>>1)&0x55555555);

Remember that the binary expansion of a 5 nibble is 0101b.  Thus the
number 0x55555555 is a mask with exactly all the even bits set, and odd
bits unset.  So (x & 0x55555555) are the even bits of x, and ((x>>1) &
0x55555555) are the odd bits, both aligned to the even bit positions.

So this whole operation takes the odd single bits (which is equal to
the count for each of those bits, remember) and adds them to the even
single bits.  These 16 simultaneous results of these additions are thus
the count for two bits at a time.  The results are stored back into the
base position of where those two bits were located in the first place.

So for example lets do a table of results for the bottom two bits:

   before     after
     00b  ->   00b (0)
     01b  ->   01b (1)
     10b  ->   01b (1)
     11b  ->   10b (2)

So the before image is what the bits start out looking like.  The after
image is what summed count of those bits are; notice that the number of
bits in the result is no more than the number of source bits.

The result is basically a sequence of 16 bit pairs:

  bit[31]+bit[30] : bit[29]+bit[28] : ... : bit[1]+bit[0]

> x = (x & 0x33333333) + ((x>>2)&0x33333333);

The nibble 3 expands to 0011b, so 0x33333333 is a mask of two bits at a
time, in alternating each pair of bits.  This thinking about the value
as sequence of 16 bit pairs, (x & 0x33333333) is the even pairs, and
((x>>2)&0x33333333) is the odd pairs aligned to the even pair position.

So just like the above, when they are added the result simply
accumulates the sum of bit pairs into bit nibbles:

  (bit[31]+bit[30])+(bit[29]+bit[28]) : ... :
(bit[3]+bit[2])+(bit[1]+bit[0])

But addition on integers is a cummitative operation, so its just:

  bit[31]+bit[30]+bit[29]+bit[28] : ... : bit[3]+bit[2]+bit[1]+bit[0]

Which is a sequence of 8 fields each of which is the sum of the bits
that were originally in that nibble.

> x = (x & 0x0F0F0F0F) + ((x>>4)&0x0F0F0F0F);
> x = (x & 0x00FF00FF) + ((x>>8)&0x00FF00FF);
> x = (x & 0x0000FFFF) + ((x>>16)&0x0000FFFF);

The pattern is the same for the rest.  You get a sequence of 4 fields
of bytes, each of which is the sum of bits in that byte in the next
line, then a pair of fields of shorts each of which is the sum of the
bits in each short, then finally just the one final result which is the
sum of all the bits in the whole 32 bit word.

-- 
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.