_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
(HTM) Visit Hacker News on the Web
COMMENT PAGE FOR:
(HTM) Pop Goes the Population Count?
thefaux wrote 13 hours 53 min ago:
As impressive as this analysis is by the compiler, I shudder to think
how much time the compiler spends doing these kinds of optimizations.
In my opinion, a better architecture would be for the compiler to
provide a separate analysis tool that suggests source level changes for
these kinds of optimizations. It could alert you that the loop could be
replaced with a popcount (and optionally make the textual replacement
in the source code for you). Then you pay the cost of the optimization
once and have the benefit of clarity about what your code _actually_
runs at runtime instead of the compiler transparently pulling the rug
out from underneath you when run with optimizations.
Side note: many years ago I wrote the backend for a private global
surveillance system that has almost surely tracked the physical
location of anyone reading this. We could efficiently track how often a
device had been seen at a location in the prior 64 (days|weeks|months)
in just 192 bytes and use popcount to compute the value. I am not proud
that I built this.
Quekid5 wrote 3 hours 57 min ago:
The issue is that many optimization opportunities only appear after
monomorphization, inlining, de-virtualization, etc. etc.
Not that you couldn't do source level analysis as you suggest... it
just wouldn't be effective in many cases.
It would also be 'unstable' in the sense that it might depend on
architecture, etc.
adgjlsfhk1 wrote 10 hours 56 min ago:
Especially in languages that allow generic programming, the right
thing to do will be context dependent.
burnt-resistor wrote 16 hours 24 min ago:
I <3 [AT]BM and BMI[12], and functional-equivalency matching with cost
minimization optimization compiler passes.
I wish GCC and LLVM had compiler passes to semi/automagically
"vectorize" hot sections using SIMD, i.e., magic transformation of
UTF-8 conversion, regex matching, and string functions.
dhosek wrote 16 hours 51 min ago:
I still find it wild that Godbolt is his actual name and not some cool
term used for the tool to see what compiler output looks like.
silisili wrote 15 hours 57 min ago:
Same! I always assumed it was the name of the tool until I found out
it was a person not long ago.
Maybe a bit of a stretch, but I could see it fitting -
(HTM) [1]: https://en.wikipedia.org/wiki/Aptronym
taeric wrote 17 hours 34 min ago:
Somewhat related, "Gosper's hack" is a fun way to loop through all of
the values that have the same number of 1s.
wat10000 wrote 18 hours 4 min ago:
There's a fun approach where do the computation as a tree in parallel.
You do a little masking and shifting to add all the even-numbered bits
to all the odd-numbered bits, and come up with a set of (assuming we're
working on a 64-bit value) 32 partial sums of 2 bits each. Then you add
pairs of those to get 16 partial sums of 4 bits each, and so forth
until you get to the top. This requires six sums, plus shifts and masks
for each one.
I don't know if compilers are able to detect this and compile it down
to a single instruction, though.
mattgodbolt wrote 17 hours 42 min ago:
All that and more: [1] :)
(HTM) [1]: https://graphics.stanford.edu/~seander/bithacks.html#CountBi...
majke wrote 19 hours 3 min ago:
Hey! Popcount used to be my favorite instruction. Now I think I prefer
LOP3 though :)
bombcar wrote 19 hours 23 min ago:
Isn't this the instruction that apparently the NSA asked for?
kens wrote 18 hours 1 min ago:
I did a lot of research on this [1]. I got confirmation from Robert
Garner (architect of the SPARC processor) that the NSA did indeed ask
for the population count instruction. His story of meeting with the
NSA is pretty amusing [2]. [1]
(HTM) [1]: https://retrocomputing.stackexchange.com/a/8666/4158
(HTM) [2]: https://archive.computerhistory.org/resources/access/text/20...
pklausler wrote 18 hours 47 min ago:
It goes back to the CDC 6600 at least, and is most often seen as part
of Hamming distance computation (pop(xor(x,y))). But it turns out to
be really useful for other things (trailing zero count), and worth
having in hardware since the software sequence is a ~dozen
instructions for 64 bits.
adgjlsfhk1 wrote 18 hours 56 min ago:
yeah. They understood how useful computing on bits can be before
anyone else.
(DIR) <- back to front page