_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   Pop Goes the Population Count?
       
       
        thefaux wrote 13 hours 53 min ago:
        As impressive as this analysis is by the compiler, I shudder to think
        how much time the compiler spends doing these kinds of optimizations.
        In my opinion, a better architecture would be for the compiler to
        provide a separate analysis tool that suggests source level changes for
        these kinds of optimizations. It could alert you that the loop could be
        replaced with a popcount (and optionally make the textual replacement
        in the source code for you). Then you pay the cost of the optimization
        once and have the benefit of clarity about what your code _actually_
        runs at runtime instead of the compiler transparently pulling the rug
        out from underneath you when run with optimizations.
        
        Side note: many years ago I wrote the backend for a private global
        surveillance system that has almost surely tracked the physical
        location of anyone reading this. We could efficiently track how often a
        device had been seen at a location in the prior 64 (days|weeks|months)
        in just 192 bytes and use popcount to compute the value. I am not proud
        that I built this.
       
          Quekid5 wrote 3 hours 57 min ago:
          The issue is that many optimization opportunities only appear after
          monomorphization, inlining, de-virtualization, etc. etc.
          
          Not that you couldn't do source level analysis as you suggest... it
          just wouldn't be effective in many cases.
          
          It would also be 'unstable' in the sense that it might depend on
          architecture, etc.
       
          adgjlsfhk1 wrote 10 hours 56 min ago:
          Especially in languages that allow generic programming, the right
          thing to do will be context dependent.
       
        burnt-resistor wrote 16 hours 24 min ago:
        I <3 [AT]BM and BMI[12], and functional-equivalency matching with cost
        minimization optimization compiler passes.
        
        I wish GCC and LLVM had compiler passes to semi/automagically
        "vectorize" hot sections using SIMD, i.e., magic transformation of
        UTF-8 conversion, regex matching, and string functions.
       
        dhosek wrote 16 hours 51 min ago:
        I still find it wild that Godbolt is his actual name and not some cool
        term used for the tool to see what compiler output looks like.
       
          silisili wrote 15 hours 57 min ago:
          Same!  I always assumed it was the name of the tool until I found out
          it was a person not long ago.
          
          Maybe a bit of a stretch, but I could see it fitting -
          
 (HTM)    [1]: https://en.wikipedia.org/wiki/Aptronym
       
        taeric wrote 17 hours 34 min ago:
        Somewhat related, "Gosper's hack" is a fun way to loop through all of
        the values that have the same number of 1s.
       
        wat10000 wrote 18 hours 4 min ago:
        There's a fun approach where do the computation as a tree in parallel.
        You do a little masking and shifting to add all the even-numbered bits
        to all the odd-numbered bits, and come up with a set of (assuming we're
        working on a 64-bit value) 32 partial sums of 2 bits each. Then you add
        pairs of those to get 16 partial sums of 4 bits each, and so forth
        until you get to the top. This requires six sums, plus shifts and masks
        for each one.
        
        I don't know if compilers are able to detect this and compile it down
        to a single instruction, though.
       
          mattgodbolt wrote 17 hours 42 min ago:
          All that and more: [1] :)
          
 (HTM)    [1]: https://graphics.stanford.edu/~seander/bithacks.html#CountBi...
       
        majke wrote 19 hours 3 min ago:
        Hey! Popcount used to be my favorite instruction. Now I think I prefer
        LOP3 though :)
       
        bombcar wrote 19 hours 23 min ago:
        Isn't this the instruction that apparently the NSA asked for?
       
          kens wrote 18 hours 1 min ago:
          I did a lot of research on this [1]. I got confirmation from Robert
          Garner (architect of the SPARC processor) that the NSA did indeed ask
          for the population count instruction. His story of meeting with the
          NSA is pretty amusing [2]. [1]
          
 (HTM)    [1]: https://retrocomputing.stackexchange.com/a/8666/4158
 (HTM)    [2]: https://archive.computerhistory.org/resources/access/text/20...
       
          pklausler wrote 18 hours 47 min ago:
          It goes back to the CDC 6600 at least, and is most often seen as part
          of Hamming distance computation (pop(xor(x,y))).  But it turns out to
          be really useful for other things (trailing zero count), and worth
          having in hardware since the software sequence is a ~dozen
          instructions for 64 bits.
       
          adgjlsfhk1 wrote 18 hours 56 min ago:
          yeah. They understood how useful computing on bits can be before
          anyone else.
       
       
 (DIR) <- back to front page