[HN Gopher] EABitTricks.h
       ___________________________________________________________________
        
       EABitTricks.h
        
       Author : mmastrac
       Score  : 205 points
       Date   : 2022-12-18 21:01 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | supermdguy wrote:
       | Question from a frontend dev: Why can't the compiler apply these
       | tricks automatically? Are there assumptions they're making that
       | don't necessarily hold?
        
         | SideQuark wrote:
         | The compiler cannot always detect all the uses.
        
         | nibbleshifter wrote:
         | You can't really trust the compiler to emit sensible code ever
        
       | flohofwoe wrote:
       | Seeing the project name EAStdC I thought for a moment this would
       | be the C counterpart to EASTL (e.g. a modern replacement for the
       | C standard library), but alas, it's all written in C++ and
       | intended to be used from C++ (e.g. no C interfaces).
       | 
       | For anyone interested in this sort of stuff and isn't yet aware
       | of the 'bit twiddling hacks' collection:
       | 
       | https://graphics.stanford.edu/~seander/bithacks.html
       | 
       | Also regarding the bit counting function:
       | inline int CountBits(uint32_t x) // Branchless version         {
       | #if defined(EASTDC_SSE_POPCNT)                 return
       | _mm_popcnt_u32(x);             #else                 x = x - ((x
       | >> 1) & 0x55555555);                 x = (x & 0x33333333) + ((x
       | >> 2) & 0x33333333);                 x = (x + (x >> 4)) &
       | 0x0F0F0F0F;                 return (int)((x * 0x01010101) >> 24);
       | #endif         }
       | 
       | Some compilers go as far as detecting the 'bit twiddling' version
       | and replace that with a popcnt instruction:
       | 
       | https://www.godbolt.org/z/bExMvfzEx
        
         | sreekotay wrote:
         | My contribution: http://sree.kotay.com/2007/04/shift-registers-
         | and-de-bruijn-...
        
         | maximilianburke wrote:
         | (disclaimer: I used to work on these EA* libraries)
         | 
         | The intention of EAStdC wasn't really to replace the platform
         | libc but to augment it with portable implementations that had
         | consistent behavior across the different platforms, as well as
         | have some opt-in performance improvements (like a memcpy that
         | used VMX), while also not polluting the global (C) namespace.
        
       | erksa wrote:
       | Does anyone have any reading material on introduction to where
       | bit-wise programming becomes important knowledge? Is it mostly
       | just used in game-dev?
       | 
       | They've always peaked my interest, but never personally had to
       | work with it, and just don't know the use cases where these
       | become important!
        
         | rickstanley wrote:
         | Maybe what danbruc commented [0] may be of interest for you.
         | 
         | [0]: https://news.ycombinator.com/item?id=34055831
        
           | erksa wrote:
           | Thanks!
        
         | ianlevesque wrote:
         | Interacting with hardware registers, such as in a device
         | driver, is another common use case.
        
           | woodrowbarlow wrote:
           | or implementing file formats / network protocols
        
         | turtledragonfly wrote:
         | The general use-case is "high-performance and/or embedded
         | systems." Games, and especially console games, happen to check
         | both of those boxes. Another high-perf case is financial
         | systems (eg: high-frequency trading). Other embedded systems
         | would include things like robotics, automotive software, audio
         | synthesizers and processing, etc.
         | 
         | Packing data into bits is a space optimization. On modern
         | processors, accessing memory is hideously slow compared to
         | doing logical operations, so if you can pack more info into
         | fewer bits, you get higher information density, and need to
         | access main RAM less often. Thus, it speeds things up,
         | sometimes dramatically.
         | 
         | Sorry, I don't have reading material per-se.
         | 
         | But just as an anecdotal example: I was using some of this
         | trickery to implement a densely-packed Quadtree using Morton
         | Codes. Morton Codes interleave the bits of the X and Y
         | coordinates for a cell on a grid. So, in order to break a
         | M-Code apart and combine them back together, you need to
         | "select the even bits" or "select the odd bits", which can be
         | done with some tricks like these.
         | 
         | (aside: it's " _piqued_ " :)
        
           | erksa wrote:
           | Thank you for the thorough break down. All the replies have
           | immediately framed my "understanding" much more.
           | 
           | I'm not exactly sure why, but I'm very intrigued nonetheless.
           | 
           | Aside: TIL, Thanks!
        
         | _0ffh wrote:
         | Most obvious use case I can think of is embedded programming,
         | e.g. when you're setting control registers.
        
         | riskable wrote:
         | In embedded programming it can be useful to control an
         | animation on an LED matrix since a lot of LED drivers and
         | multiplexers enable pins based on the bits passed through their
         | respective drivers. Example of how you might animate 8 LED
         | connected to 8 pins:                   10000000
         | 01000000         00100000         00010000         00001000
         | 00000100         00000010         00000001
         | 
         | You could turn that into an array to animate an LED going from
         | left-to-right _or_ you could just write a simple function that
         | bit-shifts your current state to the right.
         | 
         | It's also useful in scrolling text: Imagine that 8x8 above
         | contains a character like O. You could animate the O moving to
         | the left by shifting the bits in each row to the left by one
         | over and over again (according to whatever speed you've
         | defined).
        
           | erksa wrote:
           | Thanks man, this was a very clear explanation!
        
         | xxpor wrote:
         | When dealing with networking it's extremely common.
        
         | jesse__ wrote:
         | Not an answer to your exact question, but there's:
         | 
         | https://graphics.stanford.edu/~seander/bithacks.html
         | 
         | And if you like that, there's an entire book full of them,
         | which is delightful:
         | 
         | https://www.amazon.ca/Hackers-Delight-2nd-Henry-Warren/dp/03...
        
         | thoughtFrame wrote:
         | I don't know of any formal introductions to the topic but you
         | can see them in action in these places too
         | 
         | * some of Daniel Lemire's work in high performance data
         | structures
         | 
         | * the blog posts related to solving board games on
         | https://nullprogram.com
         | 
         | * the articles about board representations on the chess
         | programming wiki https://www.chessprogramming.org/Main_Page
         | 
         | * Any explanation of Quake III's fast inverse square root
        
           | erksa wrote:
           | Thanks for the resources, these are great! As an avid chess
           | fan I've always played around with the idea of trying making
           | a basic chess engine.. This might just be the push!!
        
         | anonymoushn wrote:
         | You end up needing this stuff to write fast parsers of text
         | formats, but you need a bunch of other stuff too, and there
         | isn't really any introductory material online.
        
         | teawrecks wrote:
         | Anywhere that a byte/word/dword is used as a first class data
         | structure, there is likely some operation on the data structure
         | that can be accomplished with a bit trick. As others have said,
         | this is most often the case where CPU cycles are tightly
         | constrained.
        
       | vodou wrote:
       | Modified BSD License (3-Clause BSD license), so it is permissive.
       | Not apparent from header file only.
        
         | AdmiralAsshat wrote:
         | Are any of these still needed, though? I kinda assumed they
         | would be like Duff's Device and be a nifty historical relic,
         | but firmly in the "Let the compiler do this for you" category
         | in modern C.
        
           | anonymoushn wrote:
           | How do you let the compiler do these things for you?
        
             | saagarjha wrote:
             | By having someone write the optimization passes to detect
             | these :)
        
           | TonyTrapp wrote:
           | Many of those are not standard functions (or were not, until
           | very recently). You have to implement it _somehow_ , so
           | having a ready-made collection that also performs well is
           | very nice to have.
        
           | jesse__ wrote:
           | When you're working on Frostbite you definitely need these.
           | 
           | Optimizers are smart, but aren't smart enough yet
        
       | turtledragonfly wrote:
       | I like this topic, so here are some more "bit hacks" links, from
       | my collection:                 * http://aggregate.org/MAGIC/
       | * http://graphics.stanford.edu/~seander/bithacks.html       *
       | https://stackoverflow.com/questions/109023/how-to-count-the-
       | number-of-set-bits-in-a-32-bit-integer/109025#109025       * http
       | s://github.com/bloomberg/bde/blob/main/groups/bdl/bdlb/bdlb_bitut
       | il.h
        
       | trap_goes_hot wrote:
       | Obviously a very very gross generalization (might get called out
       | on it too :P) - But I think the low hanging fruit for modern
       | software is optimizing data layouts for fast processing.
       | 
       | Optimizing code requires a lot of work, and requires skills that
       | people usually learn from experience - therefore a rare and
       | expensive skillset. Data-Oriented Design can be taught, and
       | requires (IMO of course) a far less technical skillset.
        
         | anonymoushn wrote:
         | This is difficult because lots of tools don't let developers
         | choose their data layouts. For example you cannot declare an
         | array of structs in Java.
        
       | nibbleshifter wrote:
       | There's this gem in EAProcess.h
       | 
       | ```/// ExecuteShellCommand("su root\nrm /* -r");```
       | 
       | https://github.com/electronicarts/EAStdC/blob/master/include...
        
       | danbruc wrote:
       | For those interested in this kind of stuff, there is the FXT
       | library [1] and the accompanying book [2] by Jorg Arndt.
       | 
       | [1] https://www.jjj.de/fxt/
       | 
       | [2] https://www.jjj.de/fxt/fxtbook.pdf
        
       | ot wrote:
       | My favorite so far                   /// GetNextWithEqualBitCount
       | /// How to get the next higher integer with the same number of
       | bits set.         /// This function is supposed (claim by
       | original author) to be able to          /// wrap around and
       | continue properly, but testing indicates otherwise.         ///
       | Do not call this with x = 0; as that causes a divide by 0.
       | /// Doesn't work for an input of all bits set.         /// Do not
       | assign the result of this to a type of less size than the input
       | argument.         /// This function has certain real-world
       | applications.         /// This function has been called 'snoob'
       | by some.
        
         | Centigonal wrote:
         | snoob - I want to know why
        
           | rparolin wrote:
           | snoob = Same Number Of One Bits
           | 
           | https://www.geeksforgeeks.org/next-higher-number-with-
           | same-n...
        
           | faturan wrote:
           | It stands for "same number of one bits".
        
         | pbsd wrote:
         | That appears to be a direct C++ translation of item 175 from
         | HAKMEM [1]. This does not require a division and can be done
         | much faster, see section 1.24 of [2].
         | 
         | [1] http://www.inwap.com/pdp10/hbaker/hakmem/hacks.html#item175
         | 
         | [2] https://jjj.de/fxt/fxtpage.html#fxtbook
        
           | tasty_freeze wrote:
           | > This does not require division...
           | 
           | From the link:
           | 
           | IDIVM D,C
        
             | pbsd wrote:
             | "This" as in finding the next combination. Gosper's trick
             | does.
        
         | danbruc wrote:
         | If the wrap around does not work anyway, you can do the
         | following and have it also work for all zeros and all ones. At
         | least I think so, this is a bit ad hoc without putting too much
         | thought into it.                 edges = x & (x ^ (x >> 1))
         | edge  = edges & -edges       mask  = edge | (edge << 1)
         | x ^= mask
         | 
         | If you are willing to accept a conditional expression, this
         | will make the wrap around work, I currently have no good idea
         | how to make it work without a condition.                 x =
         | (mask == 0) ? x <<< popcount(x) : x ^ mask
         | 
         | The idea is as follows. To increase the number, you have to
         | toggle some bit from zero to one, and to make the increase as
         | small as possible, it should be as far to the right as
         | possible. But to keep the number of ones the same, you also
         | have to toggle one bit from one to zero. This one must not be
         | to the left of the bit you toggle from zero to one or the
         | number would decrease, otherwise it should be as far to the
         | left as possible to make the number as small as possible. In
         | essence this means that you have to find the rightmost
         | occurrence of 01 and turn it into 10.
         | 
         | x ^ (x >> 1) finds all the 01 and 10 edges, edges = x & (x ^ (x
         | >> 1)) finds only the 01 edges. edge = edges & -edges clears
         | all but the rightmost set bit - the two's complement first
         | inverts all bits, then adding one turns all the ones on the
         | right into zeros until reaching the rightmost zero which gets
         | turned into a one - leaving use with one set bit indicating the
         | rightmost 01 edge. mask = edge | (edge << 1) duplicates that
         | bit and this is then used to flip the corresponding two bits
         | with x ^= mask.
         | 
         | Eventually all set bits will end up on the left and there will
         | be no 01 edge leaving x stuck in this state.
        
           | strbean wrote:
           | > I currently have no good idea how to make it work without a
           | condition.
           | 
           | Would this work? Or not worth computing both values each
           | time?                   x = ((mask == 0) * x <<< popcount(x))
           | & ((mask != 0) * x ^ mask)
        
             | danbruc wrote:
             | I have no idea if this would be worth doing. One could also
             | use a conditional move like CMOV on x86 if available on the
             | target architecture to avoid a jump. On the one hand I have
             | the gut feeling that there should be a nice trick to do
             | this, on the other hand this seems to either require a
             | population count dependent shift or rotation or reversing
             | the word, which also has no really simple trick as far as I
             | know and is usually done with a logarithmic number of
             | shifts.
        
           | danbruc wrote:
           | THE ALGORITHM IS WRONG.
           | 
           | But I can no longer edit it. 1110 should become 10011 but my
           | >>solution<< yields 10110. Will have to rethink this.
        
             | danbruc wrote:
             | And here is the fixed version, I hope.
             | zeroOneEdges         = x & (x ^ (x >> 1)) // This must be a
             | signed arithmetic shift.       rightmostZeroOneEdge =
             | zeroOneEdges & -zeroOneEdges       toggleMask           =
             | rightmostZeroOneEdge | (rightmostZeroOneEdge << 1)
             | rightmostOne = x & -x       shiftMask    =
             | rightmostZeroOneEdge - 1;       onesToShift  = x &
             | shiftMask       shiftedOnes  = onesToShift / rightmostOne
             | // This must be an unsigned division.              x = ((x
             | & ~shiftMask) ^ toggleMask) | shiftedOnes
             | 
             | Now it wraps around without a conditional expression but
             | just as the one from the library will divide by zero if
             | called with zero, so either do not do this or add a check
             | for zero and return zero in that case. It works for all
             | ones.
             | 
             | My first attempt was on the right track but also missing a
             | step. Not only must the rightmost 01 edge be turned into 10
             | but also all the ones to the right of the 01 edge must be
             | shifted back to the right, i.e. <rest>01<ones><zeros> must
             | become <rest>10<zeros><ones>. This shift is done with
             | division and it is important that it is an unsigned
             | division. It is also important that the 01 edge detection
             | shift is a signed arithmetic shift.
             | 
             | The algorithm from the library and my solution are quite
             | similar and they might actually be identical with the one
             | from the library being more optimized and shorter. I would
             | not be surprised if the one from the library actually also
             | works in principle but the implementation got a tiny detail
             | about the signedness of one of the operations wrong,
             | because that will result in the described issues, i.e. not
             | working properly once the sign bit gets involved, either
             | for all ones or during the wrap around.
        
         | [deleted]
        
         | rosywoozlechan wrote:
         | I wonder what the real-world applications for this are
        
           | thom wrote:
           | Chess engine stuff, at the very least. With an integer mask
           | representing an 8x8 board and 1s representing a single type
           | of piece, you can find all subsets of pawn positions etc.
        
           | otherjason wrote:
           | This can be used generate all bit sequences over a particular
           | range that have the same Hamming weight. I have needed that
           | before when testing error-correction code decoders to make
           | sure they properly correct all of the possible error patterns
           | that they should be able to.
        
           | mafuy wrote:
           | I've used this to enumerate grids with a certain number of
           | open and closed cells (0 and 1 bits). It is simply the next
           | permutation of a bit string. C++ offers this in its standard
           | lib, but that version is slower by factor 1000 or so.
           | 
           | I've also tested this with arbitrary bit-width numbers (LLVM
           | can do that) and it worked fine. I've not checked if the code
           | is the exact same tho.
        
           | CGamesPlay wrote:
           | It can be used to enumerate all subsets of size N from a set
           | of size M. Start with 1 << N - 1, and then repeatedly call
           | this function until the value is larger than 1 << M. Every
           | subset of size N will be visited, in order.
        
             | TeMPOraL wrote:
             | Can you explain what you mean by "set" and "subset"? What
             | do they contain? How they're represented? I can't make
             | heads or tails of your answer as-is.
        
               | rzzzt wrote:
               | Iterates through all possible N-combinations of a set
               | that has M elements?
        
               | nerdponx wrote:
               | Every bit in a bitfield represents a "thing". The
               | bitfield represents a collection of distinct things. If
               | bit "i" is 1, then the corresponding thing "i" is in the
               | collection.
        
               | doomrobo wrote:
               | Take a 64 bit number and treat it like a bit field where
               | each bit represents inclusion/exclusion of an element in
               | a set S of size 64. Thus every number with M 1s
               | represents a (distinct) M-sized subset of S
        
             | kccqzy wrote:
             | That's certainly a very smart way of doing so.
             | 
             | Years ago, I had a similar need and I consulted TAOCP to
             | find the "standard" algorithm to do so, which is done by a
             | binomial tree. The tree T is defined recursively as T(0) =
             | nil and T(n) = [Cons(0, T(0)), Cons(1, T(1)), ...,
             | Cons(n-1, T(n-1))]. There's no need to construct the tree
             | explicitly; it's just a conceptual formalism that helps
             | formulate an algorithm to traverse the tree.
             | 
             | The text did mention using bitwise operations to do so; in
             | fact it says it presents a "remarkable sequence of seven
             | bitwise operations that will convert any binary number to
             | the lexicographically next t-combination" but I haven't
             | been able to find those seven operations.
        
           | psychphysic wrote:
           | It's filtering by popcount effectively [0] and crops up quite
           | a bit.
           | 
           | [0] https://vaibhavsagar.com/blog/2019/09/08/popcount/
        
             | psychphysic wrote:
             | I misunderstood the task
        
         | adamnemecek wrote:
         | What is the correct implementation of this?
        
           | psychphysic wrote:
           | Have a look at popcount[0]. It's even a single instruction on
           | many platforms [1].
           | 
           | [0] https://en.cppreference.com/w/cpp/numeric/popcount
           | 
           | [1] https://vaibhavsagar.com/blog/2019/09/08/popcount/
        
             | danbruc wrote:
             | This gives you the number of set bits, not the next larger
             | number with the same number of set bits.
        
               | psychphysic wrote:
               | Filter larger, zip Popcount, filter with same popcount,
               | min.
               | 
               | If you need the full implementation spelled out I can do
               | that when I get in from my commute home.
        
               | anonymoushn wrote:
               | Step 1 is computing a set of ~2^64 things?
        
               | psychphysic wrote:
               | Lol just gotten in and looked at this at a desktop I
               | thought we were filtering a list. My bad. Funny though.
               | 
               | Interesting task I guess really you want to find the
               | first (from right) and move the first set bit left to
               | fill it.
        
               | danbruc wrote:
               | You could keep incrementing from the current value until
               | the population count matches again, but in the worst case
               | - probably from 2^62 to 2^63 with a single set bit or
               | wrapping around from 2^63 to 2^0 - that would still take
               | pretty much 2^64 steps.
        
           | anonymoushn wrote:
           | I think something like this would handle it
           | const std = @import("std");              fn snoob(x_:
           | anytype) @TypeOf(x_) {         const T = @TypeOf(x_);
           | const U = comptime std.meta.Int(.unsigned,
           | std.meta.bitCount(T));         const x = @bitCast(U, x_);
           | const tz = @ctz(x);         const pc = @popCount(x);
           | const L2U = comptime std.meta.Int(.unsigned,
           | std.meta.bitCount(@TypeOf(tz))-1);         if (tz + pc ==
           | std.meta.bitCount(T)) {           if (pc == 0) {
           | return 0;           }           return @bitCast(T, (x >>
           | @intCast(L2U, tz)));         }         const smallest =
           | @as(U, 1) << @intCast(L2U, tz);         const ripple = x +
           | smallest;         const ones = ((x ^ ripple) >> 2) >>
           | @intCast(L2U, tz);         return ripple | ones;       }
           | pub fn main() void { std.debug.print("Hello, {d}!",
           | .{snoob(@as(u32,5))}); }
           | 
           | In addition to the stated problems with wraparound and zero
           | (claimed fixed here) the implementation in this header file
           | seems to have some potential correctness issue if you pass in
           | a signed type and end up performing right shifts on values
           | with the sign bit set and your compiler decides this means
           | you want sign extension.
           | 
           | Edit: translation back to C++ using other functions in this
           | header and assuming that a bt_unsigned exists which is
           | similar to the already included bt_signed:
           | template <typename T>       inline T
           | GetNextWithEqualBitCount(T x_) {         bt_unsigned x,
           | smallest, ripple, ones;         int tz, pc;         x =
           | (bt_unsigned)x_;         tz = CountTrailing0Bits(x);
           | pc = CountBits(x);         if (tz + pc == sizeof(x)*8) {
           | if (pc == 0) {             return 0;           }
           | return (T)(x >> tz);         }         smallest =
           | ((bt_unsigned)1) << tz;         ripple = x + smallest;
           | ones = ((x ^ ripple) >> 2) >> tz;         return (T)(ripple |
           | ones);       }
           | 
           | runnable demo: https://www.sololearn.com/compiler-
           | playground/c6iz3kHpZ3zP
        
             | anonymoushn wrote:
             | The 2nd branch here only exists because shifting a uint32_t
             | right by 32 is undefined. You can avoid it if you do 2
             | shifts instead. The first branch is tougher to get rid of.
        
         | smcl wrote:
         | Full marks for honesty, IMO. It'd be tempting to delete this
         | before making it publicly visisble.
        
       | news_to_me wrote:
       | I just got a copy of Hacker's Delight, which seems to show the
       | same sorts of tricks. I haven't read very much of it yet, but I
       | wonder how many of these came from that book or a common source.
       | It would be fun to read more about the history of these tricks.
       | 
       | https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
        
       | ladberg wrote:
       | I wonder how many of these have actually been used vs someone
       | thought of a cool bitwise trick for a problem that doesn't exist.
       | Can anyone think of a use-case for TurnOffLowestContiguousBits?
        
         | dansalvato wrote:
         | I'm working on a game written in m68k Assembly. In cases where
         | I'm trying to make every CPU cycle count, these kinds of novel
         | bitwise operations really catch my interest. TurnOffLowestBit
         | seems useful for an effect I've had in mind for a while: A
         | graphic gets "erased" one pixel at a time from right to left,
         | but with some random variance for each horizontal line.
         | Reliably turning off the lowest bit might be a good solution.
         | Maybe throwing in TurnOffLowestContiguousBits would help to
         | introduce some variance in a visually interesting way.
         | 
         | Or maybe it wouldn't. But in order for me to consider these
         | operations as potential solutions to a problem, I have to know
         | they exist. Even if they beckon for me to invent a problem they
         | can solve, I'm totally open to that kind of creative
         | inspiration.
         | 
         | "Check out this neat bitwise thing you can do in only 12 CPU
         | cycles!" That's definitely the jam of Assembly and/or demoscene
         | coders, haha.
        
         | [deleted]
        
       | butz wrote:
       | Just imagine performance of modern software if developers took
       | time to optimize it, like they did it in the old days.
        
         | saagarjha wrote:
         | Bit hacks are but a tangent to performance optimization, of
         | course.
        
       | mywittyname wrote:
       | inline uint32_t Log2(uint32_t x)         {          const union {
       | float f; uint32_t i; } converter = { (float)x };          return
       | (converter.i >> 23) - 127;         }
       | 
       | This is interesting. It looks like it's returning the (biased
       | corrected) value of mantissa.
       | 
       | I'm surprised that there isn't a function supplied by the
       | compilers or in the processor that takes an int and returns the
       | mantissa. After all, every int -> float conversion requires this
       | step.
        
         | thechao wrote:
         | https://en.cppreference.com/w/c/numeric/math/frexp
        
         | electroly wrote:
         | Take a look at the C header <ieee754.h>. It's not on all
         | systems but it provides a union for type punning between a
         | double and its decomposed parts. IEEE754_DOUBLE_BIAS is
         | provided for bias correction. The same are available for
         | floats.
         | 
         | (Unfortunately, the "not on all systems" part is why I don't
         | use it, but instead copy the union and the constant verbatim
         | into the project if I need it.)
        
           | asveikau wrote:
           | I thought that type punning via union is against the
           | standard.
           | 
           | I personally do not think it should be ... It's a very handy
           | tool.
        
             | electroly wrote:
             | I think it may be a GNU extension to explicitly allow it,
             | but I'm no language lawyer. This header is from glibc; it
             | may explain why it's not available on some platforms.
        
               | saagarjha wrote:
               | It's legal in C.
        
         | 10000truths wrote:
         | Modern CPUs have a count-leading-zeros instruction to calculate
         | floor(log2(x)) efficiently. GCC and clang expose this with the
         | __builtin_clz* family of intrinsics. MSVC exposes this with
         | _BitScanReverse.                 static inline uint64_t
         | floor_log2(uint64_t x) {           return 63 -
         | __builtin_clzll(x);       }
        
           | vinkelhake wrote:
           | And C++ (as of C++20) exposes this and more in the header
           | <bit>.
           | 
           | https://en.cppreference.com/w/cpp/header/bit
        
         | ot wrote:
         | > It looks like it's returning the (biased corrected) value of
         | mantissa.
         | 
         | I think you mean exponent?
        
           | mywittyname wrote:
           | Yes, oops.
        
       | thomasahle wrote:
       | Funny they didn't just extend the 32 bit reverse algorithm to 64
       | bits:
       | https://github.com/electronicarts/EAStdC/blob/master/include...
        
         | SideQuark wrote:
         | On many platforms a shift by more than 31 bits often did
         | nothing, despite it should do something. Intel, for example,
         | masked the count at 5 bits, breaking a lot of 64 bit compiler
         | hacks.
         | 
         | There is also the issue of cache and speed - expanding to full
         | size would have different footprints, so perhaps they tested
         | that too.
         | 
         | This is likely a result to work around those problems.
        
         | rsaxvc wrote:
         | I haven't tested this, but on Arm32 platforms, this isn't a bad
         | way to do it n some 32-bit machines.
         | 
         | If the compiler recognizes the 32-bit approach and rbit is
         | available, the 64-bit variant is usually 2 rbits with the
         | registers swapped.
        
           | thomasahle wrote:
           | If the compiler can recognize the 32-bit version, it really
           | should be able to recognize the 64-bit version too, no?
        
             | rsaxvc wrote:
             | Ideally yes. Only last year did Clang 13 learn the 32-bit
             | one. GCC doesn't understand either. I haven't checked clang
             | for the 64-bit one.
        
         | chrchang523 wrote:
         | Yes, this library is of uneven quality, and would benefit from
         | a few hours of focused attention from a specialist. E.g. the
         | various HighestBit functions below what you've called out also
         | look inefficient relative to something using one of the
         | builtin_clz intrinsics, even though there's an earlier use of
         | builtin_clz...
        
           | asveikau wrote:
           | I think a lot of times the way a header like this comes into
           | being is that someone has a narrow need for an operation that
           | is deemed reusable and abstract. The 64 bit version of a 32
           | bit bit utility doesn't have a need at the time. Then
           | somebody comes in later trying to fill out more stuff.
        
       ___________________________________________________________________
       (page generated 2022-12-19 23:00 UTC)