[HN Gopher] Computing the number of digits of an integer even fa...
       ___________________________________________________________________
        
       Computing the number of digits of an integer even faster
        
       Author : vitaut
       Score  : 87 points
       Date   : 2021-06-04 14:50 UTC (8 hours ago)
        
 (HTM) web link (lemire.me)
 (TXT) w3m dump (lemire.me)
        
       | h2odragon wrote:
       | Digits = LEN( STR$( SomeNumber ))
       | 
       | ( _run and hide in shame_ )
        
         | foobarrio wrote:
         | you need an ABS( ) in there no or else the negative sign gets
         | counted? Also some languages will write really big numbers or
         | really small numbers in engineering notation (eg 10,000,000
         | becomes 10e6) but that's typically only for floating point
         | types.
        
           | opheliate wrote:
           | The article only presents a function operating on unsigned
           | integers, it's not unreasonable to suggest that this is
           | intended to do the same ;)
        
       | FartyMcFarter wrote:
       | It'd be interesting to see a performance benchmark of the various
       | solutions, including ones that don't use lookup tables.
       | 
       | The results of such a benchmark would probably heavily depend on
       | whether the lookup table is allowed to be in the CPU caches.
        
       | chrisseaton wrote:
       | This blog post is a response to another blog post which is a
       | response to a question about how to implement integer-to-string
       | length as quickly as possible in an optimising implementation of
       | the Ruby programming language called TruffleRuby, if people here
       | wanted a concrete example of how these kind of brain-teasers come
       | up for real in industrial programming in an everyday job.
        
       | silvestrov wrote:
       | The big table becomes more obvious when the constants are written
       | using hex:                 static uint64_t table[] = {
       | 0x100000000, 0x1FFFFFFF6, 0x1FFFFFFF6,         0x1FFFFFFF6,
       | 0x2FFFFFF9C, 0x2FFFFFF9C,         0x2FFFFFF9C, 0x3FFFFFC18,
       | 0x3FFFFFC18,         0x3FFFFFC18, 0x4FFFFD8F0, 0x4FFFFD8F0,
       | 0x4FFFFD8F0, 0x4FFFFD8F0, 0x5FFFE7960,         0x5FFFE7960,
       | 0x5FFFE7960, 0x6FFF0BDC0,         0x6FFF0BDC0, 0x6FFF0BDC0,
       | 0x7FF676980,         0x7FF676980, 0x7FF676980, 0x7FF676980,
       | 0x8FA0A1F00, 0x8FA0A1F00, 0x8FA0A1F00,         0x9C4653600,
       | 0x9C4653600, 0x9C4653600,         0xA00000000, 0xA00000000,
       | };
        
         | kwillets wrote:
         | Originally I used a macro:
         | 
         | // this increments the upper 32 bits (log10 T - 1) when >= T is
         | added
         | 
         | #define K(T) (((sizeof(#T)-1)<<32) - T)
         | 
         | (T is 10,100, etc.)
        
       | openasocket wrote:
       | Awesome find! But I wonder how easily this could be extended to
       | 64 bit integers? That sequence in the lookup table won't fit into
       | 64 bits if it's extended. I think you would have to fall back to
       | just having the lookup table storing the transition data, as
       | described in https://commaok.xyz/post/lookup_tables/ in the "A
       | Small Step" section. And probably add a conditional in the
       | beginning to handle very large integers that would overflow.
       | Alternatively you might be able to do something with two lookup
       | tables, but I think at that point you've got enough lookup data
       | that it would be less efficient.
        
         | failwhaleshark wrote:
         | You'd need 128-bit integers because of the way this works in
         | the intermediate steps before the shift. There are u128 types
         | in many modern compilers, but it's probably cheaper just to use
         | a different type (u64) of table and another operation.
        
         | kwillets wrote:
         | It's easier when there are more bits available than in the
         | original operand.
         | 
         | For 64, a variable shift may work. Powers of 10 have a lot of
         | trailing 0 bits.
        
       | Zamicol wrote:
       | I sometimes have difficulty finding efficient algorithms for
       | specific tasks like this. Any advice on becoming better at
       | finding efficient algorithms?
        
         | klodolph wrote:
         | I don't have any tips. Micro-optimization at this level often
         | requires creative leaps, and proving that they work can be
         | tricky (although for 32-bit integers, you can just iterate over
         | all of them to validate). There are web pages with "bit-
         | twiddling hacks"
         | (https://graphics.stanford.edu/~seander/bithacks.html), and
         | sometimes you can find a solution for your problem with a
         | superoptimizer (if you care enough).
        
           | knuthsat wrote:
           | One other way to prove these algorithms is by writing them in
           | z3 prover. You write a simple one that is 100% correct and
           | then you prove equality for your optimized one or find an
           | example that will show inequality.
        
             | chrisseaton wrote:
             | A great thing about a monadic operation that takes a 32-bit
             | integer, like this one, is that you can also just
             | exhaustively test it in reasonable time!
        
               | Dylan16807 wrote:
               | Today I learned "monadic" has two significantly different
               | meanings in the realm of programming.
               | 
               | I'm only familiar with using "unary" to mean single-
               | parameter.
        
               | Filligree wrote:
               | Nah, they're pretty much the same meaning.
        
               | Dylan16807 wrote:
               | A monad, derived from monoid, means multiple very
               | specific things completely unrelated to the other
               | definition of "single parameter".
               | 
               | And the etymology is basically a coincidence because
               | "mono" got used in different places, isn't it?
        
         | kevin_thibedeau wrote:
         | Buy a copy of Hacker's Delight.
        
         | legends2k wrote:
         | Here's a gem called _Matters Computational_; a storehouse of
         | many such tricks with proper categorization and index.
         | 
         | https://www.jjj.de/fxt/fxtbook.pdf
        
           | an1sotropy wrote:
           | holy moly this is awesome; thanks!
        
         | freeone3000 wrote:
         | Let's assume you've studied the well-known computer science
         | tricks like divide and conquer, greedy algorithms, and the
         | various tree searches, and that's insufficient, even after
         | specializing for your reduced domain. (These will get you 90%
         | of the way there 90% of the time, and usually that's good
         | enough. Sometimes it's not.)
         | 
         | For integer tricks like this, you'll want to look at number
         | theory. Modular equivalence is particularly useful, but lots of
         | things in that space are useful. However, there are a lot of
         | other tricks that rely on geometric sequences, or geometry
         | themselves, and other higher maths -- a lot of advances like
         | this are simply connecting random theorum a to situation b, if
         | you can envision the situation in the correct space.
        
       | detay wrote:
       | it all goes back to 650x, where we had no multiplication/division
       | and we had to create sin/cos tables.
        
         | TacticalCoder wrote:
         | And even on stuff like the 68000 on the Amiga/Atari ST we were
         | using sin/cos tables for lookups were way faster than doing
         | multiplication/divisions if I recall correctly. I have fond
         | memories of tools allowing to create these tables for "sprites"
         | movements (in either demos or games) but forgot how these
         | little tools were called: maybe "table editor"?
        
       | failwhaleshark wrote:
       | I recall needing this on a snprintf() implementation.
       | 
       | Isn't this effectively a table-optimized logBase(x) rounded-up to
       | the ceiling, which can be reduced to efficient
       | log2(x)/log2(Base)? If you know the base, then you can combine
       | many ops into table-lookups.
       | 
       | As an exercise to the reader, I suggest creating efficient code
       | constrained to 32-bit or 16-bit unsigned.
       | 
       | PS: I was going to say "look at _Hacker 's Delight_." I can
       | definitely read. LOL.
        
       | gameswithgo wrote:
       | No way to leverage popcount instruction for this?
        
         | Scaevolus wrote:
         | log2 is generally implemented with some variant of the "count
         | leading zeros" instruction, but the general popcount is
         | irrelevant
        
           | pklausler wrote:
           | int->float conversion, followed by extraction and unbiasing
           | of the exponent, also works for log2 and is more portable.
        
             | failwhaleshark wrote:
             | I don't normally convert my ints to floats, but when I do,
             | I *(float *)(&x). ;-) (or fild)
        
         | chrisseaton wrote:
         | lzcnt is the first instruction in his solution. Seems it's left
         | as implicit in this post, but if you go back through the chain
         | of posts it's there.
        
           | failwhaleshark wrote:
           | LZCNT (base 2) would get you to a range of possible decimal
           | digit (base 10) lengths that would need further processing,
           | but it's still more expensive than a table-lookup direct
           | solution directly to the base you're interested in.
        
             | chrisseaton wrote:
             | > LZCNT (base 2) would get you to a range of possible
             | decimal digit (base 10) lengths that would need further
             | processing, but it's still more expensive than a table-
             | lookup direct solution directly to the base you're
             | interested in.
             | 
             | Are you saying you've got a better solution than the blog
             | post that does a direct table lookup? Can you share it?
        
         | pbsd wrote:
         | If you insist, you could calculate the log2 as something like
         | x |= x >>  1;         x |= x >>  2;         x |= x >>  4;
         | x |= x >>  8;         x |= x >> 16;         return
         | popcount(x-1);
         | 
         | but there isn't really much of an advantage to it. lzcnt/bsr
         | are better.
        
       | spatulon wrote:
       | Here's one interesting application of a digit-count function.
       | 
       | The naive implementation of an integer-to-string conversion
       | involves writing the number backwards, starting from the least
       | significant digit, and then reversing the string at the end.
       | 
       | In a 2017 talk titled "Fastware"[1], Andrei Alexandrescu showed
       | that it's faster to start by counting the number of digits you're
       | going to print, and then you can print by working backwards from
       | the least significant digit. It comes out in the correct order
       | without any need to reverse at the end.
       | 
       | His implementation of the digit count was itself interesting,
       | since it does 4 comparisons per loop and then a divide by 10000,
       | instead of the naive single comparison and divide by 10.
       | uint32_t digits10(uint64_t v) {           uint32_t result = 1;
       | for (;;) {             if (v < 10) return result;             if
       | (v < 100) return result + 1;             if (v < 1000) return
       | result + 2;             if (v < 10000) return result + 3;
       | v /= 10000U;             result += 4;           }         }
       | 
       | IIRC, his insight was that smaller numbers tend to occur more
       | often in the real world, so most calls to this function will not
       | end up doing any divisions.
       | 
       | [1] https://www.youtube.com/watch?v=o4-CwDo2zpg
        
         | travisjungroth wrote:
         | I imagine his code letting out a sigh when it gets to v/=
         | 10000U.
         | 
         | "ok, fine..."
        
         | WalterBright wrote:
         | > Alexei Andrescu
         | 
         | Andrei Alexandrescu
        
           | spatulon wrote:
           | Thanks, corrected.
        
         | cecilpl2 wrote:
         | Why wouldn't you unroll the entire loop so no divisions are
         | ever required?
        
         | Bayart wrote:
         | That's clever. A uint64 doesn't have more that 20 digits, so
         | I'm guessing returning a uint8 over a uint32 might be better in
         | memory-bound cases.
        
           | vlovich123 wrote:
           | No, it really wouldn't matter in the grand scheme of things,
           | especially since the result will live in a register. Even if
           | spilled to the stack, this one usage isn't going to matter vs
           | all the other stuff that gets regularly spilled.
        
         | pkaye wrote:
         | Also most compilers can optimize the divide by constant to a
         | multiply and shift which are much faster than a divide.
        
       | Bayart wrote:
       | A few days ago I had to write a bit of code in C# with an int to
       | string conversion, and since I never use that language and I'm
       | not used to OOP I completely looked over the ToString() method
       | every primitive type has, I wrote a dumb conversion like I used
       | to do as a student in C. The basic modulo 10 loop thing. You know
       | the one.
       | 
       | Anyway I'm glad to know there's a cool lookup table trick out
       | there, I never really get to use anything like that but it makes
       | for a nice read.
        
       | toolslive wrote:
       | you can also base your solution on hamilton cycles. that also
       | boils down to o(1).
        
         | mcherm wrote:
         | > you can also base your solution on hamilton cycles
         | 
         | I don't know what that means. Can you illustrate (preferably
         | with a code example)?
        
           | toolslive wrote:
           | https://incubaid.wordpress.com/2012/01/24/number-of-
           | trailing...
           | 
           | is part of the solution. the other part is getting to the
           | most significant digit.
        
       | jedberg wrote:
       | This is a fun optimization, but what is a use case where I would
       | need to count the number of digits?
        
         | chrisseaton wrote:
         | Formatting numbers into strings is a pretty fundamental
         | operation to need to do in your fast-path. For example
         | formatting time into log lines. To do that we need to know how
         | many digits the milliseconds component has so we can pad it
         | with zeros correctly. That's what motivated this particular
         | question in our case.
        
           | jedberg wrote:
           | That makes sense. Never really thought about that as a
           | counting digits problem but yeah.
        
       ___________________________________________________________________
       (page generated 2021-06-04 23:01 UTC)