[HN Gopher] Computing the number of digits of an integer even fa...
___________________________________________________________________
Computing the number of digits of an integer even faster
Author : vitaut
Score : 87 points
Date : 2021-06-04 14:50 UTC (8 hours ago)
(HTM) web link (lemire.me)
(TXT) w3m dump (lemire.me)
| h2odragon wrote:
| Digits = LEN( STR$( SomeNumber ))
|
| ( _run and hide in shame_ )
| foobarrio wrote:
| you need an ABS( ) in there no or else the negative sign gets
| counted? Also some languages will write really big numbers or
| really small numbers in engineering notation (eg 10,000,000
| becomes 10e6) but that's typically only for floating point
| types.
| opheliate wrote:
| The article only presents a function operating on unsigned
| integers, it's not unreasonable to suggest that this is
| intended to do the same ;)
| FartyMcFarter wrote:
| It'd be interesting to see a performance benchmark of the various
| solutions, including ones that don't use lookup tables.
|
| The results of such a benchmark would probably heavily depend on
| whether the lookup table is allowed to be in the CPU caches.
| chrisseaton wrote:
| This blog post is a response to another blog post which is a
| response to a question about how to implement integer-to-string
| length as quickly as possible in an optimising implementation of
| the Ruby programming language called TruffleRuby, if people here
| wanted a concrete example of how these kind of brain-teasers come
| up for real in industrial programming in an everyday job.
| silvestrov wrote:
| The big table becomes more obvious when the constants are written
| using hex: static uint64_t table[] = {
| 0x100000000, 0x1FFFFFFF6, 0x1FFFFFFF6, 0x1FFFFFFF6,
| 0x2FFFFFF9C, 0x2FFFFFF9C, 0x2FFFFFF9C, 0x3FFFFFC18,
| 0x3FFFFFC18, 0x3FFFFFC18, 0x4FFFFD8F0, 0x4FFFFD8F0,
| 0x4FFFFD8F0, 0x4FFFFD8F0, 0x5FFFE7960, 0x5FFFE7960,
| 0x5FFFE7960, 0x6FFF0BDC0, 0x6FFF0BDC0, 0x6FFF0BDC0,
| 0x7FF676980, 0x7FF676980, 0x7FF676980, 0x7FF676980,
| 0x8FA0A1F00, 0x8FA0A1F00, 0x8FA0A1F00, 0x9C4653600,
| 0x9C4653600, 0x9C4653600, 0xA00000000, 0xA00000000,
| };
| kwillets wrote:
| Originally I used a macro:
|
| // this increments the upper 32 bits (log10 T - 1) when >= T is
| added
|
| #define K(T) (((sizeof(#T)-1)<<32) - T)
|
| (T is 10,100, etc.)
| openasocket wrote:
| Awesome find! But I wonder how easily this could be extended to
| 64 bit integers? That sequence in the lookup table won't fit into
| 64 bits if it's extended. I think you would have to fall back to
| just having the lookup table storing the transition data, as
| described in https://commaok.xyz/post/lookup_tables/ in the "A
| Small Step" section. And probably add a conditional in the
| beginning to handle very large integers that would overflow.
| Alternatively you might be able to do something with two lookup
| tables, but I think at that point you've got enough lookup data
| that it would be less efficient.
| failwhaleshark wrote:
| You'd need 128-bit integers because of the way this works in
| the intermediate steps before the shift. There are u128 types
| in many modern compilers, but it's probably cheaper just to use
| a different type (u64) of table and another operation.
| kwillets wrote:
| It's easier when there are more bits available than in the
| original operand.
|
| For 64, a variable shift may work. Powers of 10 have a lot of
| trailing 0 bits.
| Zamicol wrote:
| I sometimes have difficulty finding efficient algorithms for
| specific tasks like this. Any advice on becoming better at
| finding efficient algorithms?
| klodolph wrote:
| I don't have any tips. Micro-optimization at this level often
| requires creative leaps, and proving that they work can be
| tricky (although for 32-bit integers, you can just iterate over
| all of them to validate). There are web pages with "bit-
| twiddling hacks"
| (https://graphics.stanford.edu/~seander/bithacks.html), and
| sometimes you can find a solution for your problem with a
| superoptimizer (if you care enough).
| knuthsat wrote:
| One other way to prove these algorithms is by writing them in
| z3 prover. You write a simple one that is 100% correct and
| then you prove equality for your optimized one or find an
| example that will show inequality.
| chrisseaton wrote:
| A great thing about a monadic operation that takes a 32-bit
| integer, like this one, is that you can also just
| exhaustively test it in reasonable time!
| Dylan16807 wrote:
| Today I learned "monadic" has two significantly different
| meanings in the realm of programming.
|
| I'm only familiar with using "unary" to mean single-
| parameter.
| Filligree wrote:
| Nah, they're pretty much the same meaning.
| Dylan16807 wrote:
| A monad, derived from monoid, means multiple very
| specific things completely unrelated to the other
| definition of "single parameter".
|
| And the etymology is basically a coincidence because
| "mono" got used in different places, isn't it?
| kevin_thibedeau wrote:
| Buy a copy of Hacker's Delight.
| legends2k wrote:
| Here's a gem called _Matters Computational_; a storehouse of
| many such tricks with proper categorization and index.
|
| https://www.jjj.de/fxt/fxtbook.pdf
| an1sotropy wrote:
| holy moly this is awesome; thanks!
| freeone3000 wrote:
| Let's assume you've studied the well-known computer science
| tricks like divide and conquer, greedy algorithms, and the
| various tree searches, and that's insufficient, even after
| specializing for your reduced domain. (These will get you 90%
| of the way there 90% of the time, and usually that's good
| enough. Sometimes it's not.)
|
| For integer tricks like this, you'll want to look at number
| theory. Modular equivalence is particularly useful, but lots of
| things in that space are useful. However, there are a lot of
| other tricks that rely on geometric sequences, or geometry
| themselves, and other higher maths -- a lot of advances like
| this are simply connecting random theorum a to situation b, if
| you can envision the situation in the correct space.
| detay wrote:
| it all goes back to 650x, where we had no multiplication/division
| and we had to create sin/cos tables.
| TacticalCoder wrote:
| And even on stuff like the 68000 on the Amiga/Atari ST we were
| using sin/cos tables for lookups were way faster than doing
| multiplication/divisions if I recall correctly. I have fond
| memories of tools allowing to create these tables for "sprites"
| movements (in either demos or games) but forgot how these
| little tools were called: maybe "table editor"?
| failwhaleshark wrote:
| I recall needing this on a snprintf() implementation.
|
| Isn't this effectively a table-optimized logBase(x) rounded-up to
| the ceiling, which can be reduced to efficient
| log2(x)/log2(Base)? If you know the base, then you can combine
| many ops into table-lookups.
|
| As an exercise to the reader, I suggest creating efficient code
| constrained to 32-bit or 16-bit unsigned.
|
| PS: I was going to say "look at _Hacker 's Delight_." I can
| definitely read. LOL.
| gameswithgo wrote:
| No way to leverage popcount instruction for this?
| Scaevolus wrote:
| log2 is generally implemented with some variant of the "count
| leading zeros" instruction, but the general popcount is
| irrelevant
| pklausler wrote:
| int->float conversion, followed by extraction and unbiasing
| of the exponent, also works for log2 and is more portable.
| failwhaleshark wrote:
| I don't normally convert my ints to floats, but when I do,
| I *(float *)(&x). ;-) (or fild)
| chrisseaton wrote:
| lzcnt is the first instruction in his solution. Seems it's left
| as implicit in this post, but if you go back through the chain
| of posts it's there.
| failwhaleshark wrote:
| LZCNT (base 2) would get you to a range of possible decimal
| digit (base 10) lengths that would need further processing,
| but it's still more expensive than a table-lookup direct
| solution directly to the base you're interested in.
| chrisseaton wrote:
| > LZCNT (base 2) would get you to a range of possible
| decimal digit (base 10) lengths that would need further
| processing, but it's still more expensive than a table-
| lookup direct solution directly to the base you're
| interested in.
|
| Are you saying you've got a better solution than the blog
| post that does a direct table lookup? Can you share it?
| pbsd wrote:
| If you insist, you could calculate the log2 as something like
| x |= x >> 1; x |= x >> 2; x |= x >> 4;
| x |= x >> 8; x |= x >> 16; return
| popcount(x-1);
|
| but there isn't really much of an advantage to it. lzcnt/bsr
| are better.
| spatulon wrote:
| Here's one interesting application of a digit-count function.
|
| The naive implementation of an integer-to-string conversion
| involves writing the number backwards, starting from the least
| significant digit, and then reversing the string at the end.
|
| In a 2017 talk titled "Fastware"[1], Andrei Alexandrescu showed
| that it's faster to start by counting the number of digits you're
| going to print, and then you can print by working backwards from
| the least significant digit. It comes out in the correct order
| without any need to reverse at the end.
|
| His implementation of the digit count was itself interesting,
| since it does 4 comparisons per loop and then a divide by 10000,
| instead of the naive single comparison and divide by 10.
| uint32_t digits10(uint64_t v) { uint32_t result = 1;
| for (;;) { if (v < 10) return result; if
| (v < 100) return result + 1; if (v < 1000) return
| result + 2; if (v < 10000) return result + 3;
| v /= 10000U; result += 4; } }
|
| IIRC, his insight was that smaller numbers tend to occur more
| often in the real world, so most calls to this function will not
| end up doing any divisions.
|
| [1] https://www.youtube.com/watch?v=o4-CwDo2zpg
| travisjungroth wrote:
| I imagine his code letting out a sigh when it gets to v/=
| 10000U.
|
| "ok, fine..."
| WalterBright wrote:
| > Alexei Andrescu
|
| Andrei Alexandrescu
| spatulon wrote:
| Thanks, corrected.
| cecilpl2 wrote:
| Why wouldn't you unroll the entire loop so no divisions are
| ever required?
| Bayart wrote:
| That's clever. A uint64 doesn't have more that 20 digits, so
| I'm guessing returning a uint8 over a uint32 might be better in
| memory-bound cases.
| vlovich123 wrote:
| No, it really wouldn't matter in the grand scheme of things,
| especially since the result will live in a register. Even if
| spilled to the stack, this one usage isn't going to matter vs
| all the other stuff that gets regularly spilled.
| pkaye wrote:
| Also most compilers can optimize the divide by constant to a
| multiply and shift which are much faster than a divide.
| Bayart wrote:
| A few days ago I had to write a bit of code in C# with an int to
| string conversion, and since I never use that language and I'm
| not used to OOP I completely looked over the ToString() method
| every primitive type has, I wrote a dumb conversion like I used
| to do as a student in C. The basic modulo 10 loop thing. You know
| the one.
|
| Anyway I'm glad to know there's a cool lookup table trick out
| there, I never really get to use anything like that but it makes
| for a nice read.
| toolslive wrote:
| you can also base your solution on hamilton cycles. that also
| boils down to o(1).
| mcherm wrote:
| > you can also base your solution on hamilton cycles
|
| I don't know what that means. Can you illustrate (preferably
| with a code example)?
| toolslive wrote:
| https://incubaid.wordpress.com/2012/01/24/number-of-
| trailing...
|
| is part of the solution. the other part is getting to the
| most significant digit.
| jedberg wrote:
| This is a fun optimization, but what is a use case where I would
| need to count the number of digits?
| chrisseaton wrote:
| Formatting numbers into strings is a pretty fundamental
| operation to need to do in your fast-path. For example
| formatting time into log lines. To do that we need to know how
| many digits the milliseconds component has so we can pad it
| with zeros correctly. That's what motivated this particular
| question in our case.
| jedberg wrote:
| That makes sense. Never really thought about that as a
| counting digits problem but yeah.
___________________________________________________________________
(page generated 2021-06-04 23:01 UTC)