* * * * * Yet even more stupid benchmarks Yet another silly optimization problem [1]. This time, from a silly coding challenge [2] to find the number of integers expressible with unique digits (that is, no single digit repeats) in a base-10 representation up to the value 10,000,000,000 (there are 8,877,690 such numbers, by the way). The neatest and fastest solution was final program on this page [3], written in C#. It generates only such numbers; it doesn't try to test each number. Since I don't use C#, I decided to translate the code in C to play around with it. Wasn't all that hard: > #include > #include > > int total = 0; > const int pre[(1 << 10) + 1] /* = { ... } */ ; > > void generate2( > int maxlen, > int currentlen, > int availabledigits, > int currentvalue > ) > { > int last = (currentlen == maxlen - 1); > int x = availabledigits; > > while(x != 0) > { > int digit = pre[x ^ (x & (x - 1))]; > x &= (x - 1); > > if (digit == 0 && currentvalue == 0) > continue; > > if (last) > ++total; > else > generate2( > maxlen, > currentlen + 1, > availabledigits & ~(1 << digit), > (currentvalue * 10) + digit > ); > } > } > > int main(int argc,char *argv[]) > { > int len; > > for (len = 1 ; len <= 10 ; len++) > generate2(len,0,0xFFF >> 2,0); > > printf("total: %d\n",total); > return EXIT_SUCCESS; > } > I pregenerated the pre[] array since I wanted this to run as fast as possible. The code used to generate the array: > for (i = 0 ; i <= 10 ; i++) > pre[1 << i] = i; > Anyway, once written and compiled (gcc -O4 -fomit-frame-pointer f.c) it ran in about 0.2 seconds (average run) on a 2.6GHz (gigaHertz) machine. Fast, but I could go faster by running it across the two CPU (Central Processing Unit)s in the box. I was expecting about half the runtime, since this is easily parallelizable. It ran in about 0.16 seconds, a rather disappointing ¾ time. I commented out the code in generate2() just to test the overhead of threading and syncronization and that isn't a factor (program ran in 0.001 seconds). Undaunted, I decided to try one of the quad-core boxes at The Office. Reworked the code a bit to split the load between four CPUs as evenly as possible, and ran some tests. 0.13 seconds on average. Still not quite half the speed. Hmmm … [1] gopher://gopher.conman.org/0Phlog:2008/08/07.2 [2] http://beust.com/weblog/archives/000491.html [3] http://www.indiangeek.net/2008/08/29/a-case-study-in-micro-optimization/ Email Sean Conner at sean@conman.org .