suckless.org

       STATUS - libzahl - big integer library
 (HTM) git clone git://git.suckless.org/libzahl
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       STATUS (7064B)
       ---
            1 Optimisation progress for libzahl. Benchmarks are done in the
            2 range 1 to 4097 bits. So far all benchmarks on libzahl are
            3 done with the following combinations of cc and libc:
            4 
            5     gcc + glibc
            6     gcc + musl
            7     clang + glibc
            8 
            9 Benchmarks on the other libraries are done with gcc and glibc.
           10 
           11 All benchmarks are done on an x86-64 (specifically an Intel
           12 Core 2 Quad CPU Q9300), without any extensions turned on
           13 during compilation, and without any use of extensions in
           14 assembly code. The benchmarks are performed with Linux as
           15 the OS's kernel with 50 µs timer slack, and the benchmarking
           16 processes are fixed to one CPU.
           17 
           18 
           19   The following functions are probably implemented optimally:
           20 
           21 zset .................... always fastest
           22 zseti(a, +) ............. tomsfastmath is faster
           23 zseti(a, -) ............. tomsfastmath is faster
           24 zsetu ................... tomsfastmath is faster
           25 zswap ................... always fastest
           26 zzero ................... always fastest (shared with gmp)
           27 zsignum ................. always fastest (shared with gmp)
           28 zeven ................... always fastest (shared with gmp)
           29 zodd .................... always fastest (shared with gmp)
           30 zeven_nonzero ........... always fastest (shared with gmp)
           31 zodd_nonzero ............ always fastest (shared with gmp)
           32 zbtest .................. always fastest
           33 zsave ................... always fastest
           34 zload ................... always fastest
           35 
           36 
           37   The following functions are probably implemented optimally, but
           38   depends on other functions or call-cases for better performance:
           39 
           40 zneg(a, b) .............. always fastest
           41 zabs(a, b) .............. always fastest
           42 ztrunc(a, b, c) ......... always fastest
           43 zbset(a, b, 1) .......... always fastest
           44 zbset(a, b, 0) .......... always fastest
           45 zbset(a, b, -1) ......... always fastest
           46 zsplit .................. alternating with gmp for fastest, but gmp is a bit faster on average
           47 
           48 
           49   The following functions are probably implemented close to
           50   optimally, further optimisation should not be a priority:
           51 
           52 zadd_unsigned ........... fastest after ~140 (depends on cc and libc) compared against zadd too
           53 ztrunc(a, a, b) ......... fastest until ~100, then 77 % (gcc) or 68 % (clang) of tomsfastmath
           54 zbset(a, a, 1) .......... always fastest
           55 zbset(a, a, 0) .......... always fastest
           56 zbset(a, a, -1) ......... always fastest (only marginally faster than gmp with clang)
           57 zlsb .................... always fastest <<suspicious>>
           58 zlsh .................... not too fast anymore
           59 zand .................... fastest after ~400, tomsfastmath before
           60 zor ..................... fastest after ~1150, tomsfastmath before
           61 zxor .................... alternative with gmp after ~700, tomsfastmath before (musl), a bit slow with glibc
           62 znot .................... always fastest
           63 
           64 
           65   The following functions require structural changes for
           66   further optimisations:
           67 
           68 zneg(a, a) .............. always fastest (shared with gmp (gcc))
           69 zabs(a, a) .............. 34 % (clang) or 8 % (gcc) of tomsfastmath
           70 
           71 
           72   The following functions are probably implemented optimally
           73   or close to optimally, except they contain some code that
           74   should not be necessary after some bugs have been fixed:
           75 
           76 zbits ................... always fastest
           77 zcmpi(a, +) ............. always fastest
           78 zcmpi(a, -) ............. always fastest
           79 zcmpu ................... always fastest
           80 
           81 
           82   It may be possible optimise the following functions
           83   further:
           84 
           85 zadd .................... fastest after ~90 (clang), ~260 (gcc+musl), or ~840 (gcc+glibc) (gcc+glibc is slow)
           86 zcmp .................... almost always fastest (musl), almost always slowest (glibc) <<suspicious (clang)>>
           87 zcmpmag ................. always fastest <<suspicious, see zcmp>>
           88 
           89 
           90   The following functions could be optimised further:
           91 
           92 zrsh .................... gmp is almost always faster; also tomsfastmath after ~3000 (gcc+glibc) or ~2800 (clang)
           93 zsub_unsigned ........... always fastest (compared against zsub too)
           94 zsub .................... always fastest (slower with gcc+glibc than gcc+musl or clang)
           95 
           96 
           97   The following functions could probably be optimised further,
           98   but their performance can be significantly improved by
           99   optimising their dependencies:
          100 
          101 zmul .................... slowest
          102 zsqr .................... slowest
          103 zstr_length(a, 10) ...... gmp is faster (clang is faster than gcc, musl is faster than glibc)
          104 zstr(a, b, n) ........... slowest
          105 
          106 
          107 musl has more stable performance than glibc. clang is better at
          108 inlining than gcc. (Which is better at optimising must be judged
          109 on a per-function basis.)
          110 
          111 
          112 
          113 {{{ [out of date legacy area, this being phased out]
          114 Optimisation progress for libzahl, compared to other big integer
          115 libraries. These comparisons are for 152-bit integers. Functions
          116 in parenthesis the right column are functions that needs
          117 optimisation to improve the peformance of the function in the
          118 left column. Double-parenthesis means there may be a better way
          119 to do it. Inside square-brackets, there are some comments on
          120 multi-bit comparisons.
          121 
          122 zgcd .................... 21 % of gmp (zcmpmag)
          123 zmodmul(big mod) ........ slowest ((zmul, zmod))
          124 zmodsqr(big mod) ........ slowest ((zmul, zmod))
          125 zmodmul(tiny mod) ....... slowest ((zmul))
          126 zmodsqr(tiny mod) ....... slowest ((zmul))
          127 zpow .................... slowest (zmul, zsqr)
          128 zpowu ................... slowest (zmul, zsqr)
          129 zmodpow ................. slowest (zmul, zsqr. zmod)
          130 zmodpowu ................ slowest (zmul, zsqr, zmod)
          131 zsets ................... 13 % of gmp
          132 zrand(default uniform) .. 51 % of gmp
          133 zptest .................. slowest (zrand, zmodpow, zsqr, zmod)
          134 zdiv(big denum) ......... tomsfastmath is faster (zdivmod)
          135 zmod(big denum) ......... fastest (zdivmod)
          136 zdivmod(big denum) ...... fastest
          137 zdiv(tiny denum) ........ slowest
          138 zmod(tiny denum) ........ slowest
          139 zdivmod(tiny denum) ..... slowest
          140 }}}
          141 
          142 
          143 
          144 Note, some corresponding functions are not implemented in
          145 some other libraries. In such cases, they have been implemented
          146 in the translation layers (found under bench/). Those
          147 implementations are often suboptimal, but probably in style
          148 with what you would write if you need that functionality.
          149 Note further, if for example, you want do perform addition
          150 and you know that your operands are non-negative, you would
          151 choose zadd_unsigned in libzahl, but if you are using a
          152 library that does not have the corrsponding function, you are
          153 better of with the regular addition (zadd). This is however
          154 reflected in the comment column.
          155 
          156 Also note, TomsFastMath does not support arbitrarily large
          157 integers, the limit is set at compile-time, which gives is
          158 a significant performance advantage. Furthermore, no failure
          159 check is done with GMP. Additionally, hebimath has been
          160 excluded from these comparison because it is not fully
          161 operational.
          162 
          163 Also note, NOT does not mean the same thing in all libraries,
          164 for example in GMP it means (-x - 1), thus, znot does not
          165 use GMP's NOT in the translations layer.
          166 
          167 
          168 The following optimisation flags have been tested:
          169 
          170 -O0 ...................... Bad
          171 -O1 ...................... Bad
          172 -O2 ...................... Not so good
          173 -O3 ...................... Good
          174 -fno-builtin ............. Bad