[HN Gopher] Finding a billion factorials in 60 ms with SIMD
___________________________________________________________________
Finding a billion factorials in 60 ms with SIMD
Author : todsacerdoti
Score : 163 points
Date : 2025-06-22 23:12 UTC (23 hours ago)
(HTM) web link (codeforces.com)
(TXT) w3m dump (codeforces.com)
| few wrote:
| It's interesting to see which codeforces blog posts get traction
| on HN.
|
| For context, in competitive programming a lot of combinatorial
| problems (find some formula to count something) require you to
| output the answer modulo some prime. This is because otherwise
| the answer would overflow an int and make the problem too tedious
| to be fun and too hard for problem setters to write good problems
| or checkers for.
|
| So to prove that you still know how to count the thing, you can
| do it a finite field. If you use integers mod prime, you still
| have all the usual arithmetic operations like addition
| subtraction multiplication. And even division is still easy since
| you can calculate multiplicative inverse with Fermat's Little
| Theorem (a^(p-2) = a^(-1) mod p). The final answer you output is
| not the real thing you're counting, but just evidence that you
| had the right formula and did all the right operations.
|
| Anyway, just wanted to give context for why competitive
| programmers care about factorial mod a prime (usually as part of
| a binomial or multinomial expression). And I'm kind of surprised
| anyone outside of competitive programming cares about it.
|
| See also:
|
| https://usaco.guide/gold/modular?lang=cpp
|
| https://usaco.guide/gold/combo?lang=cpp
| dataflow wrote:
| > a^(p-2) = a^(-1) mod p
|
| Tangent, but curious: what made you write it like this? I've
| only ever seen it written as a^(p-1) = 1 mod p
|
| or a^p = a mod p
|
| Is that form somehow more useful in some cases?
| addaon wrote:
| a^(-1) mod p is the multiplicative inverse in a finite field.
| The point of the original comment was to show how to
| transform the multiplicative inverse into an easier problem.
| cperciva wrote:
| Just what do you think the running time of modular inverse
| is?
| atq2119 wrote:
| That's a pretty snarky and unhelpful approach to the
| conversation.
|
| That said, I'm also a bit surprised to see somebody
| discuss modular inverses without mentioning the extended
| euclidean algorithm, which is a more elementary solution.
| cperciva wrote:
| Snarky I'll admit, but it was a serious question. Inverse
| is quasilinear time; the exponential is quasiquadratic.
| Maybe he didn't know about the EEA?
| tmyklebu wrote:
| Well, modular multiplication is faster than modular
| inverse, both asymptotically for large moduli and
| practically for almost all moduli I can think of. (2, 3,
| and 4 being notable exceptions!)
|
| The article computes modular inverses of a_1, ..., a_n
| by:
|
| - Computing (a_1 * ... * a_i) = (a_1 * ... * a_{i-1}) *
| a_i recursively
|
| - Computing (a_1 * ... * a_n)^(-1) by square-and-multiply
|
| - Computing (a_1 * ... * a_i)^(-1) = (a_1 * ... *
| a_{i+1})^(-1) a_{i+1} recursively.
|
| - Computing a_i^(-1) = (a_1 * ... * a_i)^(-1) * (a_1 *
| ... * a_{i-1}) for each i.
|
| The second step is a scalar operation, so its running
| time is immaterial as long as you aren't doing something
| too silly.
|
| For my caveman brain, both Fermat's little theorem and
| square-and-multiply exponentiation are pretty easy to
| understand. Moreover, the code is going to be "defect-
| evident"---if I've gotten the logic wrong or forgotten
| integer promotions or modular reductions as in qsort's
| post, it'll quickly be clear by skimming the code.
| benreesman wrote:
| I disagree on two counts:
|
| - having Colin stop by your thread is strictly an
| opportunity for useful information to flow from a
| singular source to many people
|
| - you would hear that aloud 100 times a day in any office
| where serious work was being done by professionals on a
| deadline and think nothing of it, bet your ass places in
| the world where serious hackers rise only on merit and
| have the best gear embargoed are saying stuff like that
| all the time. this nepotism capture bubble is an outlier
| in the history of serious engineering.
|
| Defining the rudeness threshold down to the point where
| cperciva clears it is one part comedy and two parts
| tragedy with the words Hacker News in bold at the top of
| the page.
| qsort wrote:
| It's easier to write code for efficiently computing the
| inverse in that form, roughly: int
| FastExp(int a, int e, int p) { if (e == 0)
| return 1; if (e == 1) return a; if (e % 2 ==
| 0) return FastExp((a*a)%p, e/2, p); else return a *
| FastExp((a*a)%p, e/2, p); }
|
| In math competitions where you only have pen and paper, you'd
| instead turn what you wrote into a Diophantine equation you
| can solve with the usual method.
| vivzkestrel wrote:
| Since we are talking factorials, i wanted to ask. What is the
| largest factorial that the biggest supercomputer known to man has
| computed? how long did it take
| adgjlsfhk1 wrote:
| probably not very big. factorials are pretty boring in the
| sense that it's relatively trivial to compute a factorial
| almost as big as your hard drive. 99% of the time will happen
| in a single multiply
| cperciva wrote:
| _99% of the time will happen in a single multiply_
|
| Far from it. Asymptotically it's a proportion 2/log(N) of the
| compute cost.
| LegionMammal978 wrote:
| Yeah, most algorithms that run in O( _n_ polylog( _n_ ))
| time, including most divide-and-conquer bigint arithmetic,
| will ultimately be dominated by how much memory you have
| available. For some experiments a while back, I would create
| a terabyte-long swapfile to extend the range of what GMP
| could do, and it still filled up within a couple CPU-weeks at
| most.
| ethan_smith wrote:
| The current record is 10^10^9 (a billion billion digits) by
| Peter Luschny in 2021 using a parallel algorithm. For exact
| factorials, Clifford Stern calculated 170! in 2012 which has
| over 300 digits.
___________________________________________________________________
(page generated 2025-06-23 23:01 UTC)