https://lemire.me/blog/2021/04/28/ideal-divisors-when-a-division-compiles-down-to-just-a-multiplication/

Skip to content

Daniel Lemire's blog

Daniel Lemire is a computer science professor at the University of
Quebec (TELUQ) in Montreal. His research is focused on software
performance and data engineering. He is a techno-optimist.

Menu and widgets

  * My home page
  * My papers
  * My software

Subscribe

Email Address [                    ]
[                    ]
[Subscribe by email]
Where to find me?

I am on Twitter and GitHub:

Follow @lemire
You can also find Daniel Lemire on

  * on Google Scholar with 4k citations and over 75 peer-reviewed
    publications,
  * on Facebook,
  * and on LinkedIn.

Before the pandemic of 2020, you could meet Daniel in person, as he
was organizing regular talks open to the public in Montreal: tribalab
and technolab .
Search for: [                    ] [Search]
Support my work!

I do not accept any advertisement. However, you can support the blog
with donations through paypal. Please consider getting in touch if
you are a supporter so that I can thank you.

Recent Posts

  * Ideal divisors: when a division compiles down to just a
    multiplication
  * Some useful regular expressions for programmers
  * A trichotomy of intellectual activity
  * Science and Technology links (April 17th 2021)
  * How fast can you sort arrays of integers in Java?

Recent Comments

  * Daniel Lemire on Ideal divisors: when a division compiles down to
    just a multiplication
  * Jeff on Ideal divisors: when a division compiles down to just a
    multiplication
  * Daniel Lemire on Ideal divisors: when a division compiles down to
    just a multiplication
  * Pete on Ideal divisors: when a division compiles down to just a
    multiplication
  * Dominic Amann on Some useful regular expressions for programmers

Pages

  * A short history of technology
  * About me
  * Book recommendations
  * Cognitive biases
  * Interviews and talks
  * My bets
  * My favorite articles
  * My favorite quotes
  * My readers
  * My sayings
  * Predictions
  * Recommended video games
  * Terms of use
  * Write good papers

Archives

Archives [Select Month        ]

Boring stuff

  * Log in
  * Entries feed
  * Comments feed
  * WordPress.org

Ideal divisors: when a division compiles down to just a
multiplication

The division instruction is one of the most expensive instruction in
your CPU. Thus optimizing compilers often compile divisions by known
constants down to a multiplication followed by a shift. However, in
some lucky cases, the compiler does not even need a shift. I call the
corresponding divisors ideal. For the math. geeks, they are related
to Fermat numbers.

For 32-bit unsigned integers, we have two such divisors (641 and
6700417). For 64-bit unsigned integers, we have two different ones
(274177 and 67280421310721). They are factors for 2^32 + 1 and 2^64 +
1 respectively. They are prime numbers.

So you have that

n/274177 = ( n * 67280421310721 ) >> 64

and

n/67280421310721 = ( n * 274177 ) >> 64.

In these expressions, the multiplication is the full multiplication
(to a 128-bit result). It looks like there is still a 'shift' by 64
bits, but the 'shift' disappears in practice after compilation.

Of course, not all compilers may be able to pull this trick, but many
do. Here is the assembly code produced by GCC when compiling n/274177
and n/67280421310721 respectively for an x64 target.

        movabs  rdx, 67280421310721
        mov     rax, rdi
        mul     rdx
        mov     rax, rdx
        ret

        mov     rax, rdi
        mov     edx, 274177
        mul     rdx
        mov     rax, rdx
        ret

You get similar results with ARM. It looks like ARM works hard to
build the constant, but it is mostly a distraction again.

        mov     x1, 53505
        movk    x1, 0xf19c, lsl 16
        movk    x1, 0x3d30, lsl 32
        umulh   x0, x0, x1
        ret

        mov     x1, 12033
        movk    x1, 0x4, lsl 16
        umulh   x0, x0, x1
        ret

What about remainders?

What a good compiler will do  is to first compute the quotient, and
then do a multiplication and a subtraction to derive the remainder.
It is the general strategy. Thus, maybe surprisingly, it is more
expensive to compute a remainder than a quotient in many cases!

You can do a bit better in some cases. There is a trick from our
Faster Remainder by Direct Computation paper that compilers do not
know about. You can compute the remainder directly, using exactly two
multiplications (and a few move instructions):

n % 274177 = (uint64_t( n * 67280421310721 ) * 274177) >> 64

and

n % 67280421310721 = (uint64_t( n * 274177 ) * 67280421310721) >> 64.

In other words, the following two C++ functions are strictly
equivalent:

// computes n % 274177
uint64_t div1(uint64_t n) {
    return n % 274177;
}

// computes n % 274177
uint64_t div2(uint64_t n) {
    return (uint64_t( n * 67280421310721 )
              * __uint128_t(274177)) >> 64;
}

Though the second function is more verbose and uglier, it will
typically compile to more efficient code involving just two
multiplications, back to back. It may seem a lot but it is likely
better than what the compiler will do.

In any case, if you are asked to pick a prime number and you expect
to have to divide by it, you might consider these ideal divisors.

Further reading. Integer Division by Constants: Optimal Bounds

Published by

[4b7361]

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).
View all posts by Daniel Lemire

Posted on April 28, 2021April 28, 2021Author Daniel LemireCategories 
 

4 thoughts on "Ideal divisors: when a division compiles down to just
a multiplication"

 1. [b2c28e] Pete says:
    April 28, 2021 at 6:26 pm

    Are there similar numbers for 128 and 256-bit unsigned integers?
    And are these useful in cryptographic contexts? Obviously not as
    a secret prime, but as something to speed up big-number
    arithmetic..

    Reply
     1. [4b7361] Daniel Lemire says:
        April 28, 2021 at 6:51 pm

        Yes.

        For 128 bits: 59649589127497217 and 5704689200685129054721

        For 256 bits: 1238926361552897 and
        93461639715357977769163558199606896584051237541638188580280321

        Reply
 2. [bce6d5] Jeff says:
    April 28, 2021 at 9:47 pm

    Over the years I've noticed that trying to write a lot of numbers
    to text log can seriously slow down the sections doing the
    logging. Seems that printf("%d",...) is quite the little delay
    statement. I figure this is down to the the necessity to divide
    by 10 (with remainder) for every digit thusly printed.

    On account of printing human readable numbers, dividing by 10
    must be one of the most common divisors encountered. So you'd
    think that CPU designers would give us dedicated optimized
    divide-by-10 instructions. I'll admit I'm pretty out of touch
    with current instruction sets, but given how doggy printf %d
    still is, I'd guess they haven't.

    Reply
     1. [4b7361] Daniel Lemire says:
        April 28, 2021 at 10:47 pm

        You are right that calls to printf are slow. I do not expect
        that the conversion from integers to strings is what slows
        you down in such a case.

        Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked 
*

To create code blocks or other preformatted text, indent by four
spaces:

    This will be displayed in a monospaced font. The first four
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
        [                                             ]
Comment [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[ ] Save my name, email, and website in this browser for the next
time I comment.

Receive Email Notifications? [no, do not subscribe      ] 
[instantly    ]
Or, you can subscribe without commenting.

[Post Comment] 

[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
Post navigation

Previous Previous post: Some useful regular expressions for
programmers
Proudly powered by WordPress