[HN Gopher] Benchmarking RSA Key Generation
       ___________________________________________________________________
        
       Benchmarking RSA Key Generation
        
       Author : mfrw
       Score  : 42 points
       Date   : 2024-12-31 12:57 UTC (3 days ago)
        
 (HTM) web link (words.filippo.io)
 (TXT) w3m dump (words.filippo.io)
        
       | throw0101c wrote:
       | For new deployments/protocols, is there any reason to use RSA? Is
       | RSA (mostly?) about legacy support nowadays?
        
         | bux93 wrote:
         | For new protocols, go with X25519MLKEM768, then it's quantum
         | proof as well.
        
           | daneel_w wrote:
           | To my understanding, even with Shor's algorithm, RSA is
           | quantum-resistant when applying sufficiently large numbers
           | such that the quantum computer used to locate the prime must
           | be "sufficiently larger", i.e. a cat and mouse game of sorts
           | until some major breakthrough comes along. I agree with
           | simply moving to Ed25519 and X25519/ML-KEM etc.
        
             | honzaik wrote:
             | well, it depends on the size of the quantum computer. of
             | course you can make large enough RSA keys (depends whats
             | your security margin/assumptions) but the problem is that
             | the size/computational increase is exponential whereas the
             | solving speed scales polynomially.
             | 
             | https://eprint.iacr.org/2017/351
        
               | maginx wrote:
               | The size/computational complexity of usage also only
               | grows as a polynomial with RSA. But even with this in
               | mind, a quantum computer that can crack in polynomial
               | time is still problematic. While you can increase the key
               | size, the operator of the quantum computer could enlarge
               | his computer, a true cat-and-mouse game. Unlike the
               | current situation where usage complexity grows as a
               | polynomial, while cracking grows exponentially. This
               | difference is the reason why we can pick key sizes where
               | signing/decryption takes a millisecond while cracking it
               | takes (presumably) billions of years.
        
         | wolf550e wrote:
         | Ed25519 has been recommended over RSA for many years, post-
         | quantum stuff is recent. RSA should only be used to support old
         | protocols like webpki certificates.
        
         | upofadown wrote:
         | RSA is faster than elliptic curves for signature verification
         | and encryption. It is one of the oldest methods (half a
         | century) which suggests that it could be a good choice when you
         | need the lowest possible chance of some discovered weakness. It
         | can be used for both signing and encryption (but not with the
         | same key pair).
        
         | maginx wrote:
         | I still see a lot of RSA especially for encryption. While ECC
         | can be used for encryption it is more convoluted and much less
         | supported in hardware, software etc. than RSA
         | decryption/encryption.
        
         | thayne wrote:
         | FIPS 140-2 Compliance. FIPS 140-3 adds support for ECC, but it
         | is relatively new, and there aren't a lot of modules that have
         | been certified for it yet, so depending on you your environment
         | and requirements you might still need to use RSA.
         | 
         | Or you are doing a new deployment that needs to be compatible
         | with something that only supports RSA.
        
       | jdewerd wrote:
       | > The prime-counting function approximation tells us there are
       | Li(x) primes less than x, which works out[5] to one prime every
       | 354 odd integers of 1024 bits.
       | 
       | Rule of thumb: Want a 1024-bit prime? Try 1024 1024-bit
       | candidates and you'll probably find one. Want a 4096-bit prime?
       | Try 4096 4096-bit candidates and you'll probably find one.
       | 
       | The approximate spacing of primes around p is ln(p), so
       | ln(2^1024) = 1024*ln(2), and ln(2)=0.693 so if you are willing to
       | absorb 0.693 into your rule of thumb as a safety margin you get
       | the delightfully simple rule of thumb above. Of course, you'll
       | still want to use a sieve to quickly reject numbers divisible by
       | 2, 3, 5, 7, etc, and this easily rejects 90% of numbers, and then
       | do a Fermat primality test on the remainders (which if you squint
       | is sort of like "try RSA, see if it works"), and then do Miller-
       | Rabin test to really smash down the probability that your
       | candidate isn't prime. The probabilities can be made absurdly
       | small, but it still feels a bit scandalous that the whole thing
       | is probabilistic.
       | 
       | EDIT: updated rule of thumb to reflect random candidate choice
       | rather than sequential candidate choice.
        
         | rainsford wrote:
         | It's been a while since I've looked at the literature on RSA
         | prime generation, but I seem to remember that picking a random
         | starting point and iterating until you find a prime is
         | discouraged because primes aren't evenly distributed so key
         | generation timing could reveal some information about your
         | starting point and eventual prime choice.
         | 
         | I'm not sure how realistic of an issue this is given the size
         | of the primes involved. Even if an attacker can extract
         | sensitive enough timing information to figure out exactly how
         | many iterations were required to find a 1024 bit prime from a
         | 1204 bit random starting point, I'm not aware of a good way to
         | actually find either value. You do also introduce a bias since
         | you're more likely to select prime numbers without a close
         | neighbor in the direction you are iterating from, but again I'm
         | not sure how practical an attack on this bias would be.
         | 
         | Still, to avoid any potential risk there I seem to remember
         | best practice being to just randomly generate numbers of the
         | right size until you find a prime one. With the speed of modern
         | RNGs, generating a fresh number each time vs iterating doesn't
         | seem like a significant penalty.
        
           | honzaik wrote:
           | it might be this https://facthacks.cr.yp.to/fermat.html
           | 
           | if N=p*q and p-q < sqrt(p) then its easy to factor
        
             | maginx wrote:
             | Encountering this by chance is exceedingly unlikely of
             | course, if p and q are randomly generated. In probability
             | terms it amounts to the first half of p (or q) all being
             | zero (apart from a leading 1) so roughly 2^(-n/4) where n
             | is the bit size of n. So for RSA 2048 the probability of
             | this happening is on the order of 2^-512, or in base-10
             | terms 0.0000000...0000001, with roughly 150 zeros before
             | the one!
        
           | jdewerd wrote:
           | Yes, excellent point! I originally omitted this detail for
           | simplicity, but on reflection I don't think it actually
           | achieved much in the way of simplifying the rule so I changed
           | it to reflect reality. Thanks for pointing that out.
           | 
           | EDIT: the rush of people offering up sieve optimizations is
           | pushing me back towards formulating the rule of thumb on a
           | consecutive block of numbers, since it makes it very clear
           | that these are not included, rather than implicitly or
           | explicitly including some subset of them (implicit is bad
           | because opacity, explicit is bad because complexity).
        
           | maginx wrote:
           | I've seen many implementations that relied on iterating (or
           | rather, they used a prime sieve but it amounts to the same
           | thing in terms of the skewed distribution). While maybe not a
           | problem in practice I always hated it - even for embedded
           | systems etc. I always used pure rejection sampling, with a
           | random one in each iteration.
        
         | af3d wrote:
         | Iterating over some huge search space in an essentially
         | sequential manner is generally not going to be nearly
         | performant as simply selecting an odd number at random. You
         | could try using a generating polynomial instead such as f(x) =
         | x^2 + x + 41 but even that isn't going to help much in the long
         | run. (There are Diophantine equations which one day may prove
         | useful for generating random primes however AFAICT finding
         | efficient solutions is still currently considered a hard
         | problem.)
        
           | jdewerd wrote:
           | Yes, but the more we mix sieve rejection into candidate
           | selection the more we complicate the rule of thumb. "Reject
           | even numbers as prime candidates" is probably OK to leave as
           | an exercise for the reader, as is the equivalent "round every
           | candidate to odd" optimization. The point about random vs
           | sequential is well taken, though, and it doesn't complicate
           | the rule of thumb, so I changed it.
        
           | kevin_thibedeau wrote:
           | Incrementing a bignum is faster than another PRNG cycle.
        
             | loeg wrote:
             | Neither is a significant amount of the time required to
             | reject a candidate factor. The _cheapest_ rejection test is
             | "Bignum Division by 3" and something like 2/3 candidates
             | will need more expensive further tests.
             | 
             | https://news.ycombinator.com/item?id=40093136
        
         | FiloSottile wrote:
         | > Rule of thumb: Want a 1024-bit prime? Try 1024 1024-bit
         | candidates and you'll probably find one.
         | 
         | Where probably is 76% [1], which is not that high depending on
         | what you are doing. For example, you wouldn't be ok with
         | GenerateKey failing 24% of the time.
         | 
         | To get a better than even chance, 491 [2] 1024-bit candidates
         | are enough.
         | 
         | [1]:
         | https://www.wolframalpha.com/input?i=1+-+%281+-+li%282%5E102...
         | (using li(x) as a slightly better approximation of p(x) than
         | x/ln(x), see [3])
         | 
         | [2]:
         | https://www.wolframalpha.com/input?i=1+-+%281+-+li%282%5E102...
         | 
         | [3]: https://en.wikipedia.org/wiki/Prime-counting_function
        
       | JacobiX wrote:
       | It is fascinating (to me at least) that almost all RSA
       | implementations rely on a probabilistic primality test, even
       | though the probability of picking a pseudoprime is extremely
       | small.
        
         | FiloSottile wrote:
         | There's extremely small (like 2-20, the chance of winning $50
         | 000 with a $2 Powerball ticket [1]), and then there's
         | cryptographically negligible (like 2-120, the false negative
         | chance of our primality test). The chance of something
         | cryptographically negligible happening is about the same as the
         | chance of the attacker guessing your key on the first try, or
         | of a cosmic ray flipping a CPU flag inverting the result of "if
         | !signature.verified { return err }".
         | 
         | [1]: https://www.powerball.com/powerball-prize-chart
        
           | mras0 wrote:
           | I happened to look at this recently, and while I understand
           | the argument (but not the math) of having to do fewer Miller-
           | Rabain rounds, why would you do so in PRACTICAL settings?
           | Unlike ECC you're likely only generating long term keys, so
           | shorter key generation time seems like a bad tradeoff.
           | Composite candidates are going to be rejected early, so
           | you're (with high probability) not doing expensive
           | calculations for most candidates. My reading of [BSI B.5.2](h
           | ttps://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publicat..
           | .) confirms this.
           | 
           | Of course random bit flips could interfere, but other
           | measures should thwart this in high-stakes environments (at
           | least to some degree).
        
             | FiloSottile wrote:
             | The number of Miller-Rabin rounds has to be bounded, so if
             | you're not going to base your bound on reaching a
             | cryptographically negligible change of false positives,
             | what are you going to base it on? Should we do 10? 15?
             | 
             | The problem with "x should be enough, but why not do more?"
             | arguments is that they can be applied recursively, and
             | never answer the question "ok so when should we stop?"
        
               | mras0 wrote:
               | The BSI recommendation I linked says 60 times here
               | (considering the worst case and the security level
               | they're targeting). Just wondering why we'd want to do
               | less for a (presumably rare) one-time operation.
        
               | FiloSottile wrote:
               | Why not do 120 then? We can show that the chance of false
               | negative of 5 rounds is cryptographically negligible, so
               | 5, 60, and 120 are all the same. If the only argument for
               | 60 is that it's more and this is a really important rare
               | operation, doesn't it also apply to 120?
               | 
               | I'm not trying to be glib, there is no rational stopping
               | point if we reject the objective threshold.
        
               | mras0 wrote:
               | I don't pretend to understand all the involved math, but
               | what I'm trying to say is that as far as I understand T
               | rounds gives 4^-T probability that we've chosen a "bad"
               | prime (really composite) per normal Miller-Rabin in the
               | worst case. Doing ~5 rounds has been shown to be enough
               | to choose a good prime candidate, under most (very
               | probable!) conditions when we get it a random, and thus
               | the argument is that that ~5 rounds is fine. We agree so
               | far?
               | 
               | I'm just asking, why not run the conservative 60 round
               | test, rather than ~5 when you're doing a very rare, one
               | time, key generation? I understand that it's very
               | unlikely to reject any numbers, but at least BSI thinks
               | it's worth it for important keys.
               | 
               | If I understand the recommendation right, you wouldn't do
               | 60 for a 2048 bit key and then 120 for 4096, rather 61
               | rounds would be enough for 4096 if 120 is alright for
               | 2048.
        
               | maginx wrote:
               | As I understand it, the number of rounds needed (for a
               | given acceptable failure probability) goes down the
               | larger the number is. Note (very importantly) that this
               | is assuming you are testing a RANDOM integer for
               | primality. If you are given an integer from a potential
               | malicious source you need to do the full number of rounds
               | for the given level.
        
               | FiloSottile wrote:
               | You're asking why not apply the formula for adversarially
               | selected candidates even if we are randomly selecting
               | candidates. There is simply no reason to, except "maybe
               | we made a mistake" but then why would we not think we
               | made a mistake also in calculating the 1/4 value, or in
               | any other part of the code?
               | 
               | Phrased another way, do you have an argument for _why_
               | run the conservative 60 round test, instead of asking for
               | an argument for _why not_ run it?
               | 
               | Again, you are "very unlikely" to win the Powerball
               | jackpot. Rounds 6-60 have a cryptographically negligible
               | chance of rejecting a composite. It's different,
               | otherwise we'd have to worry about the "very unlikely"
               | chance of the attacker guessing an AES-128 key on the
               | first try.
               | 
               | (I don't follow you on the key sizes, if you apply the
               | 1/4 probability, the candidate size is irrelevant.)
        
               | maginx wrote:
               | The performance of this can matter in some scenarios. In
               | embedded systems, smart cards etc., generating the primes
               | can take a significant amount of time (15-20 seconds
               | typical) even with the 'low' number of iterations. Higher
               | iterations means the user will have to wait even longer.
               | Actually, in such systems it is not unusual to see
               | occasional time-out errors and such when the smart card
               | is unlucky with finding the primes. The timeout value may
               | be a fixed, general value in a part of the stack
               | difficult to change for the specific call.
               | 
               | Another case is short-term keys/certificates, where a
               | fresh key pair is generated, and cert issues, for each
               | and every signature. This setup makes revocation easier
               | to handle (the certificate typically has a lifetime of a
               | few minutes so it is 'revoked' almost immediately).
               | 
               | There are also scenarios where the keys are generated on
               | a central system (HSM's etc.) to be injected into a
               | production line or similar. There performance can matter
               | as well. I worked on a system where the HSM's were used
               | for this. They could typically generate an RSA key in 1-2
               | seconds, so 12 HSM's were needed to keep up with demand -
               | and this was again with the reduced number of rounds.
        
       ___________________________________________________________________
       (page generated 2025-01-03 23:01 UTC)