[HN Gopher] The definitive guide to "modulo bias" and how to avo...
       ___________________________________________________________________
        
       The definitive guide to "modulo bias" and how to avoid it (2020)
        
       Author : isomorph
       Score  : 121 points
       Date   : 2022-09-15 09:09 UTC (13 hours ago)
        
 (HTM) web link (research.kudelskisecurity.com)
 (TXT) w3m dump (research.kudelskisecurity.com)
        
       | [deleted]
        
       | rrobukef wrote:
       | Rejection sampling moves the information leak from bias to time.
       | As always don't use your own cryptography in production.
        
         | cronokirby wrote:
         | If you have a random stream of bits, and you use rejection
         | sampling to extract a value from that stream, then you don't
         | reveal any information about the value. At most, you reveal
         | information about the stream prior to the value you chose, but
         | each bit of a secure RNG should be unrelated to all prior bits,
         | so this is not an issue.
        
           | omegalulw wrote:
           | But you do? You expose some information on the range of the
           | value via the time it took to sample assuming for example
           | that the attacker knows the rejection sampling method in use.
        
             | cronokirby wrote:
             | That is true.
             | 
             | That said, there are few situations where the modulus being
             | used is not a public parameter of a protocol, and it is
             | very difficult to perform operations with a secret modulus
             | in constant-time, as your comment points out.
             | 
             | You'll always be able to get an approximate guess of the
             | size of the modulus too, since larger moduli will need more
             | registers to represent data.
        
               | some_furry wrote:
               | We don't consider "leaks the length of a SHA256 hash" to
               | be a valid timing attack in most protocols for similar
               | reasons (i.e. it's public knowledge).
               | 
               | When developers encounter timing attacks in their code,
               | they often invent really dumb ways to side-step the
               | length "leaking".
               | 
               | This might be understandable if it was a MAC then Encrypt
               | protocol with PKCS padding (hello lucky13), but instead
               | this comes up in the context of "validate this HMAC-
               | SHA256 tag for our JWT-like protocol".
        
       | LudwigNagasena wrote:
       | A cleaner example without bit masking:
       | https://github.com/openbsd/src/blob/master/lib/libc/crypt/ar...
        
         | fanf2 wrote:
         | I was hoping for a mention of Daniel Lemire's nearly-
         | divisionless algorithm for unbiased sampling. I recently
         | replaced BIND's copy of OpenBSD arc4random_uniform() with
         | Lemire's algorithm https://gitlab.isc.org/isc-
         | projects/bind9/-/blob/main/lib/is... and I blogged on the
         | subject a couple of times https://dotat.at/@/2020-10-29-nearly-
         | divisionless-random-num...
         | https://dotat.at/@/2022-04-20-really-divisionless.html
        
           | anomalroil wrote:
           | Lemire's technique is really nice, in general a good thing to
           | learn about, since it's a bit mind bending how it's playing
           | with intervals. Sadly last time I benchmarked it in code on
           | x86-64 for cryptographic purposes, it wasn't faster than
           | rejection sampling, or just using a large value and a modulo
           | reduction: in all cases what is actually taking a lot of time
           | is the call to get good quality randomness out of a CSPRNG,
           | the rest being negligible in comparison.
        
       | anomalroil wrote:
       | Notice that nowadays, unlike 2 years ago, people usually
       | recommend to use the last technique I presented there in the last
       | paragraph before the Conclusion. Which is to generate a random
       | value that is _big enough_ , so that n-log(p) > 128 so that the
       | bias will be too small to be exploitable in practice. It's
       | simpler and unlike rejection sampling ensures your code cannot
       | fall into an infinite loop in case your PRNG is broken. (I'd
       | argue you might want your code to fail in that case anyway, but
       | YMMV.)
        
         | stephencanon wrote:
         | The other virtue of this technique is that some of the more
         | popular fast rejection sampling methods (e.g. Lemire's "nearly
         | divisionless") leak a small amount of information via timing
         | side channel, because the divisionless fast path is _not_
         | unbiased.
        
       | tarakat wrote:
       | > You shouldn't rely on these claims, because even 1 bit of bias
       | on a 256 bit nonce value can be enough to attack certain
       | cryptographic schemes!
       | 
       | This seems like a very high bar for a random generator to clear.
       | It also raises a question: would using a larger nonce size
       | actually _increase_ risk, if the additional bits were biased?
        
         | barsonme wrote:
         | With some schemes, like ECDSA, you can't use a larger nonce
         | since the nonce is a field element.
         | 
         | In general, you shouldn't need to worry about it unless you're
         | using a broken CSPRNG or a bad cryptography library. And some
         | libraries will try and work around bad RNGs:
         | https://cs.opensource.google/go/go/+/refs/tags/go1.19.1:src/...
        
         | jstanley wrote:
         | It _seems_ like the general case answer is  "no, using a larger
         | nonce does not increase risk".
         | 
         | Otherwise an attacker could just imagine that instead of a
         | 256-bit nonce, the nonce was actually 257 bits long but the
         | first bit is always 0.
        
       | barbegal wrote:
       | The claim "even 1 bit of bias on a 256 bit nonce value can be
       | enough to attack certain cryptographic schemes" is true but there
       | have been no practical attacks against 256 bit secured schemes.
       | And the example of the mod 107 issue only reduces the amount of
       | bits from 6.74bits to 6.71bits (a reduction of 0.4%) so hardly
       | worth worrying about in the real world.
        
       | 10000truths wrote:
       | If you need to generate multiple such random numbers, an
       | alternative way to resolve modulo bias with minimal entropy waste
       | is to batch the random number generation. For example, you can
       | generate 6 random integers in [0, 100) by generating a random
       | integer in [0, 100^6) and performing modulo reductions on the
       | result to get your 6 integers. 100^6 is slightly less than 256^5,
       | so if your RNG works in units of bytes, then you can use 5 bytes
       | to generate 6 integers in [0, 100) instead of 6 bytes.
        
       | [deleted]
        
       | xxpor wrote:
       | Everything here is focused on cryptography, but there's other
       | common issues that can fall into this trap too. For example,
       | naive implementations of a hash function for a basic array +
       | linked list hash table. If you generate a hash, and then modulo
       | that to pick a hash bucket, you could end up with a biased
       | distribution across the buckets, and customers complaining your
       | performance varies wildly based on the input value.
       | 
       | Just more things to think about :)
        
         | kelnos wrote:
         | Isn't that why most tutorials on writing hash tables that use
         | this bucketing method recommend using table sizes that are
         | powers of two? I guess most tutorials don't exactly explain
         | _why_ , though, so someone who doesn't understand could decide
         | to not use a power of two without understanding why that's bad.
        
           | xxpor wrote:
           | There's another reason why (mathematically related) to use a
           | power of 2 as well: it makes the modulo a simple and fast
           | mask, instead of a slow division.
        
       | nraynaud wrote:
       | I knew the concept (every post on "I want a randome number in a
       | range" mentions it), but I didn't know the name, thanks.
        
       | astrobe_ wrote:
       | I remember very old versions of _man random_ [1] warning against
       | that sort of thing and recommending to pick the high (or low)
       | bits of the random value. Probably the piece of advice was not
       | correct.
        
         | morelisp wrote:
         | Picking different bits doesn't solve modulo bias.
         | 
         | This is more likely a memory of the warning concerning the low
         | quality of the randomness in the low bits (e.g. in LCGs where
         | it always alternates in the lowest bit), or the fact the high
         | bits of rand(3) were often zero due to a small RAND_MAX.
        
       | eisbaw wrote:
       | (r_dist * n) / <max value of r_dist> is better than r_dist % n
       | that linear scaling should mitigate such bias from modulo bias.
        
         | yorwba wrote:
         | That just moves the bias to other numbers.
         | 
         | In the example of reducing a byte-sized random value modulo
         | 107, the bias is that 0 can be generated by three different
         | possible inputs (0, 107 and 214), while 42 can only be
         | generated by two (42 and 149; 256 is just out of range), so 0
         | ends up being 50% more common than 42 in the long run.
         | 
         | With your proposed scheme, 0 can be generated by three possible
         | inputs again (0, 1 and 2), while 1 can only be generated by two
         | (3 and 4), so 0 ends up being 50% more common than 1 in the
         | long run.
        
       | exabrial wrote:
       | Interesting. In Java, there's random.nextInt(max). Curious if
       | that takes this into account.
        
         | brazzy wrote:
         | I have zero doubt that it does. This is extremely high
         | visibility code that has been around for decades.
         | 
         | And I also have zero doubts that there are a bunch of
         | reimplementations that don't, from assclowns who don't trust
         | libraries on general principle.
        
           | [deleted]
        
           | anomalroil wrote:
           | It does but it does not! java.util.Random is not a CSPRNG at
           | all and is terrible, so even tho the nextInt() method is
           | using rejection sampling, it's still producing biased values
           | and also completely fails to be "unpredictable" because
           | java.util.Random is weak and predictable.
        
             | [deleted]
        
             | the_af wrote:
             | Interesting! How come this wasn't fixed? Nobody noticed, or
             | is it that java.util.Random is not meant for serious
             | cryptographic use?
             | 
             | I know there are other parts of the Java standard lib that
             | are so terrible [1] that people for years have recommended
             | not using them, like anything with dates and timezones...
             | 
             | ---
             | 
             | [1] or used to, haven't kept up with the latest Java
             | versions. Maybe they fixed it.
        
               | brazzy wrote:
               | > is it that java.util.Random is not meant for serious
               | cryptographic use?
               | 
               | Indeed it never was. SecureRandom (a subclass of Random
               | in a different package, grouped with other cryptography
               | related functionality) was introduced in Java 1.1 in
               | 1997.
               | 
               | > I know there are other parts of the Java standard lib
               | that are so terrible [1] that people for years have
               | recommended not using them, like anything with dates and
               | timezones...
               | 
               | The original Java date classes (also from the 1.1 era)
               | were functional and correct, but badly designed. A modern
               | time API was introduced in Java 8 in 2014.
        
               | the_af wrote:
               | > _The original Java date classes (also from the 1.1 era)
               | were functional and correct, but badly designed. A modern
               | time API was introduced in Java 8 in 2014._
               | 
               | I'm pretty sure they were neither correct nor functional.
               | I've read pretty detailed examples of everything they got
               | fundamentally wrong, which was a lot. Wrong, not as "this
               | is cumbersome to use" but as in "this fundamentally
               | misunderstands what dates are at a conceptual level, and
               | produces wrong results". Unfortunately, I would have to
               | google it now.
               | 
               | All I can find right now is Jon Skeet's summary [1],
               | which leans more to the "avoid using java.util.Date
               | because it's one giant pitfall" side of the argument.
               | Though it does mention some fundamental problems with it.
               | However, this is not the article I found back in the day,
               | which was both more comprehensive and more critical.
               | 
               | > _A modern time API was introduced in Java 8 in 2014._
               | 
               | I seem to remember even at that time people were still
               | not recommending Java standard date classes and
               | functions, but I may be misremembering. Or were all the
               | replacements like joda time from a previous era?
               | 
               | ---
               | 
               | [1] https://codeblog.jonskeet.uk/2017/04/23/all-about-
               | java-util-...
        
               | kelnos wrote:
               | > _Or were all the replacements like joda time from a
               | previous era?_
               | 
               | Joda time essentially is the new standard Java date/time
               | implementation. Joda was used as an out-of-stdlib proving
               | ground to try out ideas and get feedback. The new Java
               | date/time API is based on the learnings from Joda time.
               | 
               | My understanding from use of the new (I shouldn't say
               | "new"; it's been there since Java 8) date/time classes
               | are that they're pretty good at representing dates,
               | times, calendars, etc., but I wouldn't claim to be an
               | expert on such matters.
        
               | brazzy wrote:
               | The Jon Skeet article you're citing mentions only
               | java.util.Date, and it is in fact absolutely fair to say
               | that this class "fundamentally misunderstands what dates
               | are at a conceptual level" in all of its functionality
               | that goes beyond being a thin wrapper around a number of
               | milliseconds.
               | 
               | But note that java.util.Date is from Java 1.0. The
               | mistake was recognized and in Java 1.1 we got
               | java.util.Calendar and friends, which is, as far as I
               | know, does qualify as "functional and correct" but was
               | ugly and annoying to use, as well as missing many useful
               | features (like the concept of a timespan, or even just a
               | date without a time-of-day).
        
               | the_af wrote:
               | (I assure you I'm not picking on you or your answer!)
               | 
               | I seem to remember the article which I unfortunately
               | cannot conjure up right now demolished java.util.Calendar
               | as well (again, as in "wrong", not as in "difficult to
               | use"). I'm pretty confident of this because I never used
               | Java 1.x at work, and when I read the article it was
               | relevant to my day job.
               | 
               | Alas, memory fails me.
        
               | edflsafoiewq wrote:
               | The contract of Random requires determinism, so the
               | algorithm can never be changed.
               | 
               | > If two instances of Random are created with the same
               | seed, and the same sequence of method calls is made for
               | each, they will generate and return identical sequences
               | of numbers. In order to guarantee this property,
               | particular algorithms are specified for the class Random.
               | Java implementations must use all the algorithms shown
               | here for the class Random, for the sake of absolute
               | portability of Java code
        
               | googlryas wrote:
               | It isn't terrible, it is just not suitable for
               | cryptographic applications. I suspect most of the time
               | that people want a random number up to some value, it is
               | _not_ for cryptographic purposes.
        
               | the_af wrote:
               | > _It isn 't terrible_
               | 
               | Mind you, I didn't call it terrible. However, the comment
               | I was replying to -- coincidentally the author of TFA on
               | modulo bias we are discussing -- did:
               | 
               | > _java.util.Random is not a CSPRNG at all and is
               | terrible, so even tho the nextInt() method is using
               | rejection sampling, it 's still producing biased values
               | and also completely fails to be "unpredictable" because
               | java.util.Random is weak and predictable._
               | 
               | and
               | 
               | > _[...] using fast, but bad random generators such as
               | Java 's Random was shown to be an issue multiple times in
               | the past already for things such as Monte Carlo, and so
               | on, not just for things related to security._
               | 
               | Of course, "CSPRNG" means "cryptographically secure", but
               | his comment made me think he considers it a terrible
               | implementation regardless.
        
               | brazzy wrote:
               | >his comment made me think he considers it a terrible
               | implementation regardless.
               | 
               | Different people have different standards for what is
               | "terrible". There are much better PRNGs that are just as
               | fast. You should probably avoid java.util.Random if you
               | care about the quality of the randomness it produces.
               | 
               | It's good enough for games that don't involve real money.
        
               | the_af wrote:
               | Well, the person who claims it's terrible is the author
               | of the very thorough article we are discussing.
               | 
               | > _You should probably avoid java.util.Random if you care
               | about the quality of the randomness it produces._
               | 
               | That's pretty much the definition of "bad" ;) If you
               | don't care about quality, I assume you could use a string
               | of numbers without any randomness at all.
               | 
               | As for games, it depends. I'm sure one could make a guess
               | it's bad for some kinds of non-betting games and
               | simulations as well.
        
               | dwheeler wrote:
               | Historically one of the first uses for computers was
               | Monte Carlo simulations. In such simulations it's often
               | important to be able to deterministically _recreate_
               | sequences of  "random" numbers. Thus, in almost all
               | computer programming languages, "Random" means
               | deterministic random values. This is a wide convention
               | followed by practically _every_ programming language.
               | 
               | If you want cryptographically secure random numbers, you
               | typically call a function with a different name, one that
               | has "secure" or "crypto" in a name somewhere (e.g., in
               | the function or containing module/package).
               | 
               | This is a convention from the 1950s and has been
               | consistent ever since. That naming-convention ship sailed
               | before many of us were born.
        
               | anomalroil wrote:
               | Speaking of Monte Carlo, here's a paper about how using
               | bad randomness can lead to wrong simulation:
               | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2992609/ (or
               | should I call them _biased_ ones?)
        
               | turtledragonfly wrote:
               | And as a counterpoint, if you use a well-chosen
               | deterministic sequence, it can make some Monte Carlo
               | methods produce better results faster than true
               | randomness would (quasirandom sequences and all that).
               | 
               | So, interestingly, "bad randomness" =/= deterministic.
               | 
               | (not that you were claiming that, just adding this
               | subtlety to the mix)
        
               | the_af wrote:
               | > _Thus, in almost all computer programming languages,
               | "Random" means deterministic random values._
               | 
               | Yes, I know how pseudo RNG works.
               | 
               | I just didn't realize this ran counter to doing secure
               | crypto, so thanks for the explanation anyway :)
        
               | anomalroil wrote:
               | As you said: Java's Random is not meant for serious
               | cryptographic usage, it's meant to be super fast. You
               | have SecureRandom instead for cryptographic usage, and
               | it's noticeably slower. Interestingly, using fast, but
               | bad random generators such as Java's Random was shown to
               | be an issue multiple times in the past already for things
               | such as Monte Carlo, and so on, not just for things
               | related to security.
        
             | brazzy wrote:
             | That's why we have
             | 
             | public class SecureRandom extends Random
        
           | [deleted]
        
       | makobado wrote:
       | This is going to be fun at work
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-09-15 23:01 UTC)