[HN Gopher] The definitive guide to "modulo bias" and how to avo...
___________________________________________________________________
The definitive guide to "modulo bias" and how to avoid it (2020)
Author : isomorph
Score : 121 points
Date : 2022-09-15 09:09 UTC (13 hours ago)
(HTM) web link (research.kudelskisecurity.com)
(TXT) w3m dump (research.kudelskisecurity.com)
| [deleted]
| rrobukef wrote:
| Rejection sampling moves the information leak from bias to time.
| As always don't use your own cryptography in production.
| cronokirby wrote:
| If you have a random stream of bits, and you use rejection
| sampling to extract a value from that stream, then you don't
| reveal any information about the value. At most, you reveal
| information about the stream prior to the value you chose, but
| each bit of a secure RNG should be unrelated to all prior bits,
| so this is not an issue.
| omegalulw wrote:
| But you do? You expose some information on the range of the
| value via the time it took to sample assuming for example
| that the attacker knows the rejection sampling method in use.
| cronokirby wrote:
| That is true.
|
| That said, there are few situations where the modulus being
| used is not a public parameter of a protocol, and it is
| very difficult to perform operations with a secret modulus
| in constant-time, as your comment points out.
|
| You'll always be able to get an approximate guess of the
| size of the modulus too, since larger moduli will need more
| registers to represent data.
| some_furry wrote:
| We don't consider "leaks the length of a SHA256 hash" to
| be a valid timing attack in most protocols for similar
| reasons (i.e. it's public knowledge).
|
| When developers encounter timing attacks in their code,
| they often invent really dumb ways to side-step the
| length "leaking".
|
| This might be understandable if it was a MAC then Encrypt
| protocol with PKCS padding (hello lucky13), but instead
| this comes up in the context of "validate this HMAC-
| SHA256 tag for our JWT-like protocol".
| LudwigNagasena wrote:
| A cleaner example without bit masking:
| https://github.com/openbsd/src/blob/master/lib/libc/crypt/ar...
| fanf2 wrote:
| I was hoping for a mention of Daniel Lemire's nearly-
| divisionless algorithm for unbiased sampling. I recently
| replaced BIND's copy of OpenBSD arc4random_uniform() with
| Lemire's algorithm https://gitlab.isc.org/isc-
| projects/bind9/-/blob/main/lib/is... and I blogged on the
| subject a couple of times https://dotat.at/@/2020-10-29-nearly-
| divisionless-random-num...
| https://dotat.at/@/2022-04-20-really-divisionless.html
| anomalroil wrote:
| Lemire's technique is really nice, in general a good thing to
| learn about, since it's a bit mind bending how it's playing
| with intervals. Sadly last time I benchmarked it in code on
| x86-64 for cryptographic purposes, it wasn't faster than
| rejection sampling, or just using a large value and a modulo
| reduction: in all cases what is actually taking a lot of time
| is the call to get good quality randomness out of a CSPRNG,
| the rest being negligible in comparison.
| anomalroil wrote:
| Notice that nowadays, unlike 2 years ago, people usually
| recommend to use the last technique I presented there in the last
| paragraph before the Conclusion. Which is to generate a random
| value that is _big enough_ , so that n-log(p) > 128 so that the
| bias will be too small to be exploitable in practice. It's
| simpler and unlike rejection sampling ensures your code cannot
| fall into an infinite loop in case your PRNG is broken. (I'd
| argue you might want your code to fail in that case anyway, but
| YMMV.)
| stephencanon wrote:
| The other virtue of this technique is that some of the more
| popular fast rejection sampling methods (e.g. Lemire's "nearly
| divisionless") leak a small amount of information via timing
| side channel, because the divisionless fast path is _not_
| unbiased.
| tarakat wrote:
| > You shouldn't rely on these claims, because even 1 bit of bias
| on a 256 bit nonce value can be enough to attack certain
| cryptographic schemes!
|
| This seems like a very high bar for a random generator to clear.
| It also raises a question: would using a larger nonce size
| actually _increase_ risk, if the additional bits were biased?
| barsonme wrote:
| With some schemes, like ECDSA, you can't use a larger nonce
| since the nonce is a field element.
|
| In general, you shouldn't need to worry about it unless you're
| using a broken CSPRNG or a bad cryptography library. And some
| libraries will try and work around bad RNGs:
| https://cs.opensource.google/go/go/+/refs/tags/go1.19.1:src/...
| jstanley wrote:
| It _seems_ like the general case answer is "no, using a larger
| nonce does not increase risk".
|
| Otherwise an attacker could just imagine that instead of a
| 256-bit nonce, the nonce was actually 257 bits long but the
| first bit is always 0.
| barbegal wrote:
| The claim "even 1 bit of bias on a 256 bit nonce value can be
| enough to attack certain cryptographic schemes" is true but there
| have been no practical attacks against 256 bit secured schemes.
| And the example of the mod 107 issue only reduces the amount of
| bits from 6.74bits to 6.71bits (a reduction of 0.4%) so hardly
| worth worrying about in the real world.
| 10000truths wrote:
| If you need to generate multiple such random numbers, an
| alternative way to resolve modulo bias with minimal entropy waste
| is to batch the random number generation. For example, you can
| generate 6 random integers in [0, 100) by generating a random
| integer in [0, 100^6) and performing modulo reductions on the
| result to get your 6 integers. 100^6 is slightly less than 256^5,
| so if your RNG works in units of bytes, then you can use 5 bytes
| to generate 6 integers in [0, 100) instead of 6 bytes.
| [deleted]
| xxpor wrote:
| Everything here is focused on cryptography, but there's other
| common issues that can fall into this trap too. For example,
| naive implementations of a hash function for a basic array +
| linked list hash table. If you generate a hash, and then modulo
| that to pick a hash bucket, you could end up with a biased
| distribution across the buckets, and customers complaining your
| performance varies wildly based on the input value.
|
| Just more things to think about :)
| kelnos wrote:
| Isn't that why most tutorials on writing hash tables that use
| this bucketing method recommend using table sizes that are
| powers of two? I guess most tutorials don't exactly explain
| _why_ , though, so someone who doesn't understand could decide
| to not use a power of two without understanding why that's bad.
| xxpor wrote:
| There's another reason why (mathematically related) to use a
| power of 2 as well: it makes the modulo a simple and fast
| mask, instead of a slow division.
| nraynaud wrote:
| I knew the concept (every post on "I want a randome number in a
| range" mentions it), but I didn't know the name, thanks.
| astrobe_ wrote:
| I remember very old versions of _man random_ [1] warning against
| that sort of thing and recommending to pick the high (or low)
| bits of the random value. Probably the piece of advice was not
| correct.
| morelisp wrote:
| Picking different bits doesn't solve modulo bias.
|
| This is more likely a memory of the warning concerning the low
| quality of the randomness in the low bits (e.g. in LCGs where
| it always alternates in the lowest bit), or the fact the high
| bits of rand(3) were often zero due to a small RAND_MAX.
| eisbaw wrote:
| (r_dist * n) / <max value of r_dist> is better than r_dist % n
| that linear scaling should mitigate such bias from modulo bias.
| yorwba wrote:
| That just moves the bias to other numbers.
|
| In the example of reducing a byte-sized random value modulo
| 107, the bias is that 0 can be generated by three different
| possible inputs (0, 107 and 214), while 42 can only be
| generated by two (42 and 149; 256 is just out of range), so 0
| ends up being 50% more common than 42 in the long run.
|
| With your proposed scheme, 0 can be generated by three possible
| inputs again (0, 1 and 2), while 1 can only be generated by two
| (3 and 4), so 0 ends up being 50% more common than 1 in the
| long run.
| exabrial wrote:
| Interesting. In Java, there's random.nextInt(max). Curious if
| that takes this into account.
| brazzy wrote:
| I have zero doubt that it does. This is extremely high
| visibility code that has been around for decades.
|
| And I also have zero doubts that there are a bunch of
| reimplementations that don't, from assclowns who don't trust
| libraries on general principle.
| [deleted]
| anomalroil wrote:
| It does but it does not! java.util.Random is not a CSPRNG at
| all and is terrible, so even tho the nextInt() method is
| using rejection sampling, it's still producing biased values
| and also completely fails to be "unpredictable" because
| java.util.Random is weak and predictable.
| [deleted]
| the_af wrote:
| Interesting! How come this wasn't fixed? Nobody noticed, or
| is it that java.util.Random is not meant for serious
| cryptographic use?
|
| I know there are other parts of the Java standard lib that
| are so terrible [1] that people for years have recommended
| not using them, like anything with dates and timezones...
|
| ---
|
| [1] or used to, haven't kept up with the latest Java
| versions. Maybe they fixed it.
| brazzy wrote:
| > is it that java.util.Random is not meant for serious
| cryptographic use?
|
| Indeed it never was. SecureRandom (a subclass of Random
| in a different package, grouped with other cryptography
| related functionality) was introduced in Java 1.1 in
| 1997.
|
| > I know there are other parts of the Java standard lib
| that are so terrible [1] that people for years have
| recommended not using them, like anything with dates and
| timezones...
|
| The original Java date classes (also from the 1.1 era)
| were functional and correct, but badly designed. A modern
| time API was introduced in Java 8 in 2014.
| the_af wrote:
| > _The original Java date classes (also from the 1.1 era)
| were functional and correct, but badly designed. A modern
| time API was introduced in Java 8 in 2014._
|
| I'm pretty sure they were neither correct nor functional.
| I've read pretty detailed examples of everything they got
| fundamentally wrong, which was a lot. Wrong, not as "this
| is cumbersome to use" but as in "this fundamentally
| misunderstands what dates are at a conceptual level, and
| produces wrong results". Unfortunately, I would have to
| google it now.
|
| All I can find right now is Jon Skeet's summary [1],
| which leans more to the "avoid using java.util.Date
| because it's one giant pitfall" side of the argument.
| Though it does mention some fundamental problems with it.
| However, this is not the article I found back in the day,
| which was both more comprehensive and more critical.
|
| > _A modern time API was introduced in Java 8 in 2014._
|
| I seem to remember even at that time people were still
| not recommending Java standard date classes and
| functions, but I may be misremembering. Or were all the
| replacements like joda time from a previous era?
|
| ---
|
| [1] https://codeblog.jonskeet.uk/2017/04/23/all-about-
| java-util-...
| kelnos wrote:
| > _Or were all the replacements like joda time from a
| previous era?_
|
| Joda time essentially is the new standard Java date/time
| implementation. Joda was used as an out-of-stdlib proving
| ground to try out ideas and get feedback. The new Java
| date/time API is based on the learnings from Joda time.
|
| My understanding from use of the new (I shouldn't say
| "new"; it's been there since Java 8) date/time classes
| are that they're pretty good at representing dates,
| times, calendars, etc., but I wouldn't claim to be an
| expert on such matters.
| brazzy wrote:
| The Jon Skeet article you're citing mentions only
| java.util.Date, and it is in fact absolutely fair to say
| that this class "fundamentally misunderstands what dates
| are at a conceptual level" in all of its functionality
| that goes beyond being a thin wrapper around a number of
| milliseconds.
|
| But note that java.util.Date is from Java 1.0. The
| mistake was recognized and in Java 1.1 we got
| java.util.Calendar and friends, which is, as far as I
| know, does qualify as "functional and correct" but was
| ugly and annoying to use, as well as missing many useful
| features (like the concept of a timespan, or even just a
| date without a time-of-day).
| the_af wrote:
| (I assure you I'm not picking on you or your answer!)
|
| I seem to remember the article which I unfortunately
| cannot conjure up right now demolished java.util.Calendar
| as well (again, as in "wrong", not as in "difficult to
| use"). I'm pretty confident of this because I never used
| Java 1.x at work, and when I read the article it was
| relevant to my day job.
|
| Alas, memory fails me.
| edflsafoiewq wrote:
| The contract of Random requires determinism, so the
| algorithm can never be changed.
|
| > If two instances of Random are created with the same
| seed, and the same sequence of method calls is made for
| each, they will generate and return identical sequences
| of numbers. In order to guarantee this property,
| particular algorithms are specified for the class Random.
| Java implementations must use all the algorithms shown
| here for the class Random, for the sake of absolute
| portability of Java code
| googlryas wrote:
| It isn't terrible, it is just not suitable for
| cryptographic applications. I suspect most of the time
| that people want a random number up to some value, it is
| _not_ for cryptographic purposes.
| the_af wrote:
| > _It isn 't terrible_
|
| Mind you, I didn't call it terrible. However, the comment
| I was replying to -- coincidentally the author of TFA on
| modulo bias we are discussing -- did:
|
| > _java.util.Random is not a CSPRNG at all and is
| terrible, so even tho the nextInt() method is using
| rejection sampling, it 's still producing biased values
| and also completely fails to be "unpredictable" because
| java.util.Random is weak and predictable._
|
| and
|
| > _[...] using fast, but bad random generators such as
| Java 's Random was shown to be an issue multiple times in
| the past already for things such as Monte Carlo, and so
| on, not just for things related to security._
|
| Of course, "CSPRNG" means "cryptographically secure", but
| his comment made me think he considers it a terrible
| implementation regardless.
| brazzy wrote:
| >his comment made me think he considers it a terrible
| implementation regardless.
|
| Different people have different standards for what is
| "terrible". There are much better PRNGs that are just as
| fast. You should probably avoid java.util.Random if you
| care about the quality of the randomness it produces.
|
| It's good enough for games that don't involve real money.
| the_af wrote:
| Well, the person who claims it's terrible is the author
| of the very thorough article we are discussing.
|
| > _You should probably avoid java.util.Random if you care
| about the quality of the randomness it produces._
|
| That's pretty much the definition of "bad" ;) If you
| don't care about quality, I assume you could use a string
| of numbers without any randomness at all.
|
| As for games, it depends. I'm sure one could make a guess
| it's bad for some kinds of non-betting games and
| simulations as well.
| dwheeler wrote:
| Historically one of the first uses for computers was
| Monte Carlo simulations. In such simulations it's often
| important to be able to deterministically _recreate_
| sequences of "random" numbers. Thus, in almost all
| computer programming languages, "Random" means
| deterministic random values. This is a wide convention
| followed by practically _every_ programming language.
|
| If you want cryptographically secure random numbers, you
| typically call a function with a different name, one that
| has "secure" or "crypto" in a name somewhere (e.g., in
| the function or containing module/package).
|
| This is a convention from the 1950s and has been
| consistent ever since. That naming-convention ship sailed
| before many of us were born.
| anomalroil wrote:
| Speaking of Monte Carlo, here's a paper about how using
| bad randomness can lead to wrong simulation:
| https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2992609/ (or
| should I call them _biased_ ones?)
| turtledragonfly wrote:
| And as a counterpoint, if you use a well-chosen
| deterministic sequence, it can make some Monte Carlo
| methods produce better results faster than true
| randomness would (quasirandom sequences and all that).
|
| So, interestingly, "bad randomness" =/= deterministic.
|
| (not that you were claiming that, just adding this
| subtlety to the mix)
| the_af wrote:
| > _Thus, in almost all computer programming languages,
| "Random" means deterministic random values._
|
| Yes, I know how pseudo RNG works.
|
| I just didn't realize this ran counter to doing secure
| crypto, so thanks for the explanation anyway :)
| anomalroil wrote:
| As you said: Java's Random is not meant for serious
| cryptographic usage, it's meant to be super fast. You
| have SecureRandom instead for cryptographic usage, and
| it's noticeably slower. Interestingly, using fast, but
| bad random generators such as Java's Random was shown to
| be an issue multiple times in the past already for things
| such as Monte Carlo, and so on, not just for things
| related to security.
| brazzy wrote:
| That's why we have
|
| public class SecureRandom extends Random
| [deleted]
| makobado wrote:
| This is going to be fun at work
| [deleted]
___________________________________________________________________
(page generated 2022-09-15 23:01 UTC)