[HN Gopher] Evolution of Random Number Generators
       ___________________________________________________________________
        
       Evolution of Random Number Generators
        
       Author : algui91
       Score  : 125 points
       Date   : 2021-04-30 09:11 UTC (2 days ago)
        
 (HTM) web link (www.johndcook.com)
 (TXT) w3m dump (www.johndcook.com)
        
       | TekMol wrote:
       | I am looking for a simple random number generator that fills a
       | given amount of slots.
       | 
       | Say it is initialized with "size=5" then it might output:
       | 
       | 3,5,2,1,4
       | 
       | Is there something like this?
       | 
       | It does not need much statistical resemblence to randomness. Just
       | look kind of random to the eye. And the code should be short. A
       | few lines of Javascript or so.
       | 
       | Maybe one approach might be to just loop through the sequence
       | (1,2,3,4,5) and xor the number with some other number? Maybe with
       | 0101010...?
        
         | Someone wrote:
         | Xoring won't work:
         | 
         | - xoring will only produce permutations that have period two,
         | and every element will be part of a 2-cycle.
         | 
         | - if your n isn't a power of two, it may produce numbers larger
         | than n
         | 
         | The set of 2-cycles will have a lot of structure, too (for
         | example, if your magic constant is even, all cycles will have
         | either two even numbers or two odd numbers; if it is odd, all
         | cycles will have an even and an odd number), but the above
         | should already be bad enough to drop that idea.
        
           | TekMol wrote:
           | if your n isn't a power of two,         it may produce
           | numbers larger         than n
           | 
           | This is the biggest problem.
           | 
           | The others are not such big problems as I don't need strong
           | resemblence to real randomness.
        
             | Someone wrote:
             | I can't look into your use case, but I'm not sure you
             | realize how spectacularly bad xoring with a fixed value is
             | as a way to generate a 'random' permutation.
             | 
             | For example, it will still alternate odd and even numbers.
             | 
             | Also, an adversary can derive the xor key from a single
             | sample, and predict the entire sequence from it.
        
         | jchook wrote:
         | The equivalence classes modulo some n form a partition of the
         | integers so you can use that to create a very efficient
         | solution with very little code.
         | 
         | Here is a neat explanation:
         | 
         | https://preshing.com/20121224/how-to-generate-a-sequence-of-...
         | 
         | If you need even better properties (eg cryptographically
         | secure) you can also look at PCG with k-dimensional
         | equidistribution:
         | 
         | http://www.pcg-random.org/index.html
        
         | amitp wrote:
         | Yes, xor can be used as part of this -- see the paper[1] and
         | code[2]. The idea is that you're generating a random _shuffle_
         | of a sequence, but you can do it in a way that _doesn 't_
         | require storing the entire shuffled sequence. You can ask
         | "what's in position 4" and it will return "item 1".
         | 
         | [1]: http://pcgworkshop.com/archive/mawhorter2019anarchy.pdf
         | [2]: https://github.com/solsword/anarchy
        
         | jlouis wrote:
         | If size is a prime, repeated multiplication on a generator
         | element (any element) will work if you compute `mod p`. But
         | it's not going to have nice random properties for the most
         | part. It's just a "permutation".
         | 
         | The reason it works is because the subgroup order generated by
         | the element has to be divisible in the group order. There's
         | only 1 and p. So unless it's trivial (i.e., the element 1),
         | it's going to be p and thus it generates the whole group in p
         | steps.
        
           | nestorD wrote:
           | From memory you don't need size to be a prime, what you need
           | is that it is _coprime_ with your step.
           | 
           | Thus, the easiest solution is to take a step-size that is
           | prime and different from you size (that works for any size).
           | Given the index of the current element, the next index would
           | be `(current_index + step) % size`.
        
         | thewakalix wrote:
         | Are you talking about generating a random permutation?
        
           | TekMol wrote:
           | I don't think so. Maybe a "random sequence" could be a name
           | for it. But not sure.
        
             | thewakalix wrote:
             | What I mean is, in your example, do you need to have
             | precisely the numbers 1, 2, 3, 4, and 5, but in any order?
             | Or is the constraint something else?
        
               | TekMol wrote:
               | Ah yes, you are right. In that sense, it is a random
               | permutation of the numbers from 1 to max.
               | 
               | Yes, when initialized with "size=5" the numbers in the
               | output must be precisely 1,2,3,4,5 but in random looking
               | order.
               | 
               | And I don't want to store the whole sequence. I want to
               | say gimmeTheNumberAtPosition(x) and it return just that
               | number.
        
               | e12e wrote:
               | I guess you could predictably enumerate the permutations,
               | so that you could lookup with parameters size/n,
               | position/i, seed/r - where r is the permutation.
               | 
               | For size=1, all parameters are 1, and only result is 1.
               | 
               | For size 2, r can be 1 or 2, naming the permutations
               | [1,2];[2,1] - and eg: rand_at(n=2,r=2,i=2) would return
               | 2, but i=1, would return 2.
               | 
               | I'm not sure how I'd implement this - I suppose it might
               | be possible to generate a predictable "walk" based on
               | n/r/i?
               | 
               | But for sizes less than, say, a million i would think
               | that a shuffled array would be easier?
               | 
               | r would need to be in the range 1..n!
        
               | resolvent wrote:
               | So you want a mapping of [1, n] to a permutation of [1,
               | n] without having to generate the list [1, n] and
               | shuffling it. An affine transformation modular n should
               | work. Instead of [1, n], look at [0, n). Find a value `a`
               | in [0, n) such that gcd(a, n) = 1 and pick a random
               | integer b in [0, n). Then the `random` number at each
               | position is `a * x + b mod n`.
               | 
               | This is simple and fast, but is not secure at all. You
               | can solve for a, b by solving the linear congruence. It
               | also does not generate every permutation of `n`. For n =
               | 5, only 20 sequences can be found out of 5! = 120.
        
               | johndough wrote:
               | The value `a` has to fulfill some more properties: https:
               | //en.wikipedia.org/wiki/Linear_congruential_generator#...
               | 
               | Since the "randomness" of the permutation is not that
               | great, I was looking for something better, but could not
               | find anything. The closest I got was https://en.wikipedia
               | .org/wiki/Xorshift#xoshiro_and_xoroshiro which only works
               | for powers of two. A workaround would be to choose the
               | next larger power of two and reject all random values
               | which are smaller than `n`, but that introduces
               | unpredictable latency and destroys the cool jump-ahead
               | feature.
        
               | resolvent wrote:
               | The LCG is the improved extension which causes you to
               | have to compute values x_0 = seed to x_n in order to
               | calculate x_{n+1}. What I am talking about is just a
               | mapping of index x -> f(x) which is what the OP seemed to
               | have wanted. In that case, `a` only needs to be
               | relatively prime. This cannot account for every
               | permutation of n since the number of values relatively
               | prime to n, the totient, has an upper bound of n, which
               | is the possible values for a. The number of values for b
               | is also n, so at most n^2 possible sequences are
               | generated, which is less than n! for n > 3.
               | 
               | For example, with n = 5: Let a = 3 and b = 2. x = [0, 1,
               | 2, 3, 4], a * x = [0, 3, 6, 9, 12], a * x + b = [2, 5, 8,
               | 11, 14], a * x + b mod n = [2, 0, 3, 1, 4]
        
               | CipherThrowaway wrote:
               | What you want is a cipher from 1..N to 1..N. This problem
               | is often approached as an encryption problem known as
               | Format Preserving Encryption. The Sometimes Recurse
               | Shuffle linked elsewhere in this thread is one solution
               | but the most foundational and accessible paper on this is
               | Black and Rogaway's "Ciphers with Arbitrary Domains" The
               | Generalized Feistel Network (Method 3) is fast and easy
               | to implement.
               | 
               | https://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf
        
               | FabHK wrote:
               | Generating random permutations is quite a different
               | problem from creating random numbers. The standard algo
               | for random permutations is the Fisher-Yates Shuffle.
               | 
               | https://en.wikipedia.org/wiki/Fisher-Yates_shuffle
               | 
               | ETA: As you don't want to store the permutation, you
               | might want to pick a number randomly from 1 to n!, and
               | then generate the permutation on the fly up to the
               | desired element, using the techniques outlined here:
               | 
               | https://stackoverflow.com/questions/29236556/how-can-i-
               | calcu...
        
               | SuchAnonMuchWow wrote:
               | Storing a number of the order of magnitude of n! will
               | take the same amount of memory than storing a permutation
               | of n elements, so what's the point ?
        
               | HelloNurse wrote:
               | If one really wants to do it the slow and hard way, it's
               | possible to compute a random permutation of n elements
               | one element at a time with n bits of storage (less with
               | fairly generic compression techniques): just remember the
               | set of numbers that have been produced so far and
               | sequentially pick one of the remaining ones. You'll waste
               | some combination of additional memory (for arithmetic
               | decoding state) and random bits (for rejected samples),
               | but materializing a large randomly permuted vector should
               | be slower.
        
               | FabHK wrote:
               | Good point. Storing a number of magnitude n! will take n
               | log n bits, and storing n numbers up to n will also take
               | n log n bit.
               | 
               | In other words, good old Fisher Yates, storing the
               | resulting permutation, and a simple lookup is probably
               | the way to go.
        
               | kinos wrote:
               | for JS you can probably just do
               | `[...Array(n).keys()].sort(()=>Math.random() - 0.5)` ?
        
               | matsemann wrote:
               | Having an unstable comparator might wreak havoc and maybe
               | not give a proper distributed space of possible results?
               | 
               | Edit: yes, here is an example (in the comments) of how
               | biased this is:
               | https://stackoverflow.com/a/18650169/923847
        
               | anderskaseorg wrote:
               | That is exactly the algorithm that Microsoft used to bias
               | its purportedly random "browser choice" screen. Don't use
               | it.
               | 
               | https://www.robweir.com/blog/2010/02/microsoft-random-
               | browse...
               | 
               | The Fisher-Yates shuffle is the right way to shuffle an
               | array in an unbiased way.
               | 
               | https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffl
               | e
        
               | isaac21259 wrote:
               | If your value of size is a power of two you could just
               | encrypt x with a block cipher where the block size is
               | equal to size and the key is your seed.
        
               | nightcracker wrote:
               | You can construct arbitrarily-sized permutations with
               | log(n) time random access using Sometimes-Recurse
               | Shuffle: https://eprint.iacr.org/2013/560.pdf.
        
         | vince14 wrote:
         | https://news.ycombinator.com/item?id=25497357
        
         | Ayesh wrote:
         | Probably looking for this:
         | https://en.m.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
        
           | TekMol wrote:
           | This seems to use an external random number generator.
           | 
           | I want the code to be complete and reproducible. So
           | gimmeTheNumberAtPosition(x) will always return the same for
           | the same x.
           | 
           | Also, FYShuffle is much more complex than I would like the
           | algo to be.
        
             | miked85 wrote:
             | Are you saying you want a reproducible random number
             | generator?
        
             | jhgb wrote:
             | Fisher-Yates is one of the simplest things out there IMO.
             | 
             | Isn't the Lehmer code the thing that you want? Then you can
             | specify your permutation with an integer and extract the
             | individual digits from it repeatedly.
        
               | TekMol wrote:
               | Fisher-Yates is one of the simplest         things out
               | there IMO
               | 
               | Just using "$n xor $something" would be simpler. I am
               | just not sure yet, what a good "$something" is.
        
               | jhgb wrote:
               | How does this guarantee the desired property of the
               | output being a permutation?
        
             | jimktrains2 wrote:
             | If you seed your random number generator, then it is
             | reproducible.
        
             | [deleted]
        
         | jamespwilliams wrote:
         | If you're using Go, you can use rand.Perm
         | (https://golang.org/pkg/math/rand/#Perm), which uses a Fisher-
         | Yates shuffle under the hood.
        
         | NotCamelCase wrote:
         | Have you looked up LFSRs yet?
         | https://en.wikipedia.org/wiki/Linear-feedback_shift_register
         | 
         | They are quite efficient for both HW and SW implementations.
        
         | wizeman wrote:
         | Look into linear feedback shift registers, they are very simple
         | and can produce random-looking permutations.
        
         | ascar wrote:
         | If you want a simple pseudo random sequence of N that just
         | looks random (pseudocode):                 int start =
         | getRandomInt(seed) % N;       double k = N / 1.618;       k = k
         | % 2 == 0 ? k - 1 : k; // k and N must be coprime       int
         | nextElement = k * start % N;       start++;
         | 
         | This will jump wildly across your sequence and visit all
         | elements (as k and N are chosen coprime). No need to store a
         | full array. Is it statistically random? Of course not, e.g. you
         | will never see two values close to each other right after
         | another.
        
           | anderskaseorg wrote:
           | Your algorithm fails to choose k and N coprime for about 19%
           | of N, such as N = 6, 10, 14, 15, 25, 27, 35, 36, 45, 54, 55,
           | 65, 66, 75, 84, 85, 90, 93, 95 ....
        
       | jedimastert wrote:
       | I'm reminded of Chris Wellon's "Prospecting for Hash Functions",
       | where he randomly generates hash functions and then runs them
       | through a couple of tests.
       | 
       | https://nullprogram.com/blog/2018/07/31/
       | 
       | Out of curiosity, is running the state through a hash a
       | reasonable rand strategy?
        
       | midjji wrote:
       | Recently noted that mt19937, mt19937_64 are much faster than the
       | standard c++ random generator. They are also possibly better with
       | regards to number distribution, but the performance difference is
       | gigantic on clang. The standard 32 bit
       | std::default_random_generator is almost 40x slower than the 64
       | bit mt19937_64 one across the board, from -o1 to -O3 march...
       | etc.
       | 
       | As all of them are also faster than old school rand, its worth
       | upgrading for the performance increase if nothing else.
        
         | FabHK wrote:
         | I think both the PCG family and the xoroshiro family are faster
         | still.
         | 
         | https://prng.di.unimi.it
         | 
         | https://www.pcg-random.org
         | 
         | ETA: https://github.com/lemire/testingRNG
        
           | ufo wrote:
           | Does anyone know what's the current consensus about PCG vs
           | xoroshiro. I remember that the xoroshiro author had several
           | complaints about PCG but I don't know ifhe is right or if he
           | was just being salty.
        
             | aDfbrtVt wrote:
             | There's a bunch of analysis written by the PCG folks
             | bashing Xoroshiro as well [1]. I think that for most normal
             | use cases with randomized state, Xoroshiro is quite good.
             | It only requires a single 64b add which is amazingly
             | efficient.
             | 
             | [1] https://www.pcg-random.org/posts/xoshiro-repeat-
             | flaws.html
        
         | rurban wrote:
         | MT is pretty slow. There are fast variants (SFMT), but why use
         | that when can have much better and much faster rngs?
         | 
         | https://rurban.github.io/dieharder/QUALITY.html
        
           | ascar wrote:
           | I don't know about all the other RNGs in this benchmark, but
           | MT prepares a dense/contiguous array of random numbers. The
           | whole computation can be SIMDified (often even by the
           | compiler), but extraction is (unless changed explicitly) a
           | single copy of a double.
           | 
           | If RNG speed matters and you need a lot of random numbers in
           | succession (and I assume these two assumptions correlate
           | strongly) editing MT to directly pull 4 or even 8 random
           | numbers at once (i.e. a `__m256 rand_m256()` interface) is a
           | huge performance gain.
           | 
           | wyrand (the top spot of the linked benchmark), doesn't have
           | this benefit. The computation can't be SIMDified and the
           | extraction is always a single value.
           | 
           | So I would take these benchmark with a grain of salt and take
           | a closer look at the specific situation for any application
           | where RNG speed really matters.
        
             | jhgb wrote:
             | Wouldn't a SIMDified implementation of other RNGs speed
             | them up by a comparable factor, making MT relatively slower
             | again?
        
               | ascar wrote:
               | If the next number depends on the previous number (like
               | with wyrand and many others) it can't be simdified.
               | 
               | You could simdify computation with different seeds tho,
               | which might be fine for many purposes. However at that
               | point it's also a custom implementation with a different
               | number sequence than the non-simd version, which isn't
               | the case with a SIMDified MT.
        
               | adgjlsfhk1 wrote:
               | Also xoshiro generators are enough cheaper than MT that
               | even if you couldn't simdify it, you could keep 4 and get
               | run the 4 together in SIMD to get 4 outputs way cheaper
               | than 1 SIMD MT.
        
               | galangalalgol wrote:
               | It looks like xorshiro was SIMDified in a paper linked in
               | abother post in this tree.
        
           | galangalalgol wrote:
           | Very useful! Thanks! Aren't there AES related intrinsics for
           | some architectures that can be really fast for rng? How do
           | they compare?
        
             | rainworld wrote:
             | https://github.com/google/randen#performance
        
       ___________________________________________________________________
       (page generated 2021-05-02 23:03 UTC)