hngopher.com

       [HN Gopher] Need a PRNG? Use a CSPRNG
       ___________________________________________________________________
        
       Need a PRNG? Use a CSPRNG
        
       Author : tczajka
       Score  : 32 points
       Date   : 2023-11-25 13:02 UTC (9 hours ago)
        
 (HTM) web link (sortingsearching.com)
 (TXT) w3m dump (sortingsearching.com)
        
       | boulos wrote:
       | It's been over a decade since I looked at this, but in real-time
       | Monte Carlo path tracing the speed of your RNG is material. In
       | simple enough scenes, probably something like 5-10% IIRC when
       | using MT.
       | 
       | I appreciate that you had a note about performance, but you were
       | pretty quick to dismiss that there's a large quality difference
       | between rand() and any modern PRNG. Plenty of these are fine
       | enough and paying for a CSPRNG with 4x or more the perf hit may
       | be a mistake.
       | 
       | I _do_ agree that if you don 't know what you're going to use
       | them for or don't know if you have a performance problem that you
       | should just use a CSPRNG. The potential for someone to
       | accidentally cause a security bug is not worth it. If they decide
       | to do so intentionally, that's on them.
       | 
       | tl;dr: maybe "Need a PRNG? Start with one that's
       | cryptographically secure"?
        
         | nullc wrote:
         | There are things with rand() like performance but much better
         | properties.
         | 
         | for randomized search I'm a fan of xoshiro256++ periodically
         | reseeded by rdrand.
         | 
         | Obviously if performance isn't a concern, a CSPRNG should be
         | the default. But it often is a concern.
         | 
         | Traditional 'fast' PRNGs have simple algebraic structure that
         | is absolutely known to cause wrong results, they're not worth
         | it compared to modern fast PRNGs.
         | 
         | (I wouldn't ever use MT today, it's not on the pareto-frontier
         | of performance vs quality and has a bad cache footprint)
         | 
         | > tl;dr: maybe "Need a PRNG? Start with one that's
         | cryptographically secure"?
         | 
         | I agree but: I think if people do that, they'll often find that
         | their whole process is now 20% slower than when they used
         | rand() and then go back to rand. So I think it's also important
         | to make people aware of fast generators with better properties.
        
         | fanf2 wrote:
         | There are several PRNGs that are faster and more random than
         | Mersenne Twister.
        
         | tczajka wrote:
         | I can just about believe that this kind of scenario _might_ be
         | a very rare exception where the performance difference is
         | relevant and you might want to do something else.
         | 
         | Still, color me skeptical.
         | 
         | "In simple enough scenes"? OK perhaps, but we don't want to
         | only run ray tracing for very simple scenes. Simple scenes are
         | those cases where you probably don't need much calculation, so
         | does it really matter that 5-10% of that simple calculation is
         | RNG? It's the complicated computation-heavy scenes that matter!
         | 
         | Also, one would have to see the code to judge this. For
         | instance, maybe that code was wasting random bits a lot?
        
           | nullc wrote:
           | In the sorts of searches I often do, like fitting integer
           | solutions to fast function approximations, searching for
           | error correcting codes, and randomized SAT-like problems, or
           | simulating usage of data structures for benchmarking them
           | chacha is slow enough to meaningfully influence (even
           | dominate) the runtime.
           | 
           | I wouldn't want to use rand() because it's extremely likely
           | that it will produce actually bad results.
           | 
           | I agree one should use a CSPRNG like chacha (or rdrand) if
           | its runtime is insignificant but then the question is: whats
           | the best choice when it's not insignificant? Using rand is a
           | bad idea.
        
       | nicole_express wrote:
       | I'm going to keep using quick insecure PRNGs for my NES games
       | rather than write a 6502 implementation of a CSPRNG... I guess
       | this isn't something that's safe to assume for all game
       | development by any means, but in my case if someone really can
       | predict the future actions based off of visual observation they
       | deserve the slight advantage they get.
        
         | mcraiha wrote:
         | I hope you have already watched "Dragon Warrior by
         | NEScardinality" that shows exactly what slight advantage is
         | https://www.youtube.com/watch?v=Bgh30BiWG58&t=428s
        
           | SuchAnonMuchWow wrote:
           | This exploit the fact that the game use a PRNG instead of a
           | true RNG, and can work regardless of whether the PRNG is a
           | CSPRNG or not.
        
       | barathr wrote:
       | Even simpler than implementing a cryptographic PRNG yourself,
       | just use /dev/urandom, which will provide an infinite, non-
       | blocking stream of cryptographically-strong pseudorandom bytes.
       | All platforms currently have known-good implementations of
       | /dev/urandom -- Linux uses ChaCha and MacOS and BSDs use Bruce
       | Schneier's Fortuna.
        
         | steponlego wrote:
         | They weakened it recently in Linux again. For over a decade
         | there was a badly bugged PRNG in the Linux kernel, it was
         | discovered and replaced with a more costly one which worked
         | great. Then, only a short time ago, they replaced that with one
         | of... shady provenance. You're better off writing your own PRNG
         | on that platform IMHO.
        
           | fanf2 wrote:
           | Jason Donenfeld (author of Wireguard) replaced Linux's SHA-1
           | based PRNG (remember, SHA-1 is cryptographically broken) with
           | BLAKE2. What is shady about it?
           | 
           | You can't get cryptographically secure random numbers without
           | platform support, so it's really bad to tell people to avoid
           | the kernel CSPRNG.
        
             | steponlego wrote:
             | I simply don't trust NSA people and those who take their
             | money. Why would you? We've seen nothing but shady moves
             | from them in this space.
        
               | tptacek wrote:
               | What are you talking about? Jason Donenfeld is the author
               | of WireGuard, the extraordinarily popular VPN protocol
               | that cannot use NIST cryptography (it does no
               | negotiation, and is built on a version of Noise that uses
               | ChaPoly and 25519). The change that was just described to
               | you was a shift from NIST cryptography to non-NIST
               | cryptography.
        
               | schoen wrote:
               | > that cannot use NIST cryptography
               | 
               | Do you mean as a matter of Donenfeld's engineering
               | decisions (that those algorithms are unavailable in
               | WireGuard)?
        
         | fanf2 wrote:
         | /dev/urandom is _really_ slow compared to a userland CSPRNG,
         | though. And if you are doing simulations or fuzz testing, you
         | need to be able to seed your PRNG to get reproducible results.
        
       | isilofi wrote:
       | For many kinds of Monte Carlo algorithms, CSPRNGs are stupidly
       | slow. The author compares two handpicked examples of a fast
       | CSPRNG and a very slow PRNG, arriving at a factor of 4. In
       | practice, e.g. comparing to very simple stuff like multiply-add
       | RNGs, it is more like a factor of 4000. Only to then claim that
       | "But that would only be true if generating random bits was the
       | hot spot, the bottleneck of your program. It never is in
       | practice."
       | 
       | E.g. for the usual example of Monte Carlo integration, you pick a
       | point randomly, usually by running your RNG for both coordinates.
       | Then you evaluate your characteristic function with that point as
       | an input. Very often, the function will have a runtime that is in
       | the range of a few hundred FPU instructions or less. Comparable
       | to the evaluation of your CSPRNG. So in the end, you usually get
       | a 40 to 50% faster runtime by just using a slow PRNG instead of a
       | CSPRNG. Not to mention the few extra percent you will gain with a
       | fast PRNG.
       | 
       | And it doesn't stop there. CSPRNG libraries are often optimized
       | to be side-channel free and cryptographically safe. Even if you
       | were to use a CSPRNG, you are leaving performance on the table by
       | using stuff with security properties you will never ever need,
       | that can easily be optimized out for a 30% gain in the CSPRNG
       | parts.
       | 
       | And yes, for simulations a few percent points are relevant.
       | Thoses "few" percents are maybe days in runtime, thousands of
       | currency units in power and hardware cost and tons of CO2 in
       | pollution.
        
         | tczajka wrote:
         | It's not true that I cherry-picked a slow PRNG for my 7 GB/s
         | number. In fact I selected the second-fastest PRNG on that page
         | (because it's popular)! The fastest one is 8 GB/s. PCG32 is 3
         | GB/s.
         | 
         | Your 4000x factor speed up for a linear-congruential generator
         | is just a completely false number.
         | 
         | Yes I did pick ChaCha20 for its speed -- it's designed for
         | speed!
         | 
         | "A few hundred FPU instructions" in your Monte Carlo is not
         | comparable to generating a number with ChaCha20. If you need a
         | few hundred FPU instructions per number, you will be running a
         | lot slower than 2 GB/s. ChaCha only requires a few cycles per
         | byte.
         | 
         | I agree that you can optimize out the side-channel free part
         | for non-crypto-purposes. That's a good thing! I recommend doing
         | that.
        
           | josephg wrote:
           | I was curious. The rust rand library has implementations for
           | a lot of rngs - prngs and csrngs. All with high quality
           | implementations as far as I can tell.
           | 
           | They agree with your numbers: their fast, high quality prng
           | implementations are about 8gb/sec and chacha is about
           | 2gb/sec:
           | 
           | https://rust-random.github.io/book/guide-rngs.html
        
             | espadrine wrote:
             | There are ways to make much faster PRNGs; I designed one
             | that reaches 53 GB/s[0] (0.06 cycles per byte) with better
             | quality than those listed in the article.
             | 
             | But I agree with tczajka: good engineering practice ought
             | to be to start with a CSPRNG. First make it work, then make
             | it fast. With truly random data, you know your algorithm
             | works, and can then check if there is a quality loss when
             | switching away. The performance of CSPRNG is so optimized
             | that it is seldom the bottleneck anyway.
             | 
             | [0]: https://github.com/espadrine/shishua#comparison
        
               | josephg wrote:
               | Cool! I wonder if the rand crate would be interested in
               | adding this. I love all their sampling functions -
               | shuffle, choose, uniform, etc. But it looks like shishua
               | is way faster than any of their rngs.
        
           | pclmulqdq wrote:
           | The state of the art in super-fast PRNGs is about 0.3 cycles
           | per byte at the moment. I believe this is done with SIMD
           | versions of the xoroshiro algorithms right now. 0.3 cycles
           | per byte compared to "a few" cycles per byte is an order of
           | magnitude difference in throughput.
           | 
           | Here's the comparison from the maintainers of Julia:
           | https://prng.di.unimi.it/#shootout
           | 
           | Still, most crypto libraries are designed with extreme
           | performance in mind, and very few PRNG libraries are. You can
           | do a lot better than 0.3 cycles/byte and still beat the NIST
           | test if you try for speed.
        
             | hdevalence wrote:
             | I haven't paid close attention recently but that doesn't
             | seem that far off of performance available via (hardware
             | accelerated) AES?
             | 
             | Looking at https://eprint.iacr.org/2018/392.pdf , it seems
             | like:
             | 
             | - Intel CPUs can use AESNI to do AES at 0.64 cpb - AMD Zen
             | cores have two AESNI cores and can achieve 0.31 cpb -
             | Vectorized AES instructions (supposed to ship in Ice Lake
             | five years ago, but maybe a casualty of Intel's AVX512
             | mishaps) were expected to bring it down to 0.16 cpb
             | 
             | In some sense this isn't a "fair" comparison in that it's
             | fast because there's hardware acceleration, but that
             | doesn't really matter, the hardware is there so it might as
             | well be used.
        
               | pclmulqdq wrote:
               | Yeah, chacha was an odd choice because AES is a lot
               | faster on CPUs with acceleration. Just doing a few AES
               | rounds would be a pretty fast, good PRNG.
        
               | dpwm wrote:
               | For those interested in using AES with reduced rounds as
               | a PRNG, it is covered in the paper "Parallel Random
               | Numbers: As Easy as 1, 2, 3" by John Salmon et al.
               | 
               | https://www.thesalmons.org/john/random123/papers/random12
               | 3sc...
        
           | isilofi wrote:
           | Upon a closer look, do not trust the numbers on https://rust-
           | random.github.io/book/guide-rngs.html in any way, they are
           | clearly bogus and implausible. Their figure for their
           | "StepRNG" which is just a counter is 51GB/s. Their XorShift
           | RNG at 5GB/s, which is just a XOR and a shift is slower than
           | Xorshiro at 7GB/s, which is a xor, shift and rotate, 1 op
           | more. Both XorShift and Xorshiro should actually be of
           | comparable performance to a counter of the same width because
           | with modern CPUs, all those trivial bit operations like
           | shift, rotate and XOR are sub-cycle microops. And all three
           | of those should either be memory-bound and therefore of the
           | same performance, or quite a bit faster than memory-bound if
           | they just measure the in-register performance.
           | 
           | > Your 4000x factor speed up for a linear-congruential
           | generator is just a completely false number. > Yes I did pick
           | ChaCha20 for its speed -- it's designed for speed!
           | 
           | 1 Chacha20 block takes 20 rounds, each of which consists of 4
           | QR (quarter round) operations. A QR is 4 additions, 4 XORs
           | and 4 ROTLs, so 12 instructions on 32bit values. Multiply
           | that together and you arrive at 960 operations per block
           | (actually a handful more for the counter, maybe the round
           | loop and stuff like that, but not a lot), each block gives
           | you 16 uint32 values. So 60 instructions per uint32 or 15
           | instructions per byte.
           | 
           | A multiply-add generator takes only 2 instructions (you could
           | use fused-multiply-add if available, but i'll leave that out
           | as I left out sub-microop-rotate before, just to not
           | overcomplicate things) per uint32 or half an instruction per
           | byte. Yes, that is not yet a hyperbolic factor of 4000.
           | 
           | But then you'll have to use your random values. Since your
           | Chacha20 random number stream only comes in blocks, on many
           | CPU architectures, you will have all your registers full with
           | your resulting block. Meaning that for the subsequent
           | calculation, you have to store those random numbers somewhere
           | or throw them away, do your other calc, then load the
           | randomness again, etc. So you will always pay a penalty for
           | cache and memory accesses and you will always have
           | unnecessary register pressure. Even a L1 cache access will
           | cost you about 4 cycles of access latency, other cache levels
           | are far worse. Which means that it probably won't be 4000
           | yet, but a lot more.
           | 
           | Now we'll arrive at "yes, but somebody said ChaCha20 is
           | roughly 1 cycle per byte!". Which isn't wrong, but you have
           | to read carefully: 1 _cycle_, not 1 _instruction_. That
           | benchmark relies on calculating multiple ChaCha20 blocks in
           | parallel, because a modern CPU has multiple execution units
           | and thus can execute multiple independent instructions within
           | one cycle. There is also SIMD, where one instruction can
           | operate on multiple pieces of data. But to be fair, we also
           | need to do this with our multiply-add-RNG. And where I can
           | have 16 registers of 32bits calculating one ChaCha20 block, I
           | can also have 16 of the same 32bit registers calculating 16
           | multiply-add random numbers in parallel.
           | 
           | Thus giving us 60 cycles per ChaCha20 uint32 vs. 0.125 cycles
           | (2/16) per multiply-add uint32. That is a factor of 480, not
           | taking possible memory or cache penalties into account,
           | because that really depends on the computation between the
           | randomness steps. Still not 4k, I admit, that was hyperbole.
        
       | glitchc wrote:
       | An alternative to the post is to take any PRNG's output and
       | calculate the SHA3 hash of it, then use the hash as the random
       | number. If SHA3 is not available, use AES. In either case, all
       | statistical weaknesses and side-channel attacks will be
       | eliminated.
       | 
       | The platform is far more likely to have support for either of
       | those in the system libraries, than a bespoke CSPRNG, which would
       | need to be packaged with your app/game.
        
         | josephg wrote:
         | That might be convenient so you don't have to go fishing for
         | 3rd party libraries - which is especially a problem in C/C++.
         | But I expect the result will be much slower than using an
         | optimized csrng like chacha.
        
           | glitchc wrote:
           | Well, ChaCha is actually a hash function. Specifically it was
           | a candidate for SHA3 but was beaten by Keccak in the final
           | spec. Additionally, aes-ctr-128 has the same performance as
           | ChaCha (1.5 GB/s).
           | 
           | All I was trying to illustrate was that in general, pairing a
           | PRNG with a block cipher or hash function is sufficient to
           | create a CSPRNG, and any developers worried about their PRNGs
           | can couple rand() with a readily available cipher/hash in the
           | system libraries. After all the blog is explicit about not
           | requiring a cryptographically secure RNG for most
           | applications.
        
         | alright2565 wrote:
         | Would be better to simply use a counter and a random seed in
         | that case. Who knows when your prng will loop around, but your
         | counter will always loop at 2^32.
         | 
         | At which point, if you use AES, you've just implemented the
         | AES-CTR csprng that TFA suggests.
        
         | zuzun wrote:
         | The sponge design of SHA-3 makes it a CSPRNG basically. You can
         | keep draining it after you fed the data in.
        
           | glitchc wrote:
           | That's a great poin, even better.
        
       | ribit wrote:
       | The reason not to use CSPRNG is performance and efficiency. There
       | are many applications where you only need randomness, and don't
       | care about the resiliency provided by the CSRPNG. I mean, I'm not
       | going to run a cryptographically secure generator on a GPU just
       | to scatter a few rays for my raytracer.
       | 
       | IMO, suggesting that CSPRNG should be the default choice is a bad
       | engineering advice. This leads to software burning processor
       | cycles for no reason at all. I agree with the author that the
       | default PRNGs provided by many environments are of very poor
       | quality, but that's hardly a reason to choose something you don't
       | need.
        
       | josephg wrote:
       | My biggest use case for random numbers is for fuzz testing. I
       | fuzz test all over the place - any time I have some clear
       | invariants and some complex code to test, I'll generate random
       | data in a loop and make sure the invariants always hold. This
       | finds _so many bugs_.
       | 
       | But for this kind of work, a good prng is better. The reason is
       | simple: when I find a failing test, I can print out the seed that
       | generated that test. Then I can rerun just that one seed (so the
       | error shows up immediately) and debug. Every time I rerun my
       | program, a fixed seed means it will use the same input and fail
       | in the same way. Perfect.
       | 
       | High quality, modern PRNGs shouldn't have the kind of bugs that
       | cause the problem listed at the start of this article. I think
       | part of the problem is that C has so many ways to get random
       | numbers that it's hard to tell which are high quality, and on
       | which platforms.
       | 
       | Rust's rand crate is the poster child for me of what this looks
       | like done well. Here's a bunch of prngs and csrngs, implemented
       | and listed in a simple table for you to pick which one you want
       | to use. For each rng they show rough performance numbers and the
       | prngs have a rough quality guide:
       | 
       | https://rust-random.github.io/book/guide-rngs.html
       | 
       | And if in doubt, thread_rng() is always a good choice.
        
         | ngneer wrote:
         | I recommend you research CSPRNGs before arriving at a
         | conclusion. They, too, are seeded, which is helpful for
         | deterministically replaying a sequence, say for fuzzing.
        
           | josephg wrote:
           | Oh, TIL! And indeed chacha in Rust's rand crate can be
           | seeded. Cool!
           | 
           | https://rust-
           | random.github.io/rand/rand_chacha/struct.ChaCha...
        
       | worik wrote:
       | This is wrong for two reasons
       | 
       | Firstly
       | 
       | Reproducibility
       | 
       | Using "true" randomness sacrifices that
       | 
       | Secondly
       | 
       | Performance
       | 
       | In simulations you often need billions of these
       | 
       | There are cases for true randomness, it is a good thing it is
       | available those rare occasions5
        
         | OskarS wrote:
         | > Reproducibility
         | 
         | That's a very silly objection: CSPRNGs can be seeded just like
         | a PRNG. The "P" in both abbreviations is the key, it stands for
         | "pseudo".
        
         | pclmulqdq wrote:
         | CSPRNGs are not true randomness. They are PRNGs (with a seed
         | and all), but with cryptographic guarantees. That also means,
         | as a corollary, that they are generally good RNGs.
         | 
         | TRNG appliances can get very high throughput, too.
        
       | sneak wrote:
       | There are lots of cases where a PRNG is more appropriate than a
       | CSPRNG. Wiping disks is one of them, as the CSPRNG becomes the
       | bottleneck with large arrays. Testing i/o or network throughput
       | is another; you don't want your algorithm tainting the results.
       | 
       | I like the thrust of the article but if you're going to be
       | particular, you should also be correct. CSPRNGs are not suitable
       | replacements in 100% of cases.
       | 
       | Engineering is about trade-offs.
        
         | tczajka wrote:
         | If a predictable PRNG is sufficient for wiping disks, why do we
         | care that the data is random at all? How about writing 0, 1, 2,
         | ..., 255, 0, 1, 2, ... to consecutive bytes? Or -- generate 1
         | kb of good quality random data, and write the same 1 Kb
         | repeatedly? It's not clear to me what we're trying to achieve
         | in this scenario.
        
       | OskarS wrote:
       | The author is a bit too categorical, there are certainly use-
       | cases where both speed and code-size matters a great deal. Others
       | have mentioned Monte Carlo algorithms, but there are others: not
       | very long ago I was doing an effect in a shader that needed a
       | PRNG, it would have been lunacy to use a CSPRNG. High quality
       | statistical properties are not important at all, but speed and
       | instruction size is VERY important. My hunch is that this is
       | often true in game-dev: Minecraft probably should not use a
       | CSPRNG to generate its random world.
       | 
       | However, in general, I basically agree: for MOST tasks where you
       | need a random number generator, a CSPRNG is probably the right
       | choice and it's a reasonable "default" choice if you don't have
       | good reasons why it shouldn't be. It's a continual annoyance to
       | me that PRNG libraries usually don't include any solid CSPRNGs
       | (looking at you, C++ header <random>) when it's often the right
       | choice.
        
       | tromp wrote:
       | I recommend siphash24 [1] for its excellent statistical
       | properties and good performance. It's a light weight, almost
       | cryptographically secure, seeded hash, very popular in hashtable
       | implementations.
       | 
       | [1] https://en.wikipedia.org/wiki/SipHash
        
       | pyjarrett wrote:
       | > You can try running this code yourself and see if you get the
       | same answers. This experiment is repeatable.
       | 
       | OpenSUSE:                   483 0.305         484 0.293
       | 485 0.283         486 0.275         487 0.3         488 0.302
       | 489 0.304         490 0.274         491 0.274         492 0
       | 493 0.285         494 0         495 0.271         496 0
       | 497 0.289         498 0         499 0.294         500 0
       | 501 0.306         502 0         503 0.31         504 0
       | 505 0.275         506 0         507 0.285         508 0
       | 509 0.289         510 0         511 0.282         512 0
       | 513 0.301         514 0         515 0.266         516 0
       | 517 0.289         518 0         519 0.291         520 0
       | 521 0.309         522 0         523 0.325         524 0
       | 525 0.358         526 0         527 1         528 0         529 0
       | 530 0         531 0         532 0         533 0         534 0
       | 
       | That's wild.
       | 
       | I don't see this with mt19937, but I did see these funny nuggets
       | with unseeded `rand()` on Windows:                   255 0.3
       | 256 0         257 0.289         258 0.306         ...         511
       | 0.262         512 0         513 0.275
        
       | Vecr wrote:
       | I'm surprised I'm not seeing mention of the reduced round ChaCha8
       | version of ChaCha, that's what I'd recommend for Monte Carlo use.
       | It's available in Rust at
       | https://docs.rs/rand_chacha/0.3.1/rand_chacha/struct.ChaCha8... .
       | I recommend seeding it from a high quality random source, but it
       | could be fixed in your source code, generated with `openssl rand
       | -base64 32` or similar.
        
       | spott wrote:
       | This whole discussion seems to be summed up as "premature
       | optimization is the root of all evil, and using anything but a
       | csprng is a premature performance optimization"
       | 
       | There doesn't seem to be any good non-performance reasons to use
       | a regular prng that I can think of.
        
       | zeta0134 wrote:
       | But if you need a PRNG in a _big hurry_ , an LFSR is probably
       | fine.
       | 
       | https://en.m.wikipedia.org/wiki/Linear-feedback_shift_regist...
       | 
       | I use one on my NES projects that can compute a new 8bit PRNG
       | value in something like 65 cycles
        
       | e63f67dd-065b wrote:
       | There's one big scenario where PRNGs are good: reproducibility. I
       | have a lot of tests that are deterministic given a seed, and if
       | the test fails then I can grab the seed and replay the test.
       | 
       | CSPRNG is designed to be _cryptographically secure_ , and thus I
       | somewhat agree that it should be the default choice. But there
       | are many cases where reproducibility is a good thing, and that's
       | where you need seeded rngs.
        
         | Klaus23 wrote:
         | The same is true for CSPRNG. Same seed -> same output.
        
       | atq2119 wrote:
       | This is pretty reasonable, I think, but can it be applied to GPU
       | code?
       | 
       | The fundamental challenge of CSPRNGs is that they tend to have
       | high latency and each thread produces a lot of randomness at
       | once, while typically you only need a little bit at a time. So
       | either you throw away the extra bits, effectively wasting ALU, or
       | you keep them around in storage somewhere, which is expensive as
       | well.
       | 
       | I would love to see a CSPRNG that is designed to leverage
       | subgroup/wave ops. (Hmm, now that I think about it, perhaps
       | ChaCha can be spread reasonably across 4 lanes? That would give 4
       | dwords per thread and potentially cut latency quite a bit, much
       | better tradeoff overall for most GPU applications)
        
       | slow_typist wrote:
       | http://csundergrad.science.uoit.ca/courses/sim-and-modeling-...
        
       ___________________________________________________________________
       (page generated 2023-11-25 23:00 UTC)