[HN Gopher] Randomness extractors: making fair coins out of bias...
___________________________________________________________________
Randomness extractors: making fair coins out of biased coins
Author : Maro
Score : 61 points
Date : 2024-09-15 14:50 UTC (8 hours ago)
(HTM) web link (bytepawn.com)
(TXT) w3m dump (bytepawn.com)
| leeoniya wrote:
| made me think of, "bend entropy towards definable outcomes"
|
| https://m.youtube.com/watch?v=2QJjMwBAORw
| ur-whale wrote:
| Does this work with a guaranteed random bit output bandwidth ?
|
| These techniques (eg von neumann) all seem to suffer from that
| same problem, namely, they crank out uniformly distributed random
| bits from biased sources, but with no guarantee of how long you
| may have to wait for a bit to come out.
| cedilla wrote:
| If the coin is biased so that it lands on heads only 1 in a
| million times it will take a lot of coin throws to get some
| randomness out of it. This isn't a limit of the technique, just
| a consequence of the low entropy source.
| remram wrote:
| Does it take a fixed long time or a long random potentially-
| infinite time?
| big-green-man wrote:
| The latter (except that it doesn't always have to be long;
| the length of time per bit follows a gaussian
| distribution). If it were the former then randomness would
| be predictable in some sense. It's similar to the halting
| problem.
| remram wrote:
| I'm not sure what you mean by "randomness would be
| predictable". For example, it is possible to do this if
| you know the bias.
| wakawaka28 wrote:
| If you know the bias, then you can surely estimate the bit
| rate. Of course you could get very unlucky and wait a long time
| for bits, even with an unbiased underlying source. But that's
| the nature of these methods.
| ur-whale wrote:
| Or you could pump the heavily biased bit stream into a modern
| cipher and get a - for all intents and purposes - super high
| quality, completely unbiased random stream with the same
| bandwidth characteristics as the input.
|
| And, as a matter of fact, much faster than the input if
| required.
| Ar-Curunir wrote:
| Things are not so simple: how do you key the cipher? with
| the same randomness as the input string? If so, then that
| input distribution could be a bad one for the cipher, in
| that it might fail to generate a sufficiently scrambled
| output.
|
| This can be mitigated if you use a unkeyed hash function
| instead, but even for these we don't have any provable
| guarantees unless you assume the function is a random
| oracle, which is an ideal object that cannot exist.
| chongli wrote:
| Yeah, if we're starting with the "1 in a million" unfair
| coin then most of the input strings are going to be all 0
| or all 1 which means the cipher is going to give the same
| sequence every time. Not useful!
| LegionMammal978 wrote:
| For certain input distributions, a guaranteed output bandwidth
| is impossible. For instance, suppose that you had a function
| that, given a vector of _N_ integers chosen uniformly from
| (0,1,2), returns a single unbiased output bit. Then, exactly
| half of the 3^ _N_ input vectors would correspond to each
| output bit. But this is impossible, since 3^ _N_ is always odd,
| so this (0,1,2)-function cannot exist for any fixed _N_.
|
| Now, suppose that you had a function that turned _N_ input bits
| with a 1:2 bias into a single bit with no bias. Then you could
| use this to implement an (0,1,2)-function as above, by
| collapsing inputs 1 and 2 into input 1. Since such an
| (0,1,2)-function is impossible, the 1:2-biased function is
| similarly impossible.
|
| This argument holds for any i.i.d. input source that can be
| generated from a uniform-integer source with an odd number of
| possible values.
| Animats wrote:
| That's a good question. You cannot get a non-statistical
| guarantee of an output rate from the Von Neumann biased coin
| toss, because you can potentially get a long run of the same
| toss result, which stalls output bit generation. That might be
| a generic limitation of all similar de-biasing approaches. Has
| that been proven or disproven?
| LegionMammal978 wrote:
| See my sibling reply [0]: for many input distributions, you
| can never put a fixed upper bound on the number of inputs you
| need to get a single output bit.
|
| [0] https://news.ycombinator.com/item?id=41549648
| bjornsing wrote:
| > Interestingly, this is the best possible approach, whether we
| know the bias of the input bit stream or not.
|
| Would love to see a proof of that. It feels a bit unintuitive to
| me.
|
| (But on second thought that might be because I'm thinking of
| cases of correlated bits.)
| LegionMammal978 wrote:
| In general, I think you could show that an 'arithmetic coding'
| approach is optimal, if the distribution of each bit given the
| previous bits is known. From there, you'd just have to show
| that the von Neumann approach for i.i.d. bits is no worse on
| average.
| Darmani wrote:
| This is correct, and I came to the comments to say the same
| thing. It takes some work to implement without arbitrary-
| precision floating point, but arithmetic coding can make use
| of the full entropy in the input stream, whereas the approach
| in the article discards a lot of information.
| atq2119 wrote:
| It's obviously false. If you know that the bias is 0.5, you can
| just pass the input stream through unmodified. You can also
| construct better codes for other biases.
| ralegh wrote:
| Depends how you define best, I assume best in the article
| means 'output is close to 50/50 and independent', in which
| case the 01/10 solution is already optimal given the
| assumptions.
|
| If you define best as the most efficient at converting inputs
| to outputs as well as the output being 50/50 then there are
| better ways such as your example.
| contravariant wrote:
| It's worded quite badly, but I think it's the best you can do
| given 2 independent bits, except for the rare case where 00
| and 11 are equally likely.
| sevenoftwelve wrote:
| The article is interesting, but it misses the most practical and
| unambiguously safe way to generate streams of random data: Use
| cryptography.
|
| To generate a stream of random data, use a hash function with
| arbitrary-length output (XOF) such as blake2x[^0] or
| shake256[^1]. Make sure your key contains at least 256 bits of
| entropy. Absolutely never use a key with less than 128 bits of
| entropy.
|
| Since it's impossible to know how much entropy there is in a key,
| you probably want to use something like the Fortuna RNG[^2].
| Substitute the sha2/AES based construction for your XOF. Bruce
| Schneier designed Fortuna back when XOFs were harder to come by.
|
| If you want more performance, you can use blake2 to compress your
| input seed into 256 bits and generate the random stream using
| chacha20[^3].
|
| All of this is usually handled by the Linux kernel[^4], so it's
| best to just use the getrandom(2)[^5] system call or just read
| from /dev/urandom[^6]. If you are writing a Rust program, you can
| use the rand[^7] crate, which uses a mixed approach reading a
| seed from the operating system and expanding it in-process using
| chacha[^8]. This is a valid strategy.
|
| I am omitting some subtleties[^10] about mathematical definitions
| of randomness extractors as used by the author of the article.
| When you are using a cryptographic approach, you are dealing with
| a complexity-theory based security notion[^9], which does not
| precisely equate to creating a stream with a specific amount of
| entropy. Everywhere - except for a physics or a math paper
| dealing with information theory - I would call this a
| technicality. For most intents and purposes, cryptographic
| security notions are the most real-world robust conceptions of
| randomness available.
|
| [^0]: https://www.blake2.net/
|
| [^1]: https://csrc.nist.gov/pubs/fips/202/final (shake256 is part
| of the SHA-3 standard)
|
| [^2]: https://www.schneier.com/academic/fortuna/
|
| [^3]: https://protonvpn.com/blog/chacha20
|
| [^4]: https://lwn.net/Articles/884875/
|
| [^5]: https://man.archlinux.org/man/getrandom.2
|
| [^6]: https://man.archlinux.org/man/urandom.4
|
| [^7]: https://crates.io/crates/rand
|
| [^8]: https://docs.rs/rand/0.8.5/rand/rngs/struct.StdRng.html
|
| [^9]:
| https://en.wikipedia.org/wiki/Security_of_cryptographic_hash...
|
| [^10]: In particular, PRFs are not guaranteed to output tokens
| with a certain amount of entropy - if I recall correctly -
| because they can map two inputs to the same output.
|
| ---
|
| I am the main author of the Rosenpass[^11] post-quantum secure
| key exchange for WireGuard. My expertise comes from developing
| this protocol, as well as a couple of years of engagement with
| the real-world cryptography community and from my own scientific
| research on cryptography and secure implementations of
| cryptography.
|
| [^11]: https://rosenpass.eu/
| travisjungroth wrote:
| It would be interesting to combine this with something that
| detects bias on a non-deterministic stream. So in one shot, it
| takes a stream of unknown bias and emits an unbiased stream. The
| closing paragraph says that's impossible, but the tradeoff is you
| are only sure the output is unbiased with some amount of
| confidence. I think you'd also need a buffer to detect and then
| output.
| coppsilgold wrote:
| If you know that a source of randomness contains entropy
| (unpredictable bits) but you don't know how much (ex. digital
| camera unless heavily processed will contain random sensor noise
| in the output) the safest thing to do is pipe it into a
| cryptographic construct such as a hash or a sponge.
|
| Once you believe you piped enough you use the state of the
| cryptographic primitive as the seed for further random bit
| generation. The Linux kernel uses a sponge (to accumulate), hash
| function (to consolidate) and a stream cipher (to output) to
| 'convert' events with some degree of randomness into 'infinite'
| safe cryptographically secure random bits.
|
| To acquire some intuition about this you can imagine taking a raw
| 1MP photo with a camera sensor and then feeding the lossless file
| to sha256sum. You acquire a 256 bit string and the sensor noise
| in the photo will be sufficient to secure the result. An attacker
| would need to model all the degrees of freedom in taking photos
| in the world and sensor noise production to build a simulator for
| your camera and start bruteforcing your sha256 result which will
| almost certainly (sensor might be compromised or not really be
| raw) contain far more degrees of freedom than 256 bits.
___________________________________________________________________
(page generated 2024-09-15 23:00 UTC)