[HN Gopher] How random are TOTP codes?
___________________________________________________________________
How random are TOTP codes?
Author : ColinWright
Score : 132 points
Date : 2024-07-02 11:59 UTC (3 days ago)
(HTM) web link (shkspr.mobi)
(TXT) w3m dump (shkspr.mobi)
| SoftTalker wrote:
| We like to see patterns. I think the same thing with TOTP codes,
| I'm always noticing when they have repeated digits, or only 2
| different digits, stuff like that, but that's just the nature of
| the human brain looking at random numbers.
| GrantMoyer wrote:
| Me too. I was particularly pleased when I got 332211 as my TOTP
| once.
| BrightOne wrote:
| I remember getting a GABEN code from Steam 2FA. Was 100% sure
| it was not random
| Fr0styMatt88 wrote:
| For me it's "I swear I feel like I've gotten the exact same
| TOTP code not that long ago!".
|
| What's the actual likelihood of seeing the same TOTP code from
| different apps at different times?
|
| What about those ones that get sent out in phone messages?
| masklinn wrote:
| > What's the actual likelihood of seeing the same TOTP code
| from different apps at different times?
|
| There's only a million 6 digit TOTP codes, so I reckon you
| get to 50% odds of collision at ~1200 values.
| yulle wrote:
| For those curious on how to find that ~1200 values are
| needed before a 50% probability of collision, this is a
| generalized form of the birthday problem [0].
|
| With a calculator [1], we can find the exact amount.
| plugging in D=10^6 (# of unique codes) with P=0.5 (odds of
| collision) gives you 1178 values.
|
| [0] https://en.wikipedia.org/wiki/Birthday_problem
|
| [1] https://www.bdayprob.com/
| codeflo wrote:
| The very rough rule of thumb that I think one should
| remember is: You get a 50% chance of collision with
| sqrt(total) samples.
|
| Sqrt(1 000 000) = 1 000, close enough to the exact number
| you calculated.
| ColinWright wrote:
| More accurately, for 50% it's \sqrt(\ln(4))\sqrt(N) which
| is \sqrt( ln(4) N ).
|
| That's about 1.17741 times \sqrt(N).
|
| The reasons are explained here:
| https://news.ycombinator.com/item?id=40886363
| dickfickling wrote:
| I have long considered building a code generator that only
| serves "nice" codes.
|
| The standard implementation would use a hard coded set of
| patterns. The enhanced version would pair automatic random code
| generation with a public site where members of the public can
| decide in real time whether a given code is "nice" before it's
| sent along to the intended recipient.
|
| (note @tptacek if you're reading this, I swear it's (mostly) a
| joke)
| whalesalad wrote:
| I've always been under the impression that popular TOTP
| implementations were specifically designed to produce "nice"
| codes. They always have repeated digits, like 202838.
| wavemode wrote:
| I think that's just the birthday paradox. It's highly
| likely for a randomly generated number to have some
| repeated digits.
| cozzyd wrote:
| Hmm for 2 digits you only have one "nice" code (69). For
| three digits it's the same (069). For four digits you do get
| two (0069 and 6969), but I think you need a lot of digits
| before "nice" codes are effective.
| nayuki wrote:
| There is a small inherent bias in TOTP (and HOTP) codes. The
| algorithm extracts 31 consecutive bits from a SHA-1 hash (
| https://en.wikipedia.org/wiki/HMAC-based_one-time_password#A...
| ). Let's assume that SHA-1 produces uniformly random bits.
|
| The 31-bit number is modulo'd by 10^6 to generate a 6-digit
| base-10 number. But 2^31 isn't a multiple of 10^6, so some
| remainders will be slightly more likely than others. Namely:
|
| * 000000 to 483647: 2148/2147483648 [?] 1.000240e-6 chance.
|
| * 483648 to 999999: 2147/2147483648 [?] 0.999774e-6 chance.
|
| This kind of bias always happens when changing the range of
| random numbers and the number of possible outcomes is not a
| divisor of the number of incomes, and rejection sampling isn't
| used.
|
| This is why, for example, java.util.Random.nextInt(int n) (which
| generates an integer in the range [0, n)) carefully uses
| rejection sampling in its algorithm:
| https://docs.oracle.com/javase/8/docs/api/java/util/Random.h...
| throwaway81523 wrote:
| The effect of this on the min-entropy is basically nil.
|
| Added: to be pedantic, the difference in min-entropy from a
| uniform distribution is about 0.000347 bits if my calculation
| is right (log2(2148*1e6/2**31)). Really, this is not of
| practical significance given how TOTP is used.
| p0seidon wrote:
| I love how this response corresponds to your profile
| description, and I mean that in a positive way, just to be
| clear.
| nayuki wrote:
| The entropy (
| https://en.wikipedia.org/wiki/Entropy_(information_theory) )
| of the uniform distribution is -log2(0.000001) [?]
| 19.9315685693 bits.
|
| The entropy of the TOTP distribution is -log2(2148/2147483648
| )x483648x2148/2147483648-log2(2147/2147483648)x516352x2147/21
| 47483648 [?] 19.9315685303 bits.
|
| So yes, the difference in entropy is negligible. The TOTP
| distribution is worse by 39 nanobits (3.906e-8) per code.
| throwaway81523 wrote:
| Pedantry 2: normally in cryptography we use the min-entropy
| <https://en.wikipedia.org/wiki/Min-entropy> rather than the
| Shannon entropy that you linked, though in this case they
| are almost equal.
|
| Exercise: consider a weighted, million sided die. 50% of
| the time when you roll it, it comes up 1. The other 50% of
| the time, it comes up on one of the other 999999 results,
| with equal likelihood. What is the min-entropy of this
| distribution? What is the Shannon-entropy? This should tell
| you why the min-entropy is preferable.
|
| Added: hmm I think I made a calculation error further up.
| I'll look at it tomorrow if I can.
| Maxatar wrote:
| Not being an expert, I was unaware of this and worked
| through it after reading the article you linked.
|
| So you have 1 bit for the min-entropy and about 11 bits
| for the Shannon entropy. The Shannon-entropy pretty much
| hides the elephant in the room, which is the enormous
| bias of rolling a 1. So basically in crypto you use the
| min-entropy because that reflects the most vulnerable
| scenario in a system which is what you prioritize
| protecting against, rather than protecting against the
| average scenario.
|
| This was very insightful, thanks for sharing.
| o11c wrote:
| The most extensive _comparison_ of rejection methods I 've seen
| is on the PCG blog:
|
| https://www.pcg-random.org/posts/bounded-rands.html
| fuglede_ wrote:
| One I remember stumbling upon some years back was the .NET
| Framework equivalent of that, System.Random.Next(0,
| int.MaxValue), which has a much greater probability of
| producing odd numbers than even numbers, the probability of
| getting odd numbers being 50.34%, because of some rather
| unfortunate translations between integers and floating points.
| https://fuglede.dk/en/blog/bias-in-net-rng/
| selcuka wrote:
| > How random are TOTP codes?
|
| Nitpicking: They are not supposed to be random as that would
| defeat the purpose. We should be able to deterministically
| generate the same number on both the client and server side from
| the same 2 seeds (secret key and the timestamp).
|
| They should be ideally uniformly distributed, though.
| account42 wrote:
| > They should be ideally uniformly distributed, though.
|
| This isn't enough as just counting from 000000 to 999999 is
| uniformly distributed over the whole range. You want the
| numbers to be _pseudo_ random so that someone without the seed
| cannot guess them even if they have seen previously generated
| numbers.
| landgenoot wrote:
| I have been thinking about a useless TOTP app that works the
| other way around. Instead of giving you the current code, it
| gives you the timestamps when the code is e.g. 000000, 123456 or
| 777777.
|
| With a window of 30 seconds and 1e6 possibilities, the expected
| time it takes to get to a particular number is 347 days. Should
| be easy to brute force.
| hi-v-rocknroll wrote:
| In a deep voice: _Your wish is my command._
|
| https://gist.github.com/skull-squadron/8f806b28abbcaa1ba9c25...
|
| Unfortunately, it may take several years before a certain TOTP
| value is reached because the values are nondeterministic rather
| than ordered and so there will be hash collisions of other
| values as well.
|
| Example: JBSWY3DPEHPK3PXP 999999 TOTP will
| match 999999 between 2024-11-29 16:37:00 -0600 and 2024-11-29
| 16:37:29 -0600
| landgenoot wrote:
| The community here keeps impressing me. Nice job!
|
| Yes it could be several years, however the expected value is
| less than a year. Just like the expected dice rolls before
| rolling 6 is 6.
| hi-v-rocknroll wrote:
| Yes, it's good to disambiguate that it's expected in a
| statistical sense but it's not an upper bound or confidence
| interval.
|
| I also wrote a program to find CRC32 hash collisions that
| can be injected into a text file or script to make a text
| hash to that value. https://gist.github.com/skull-
| squadron/c85d295cf9e6124dd7e90...
|
| Doing so MD5, SHA1, or even SHA256 would be extremely slow
| and expensive, but not impossible.
| nayuki wrote:
| You don't need to brute-force CRC-32. I do a preimage
| attack directly using linear algebra:
| https://www.nayuki.io/page/forcing-a-files-crc-to-any-
| value
| robxorb wrote:
| Not so useless. A user of that could memorise the times/dates
| where a particular easy-recall code comes up. With that, they
| have effectively "transferred" 2FA for those times into their
| brain, and not need to use any 2FA app (at only those times).
| hnbad wrote:
| Good luck memorising a number of 30 second windows and
| hitting them exactly. That would require a level of
| organisation, timing and self-discipline I can not fathom.
| robxorb wrote:
| Well, the thirty second window in practice is usually a 2 -
| 3 minute window, as TOTP servers are set up to allow for
| drift, network issues, human slowness etc. For sure,
| memorising more than a handful may be hard but it's just
| "11 Nov 14:35", etc.
|
| I wonder if something could be set up to be both more
| secure, and more tailored to this use-case. Be pretty sweet
| to embed a 2FA in users brains somehow.
| 8organicbits wrote:
| I think it would be easier to use HOTP for that as the
| codes are one time use and aren't time based. The user
| just needs to memorize one of the next N codes.
| masfuerte wrote:
| If you are able to choose the seed you could brute force
| it so that it produced a memorable code at a memorable
| time.
| robxorb wrote:
| That's a very cool idea, with the caveat that if an
| attacker knew you were doing that, the search space for
| your key would shrink.
| dotancohen wrote:
| > Be pretty sweet to embed a 2FA in users brains somehow.
|
| 2FA already has a concept of "Something that you know".
|
| Still, "Something that you could calculate without
| revealing the thing that you know" is an interesting
| concept.
| svarrall wrote:
| You mean like this? You can get push notifications when your
| codes are "cool"
|
| https://open.substack.com/pub/jacobbartlett/p/building-a-2fa...
| kelseyfrog wrote:
| Can't they run a chi-squared test(n=10) and see that the results
| is not significant?
| bhawks wrote:
| Using statistics? In our moment of triumph?
|
| I think you underestimate these histograms.
| rustcleaner wrote:
| I wish they were used more, and had adjustable settings.
|
| Can we please have customizable diceware TOTP? I'd like 8-12
| words 60-90 seconds. I also wish this could be used everywhere.
| slau wrote:
| Depending on the algorithm, one or two of the digits in the TOTP
| are counters to help the server figure out clock drift on the
| client device.
|
| This was especially relevant when talking about hardware tokens
| that had relatively inaccurate clocks. In the RSA algorithm I
| seem to recall it was the second or third digit. Each clock tick
| was 2.5 seconds or something, so providing the last digit of the
| clock counter massively reduced the number of calculations the
| server had to do in case of a mismatch.
| fanf2 wrote:
| No, TOTP implementations don't compensate for clock drift, but
| they do compensate for moderate inaccuracy and races at the
| edges of the 30 second slots by typically allowing two or three
| codes around the current time +/- 1 minute or so.
| slau wrote:
| Well, considering I was part of the implementation team of
| one of those hardware tokens, I'm sorry I'm going to have to
| disagree with you :). Please note that I'm not talking about
| RFC6238 / 4226. I'm talking about proprietary implementations
| for hardware tokens.
|
| We most definitely had drift management. One of the
| differentiating features to our competitors is that our
| algorithm had both a "click counter" (number of times the
| button was pressed) and a "clock counter". The least
| significant digit of both counters was included in the OTP
| that was generated. The authentication server used the last
| digit of both counters to figure out what values to use, and
| as you say, generated codes in both time directions in order
| to try and identify the values of the counters (well, the
| click counter obviously didn't go both directions).
|
| The server then stored the last matching click counter value
| and the drift of the clock value. It wasn't uncommon to have
| tokens that drifted by multiple minutes per month.
|
| This being said, you're right: the clock only incremented
| every 28 or 32 seconds, not 2.5 seconds as I incorrectly
| remembered.
| hnbad wrote:
| When most people say TOTP they refer to the RFC kind, not
| proprietary OTP token generators.
|
| So it's less of a "depending on the algorithm they may do
| this" thing and more "some proprietary solutions do this
| instead".
| est wrote:
| offtopic, is it secure to design a login that only requires
| username+TOTP?
|
| This elimilates passwords altogether, but are there any pitfalls?
| ManuelKiessling wrote:
| Totally not my field of expertise, but maybe some like "an
| attacker knows all usernames, and can therefore try an attack
| where they attempt to log into all of the accounts in parallel
| with the TOTP '000000'" yields a high enough chance of hitting
| one or more accounts that happen to have this TOTP at this very
| moment?
| est wrote:
| it's about same chance of trying to login all usernames with
| password 123456 no?
|
| An easy counter-measure would be blocking consecutive TOTP
| logins of the same or similar codes.
| dmurray wrote:
| It doesn't really matter if the codes are the same.
|
| The attacker has a 1 in 1 million chance of guessing right,
| assuming 6-digit codes. This isn't acceptable for most
| applications.
| est wrote:
| > a 1 in 1 million chance of guessing right
|
| That's assuming there's no throttle of login requests and
| all authentication were made under one minute against one
| single user.
| croes wrote:
| Yes, could easily be limited to 3 attempts per 30
| seconds, so people could fix typos.
| jsnell wrote:
| No, it is assuming neither of those things. Each
| independent attempt pretty obviously has those odds, no
| matter how many accounts and how much time they're spread
| over. Your suggested countermeasure does not do anything
| to change those odds.
|
| I think you're misreading "1 in 1 million" as "1 million
| in 1 million".
| est wrote:
| > Your suggested countermeasure does not do anything to
| change those odds.
|
| Actually it does change. Even if you hit the exact
| correct TOTP code, the system still deny your login
| because of throttle rules and you can't tell on the
| client side.
| ksynwa wrote:
| Not TOTP but a bunch of sites are doing this thing where they
| send an OTP to the registered email and have eliminated
| passwords altogether. I find it really annoying.
| yurishimo wrote:
| It is annoying but it generates far fewer support requests
| for people who can't remember a password and think resetting
| it is too difficult.
|
| With a long enough session life and a good refresh strategy,
| it's less of a problem. If an app clears sessions after a
| week, then I would argue they are doing it wrong.
| ksynwa wrote:
| I think they also want to thwart folks like me who are
| possibly going to use a temporary email site to make an
| account.
| mdaniel wrote:
| Depending upon one's threat model, I found that the
| "public" version of throwaway emails still work great for
| that purpose, versus sites like 10minutemail which
| _destroy_ the address after a time. Sure, I am cognizant
| that if I just went surfing around (e.g.) https://www.mai
| linator.com/v4/public/inboxes.jsp?to=ksynwa long enough I
| could probably snag any hypothetical login link to
| HipsterLoginLink.co but in practice I doubt it's an issue
| account42 wrote:
| That's how I use random sites for which I can't/won't use my
| main browser - just set a random password and don't save it
| anywhere, then rely on email password resets to login.
| lelanthran wrote:
| > offtopic, is it secure to design a login that only requires
| username+TOTP?
|
| This elimilates passwords alltogether, but are there any
| pitfalls?
|
| Coincidentally, I just mentioned that I did this:
| https://news.ycombinator.com/item?id=40878150
|
| (of course, I leave it as a user preference: The user chooses
| whether to use standard passwords or to use one-time-passwords)
| definitelyauser wrote:
| "Secure" is relative.
|
| I have a system I use where you enter your email and get a one-
| time code.
|
| The goal in that system is not to securely authenticate you,
| merely to identify you. "Good enough" for the use case.
| 8organicbits wrote:
| You'll see this in CTFs sometimes. Each guess gives the
| attacker a one in a million chance of guessing the TOTP. If the
| attacker can figure out a username (usually easy) and the log
| in API isn't rate limited, then they can just guess codes in a
| loop. It take about 12 days to make one million guesses at one
| request per second. Most TOTP apps allow codes that are
| older/newer, and the attacker may be able to use a higher RPS.
| I guessed a TOTP in 30 minutes for one CTF, for example.
|
| So pretty insecure, but probably suitable for some systems with
| low security requirements or other mitigating factors.
| croes wrote:
| The pitfall is when the TOTP app is on the same device as the
| browser you want to login with.
|
| But that's a general problem. 2FA should really mean 2 separate
| devices.
| MattJ100 wrote:
| A related observation, where in many real-world data digit
| frequency _can_ be predicted, is described by Benford 's Law:
| https://en.wikipedia.org/wiki/Benfords_law
|
| However this law obviously does not apply to TOTP codes (unless
| someone did something very wrong).
| Terr_ wrote:
| IANAMathematician, but I like to visualize Benford's law as
| what happens when someone is throwing allegedly-random darts at
| log10-lined graph paper.
| hi-v-rocknroll wrote:
| TOTP is slowly on its way out compared to passkeys and FIDO2.
| It's still useful as another 2FA choice.
| aryonoco wrote:
| Passkey was a great idea. A brilliant solution which could have
| solved authentication on the web once and for all. If only
| Apple, Google and Microsoft had not completely ruined it by
| tying it to their platforms and using it as another moat
|
| https://fy.blackhats.net.au/blog/2024-04-26-passkeys-a-shatt...
| daghamm wrote:
| "If you really want passkeys, put them in a password manager
| you control. But don't use a platform controlled passkey
| store, "
|
| I think this is a good advice in general: if you value your
| freedom avoid platform controlled services even if they are
| slightly more convenient.
|
| Btw, the platform authenticator apps are a privacy nightmare.
| Some are constantly reporting your activities to multiple
| services. You can verify this easily using an on-device proxy
| VPN such as NetGuard.
| hi-v-rocknroll wrote:
| Yup. Don't do Google Authenticator or similar. There are a
| couple of FOSS ones.
| rustcleaner wrote:
| Agreed. Passkeys without QR codes to enclave devices or
| copy/paste strings to/from an enclave application made them
| DOA for true barium meal enjoyers.
| dx034 wrote:
| I don't like passkeys. I'm not sure if I'm using it wrong, but
| it feels like entering a TOTP is so much faster and easier than
| using passkeys. It was always easy to enter a 6 digit code and
| have some back-up codes printed somewhere. Passkeys might be
| superior in some ways, but feel much harder to use. But then
| again, I'm also not an average user.
| hi-v-rocknroll wrote:
| I don't like the idea of passkeys if they cannot be backed-up
| or are non-portable locked away in a walled garden and
| possibly on stored on some corporate cloud in some unknown
| manner. When they can be backed-up, are portable, and have an
| explicit security policy, then I'll consider them.
|
| For example, Bitwarden is able to act as a passkey provider
| on iOS and can store the passkey secret key gunk into a
| password record. I tried it out on a couple of minor services
| that have username & password login alternatives.
| dx034 wrote:
| My employer uses alphanumeric 2 factor codes and I'm so certain
| that they have a bias towards some letters (mostly y and z). I
| know I'm probably wrong and it's probably because they appear so
| rarely in real words, but I can't shake the feeling they aren't
| random.
|
| Only problem is that I don't have the algorithm. I started
| writing down all codes I got but since I only get 5 a week, it's
| a long process. I'll probably switch jobs before I have valid
| results.
|
| Not that it would change anything, but I'd be really curious how
| biases in those codes could appear.
| usr1106 wrote:
| Is there a standardized, public, and widely examined algorithm
| that produces letters or did they run "their own crypto"?
| motohagiography wrote:
| aside, the adage "don't roll your own crypto," has this funny
| side effect of homogenization where a weakness empowers an
| attacker against the maximum number of targets and makes mass
| interception more cost effective.
|
| I've found that interoperability across diverse
| implementations is ironically the best protection against
| schemes that weaken rngs and key entropy to facilitate mass
| interception. independent implementations become a proof of a
| protocol or algorithm implementation. if there is only one
| functional implementation of something, it's where I would
| look first.
| chowells wrote:
| Custom logic to serialize a number as a set of symbols hardly
| comes near the threshold of "rolling your own crypto". So
| long as they follow a standard to generate the number, the
| serialization is basically irrelevant as long as it's
| reversible.
| _flux wrote:
| I wondered about a related problem: how many of the codes are
| "easy"? Easy as in they are composed of "simple" patterns, such
| as going to adjacent digits in a number pad. Often it seems that
| if you are given such a sequence, there's a pattern you can use
| to recall it. Seems like it would make the randomness suspect,
| perhaps? But maybe all the sequences have easy rules, the rules
| just differ?
|
| So during a short hackfest I created this to check it out:
| https://github.com/eras/reco . Sorry, no binaries and the font
| size is hardcoded for presentation, and actually the whole UI
| stuff is just for that reason there.. By default it scans the
| whole 6-digit sequence space, but you can also give it a sequence
| and it will show the rules it finds.
|
| Given the rules it uses, it turns out 50% of 6-digit sequences
| are "easy". Because it is based on the rules I just thought would
| apply there are probably other "easy" rules that could cover a
| lot of the remaining 50%.. It also cheats in a way by trying to
| apply the codes to all* shapes and sizes of numpads (1x10, 2x5,
| 3x3+1): match in any numpad is accepted for a sequence to be
| "easy".
|
| It may also be some of the rules for the sequences it finds are
| not "easy" after all :).
| saagarjha wrote:
| Having implemented TOTP codes once I know that they're basically
| unbiased because of the cryptography involved. That said, I would
| bet money that Apple's two-factor implementation is something
| custom because it just seems far too likely to generate
| combinations that look non-random. A bet not because I have
| evidence I'm right, but because I want someone to explain to me
| how these work, if only just "oh it's literally the same
| algorithm everyone else uses" :)
| boxed wrote:
| I would take that bet :P
| gizmo wrote:
| A related issue is that TOPT security guidelines suggest using a
| 160 bit key. Some organizations use 20 chars with alphabet
| A-Za-z0-9. Easy mistake to make, a byte is 8 bit after all.
| However, 62 ^ 20 is only 120 bits (give or take). Way less than
| the recommended minimum. Does anybody know how insecure this is
| in practice?
| dchest wrote:
| 120 bits is secure enough.
| usr1106 wrote:
| I often have the feeling that my TOTP codes are somehow simple.
| Simple in the sense of containg repeated digits, some rhythm
| (e.g. 663183) or symmetry instead of being "purely random" (e.g.
| 581329).
|
| I guess the reason is the human brain can really recognize many
| kinds of patterns. Nothing weird about the entropy.
| joezydeco wrote:
| I notice a lot of double digits, but I always chalked that up
| to confirmation bias.
|
| But if some numbers _were_ doubled, that chunking makes short-
| term encoding slightly easier. Rule of 7(+ /-2) and etc etc
| etc.
|
| https://lawsofux.com/millers-law/
| pixl97 wrote:
| I mean in a 6 digit code there's over a 50 percent chance
| you'll have one double, you're using 60% of the available
| characters in any one return.
| nmstoker wrote:
| Am reassured others see this sort of thing (even if it ends up
| being chance).
|
| About six months ago our MFA system, which uses codes between 1
| and 100, persistently started to give me codes that were odd
| numbers in the top half of the range (ie 51 or above). This went
| on for well over a month (several codes per day) before I saw the
| pattern cease. The rational side of me felt it was just chance
| but I had a nagging unease all the same!
| dfox wrote:
| Since the start of using any kind of SMS 2FA I keep noticing,
| that for some systems you get "nice to remember" numbers somewhat
| often. I didn't care enough to actually write them down and
| confirm whether it is a fact or just my internal bias of finding
| patterns where there are not any.
|
| On the other hand, various decimal "random" numbers around
| payment cards (default PIN, authorization codes and what not) are
| clearly biased, because they are usually generated by taking
| hexadecimal representation modulo 10.
___________________________________________________________________
(page generated 2024-07-05 23:01 UTC)