[HN Gopher] Why It Matters Whether Hashed Passwords Are Personal...
___________________________________________________________________
Why It Matters Whether Hashed Passwords Are Personal Information
Under U.S. Law
Author : fortran77
Score : 75 points
Date : 2021-02-03 15:35 UTC (7 hours ago)
(HTM) web link (www.jdsupra.com)
(TXT) w3m dump (www.jdsupra.com)
| woliveirajr wrote:
| > As long as the salt value remains secret, this is a very
| effective method against rainbow attacks and other current
| methods of attack
|
| But salts aren't used to be a secret: they are used to make
| existing rainbow tables useless, as the salt wasn't used to
| precompute the rainbow tables.
|
| In the example they use "2@3" pre- and post-appended to the
| password. Ok, so any rainbow table that haven't used 2@3 is
| useless. But "password" will always be a lousy password and 6
| characters of salt isn't that great.
| rileymat2 wrote:
| If they use an unique salt for each password you are right, if
| they use the same salt for each password then the article is
| correct(ish) _.
|
| _ I should say correct enough.
| La1n wrote:
| >if they use the same salt for each password then the article
| is correct(ish).
|
| Wouldn't it be a pepper then?
|
| https://en.wikipedia.org/wiki/Pepper_(cryptography)
|
| edit: nvm, it seems the main difference is a pepper being
| secret.
| cschneid wrote:
| Yep - a pepper is an app-wide value that forces an attacker
| to get both a configuration value (from the source, or env
| vars) and also the hashes of the passwords (a db dump).
| Just another incremental thing for an attacker to overcome
| that is cheap & easy to implement.
|
| Salts must be unique per password as far as I know?
| Otherwise they don't do their main goal which is to make
| every guess useful against only one hash. (ie: guess
| 'hunter2', check if any user had hunter2 as a password, but
| salted, you'd only be able to guess if cschneid had hunter2
| since you'd be hashing hunter2-sssssaaaaallllltttt).
| rileymat2 wrote:
| Yeah, I was trying to imply a correctish for a general
| audience.
|
| If right or wrong is a spectrum.
| dane-pgp wrote:
| It's maybe also worth pointing out that if the same salt is
| used for each password, then an attacker who can leak the
| hashes after creating their own account (with a known
| password) can try brute-forcing the salt.
|
| If the attacker hasn't seen the code, they won't know whether
| the salt is prepended, or appended, or both to the password
| (or if there are two salts), but the amount of computation
| needed to try all possible salts is roughly equivalent to the
| amount needed to crack a single well-chosen (unsalted)
| password.
| kstrauser wrote:
| Even then it's better than nothing. If every company used
| different salt, rainbow tables would have to be completely
| regenerated for each company.
|
| (This is still awful, because then two users with the same
| password would have the same digest. If you know that
| joe@example.com has password "badpass", and jane@example.com
| has the same hashed password, then hers is also "badpass". So
| use randomly per-user salts, gang! Even better, pick a tested
| implementation of PBKDF2 or Argon2 and use that
| implementation. Don't roll your own crypto!)
|
| But still, it's better than nothing.
| [deleted]
| _trampeltier wrote:
| Why not just use joe@example.com as part of the salt? So
| every user has a different salt.
| rileymat2 wrote:
| Oh, yeah, I agree, it is better than nothing.
|
| Just pointing out that I have seen conditions where the
| article is not terribly wrong in its statement.
|
| Things like bcrypt are nice because they take care of that
| for you.
| ryanlol wrote:
| Does anyone actually use rainbow tables to crack password
| hashes anymore with GPUs being so fast?
| motohagiography wrote:
| In many privacy regulations, any unique identifier is treated as
| Personally Identifying Information (PII). If the hashed password
| is used as a lookup key, that makes it a unique identifier, and
| it's PII. I do work in environments where agencies are not
| allowed to share user PII with each other, and therefore you
| can't create single (SSO) identities to login to different
| services. The principle is that individuals should be able to
| compartmentalize their business and service relationships and not
| have their service providers collude or conspire against them for
| leverage.
| ensignavenger wrote:
| In typical username+password systems, the passwords are not
| unique, and thus the password hash would be an inappropriate
| lookup key.
| motohagiography wrote:
| It wouldn't be, unless you were sharing an unsalted hash
| database across organizations, where it was aggregated with
| other ones, then that hash is a unique identifier. Uniqueness
| becomes an issue in things like de-identification, which is a
| legal concept, and pseudonymization, which is basically
| tokenization, but not an information theoretic one like say,
| k-anonymity, or a cryptographic one like entropy.
|
| If these distinctions are new to people on this thread, I
| recommend https://www.oreilly.com/library/view/building-an-
| anonymizati... for some background.
| cookingmyserver wrote:
| So, correct me if I am wrong, but you are saying that
| uniqueness when it comes to data regulation is different
| than uniqueness in information theory.
|
| Even though I generate a random salt when hashing passwords
| I wouldn't use it to select a user out of a database
| because I don't check for collisions by ensuring a unique
| salt that has never been used before. The chance of a
| collision is tiny, but I still don't want to do it. But
| this would still qualify as unique under data regulations?
| motohagiography wrote:
| The fact of a collision does not generally have
| regulatory. meaning unless your hashing scheme was for
| de-identification and your re-identification risk
| assessment of the method showed it was trivial to re-
| identify someone because of said collisions. Also,
| logically, the lower the collision rate, the more unique
| the identifier. I recommend reading the book I linked
| above.
|
| The policy view of what these things mean vs. what we
| talk about in terms of entropy, collisions, confusion,
| etc, are subtly different things. Of course it depends on
| the regs regime, but here's an example from what I do:
|
| A business wants to share 10million of personal profiles
| keyed on a SIN with another business or agency for some
| research. Privacy law says they can't share the data
| unless the personally identifying information is removed.
| Business tells their DBA to "encrypt" the SIN numbers,
| who says "sure" and SHA256's them all and shares the data
| set, because to him, now the SINs are not shared(!).
|
| A privacy policy analyst freaks out because hashing the
| SINs has done nothing to protect the identities of the
| people in the data profiles. The DBA can't figure out why
| because he used an NIST approved 256bit hash on them and
| he tells his boss "it's fine, they're encrypted."
|
| To your case with the salt: if you salt the hashes, the
| agency receiving the file calls back and says they can't
| use the data because the hashes don't match the profiles
| in their database - because they have been using the
| hashes as unique identifiers and re-identifying people in
| the data set. If you salt them for your initial data
| sharing, and then re-hash them with a new salt for an
| update, that's better, but it is still a transferable
| identifier for that cut of the data.
|
| Even if you tokenize each profile with a UUID and
| transfer that UUID between agencies, you are transferring
| a unique identifier about the person. The right way to do
| this is to have a tokenization service broker that takes
| records and synthesizes _new_ record keys (MBUN) for each
| destination counterparty you are sharing the data set
| with. Hardly anyone does this, and they just take the
| privacy risk instead.
|
| The policy vs. info theory difference is that policy can
| conflate hashing and encryption depending on the purpose
| of the use, where in tech and security, they are totally
| different things.
| hxtk wrote:
| In the theoretical sense what you care about is whether
| you can safely assume those hashes will always be unique.
|
| In the practical sense what matters is whether you can
| safely assume the hash for a particular user in a
| particular system will ever be unique.
|
| E.g., if I had a snapshot of a database mapping
| identities to password hashes and a log of all the hashes
| that had been computed by the auth server, I could make
| very reasonable guesses about who logged in at what time.
|
| With that said, I am not a lawyer and I have no idea what
| the legal significance of all that might be.
| jtdev wrote:
| The passwords should be salted, and therefore unique.
| EGreg wrote:
| Okay maybe not PII, but can they share PIII or PIV?
| geoduck14 wrote:
| Wow. I had never considered that. I always wondered why some
| companies don't let me SSO into different parts of their apps.
| motohagiography wrote:
| It's an interesting issue now because norms have changed.
| That specific privacy view is a very 90's-00's worldview that
| assumed silos and didn't anticipate using your webmail login
| for literally everything. Examples include legal firewalls
| between divisions of banks where they can't use your
| transaction history to make car insurance decisions. Should
| an airline know your income before quoting you a fare? (which
| is why some people use Tor and proxies to search for flights)
|
| Privacy is anti-discrimination, and the reason social media
| companies are so rich is because they sell micro-
| discrimination as a service. It's so valuable because when
| people see how it works they ask, "how is this even legal?"
| DoofusOfDeath wrote:
| Thanks for sharing! I'm curious about the exact dividing line
| between PII and non-PII. So if you know, could you clarify a
| detail?
|
| Your phrasing was: "If the hashed password *is used as* a
| lookup key, that makes it a unique identifier, and it's PII."
|
| If I take your wording literally, your saying that I could
| store everyone's SSN in a database, but as long as _my system_
| didn 't attempt to use SSN as a unique identifier, the SSN
| doesn't count as PII.
|
| But it seems unlikely that that's how you meant it.
| derekp7 wrote:
| "If A, then B" does not necessarily mean "If Not A, then Not
| B". Now if the first logical statement was "If, and only if
| A, then B", then the second logical statement would follow.
| Gunax wrote:
| Would that also mean that encrypted (not hashed) PII is also
| personal information? It should not be retrievable (in theory and
| practice) without the key.
| number6 wrote:
| I have it from different DPAs that encrypted data are not PPI
| under the GDRP. The case might be different for passwords but
| you have to think around the corner a bit: You compare two
| hashes when you compare a password - you don't compare the
| password itself (the actual string in the form) - so what
| Identifies you as the person to the computer is not your
| password but the hash. So the hash itself is PII. Why isn't
| this true for encryption? Well good encryption in principle
| should not look different from random data. Will the same file
| encrypted with the same key produce the same outcome? Well -
| no. At least it shouldn't. So its not possible to identify
| someone with it like with a hash value. I can only speculate
| about the rational but this is the only explanation that come
| to my mind.
| tialaramex wrote:
| Several technical mistakes in this piece (which I assume was not
| primarily for a technical audience, but mistakes they are
| nonetheless):
|
| 1. This article repeats the misunderstanding that Rainbow Tables
| are the same thing as a Dictionary:
|
| > Given the processing power available, hackers have and continue
| to generate enormous tables that contain anything from every
| possible combination of values for shorter passwords to lists
| including variants of common and known passwords. These are
| called rainbow tables, and if the password used is included in
| one of these tables, then the cleartext password is known to the
| hacker.
|
| The Rainbow Table is actually an interesting improvement on a
| pre-existing idea (by Hellman, yes as in Diffie-Hellman) for a
| Time-space trade off attack.
|
| So let's describe the most basic idea: We have a list of
| potential passwords (a "Dictionary"). It doesn't matter whether
| it's an actual written list, or a procedure to pick possible
| passwords, but it must be strictly finite. Just trying them all
| is a Dictionary attack. For an online service this is one thing
| "rate limiting" is defending against.
|
| Now, if you have the hashes, you could do your dictionary attack
| except instead of trying to log in you just calculate each hash
| and compare it. There are automated tools that can do this for
| you e.g. John The Ripper.
|
| Next cleverest idea is: Let's calculate hashes from our
| dictionary, store and index them (or we could e.g. store them in
| order) with the original word so we can just look up a hash and
| get back the password if we know one.
|
| Hellman's idea was: Don't store all these hashes. Storage is
| expensive. Imagine a function that converts a hash back into a
| password from the dictionary, call it F() and the hash function
| H() now instead of calculating and storing H(password) ->
| password we recurse through H(F(H(F(.... some number of times,
| effectively chaining together many solutions in each entry
| stored, then we do more work to calculate each intermediate when
| doing a lookup of a hash later. We're trading off less space for
| more time.
|
| Hellman's idea is clever, but it's a bit wasteful, because
| sometimes there will be (at random) collisions in either H() or
| F() and this means you're wasting space/ losing recall in your
| system.
|
| So, Rainbow Tables are Philippe Oechslin's improvement, Oechslin
| deliberately perturbs F() differently for each iteration so that
| if a collision does occur it will go away again unless it was in
| the same iteration of the chains.
|
| This produces a "rainbow" picture in Oechslin's slides which
| named the resulting technique.
|
| Rainbow Tables are useless against a properly salted hash,
| because all this up-front work must be done again for each
| different salt, or, the size of the tables are inflated by the
| size of the salt. So e.g. a 32-bit salt (many today are far
| larger) makes your Rainbow Table cost four billion times more to
| calculate and need four billion times as much space.
|
| They got famous because some key Microsoft password technologies
| from the turn of the century do not use salt, and so you can
| attack those with Rainbow Tables. In particular the LM Hash can
| 100% reliably be reversed by a suitable Rainbow Table that would
| be affordable to store in that era.
|
| 2. It gets salt wrong:
|
| > As long as the salt value remains secret, this is a very
| effective method against rainbow attacks and other current
| methods of attack.
|
| Nope. The salt doesn't need to be _secret_ it should be _random_
| for each record. Our goal isn 't that an attacker doesn't know
| each salt, which in most of these cases will have been stored
| with the hashed passwords, but to prevent them amortizing work
| over a large number of hashes, see the discussion of Rainbow
| Tables earlier.
|
| Overall I think for lawyers or managers looking for legal advice,
| which I assume is the target of this article, the focus ought to
| be on _getting away from passwords_. They were already a bad
| idea, a last resort, last century and we 've doubled down rather
| than seeking every other alternative.
|
| If your system is authenticated with WebAuthn you do not store
| any _secrets_ you do not store anything that would "permit
| access" to the system, because it's all driven by Public Key
| Cryptography. You store things can be used to verify that the
| client is the same as before, but they can't be used to pretend
| to be that client, or even to unmask the client's identity. You
| could (but probably shouldn't) paint the contents of a WebAuthn
| authentication database on the side of your building and that'd
| be fine.
| upofadown wrote:
| By having the salt secret they are using it as an encryption
| key that they hope will prevent any sort of brute force attack.
| Thus the hash is no longer covered by privacy legislation. A
| separate salt per record is still brute forceable, even if it
| takes a lot longer for a lot of records.
| tptacek wrote:
| If you have a secret place that you can hide data from
| attackers in, just store the hashes there too.
| JulianMorrison wrote:
| Please do not use the SHA family of hashes to store passwords.
| _They are designed to be fast._ This makes brute forcing them
| relatively easy. There are hashes designed for passwords that are
| _not_ fast.
| johnisgood wrote:
| Yes, it is called stretching.
|
| Cryptographic hash functions are typically designed to be
| computed quickly, so it is possible to try guessed passwords at
| high rates. You could try billions possible passwords each
| second. This is why we have password hash functions that
| perform key stretching, such as PBKDF2, scrypt, or Argon2. They
| increase the time (and sometimes even memory) required to
| perform brute-force attacks on stored password hash digests.
| Use a large (256-bit is fine) random salt value. All the salts
| are random values (note: they do not have to be a secret), so
| each user will use a different salt value, and now the attacker
| has to compute the stretching function once for each password
| combination, rather than once for each password, and this is a
| lot more work for the attacker.
|
| If you only hash with a salt, it still leaves passwords exposed
| to brute-force attacks and dictionary attacks that they can
| easily run on GPUs. Key stretching is important! This is why we
| have dedicated password hashing functions that are slow enough
| to mitigate brute-force and dictionary attacks.
|
| TL;DR: salting and stretching on passwords, along with password
| hash functions (slow password hashing functions: PBKDF2,
| bcrypt, scrypt, Argon2, Balloon and some recent modes of Unix
| crypt)! Do not use fast cryptographic hash functions on
| passwords, as it defeats the purpose of stretching. Salting is
| not enough.
| JulianMorrison wrote:
| Do not, ideally, roll your own out of anything.
| [deleted]
| dehrmann wrote:
| As others have said, use scrypt, but...
|
| PBKDF2, obviously a less marketable name than scrypt, was a
| pretty standard tool for dealing with fast hash functions. What
| turned out to be a bigger problem than speed is memory use. Now
| that we have widespread GPUs, it's a lot easier to parallelize
| password cracking, and most cryptographic hash functions use
| negligible memory, so it parallelizes very well. scrypt also
| improves this.
|
| I actually saw scrypt take down a website. It got attacked with
| brute-force password attempts (<100k, so it wasn't a serious
| attack), and the scrypt calls exhausted the CPU, essentially
| doing a DoS attack. We ended up wrapping the call in a
| semaphore to prevent this. You don't normally think about it,
| but login might be the most expensive operation your webserver
| does.
| AnthonyMouse wrote:
| I've never really loved this advice.
|
| If people use weak passwords, the hash algorithm doesn't
| matter, they're going to get cracked. If you use strong
| passwords, the hash algorithm doesn't matter, they're not going
| to get cracked. This sort of thing only matters for medium
| passwords. And can be avoided by requiring strong passwords.
|
| It also does very little against precomputation attacks (e.g.
| rainbow tables), because it makes it take longer to compute the
| tables but it's still efficient to use them.
|
| Meanwhile it adds a user-perceptible amount of latency to
| logging in, because the hash algorithm is so slow. That causes
| users to take steps to avoid it, e.g. by staying logged in when
| they don't need to or preventing their machine from locking,
| which reduces security.
| [deleted]
| __s wrote:
| Not sure what you're suggesting, but I'll try to give benefit
| of the doubt & guess you advocate for giving users randomly
| generated 64 character hex strings & password resets only
| give them a newly generated password
|
| At that point it may be suitable to store the password as
| plaintext, though a layer of hashing could be suitable to
| slow down an attack from authenticating in case they get read
| access to the database
| saurik wrote:
| I mean, the "reality" is that if a user uses the same
| password across multiple sites then they are almost
| certainly screwed as the entire premise that we are going
| to guilt all the big websites into storing hashed passwords
| doesn't work for the long tail of websites that each had a
| high likelihood of being run by a scammer anyway. Users
| really just need to not use the same password across
| multiple websites, and that password needs to be high
| entropy. The recommendation of "hash your stored passwords"
| prevents you from being part of the problem, but it is a
| solution to the wrong problem and fails to protect either
| the user or the website from getting hacked: the
| recommendation--at least and certainly if assuming you
| insist on implementing login with the classic "user will
| later send a plain text copy of their secret to the server
| each time they log in, which the server will (somehow)
| verify"--absolutely should include "don't let the user
| generate their password", as anything else does a
| disservice to the user and doesn't actually help the
| website. If the user is sane, they won't care if you
| generate the password. If the user isn't sane, then either
| 1) they are using some kind of weird deterministic scheme
| to generate passwords for all websites (which is frankly
| not optimal); 2) they are intending to use their password
| on another, less secure website (and so should be prevented
| from using that same password on your website); or 3) their
| password is something stupid (like their mother's maiden
| name, concatenated with their date of birth... but they
| change the i's to exclamation points to get around the
| symbol requirement).
| dwheeler wrote:
| I completely agree. If you're directly using a SHA hash for
| password storage, even with salt, you're doing it so
| incompetently that you should probably be fined.
|
| At the very _least_ use a well-studied _iterated_ salted hash
| algorithm for storing passwords (if you must authorize an
| incoming request later). That is a _minimum_ bar. I generally
| recommend argon2. Reasonable alternatives are bcrypt, scrypt,
| and PBKDF2. PBKDF2 is weak against hardware, but it 's still
| better than using a hash directly.
|
| I teach a class on developing secure software, and students
| have to do a final project. I take off points if they directly
| use SHA for password storage; that's just not acceptable.
| EGreg wrote:
| Can you saw which? Blowfish?
| teddyh wrote:
| People used to say "Just use bcrypt.", but this might have
| changed since. A cursory look seems to indicate that Argon2
| is the current preferred choice.
| tptacek wrote:
| bcrypt is fine. It has held up quite well. Really, when
| picking among the mainstream password hashes, just throw a
| dart if you need to.
| __s wrote:
| pbkdf2: https://en.wikipedia.org/wiki/PBKDF2
|
| Your standard crypto library probably implements it
| (Rfc2898DeriveBytes for .NET). Under the covers you should
| tell it to use SHA512. If you can choose something better go
| ahead, but this is often good enough
| [deleted]
| valauran wrote:
| Use argon2 or scrypt for password hashing. PBKDF2 or bcrypt
| are also acceptable, but prefer the first two in new systems.
| Make sure to use appropriate complexity factors.
|
| A good starting point when trying to decide what crypto
| algorithm to use is
| https://latacora.micro.blog/2018/04/03/cryptographic-
| right-a...
| JulianMorrison wrote:
| I absolutely love that right answers document. Thank you.
| rlpb wrote:
| Further, a hash is the wrong primitive to use to store
| passwords. Just as you wouldn't attempt to invent a
| cryptographic hash function yourself.
|
| The correct primitive to use is a _key derivation function_.
| Examples are PBKDF2, bcrypt, scrypt and Argon2.
| SAI_Peregrinus wrote:
| bcrypt is NOT a key derivation function. It IS a "password
| hashing function". PBKDF2, Argon2, and scrypt are both.
| Hizonner wrote:
| A hash (especially a properly salted slowed hash) of a _good_
| password doesn 't permit access to anything, because it's beyond
| cracking even with every dictionary and optimization you might
| have. A hash of a _bad_ password does permit access. The law is
| going to have to wrap its head around that distinction somehow.
|
| Unfortunately "good" passwords have other problems, like being
| hard to type and remember... so there are practically no good
| passwords out there...
| xyzzy123 wrote:
| Not a huge fan of offline-attackable password hashes in
| datastores.
|
| I want to put a vote in here for an anchoring. Lots of methods:
|
| 1) Code secret. This is ok but loses all strength if disclosed.
|
| 2) KMS wrapping of hashes. This is better; attacker needs to be
| in your environment, actively unwrapping all your stuff.
|
| 3) Controlled hashing step (best). Have one step of your password
| hashing incorporate a one-way operation using a "normally non-
| extractable" hardware secret. This could be in CloudHSM if you're
| in AWS.
|
| This completely prevents offline attacks unless the attacker
| compromises your HSM.
|
| 4) It would be nice if cloud providers implemented "password
| hashing as a service". They could easily and cheaply prevent
| offline attacks.
|
| 5) Of course strong key auth instead of passwords for clients
| would be nicer.
| deckard1 wrote:
| > The question of whether a hashed password "permits access" to
| an online account is a complex question that has not been fully
| addressed from a legal standpoint.
|
| Isn't this already answered by existing law? Wasn't Kevin Mitnick
| already charged and prosecuted with laws that would cover
| unauthorized access? Obviously if you have a hashed password and
| it's not _your_ password and you 're trying to access an account
| that isn't _yours_ then I 'm sure that qualifies as unauthorized
| access. What am I missing here?
|
| It sounds like there is an argument being put forth to make
| hashed/salted passes PII for, I can only guess, the sole purpose
| of litigation against businesses for being hacked. Otherwise,
| does anyone think hackers care about PII?
|
| I can understand prosecuting for security lapses that are related
| to a data breach. But within reason. You're getting close to
| charging the victim for the crime here.
|
| I know it's fashionable to argue against storing _any_ PII here
| on HN. But that 's not reasonable for our society. Because, much
| like salting+hashing, when you're implementing security measure
| X, all the future hacker has to do is X+1 to breach the wall.
| This game will never end. If we go down this route, you know
| what's going to happen? Only Amazon and Google and Facebook can
| control your PII. With CCPA and GDPR you're already seeing how
| this plays out. The businesses with deep pockets win. They can
| afford to deal with the legal realities of doing business all day
| long. There is an entire cottage industry that popped up just to
| put those stupid cookie banners on websites now. Does anyone
| really want to live in a world where Google gets to dictate if
| your business can exist, and Amazon constantly stepping on your
| neck and demanding rent?
|
| CCPA/GDPR are basically laws that punish the entire world for the
| sins of three or four monopolies.
___________________________________________________________________
(page generated 2021-02-03 23:03 UTC)