[HN Gopher] Hash Functions
       ___________________________________________________________________
        
       Hash Functions
        
       Author : rdpintqogeogsaa
       Score  : 62 points
       Date   : 2023-06-03 13:20 UTC (9 hours ago)
        
 (HTM) web link (papa.bretmulvey.com)
 (TXT) w3m dump (papa.bretmulvey.com)
        
       | [deleted]
        
       | layer8 wrote:
       | I don't understand the encryption use case. Can someone
       | elaborate?
        
         | tptacek wrote:
         | It's not written super clearly (though the article itself is an
         | interesting and ambitious piece of technical writing) but the
         | impression I get is that he's referring to the role of hash
         | functions in cryptosystems: for signatures, transcript hashes,
         | key derivation, channel binding, and things like that.
         | Cryptographic hash functions are the glue that binds crypto
         | protocols together.
         | 
         | But you can also trivially turn a hash function into a cipher
         | and encrypt with it (apologies if I missed an explanation of
         | this in the article). Just hash a key and a counter together to
         | create a keystream and XOR your plaintext to it. That's how
         | Salsa20 and ChaCha20 work. (Interestingly, the reverse process
         | --- converting a block cipher into a hash function --- is where
         | we historically get our cryptographic hash functions from).
        
           | layer8 wrote:
           | I'm familiar with hash functions for signatures, however I'm
           | confused about the mention for encryption, in particular with
           | respect to the context of common schemes like AES and RSA. I
           | guess hashing a key is an option though.
        
             | commandersaki wrote:
             | Modern hashing functions pretty much offer encryption out
             | of box like gimli or BLAKE2 family (I think they call it
             | XOF mode).
             | 
             | This is pretty much thanks to the sponge construction.
        
               | tptacek wrote:
               | You can spitball a hash-based stream cipher with any hash
               | function in just a couple lines of python. Take a 128 bit
               | key string, and then hash it with an incrementing counter
               | to get successive 32 bytes of keystream data, just like
               | you would with AES in CTR mode.
        
       | adamzochowski wrote:
       | There is a fourth category of hash functions: find similar items.
       | 
       | For English words, this is usually done with soundex / metaphone
       | / doublemetaphone. Fun fact, soundex was developed over 100 years
       | ago to help immigrants find their families in USA even if they
       | didn't know how to spell their last name. Soundex is supported by
       | most DBs.
       | 
       | For images, there are hashes like ahash/phash/dhash/whash/blockme
       | at/colormoment/colorhas/marrHildreth .
       | 
       | Most popular Audio hash is acoustid. There was also EchoPrint
       | (with two incompatible versions), but once EchoNest was bought
       | out by Spotify the public support and development into echoprint
       | died. Some groups of people tried to keep echoprint alive but I
       | am unaware if it is still used by anyone publicly.
        
         | joezydeco wrote:
         | Soundex is still out there. If you live in Illinois, Florida,
         | or Wisconsin your drivers license number is a soundex hash of
         | your name combined with your birthday.
         | 
         | http://www.highprogrammer.com/alan/numbers/dl_us_shared.html
        
         | preseinger wrote:
         | hash functions are ultimately just a mapping of an input set of
         | high/infinite cardinality to an output set of fixed/finite
         | cardinality
         | 
         | you correctly point out that some hash functions exist which
         | map input words in some specific language into similarity
         | groups based on some concept of semantics
         | 
         | but this is essentially unrelated to the topics raised in the
         | linked article
        
         | jll29 wrote:
         | Robert C. Russell (and Margaret King Odell) invented SOUNDEX,
         | which is described in US Patent 1,261,167 (1918) -see also
         | Knuth (1973) TAOCP vol. 3.
         | 
         | https://patentimages.storage.googleapis.com/31/35/a1/f697a3a...
        
         | lynndotpy wrote:
         | If I remember correctly, Apple's Voxel Net (~2018 neural
         | network, used for doing machine learning on sparse lidar point-
         | cloud data) uses a simple "spatial hash" for this purpose,
         | using the coordinate of the voxel as a key.
        
         | esafak wrote:
         | Today you would use embeddings optimized for similarity search.
        
           | nighthawk454 wrote:
           | Embedding models are still essentially a hash function, just
           | very complex ones
        
         | scentoni wrote:
         | Those are examples of useful functions. They are not examples
         | of useful hash functions.
        
           | gabetax wrote:
           | Specifically, these functions to not provide "uniform
           | distribution".
        
             | jagged-chisel wrote:
             | That isn't a requirement of an algorithm called a hash
             | function
        
               | gabetax wrote:
               | From the article:
               | 
               | > For a function to be useful as a hash function, it must
               | exhibit the property of uniform distribution
               | 
               | It's also listed as the first property on the Wikipedia
               | page:
               | 
               | https://en.wikipedia.org/wiki/Hash_function#Uniformity
        
               | jagged-chisel wrote:
               | I can't find the text you quoted. The article starts off:
               | 
               | > A hash function is any function that can be used to map
               | data of arbitrary size to fixed-size values
               | 
               | And the first sentence in the section you link says
               | 
               | > A good hash function should map the expected inputs as
               | evenly as possible over its output range.
               | 
               | It doesn't have to be "good" to be a hash function.
        
               | anamexis wrote:
               | That's not a meaningful definition of a hash function,
               | then.                   def hash(val):             return
               | 0
        
               | jagged-chisel wrote:
               | Yes, this is a hash function. Whether it's "useful" or
               | "good" is the purview of the architect.
               | 
               | Tangential edit: https://xkcd.com/221/
        
               | throwmeaway1212 wrote:
               | That has a pretty uniform distribution..
        
           | derefr wrote:
           | I think you're conflating "functions used to build a hash
           | table" with "functions used to build a dictionary ADT." A
           | hash table doesn't have to serve the purpose of storing items
           | in a dictionary ADT. A hash table where everything that's
           | similar collides into the same buckets is just as useful for
           | finding "a linked list of similar items" in O(1), as a hash
           | table where everything gets its own unique bucket is useful
           | for finding items in O(1).
        
         | michaelcampbell wrote:
         | While researching Soundex in the mid/late 80's, I of course
         | soundex'd my last name and realized that it was the first part
         | of my state's driver's license #.
        
           | [deleted]
        
           | metadat wrote:
           | Intriguing, I just checked mine and it seems California
           | doesn't appear to do this.
           | 
           | If you're comfortable sharing, would like to find out which
           | state you're referring to. Cheers.
        
             | ComputerGuru wrote:
             | Another poster shared this link: http://www.highprogrammer.
             | com/alan/numbers/dl_us_shared.html
             | 
             | It has the encoding formula for Wisconsin, Illinois, and
             | Florida (which match the parent's comment) but there may be
             | others.
        
       ___________________________________________________________________
       (page generated 2023-06-03 23:01 UTC)