[HN Gopher] Show HN: A portable hash map in C
       ___________________________________________________________________
        
       Show HN: A portable hash map in C
        
       Author : e-dant
       Score  : 52 points
       Date   : 2024-12-08 20:05 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | EricRiese wrote:
       | I followed the link in the docs to the page on the hash function.
       | There's a DJB-2 algorithm on that page, but no DJB-1
        
         | e-dant wrote:
         | Huh -- I must have written djb1 in error
        
       | vintagedave wrote:
       | This is neat, and remarkably small. I personally need more
       | comments in order to follow, say, the growth logic. I see it
       | rehashes but I don't see how it doesn't potentially overwrite
       | entries on the way.
       | 
       | Algorithm for hash collisions is just to find the next empty slot
       | (linear probing)? What happens when the original entry is
       | removed, are the ones in subsequent slots cycled backwards? I
       | don't see that...
       | 
       | Also the name is new to me! TIL 'salmagundi' is an English word
       | (not even a loan-word) for a type of salad made of many
       | ingredients: an excellent name for a hash map that can contain
       | many effectively random items.
        
         | e-dant wrote:
         | I like the exercise of trying to find the simplest reasonable
         | solution to some problem.
         | 
         | Many of my toy and hobby projects are exactly that. Some make
         | allowances for the sake of performance and generality.
         | 
         | The hash map, though, is up there with sorting algorithms in
         | the breadth of wild implementations. Salmagundi is unlikely to
         | be the fastest, or the smartest. But it's cute and portable.
        
           | vintagedave wrote:
           | It's neat. I like it!
           | 
           | I edited my comment with some more questions, btw, likely as
           | you were answering - apologies if that ended up confusing.
        
             | e-dant wrote:
             | There's no slot cycling logic. When a k,v pair at some
             | index is deleted, and a different k,v pair has a matching
             | hash, there's a (small?) penalty in the hm_get logic that
             | will eventually the right k,v pair.
             | 
             | I need a bit more clarification on your first question.
             | There should be no mutation of the underlying map when it
             | grows -- we're just allocating more room for entries and
             | adjusting the old map's item "ownership" -- moving
             | pointers->items around.
        
               | e-dant wrote:
               | Nevermind -- there are ways for deleting colliding items
               | to cause other colliding items to be "forgotten", and one
               | solution would be to implement slot cycling on deletion
        
       | drfuchs wrote:
       | A bug, I believe: If you "put" three colliding strings A and then
       | B and then C, and then "delete" B, you won't be able to find C
       | anymore.
        
         | e-dant wrote:
         | Good catch! Yes, that looks like a bug :)
        
       | jll29 wrote:
       | Thanks for sharing. Simplicity is a good thing.
       | 
       | In order to speed it up by reducing the number of malloc() calls,
       | it may be worth adding a simple arena memory allocation measure:
       | by allocating one larger block (e.g. 1 MB) initially and then
       | doubling the memory size each time you run out, all
       | malloc()/calloc() calls can become local salmagundi_alloc() calls
       | that are just macro invocations that return an arena buffer
       | pointer and also increment said pointer as a side effect.
       | 
       | I also recommend you have a look at Chris Hanson's book "C:
       | Interfaces and Implementations", which has a few neat C API
       | tricks that your code could benefit from (e.g. for reducing name
       | space pollution, for avoiding null pointer argument errors, API
       | method naming etc.).
        
       | zabzonk wrote:
       | Purely out of interest, and probably my bad, but what is:
       | (void)k_sz;
       | 
       | doing? I've seen fussy people doing:                    (void)
       | printf( "hello" );
       | 
       | but not what you have written. As I say, probably ignorance on my
       | part.
        
         | ryanpetrich wrote:
         | This is to suppress a compiler warning that the k_sz argument
         | is unused.
        
         | lang4d wrote:
         | That would be one way to silence an unused variable warning
        
         | secondcoming wrote:
         | It silences 'unused variable' warnings.
        
         | tester756 wrote:
         | https://stackoverflow.com/questions/7354786/what-does-void-v...
        
         | Sesse__ wrote:
         | It's an idiom for telling the compiler "don't give a warning on
         | this argument not being used, I know about it". Not all
         | compilers will heed it. Other choices include e.g.
         | __attribute__((unused)) (GNU), [[maybe_unused]] (C++17/C23) or
         | just not giving it a name (nonstandard, but very commonly
         | supported).
        
           | tredre3 wrote:
           | > or just not giving it a name (nonstandard, but very
           | commonly supported)
           | 
           | It's only supported by gcc 11+ and clang 11+ (both from
           | ~2021) because it was a suggested C2x extension. It's been
           | accepted in C23 so other compilers will eventually support
           | it, but I wouldn't hold my breath regarding when that
           | happens.
        
             | Sesse__ wrote:
             | Ah, of course, in C mode it's different. (I nearly always
             | use C++ mode, where at least GCC 10 happily takes it.)
        
           | zabzonk wrote:
           | OK, thank you. But why do the functions have those unused
           | parameters?
        
             | Sesse__ wrote:
             | I would assume that is because the hash table takes in a
             | "hash this key" function pointer and a "compare these keys"
             | function pointer, and those must contain the size for
             | variable-length keys. So even if your user-supplied
             | functions know the length, or only care about the first
             | byte, they have to conform to that API, and then you get an
             | unused parameter.
        
               | zabzonk wrote:
               | Have functions with different names and different numbers
               | of parameters?
        
               | Sesse__ wrote:
               | And then have a union between the two different kinds of
               | pointers, and a tag to tell which one is in use, then
               | test that tag everywhere? Why? What good does it bring
               | for that extra complexity?
        
               | zabzonk wrote:
               | Doesn't the name of the function select for the types and
               | use of the parameters? But I haven't written in C for a
               | long time, and this I think could be done more clearly in
               | C++.
        
               | saagarjha wrote:
               | That would be a bad API design.
        
       | tralarpa wrote:
       | My C is quite rusty, so apologies for stupid questions.
       | 
       | In the hm_put function, when you overwrite, why do you malloc and
       | copy the key again, and what happens with the old key pointer and
       | the old value pointer? (no free for the key and the memset zeros
       | everything?)
        
       ___________________________________________________________________
       (page generated 2024-12-08 23:00 UTC)