[HN Gopher] Show HN: A portable hash map in C
___________________________________________________________________
Show HN: A portable hash map in C
Author : e-dant
Score : 52 points
Date : 2024-12-08 20:05 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| EricRiese wrote:
| I followed the link in the docs to the page on the hash function.
| There's a DJB-2 algorithm on that page, but no DJB-1
| e-dant wrote:
| Huh -- I must have written djb1 in error
| vintagedave wrote:
| This is neat, and remarkably small. I personally need more
| comments in order to follow, say, the growth logic. I see it
| rehashes but I don't see how it doesn't potentially overwrite
| entries on the way.
|
| Algorithm for hash collisions is just to find the next empty slot
| (linear probing)? What happens when the original entry is
| removed, are the ones in subsequent slots cycled backwards? I
| don't see that...
|
| Also the name is new to me! TIL 'salmagundi' is an English word
| (not even a loan-word) for a type of salad made of many
| ingredients: an excellent name for a hash map that can contain
| many effectively random items.
| e-dant wrote:
| I like the exercise of trying to find the simplest reasonable
| solution to some problem.
|
| Many of my toy and hobby projects are exactly that. Some make
| allowances for the sake of performance and generality.
|
| The hash map, though, is up there with sorting algorithms in
| the breadth of wild implementations. Salmagundi is unlikely to
| be the fastest, or the smartest. But it's cute and portable.
| vintagedave wrote:
| It's neat. I like it!
|
| I edited my comment with some more questions, btw, likely as
| you were answering - apologies if that ended up confusing.
| e-dant wrote:
| There's no slot cycling logic. When a k,v pair at some
| index is deleted, and a different k,v pair has a matching
| hash, there's a (small?) penalty in the hm_get logic that
| will eventually the right k,v pair.
|
| I need a bit more clarification on your first question.
| There should be no mutation of the underlying map when it
| grows -- we're just allocating more room for entries and
| adjusting the old map's item "ownership" -- moving
| pointers->items around.
| e-dant wrote:
| Nevermind -- there are ways for deleting colliding items
| to cause other colliding items to be "forgotten", and one
| solution would be to implement slot cycling on deletion
| drfuchs wrote:
| A bug, I believe: If you "put" three colliding strings A and then
| B and then C, and then "delete" B, you won't be able to find C
| anymore.
| e-dant wrote:
| Good catch! Yes, that looks like a bug :)
| jll29 wrote:
| Thanks for sharing. Simplicity is a good thing.
|
| In order to speed it up by reducing the number of malloc() calls,
| it may be worth adding a simple arena memory allocation measure:
| by allocating one larger block (e.g. 1 MB) initially and then
| doubling the memory size each time you run out, all
| malloc()/calloc() calls can become local salmagundi_alloc() calls
| that are just macro invocations that return an arena buffer
| pointer and also increment said pointer as a side effect.
|
| I also recommend you have a look at Chris Hanson's book "C:
| Interfaces and Implementations", which has a few neat C API
| tricks that your code could benefit from (e.g. for reducing name
| space pollution, for avoiding null pointer argument errors, API
| method naming etc.).
| zabzonk wrote:
| Purely out of interest, and probably my bad, but what is:
| (void)k_sz;
|
| doing? I've seen fussy people doing: (void)
| printf( "hello" );
|
| but not what you have written. As I say, probably ignorance on my
| part.
| ryanpetrich wrote:
| This is to suppress a compiler warning that the k_sz argument
| is unused.
| lang4d wrote:
| That would be one way to silence an unused variable warning
| secondcoming wrote:
| It silences 'unused variable' warnings.
| tester756 wrote:
| https://stackoverflow.com/questions/7354786/what-does-void-v...
| Sesse__ wrote:
| It's an idiom for telling the compiler "don't give a warning on
| this argument not being used, I know about it". Not all
| compilers will heed it. Other choices include e.g.
| __attribute__((unused)) (GNU), [[maybe_unused]] (C++17/C23) or
| just not giving it a name (nonstandard, but very commonly
| supported).
| tredre3 wrote:
| > or just not giving it a name (nonstandard, but very
| commonly supported)
|
| It's only supported by gcc 11+ and clang 11+ (both from
| ~2021) because it was a suggested C2x extension. It's been
| accepted in C23 so other compilers will eventually support
| it, but I wouldn't hold my breath regarding when that
| happens.
| Sesse__ wrote:
| Ah, of course, in C mode it's different. (I nearly always
| use C++ mode, where at least GCC 10 happily takes it.)
| zabzonk wrote:
| OK, thank you. But why do the functions have those unused
| parameters?
| Sesse__ wrote:
| I would assume that is because the hash table takes in a
| "hash this key" function pointer and a "compare these keys"
| function pointer, and those must contain the size for
| variable-length keys. So even if your user-supplied
| functions know the length, or only care about the first
| byte, they have to conform to that API, and then you get an
| unused parameter.
| zabzonk wrote:
| Have functions with different names and different numbers
| of parameters?
| Sesse__ wrote:
| And then have a union between the two different kinds of
| pointers, and a tag to tell which one is in use, then
| test that tag everywhere? Why? What good does it bring
| for that extra complexity?
| zabzonk wrote:
| Doesn't the name of the function select for the types and
| use of the parameters? But I haven't written in C for a
| long time, and this I think could be done more clearly in
| C++.
| saagarjha wrote:
| That would be a bad API design.
| tralarpa wrote:
| My C is quite rusty, so apologies for stupid questions.
|
| In the hm_put function, when you overwrite, why do you malloc and
| copy the key again, and what happens with the old key pointer and
| the old value pointer? (no free for the key and the memset zeros
| everything?)
___________________________________________________________________
(page generated 2024-12-08 23:00 UTC)