[HN Gopher] Show HN: A portable hash map in C
       ___________________________________________________________________
        
       Show HN: A portable hash map in C
        
       Author : e-dant
       Score  : 161 points
       Date   : 2024-12-08 20:05 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | EricRiese wrote:
       | I followed the link in the docs to the page on the hash function.
       | There's a DJB-2 algorithm on that page, but no DJB-1
        
         | e-dant wrote:
         | Huh -- I must have written djb1 in error
        
       | vintagedave wrote:
       | This is neat, and remarkably small. I personally need more
       | comments in order to follow, say, the growth logic. I see it
       | rehashes but I don't see how it doesn't potentially overwrite
       | entries on the way.
       | 
       | Algorithm for hash collisions is just to find the next empty slot
       | (linear probing)? What happens when the original entry is
       | removed, are the ones in subsequent slots cycled backwards? I
       | don't see that...
       | 
       | Also the name is new to me! TIL 'salmagundi' is an English word
       | (not even a loan-word) for a type of salad made of many
       | ingredients: an excellent name for a hash map that can contain
       | many effectively random items.
        
         | e-dant wrote:
         | I like the exercise of trying to find the simplest reasonable
         | solution to some problem.
         | 
         | Many of my toy and hobby projects are exactly that. Some make
         | allowances for the sake of performance and generality.
         | 
         | The hash map, though, is up there with sorting algorithms in
         | the breadth of wild implementations. Salmagundi is unlikely to
         | be the fastest, or the smartest. But it's cute and portable.
        
           | vintagedave wrote:
           | It's neat. I like it!
           | 
           | I edited my comment with some more questions, btw, likely as
           | you were answering - apologies if that ended up confusing.
        
             | e-dant wrote:
             | There's no slot cycling logic. When a k,v pair at some
             | index is deleted, and a different k,v pair has a matching
             | hash, there's a (small?) penalty in the hm_get logic that
             | will eventually the right k,v pair.
             | 
             | I need a bit more clarification on your first question.
             | There should be no mutation of the underlying map when it
             | grows -- we're just allocating more room for entries and
             | adjusting the old map's item "ownership" -- moving
             | pointers->items around.
        
               | e-dant wrote:
               | Nevermind -- there are ways for deleting colliding items
               | to cause other colliding items to be "forgotten", and one
               | solution would be to implement slot cycling on deletion
        
               | skitter wrote:
               | Another solution could be tombstones: When a slot gets
               | cleared and the next slot has an item, insert a
               | placeholder value. During linear probing, treat the
               | placeholder as a normal item (that doesn't match); during
               | insertion & deletion treat the placeholder as an empty
               | slot.
        
         | andrelaszlo wrote:
         | The etymology of 'salmagundi' seems pretty obscure. It might be
         | a loan word?
         | 
         | https://www.nytimes.com/1978/01/16/archives/the-salmagundi-d...
         | 
         | https://en.m.wiktionary.org/wiki/salmagundi
        
       | drfuchs wrote:
       | A bug, I believe: If you "put" three colliding strings A and then
       | B and then C, and then "delete" B, you won't be able to find C
       | anymore.
        
         | e-dant wrote:
         | Good catch! Yes, that looks like a bug :)
        
           | attractivechaos wrote:
           | You are implementing a closed hash table with linear probing.
           | You need tombstones to mark deleted items, or better, move
           | other items to replace deleted items [1]. Currently your
           | library doesn't have either mechanism.
           | 
           | [1] https://en.wikipedia.org/wiki/Linear_probing#Deletion
        
           | munificent wrote:
           | If it helps, my book Crafting Interpreters walks through a
           | linear probing hash table implementation in C including
           | handling deletion:
           | 
           | https://craftinginterpreters.com/hash-
           | tables.html#deleting-e...
        
       | jll29 wrote:
       | Thanks for sharing. Simplicity is a good thing.
       | 
       | In order to speed it up by reducing the number of malloc() calls,
       | it may be worth adding a simple arena memory allocation measure:
       | by allocating one larger block (e.g. 1 MB) initially and then
       | doubling the memory size each time you run out, all
       | malloc()/calloc() calls can become local salmagundi_alloc() calls
       | that are just macro invocations that return an arena buffer
       | pointer and also increment said pointer as a side effect.
       | 
       | I also recommend you have a look at Chris Hanson's book "C:
       | Interfaces and Implementations", which has a few neat C API
       | tricks that your code could benefit from (e.g. for reducing name
       | space pollution, for avoiding null pointer argument errors, API
       | method naming etc.).
        
         | attractivechaos wrote:
         | If you double a memory block with realloc, the memory may be
         | relocated to a different address (at least on MacOS). Then all
         | pointers pointing to the old block will be invalid. This can be
         | addressed by adding new blocks to a list. It will be more
         | complex than a minimal arena allocator.
         | 
         | Furthermore, a typical arena allocator only grows but doesn't
         | shrink. A deleted item from the memory block will still hold
         | the memory and will not be released. You will need a more
         | sophisticated allocator for deletion. For fixed sized items,
         | memory pool may be an option.
        
       | zabzonk wrote:
       | Purely out of interest, and probably my bad, but what is:
       | (void)k_sz;
       | 
       | doing? I've seen fussy people doing:                    (void)
       | printf( "hello" );
       | 
       | but not what you have written. As I say, probably ignorance on my
       | part.
        
         | ryanpetrich wrote:
         | This is to suppress a compiler warning that the k_sz argument
         | is unused.
        
         | lang4d wrote:
         | That would be one way to silence an unused variable warning
        
         | secondcoming wrote:
         | It silences 'unused variable' warnings.
        
         | tester756 wrote:
         | https://stackoverflow.com/questions/7354786/what-does-void-v...
        
         | Sesse__ wrote:
         | It's an idiom for telling the compiler "don't give a warning on
         | this argument not being used, I know about it". Not all
         | compilers will heed it. Other choices include e.g.
         | __attribute__((unused)) (GNU), [[maybe_unused]] (C++17/C23) or
         | just not giving it a name (nonstandard, but very commonly
         | supported).
        
           | tredre3 wrote:
           | > or just not giving it a name (nonstandard, but very
           | commonly supported)
           | 
           | It's only supported by gcc 11+ and clang 11+ (both from
           | ~2021) because it was a suggested C2x extension. It's been
           | accepted in C23 so other compilers will eventually support
           | it, but I wouldn't hold my breath regarding when that
           | happens.
        
             | Sesse__ wrote:
             | Ah, of course, in C mode it's different. (I nearly always
             | use C++ mode, where at least GCC 10 happily takes it.)
        
           | zabzonk wrote:
           | OK, thank you. But why do the functions have those unused
           | parameters?
        
             | Sesse__ wrote:
             | I would assume that is because the hash table takes in a
             | "hash this key" function pointer and a "compare these keys"
             | function pointer, and those must contain the size for
             | variable-length keys. So even if your user-supplied
             | functions know the length, or only care about the first
             | byte, they have to conform to that API, and then you get an
             | unused parameter.
        
               | zabzonk wrote:
               | Have functions with different names and different numbers
               | of parameters?
        
               | Sesse__ wrote:
               | And then have a union between the two different kinds of
               | pointers, and a tag to tell which one is in use, then
               | test that tag everywhere? Why? What good does it bring
               | for that extra complexity?
        
               | zabzonk wrote:
               | Doesn't the name of the function select for the types and
               | use of the parameters? But I haven't written in C for a
               | long time, and this I think could be done more clearly in
               | C++.
        
               | Sesse__ wrote:
               | You're saying you would write the entire hash table four
               | times, with everything duplicated and no reuse? Again:
               | Why? What do you gain compared to one line to suppress a
               | warning?
        
               | zabzonk wrote:
               | Clarity? And in C++ at least, I would re-write the
               | functions.
        
               | saagarjha wrote:
               | That would be a bad API design.
        
               | detaro wrote:
               | The point is that you can pass any of the hash functions
               | (or your own compatible) to the datastructure, so it'll
               | obviously call of them with the same set of parameters.
               | And hash functions that need all of them will ignore
               | them.
        
       | tralarpa wrote:
       | My C is quite rusty, so apologies for stupid questions.
       | 
       | In the hm_put function, when you overwrite, why do you malloc and
       | copy the key again, and what happens with the old key pointer and
       | the old value pointer? (no free for the key and the memset zeros
       | everything?)
        
         | codr7 wrote:
         | Nice catch :)
         | 
         | Looks like the key logic is missing an overwrite case, the
         | value is reallocated on overwrite.
        
       | codr7 wrote:
       | Considering there seems to be at least a one memory bug
       | (hm_put/key from another comment), I would strongly recommend
       | running the tests in valgrind or similar if you haven't already.
       | Doing C without it usually ends in some kind of disaster for me.
        
         | ivanjermakov wrote:
         | What kind of disaster? Wouldn't OS security prevent userspace
         | program do destructive things?
        
       | alanmoraes wrote:
       | In the code example, I guess the variable `k_sz` is undefined in
       | the call `hm_get(map, k, k_sz)`. Hope that helps.
        
       | teo_zero wrote:
       | Row 100 of hm_put() clears all contents of the item struct, thus
       | losing the pointer to where the previous value was stored. But
       | then row 107 needs this pointer to pass it to realloc().
       | 
       | Additionally, in case of overwrite, the memory previously
       | allocated for the key is leaked.
       | 
       | All in all, I don't think row 100 is really needed, and removing
       | it wouldn't do any harm.
       | 
       | Finally, deletes are hard in hash maps. Either you decide that
       | deletes are not allowed, or you implement them with tombstones or
       | re-insertions. A half-backed solution is not an option.
       | 
       | Hope these suggestions help you.
        
         | eqvinox wrote:
         | > Finally, deletes are hard in hash maps.
         | 
         | Deletes are hard in open hash maps like here; they're trivial
         | in chained hash maps.
         | 
         | (I'm only posting this because this seems to be an
         | exercise/learning-ish HN post and I think it's a relevant point
         | to not forget.)
        
           | teo_zero wrote:
           | Right.
           | 
           | Thinking about it, I don't grasp why the author has opted for
           | linear probing here. Given the amount of malloc(), they might
           | have used linked lists as well.
        
         | acmj wrote:
         | I agree. How to deal with deletions is a 101 of hash table
         | implementation. Many CS majors could get it right or at least
         | know about the caveat. It is amusing that many high-quality
         | posts on hash tables are buried in the "new" section of HN
         | while this half-baked toy floats to the front page, with 83
         | github stars at the moment. People don't know what they are
         | upvoting for.
        
       | gritzko wrote:
       | I recently coded a linear-probing hash map in C and I highly
       | recommend you to use fuzz tests. These evolve naturally from unit
       | tests: next step table unit tests, next step property test, next
       | step fuzzing, all steps are incremental, hence easy.
       | 
       | Having a fuzz test was in-va-lu-ab-le. I caught several bugs
       | right there. The delete function was particularly tricky.
       | 
       | I ended up with two fuzz tests as my hash table has a key
       | feature: convergence. Having same contents, it would have exactly
       | same bits in the buffer. In other words, it is independent of the
       | insertion/deletion order. For this, I added another fuzz test. I
       | would add a third one if I realize there is an important
       | invariant I did not fuzz test. That is not much work, but so much
       | useful!
       | 
       | https://github.com/gritzko/librdx/blob/master/abc/fuzz/HASH....
       | 
       | https://github.com/gritzko/librdx/blob/master/abc/fuzz/HASHd...
       | 
       | P.S. In your case, I recommend clang's libfuzzer with ASAN.
       | 
       | https://llvm.org/docs/LibFuzzer.html#fuzzer-usage
        
         | dwattttt wrote:
         | fuzz testing vs HN-popular-post testing, go!
        
         | eqvinox wrote:
         | Here's another example of unit tests for hash tables (and RB-
         | trees and skiplists and...)
         | 
         | https://github.com/FRRouting/frr/blob/master/tests/lib/test_...
         | 
         | (this file is #include'd from another, with some #define set
         | up:)
         | 
         | https://github.com/FRRouting/frr/blob/master/tests/lib/test_...
         | 
         | The "tribe" of C developers with distaste for the preprocessor
         | will absolutely hate how this is written. But it works and
         | caught real issues. It doesn't use an external fuzzing tool but
         | rather a deterministic PRNG (see prng_* calls) to get a good
         | mix of call combinations. Coverage analysis was done to make
         | sure it reaches everywhere.
         | 
         | (disclaimer: I wrote this. And it did catch bugs. Also it
         | really should have a lot of comments.)
        
           | cb321 wrote:
           | If you are in a tribe not hating on pp-macros, you might find
           | this approach for testing/debugging data structures (or even
           | the surrounding pp-based "CTL"-like C templating idea or
           | abstract ways of dealing with the BST zoo) interesting:
           | https://github.com/c-blake/bst/blob/main/test/bst-interact.c
           | 
           | Essentially, you just write a very simple (some might say
           | trivial) interactive command-shell that can include an auto-
           | checking upon edit of (in that case tree) invariants keyed-
           | off an environment variable. The only missing piece is some
           | little tool to generate random inserts/deletes/etc.
           | 
           | If the micro-shell has a pretty printer then it is easy to
           | "replay" any failed series of edits with some "p" commands
           | before & after invariant failures.
        
             | eqvinox wrote:
             | This is incredibly funny to me, because that's (almost)
             | exactly how some of our other tests work :D
             | 
             | (Almost: we have a full-featured command shell and just use
             | that for testing)
             | 
             | Shell bindings: https://github.com/FRRouting/frr/blob/maste
             | r/tests/ospf6d/te...
             | 
             | Input: https://github.com/FRRouting/frr/blob/master/tests/o
             | spf6d/te...
             | 
             | Expected output: https://github.com/FRRouting/frr/blob/mast
             | er/tests/ospf6d/te...
             | 
             | Absolutely agree this is a great option to have for some
             | kinds of tests.
        
               | cb321 wrote:
               | One reason to have a razor thin custom command shell
               | (perhaps obeying similar conventions across data
               | structures..) is that the parsing for such can be so
               | fast/consistent that you can also use it for
               | benchmarking/perf regression testing with a workload
               | generator. You might also find & record "anomalous"
               | workloads that way or write a little "translator" (some
               | might say "compiler") from a log to such a workload or
               | etc.
               | 
               | I have done this for data structures even as fast as
               | integer-keyed hash tables (though in that hyper-fast case
               | you might need to try to measure & subtract off parser-
               | loop/IO dispatch overhead and/or need statistical testing
               | for "actually even different at all", perhaps along the
               | lines of
               | https://github.com/c-blake/bu/blob/main/doc/tim.md).
        
       | TheDudeMan wrote:
       | I suggest improving the probing a little. Something like
       | idx = (idx + 1) % map->cap;
       | 
       | becomes                 iterCount++;       idx = (idx +
       | iterCount) % map->cap;
        
         | meisel wrote:
         | It could be improved even more, performance wise. The
         | potentially expensive modulo could be avoided entirely with an
         | if statement. Or, only use powers of 2 for the capacity, and
         | then you can also use bit wise ops
        
       | david2ndaccount wrote:
       | If you're interested in this, I wrote a blogpost on a simple c
       | hash table without deletion.
       | 
       | https://www.davidpriver.com/c-hash-table.html
        
       | e-dant wrote:
       | I want to thank everyone here who pointed out deficiencies in
       | this little library.
       | 
       | Good catches :)
        
         | rhelz wrote:
         | Not getting all defensive is a superpower!
        
       | anacrolix wrote:
       | Fuck sake, it's char * not char*
        
         | tobyhinloopen wrote:
         | My auto-formatter always changes it like that and I just don't
         | understand why `char *foo` is better than `char* foo`.
         | 
         | I consider `char*` to be the type of `foo`. What's the
         | rationale of having `char *foo`?
        
           | onre wrote:
           | Dunno how this is supposed to be worded, but it's because the
           | "pointerhood" is tied to the variable being declared, not the
           | type itself. This becomes obvious when you declare multiple
           | variables at once.                 char* cat, dog;  /* one
           | char pointer, one char */       char *cat, *dog; /* two char
           | pointers */
        
             | teo_zero wrote:
             | Right. It _would_ be nice if C allowed types and variables
             | to be separated, we could even defines arrays like this:
             | char[8] buffer;
             | 
             | Alas, it's not the syntax Dennis Ritchie settled upon.
        
               | cb321 wrote:
               | The trade-off is one vs. two operator sub-syntaxes or in
               | other words "sub-syntax economy". As it is, C has just
               | one operator expression syntax (and one set of operator
               | precedences/associativities). The "forward applicative"
               | expression syntax is "reused" in the "backward" or
               | "inverse" type declarations.
        
             | kevin_thibedeau wrote:
             | The minor problem is that a typedef pointer breaks this
             | pattern:                 typedef foo * FooP;       FooP a,
             | b; // Both are pointers
             | 
             | Pointer typedefs are misguided but there is no denying that
             | C is inconsistent on whether the '*' is part of the type or
             | not.
        
           | eqvinox wrote:
           | It's purely cultural. "char* foo" or "char * foo" does make
           | more sense than "char *foo". But culture is a very strong and
           | useful thing; if it "looks weird", that will create issues in
           | people's brain when thinking about the code.
           | 
           | Also, case in point: how do you format "char * const foo"?
        
             | cb321 wrote:
             | Redundant cues are also strong. So, for example, C is
             | largely whitespace-insensitive and operator precedence can
             | matter a lot. Consider "x+y*z". If you always spaced this
             | "x+y * z" it would be the same to the parser but less clear
             | to a human than spacing it "x + y*z" where you have used
             | whitespace as a redundant cue to operator precedence.
             | 
             | Similarly, what makes C type declarators tricky for many is
             | their "inverse nature" - "char *foo" means, applying the
             | operator "*" to the identifier "foo" yields a base type
             | "char". So, the "char* foo" form is literally creating a
             | cue in the opposite direction of operator precedence just
             | like "x+y * z" would. For just one indirection it's no big
             | deal, but for things like "char *foo[]" and beyond it
             | becomes more important. So, the "mis"-spacing "char* foo"
             | sets people up for later conceptual failure.
             | 
             | This is in addition to the above point by @onre about
             | multiple identifiers and @anacrolix's initial rather, ahem,
             | pithy take. There is also another "where would ()s be
             | redundant/unneeded" rationale (though this largely
             | recapitulates the 2nd paragraph here). If you are looking
             | for a catchphrase then maybe it could be "Don't space-
             | against-syntax".
        
               | uecker wrote:
               | I do not think this analysis is correct. "char *foo" is
               | not a cue in the wrong direction, it indicates that in
               | "char *foo, bar" only foo is a pointer. Whether the
               | syntax for declarations in C is a good or idea or not is
               | a different question, but it is not misleading in the
               | same way "x+y * z" is.
        
               | cb321 wrote:
               | You seem to have misinterpreted my comment since we agree
               | that the "no-space-before 'foo' way" is truer to the
               | syntax, and to repeat myself, I had said "char* foo" is
               | the bad cue and even mentioned @onre's multiple
               | identifiers supporting point (which you reiterate). It
               | can be hard to see whitespace on HN comments, though
               | unless you do the 4space indents to get a monospaced
               | font. You can in-page search, though.
               | 
               | The only difference with the spacing against syntax in
               | "x+y * z" is that there are fewer type-relevant operators
               | (e.g. '+' does not work in a type expression). One _does
               | NOT_ hear people advocating for _more_ space around  '*'
               | than '+' (though I have sometimes heard "equal space").
               | Your non-disagreement has not persuaded me that this is a
               | bad way to explain the problem, although I agree I could
               | cut down on the double negatives. :-) { Even grade school
               | kids have some sense of precedence of '+' & '*' which is
               | so broadly adopted that even hand calculators do it. }
               | 
               | "How misleading" spacing against syntax can be is
               | admittedly more subjective. I think whitespace can really
               | matter to human readers. There are many other examples.
               | gcc added -Wmisleading-indentation a while back. Off-side
               | rule languages like Python embrace things that sometimes
               | prevent that problem. The Nim compiler was once sensitive
               | to exactly this kind of operator spacing where it would
               | parse "x+y * z" { space around '*' } as "(x+y)*z" in an
               | attempt to have the compiler "Do what I mean".
        
               | uecker wrote:
               | Ah sorry, I then I have indeed misread your comment.
        
       | cozzyd wrote:
       | I'm curious about the #ifdef guard, which is of a form I've never
       | encountered.                 #ifndef
       | BD9DF82A4540BB19368E48E4747C0706       #define
       | BD9DF82A4540BB19368E48E4747C0706
       | 
       | Does some IDE do that for you? Did you just pick a random
       | sequence? Why not just a variant of #ifndef _salmagundi_h that is
       | more common?
       | 
       | Other than that, I strongly echo other's advice about avoiding so
       | many mallocs.
        
         | tobyhinloopen wrote:
         | What's wrong with `#pragma once`? (honest question, i only do C
         | and C++ as a hobby)
        
           | Gibbon1 wrote:
           | Nothing is wrong with that.
        
             | eqvinox wrote:
             | One thing is wrong with it: a large part of C developers
             | either doesn't know it exists, or doesn't know what it
             | does.
             | 
             | ... which I'd fix by using it more :D
        
               | kevin_thibedeau wrote:
               | It doesn't exist in C. It's solely a compiler extension.
               | Include guards get the same performance benefit as
               | #pragma once on compilers that support it. There's no
               | point in embracing laziness with no upside.
        
               | eqvinox wrote:
               | There is an upside: complex header files often contain a
               | bunch of #ifdef ... #endif blocks, especially when
               | dealing with cross-platform support. Since indentation is
               | generally not used there, they can be rather hard to
               | read, especially when nested.
               | 
               | Not having the include guard use the same mechanism makes
               | these a bit easier to read and removes a little bit of
               | cognitive load. (And every bit of cognitive load counts
               | since that's a very hard limit of people's brains.)
               | 
               | Personally (and for the projects I work on), I couldn't
               | care less whether it exists in ISO C or not. I have a
               | list of compilers and platforms I want/need to support.
               | But of course your project may be different.
        
               | kevin_thibedeau wrote:
               | > Since indentation is generally not used there,
               | 
               | Then indent your macros for readability. clang-format can
               | do it for you if you don't want to maintain them.
        
               | eqvinox wrote:
               | Thanks but I'm not asking for a better way to handle
               | #ifdef's, indenting them is extra work1 and there's still
               | an unnecessary outermost block for the include guard.
               | You've also not made any further argument against the use
               | of "#pragma once".
               | 
               | 1 especially across an open source project with more than
               | a hundred contributors from a multitude of companies.
        
           | emsy wrote:
           | You can't target ancient compilers.
        
           | uecker wrote:
           | It is not supported everywhere and it uses some heuristic to
           | determine that the file is actually the same one as used
           | before, and this heuristic sometimes fails in complicated
           | scenarios, e.g. with multiple shared directories, link, or
           | because a file was copied.
        
           | iainmerrick wrote:
           | Pedants don't like it.
        
         | heavensteeth wrote:
         | It's the length of an MD5 hash, which I'd wager isn't a
         | coincidence. It's not "salmagundi", "Salmagundi", or
         | "salmagundi.h" though.
        
         | quotemstr wrote:
         | Amazing how hard some people try to avoid the obviously correct
         | and portable "#pragma once".
        
           | uecker wrote:
           | It is neither portable nor obviously correct.
        
             | quotemstr wrote:
             | Yes, it's both. It works fine in real world million line
             | codebases. I have never, not once, seen a bug or
             | portability issue caused by it. I have seen multiple header
             | guard macro names collide.
             | 
             | Eschewing pragma once these days is pure ignorance and
             | superstition --- then again, so is writing new code in C at
             | all.
        
               | iainmerrick wrote:
               | Yep, header guard name collisions, and also typos in
               | header guards so they don't work as intended.
        
               | uecker wrote:
               | This good for you, other people reported problems with
               | pragma once.
               | 
               | And I think C is one of the better languages you could
               | use to write code today. Also from the programs I use
               | daily, the more stable and reliable ones tend to be
               | written in C.
        
           | cozzyd wrote:
           | for me it's mostly muscle memory that takes a few seconds at
           | the beginning of a file.
           | 
           | i#ifndef __foo_h
           | <escape>yyp:s/ifndef/define/<enter>o<enter>#endif<escape>
        
       | eqvinox wrote:
       | Here's some food-for-thought questions:
       | 
       | First, a correctness question:
       | 
       | - if a struct contains padding, the value of these padding bytes
       | is undefined. What happens if you use such a struct as a key?
       | 
       | And then some integration questions:
       | 
       | - how do you make this typesafe against mixing up 2 distinct uses
       | of hashmaps in the application?
       | 
       | - how do you make this typesafe for its keys and values, i.e.
       | remove accident-prone casts on read access?
       | 
       | - how do you not call the compare and hash functions through
       | function pointers? (Calling through function pointers is both
       | performance relevant as well as a target for exploitation by
       | overwriting the pointer)
       | 
       | (None of these have "nice" answers. But thinking about them is
       | IMHO extremely valuable since these are the most challenging in
       | organizing and engineering an actual codebase itself.)
        
         | hoten wrote:
         | > - if a struct contains padding, the value of these padding
         | bytes is undefined. What happens if you use such a struct as a
         | key?
         | 
         | It's undefined for uninitialized objects. If I understand
         | correctly, when an initializer is provided the padding bits are
         | set to 0. Additionally, setting the whole chunk of memory to 0
         | w/ memset would work.
         | 
         | https://stackoverflow.com/a/37642061/24042444
         | 
         | > - how do you make this typesafe against mixing up 2 distinct
         | uses of hashmaps in the application?
         | 
         | Lacking templates, the user of the library could create slim
         | wrapper methods for each type (and coerce to void* to call the
         | underlying hashmap lib); You could still accidentally call the
         | wrong method but it would help a bit. I suppose you could wrap
         | the real hashmap in a new type to help further (so each set of
         | wrapper methods for a given type would be forced to use the
         | same wrapper struct).
         | 
         | Alternatively, the library could provide helper macros to
         | accomplish the same.
        
           | eqvinox wrote:
           | > If I understand correctly, when an initializer is provided
           | the padding bits are set to 0.
           | 
           | https://interrupt.memfault.com/blog/c-struct-padding-
           | initial... looks at this (albeit a bit outdated; clang 13 and
           | GCC 11) and claims to see undefined values even when
           | initializers are given, at higher optimization levels
           | 
           | > Additionally, setting the whole chunk of memory to 0 w/
           | memset would work.
           | 
           | This seems to be the only thing that can be relied on to work
           | (... sadly)
           | 
           | An alternate answer is: don't treat structs as chunks of
           | bytes when the chunk crosses padding (i.e. create functions
           | to hash specific structs)
           | 
           | Another alternate answer: remove the padding and add
           | "_pad12[34]" members instead. But then you need to figure out
           | where you actually have padding, which is architecture
           | dependent...
        
       | ms9598 wrote:
       | In reviewing hm_put, a few things stand out in the area of
       | handling cases where memory allocation fails. The first problems
       | are at lines 113-116 where memory is allocated and written to,
       | then the allocations are validated in line 117 after they are
       | written to with memcpy. An allocation failure would cause a
       | crash. Secondly, more about semantics, is at line 105 where
       | item->v is free'd after it is found to be NULL, to me it is
       | counter intuitive and unnecessary. Reading further (and back to
       | the first area of concern) at lines 118-119, after an allocation
       | failure, both are free'd which, to me, is acceptable and
       | convenient. In light of that, it seems that line 105 is there for
       | the sake of the reviewer who may not like the apparent asymmetry
       | of the two sites.
        
       | quotemstr wrote:
       | This web site is amazing. We have, on one hand, people explaining
       | machine sentience and quantum breakthroughs, and on the other
       | hand, we have people refusing to use "#pragma once" in C because
       | it's still too novel and doesn't work in every compiler yet.
       | 
       | Never change, HN.
        
       ___________________________________________________________________
       (page generated 2024-12-09 23:01 UTC)