[HN Gopher] A Tale of Java Hash Tables
       ___________________________________________________________________
        
       A Tale of Java Hash Tables
        
       Author : nomemory
       Score  : 88 points
       Date   : 2021-11-23 15:46 UTC (7 hours ago)
        
 (HTM) web link (www.andreinc.net)
 (TXT) w3m dump (www.andreinc.net)
        
       | kasperni wrote:
       | As part of project Valhalla they did some performance tests with
       | open addressing as well [1]. Result: java.util.HashMap was the
       | best performing implementation most of the time.
       | 
       | [1]
       | https://cr.openjdk.java.net/~skuksenko/valhalla/hashmaps/has...
        
       | bjourne wrote:
       | So the reason HashMap isn't as fast as it could be is because it
       | converts buckets with linked lists to red-black trees when it
       | reaches some threshold. The reason it does that is because in any
       | hashing scheme, specifically crafted input can cause O(n) lookup
       | times. It is reasonably trivial to take advantage of this to
       | launch DOS attacks. Your web server parses HTTP headers with a
       | hash table? Might very well be exploitable. Same thing with
       | parsing JSON objects using hash tables. I posit that in practice
       | there are very few scenarios in which the extra performance of an
       | "optimized" hash table outweighs the benefits of having one that
       | is not vulnerable to hash-flooding attacks.
        
         | vlovich123 wrote:
         | You can always use a cryptographic hash of the key instead of
         | an untrusted key.
         | 
         | Linked list buckets are terrible for cache locality anyway.
        
           | bjourne wrote:
           | Can you show me an implementation of a high-performance hash
           | table with a cryptographic key that is not vulnerable to
           | hash-flooding?
        
             | goldenkey wrote:
             | All you have to do is have the hashmap be
             | HashMap<K,V,CryptographicKey> and never reveal the key, or
             | seed it from randomness? One can use an HMAC or some other
             | method to use the key to derive hashes.
        
               | bjourne wrote:
               | Is there as HashMap overload with three type parameters?
               | Yes, you can use a cryptographic hash function like
               | SipHash does but it wouldn't perform has well as HashMap.
        
               | vlovich123 wrote:
               | In Java? No. Rust is safe out of the box while retaining
               | O(1) properties at all times
        
             | vlmutolo wrote:
             | Rust's standard library HashMap is a good example.
        
       | mgaunard wrote:
       | Java, the language where linear probing is considered high-tech.
       | 
       | What's gonna happen once they discover robinhood hashing?
        
         | nomemory wrote:
         | The point of the article was too prove that it's hard to do
         | Open Adressing in Java and beat HashMap performance, that uses
         | Separate Chaining.
         | 
         | The article contains also a Robin Hood implementation that
         | overall works worse than HashMap. If you can give something
         | recommendation on how to improve it I would forever in debt.
        
       | zz865 wrote:
       | I like the link to Spotify profile, seems a much better way to
       | get to know someone.
        
       | slaymaker1907 wrote:
       | I've read that HashMap will use a tree if a single bucket gets
       | too large so long as the keys are comparable.
        
       | SloopJon wrote:
       | The post quotes a comment from the Python source code for the
       | dict object:                 This is done by initializing a
       | (unsigned) vrbl "perturb"
       | 
       | Is "vrbl" a term I should know, or just a prvrs abbrvtn for
       | variable?
        
         | raymondh wrote:
         | It's an acronym for "virtual retractable binary link", a
         | concept well known to perturbation specialists.
        
       | mnw21cam wrote:
       | > in practice, nobody will use a hash function that returns a
       | constant integer
       | 
       | How much do you want to bet on that?
        
         | nomemory wrote:
         | I consider myself an optimist.
        
         | jkaptur wrote:
         | I did this many years ago. It wasn't production code and I was
         | prioritizing expediency. The class had rather complicated
         | equals() semantics, I wanted "set" and "map" data structures,
         | and I knew there would never be very many objects.
        
         | michaelcampbell wrote:
         | Not quite that bad, but the hash algo for ... function lookup?
         | (the exact usage escapes me) in PHP was "strlen". Yup, the
         | length of the function name was the hash value.
        
           | nomemory wrote:
           | Maybe it was even more terrible, because it tried to be
           | "smart".
        
           | bruce343434 wrote:
           | yes, the early php stdlib functions sometimes had mangled
           | names by replacing a "to" with 2, and maybe shortening some
           | names but not others due to this reason. At least, that's
           | what the creator of php says here: https://news-
           | web.php.net/php.internals/70691
           | 
           | Also fun read: https://eev.ee/blog/2012/04/09/php-a-fractal-
           | of-bad-design/
        
       | habibur wrote:
       | Check also PHP-7 or 8's "open addressing" hash map. Beats the
       | hell out of everyone else. Like C++ red-black tree default
       | hashmap implementation taking 1.0 second -- vs PHP taking 0.5
       | second for the same dataset.
       | 
       | Here's the code I tested with.
       | 
       | php:                 #!/usr/bin/env php       <?       $ret=[];
       | for($i=0; $i<5000000; $i++){
       | $ret["id.".($i%500000)]="val.".$i;       }
       | print(count($ret))."\n";       print $ret["id.10000"]."\n";
       | 
       | cpp:                 #include <iostream>        #include <map>
       | using namespace std;               int main(){
       | map<string, string> ret;         for(int i=0; i<5000000; i++){
       | ret[string("id.")+to_string(i%500000)] =
       | string("val.")+to_string(i);         }        cout << ret.size()
       | << endl;        cout << ret["id.10000"] << endl;        return 0;
       | }
       | 
       | Rust :                 use std::collections::HashMap;       fn
       | main(){         let mut ret = HashMap::new();         for i in
       | 0..5000000 {             ret.insert(format!("id.{}", i%500000),
       | format!("val.{}",i));         }         println!("{}",
       | ret.len());         match ret.get("id.10000") {
       | Some(val) => println!("{}", val),             None =>
       | println!("ERROR")           }       }
        
         | masklinn wrote:
         | > Rust :
         | 
         | You're using the default hash function, which is SipHash 1-3,
         | which is known to be quite slow. Not horrendously so but it was
         | picked for attack resistance.
         | 
         | If you need high hashmap performances and control input (or
         | have a rather safe type) it's pretty common knowledge that you
         | should replace the hash function with something like fnv or fx.
         | 
         | You're also allocating a ton in a hot loop, also a known
         | mispattern. I'd expect C++ to also be greatly affected there.
        
           | fl0ki wrote:
           | My experience with fx has been very mixed and I don't think
           | it's a safe default recommendation. It can often outperform
           | fnv and wyhash because it does so little, but it can also
           | greatly underperform once it's used in a hash table because
           | it can cause a lot of conflicts for some types of keys,
           | especially after modulo.
           | 
           | This can be much slower than just having a slightly slower
           | hash function, because the equality function will be called
           | against many more candidates, potentially with each requiring
           | one or more uncached fetches from main memory.
           | 
           | I now recommend wyhash more often than not, and sometimes fnv
           | can beat that. wyhash has the substantial advantage that it
           | can also be made attack-resistant with a seed given to the
           | builder, with much less overhead than sip.
           | 
           | My most recent problematic use cases for fx were all to do
           | with hashing just one or two scalar integers per key. Its few
           | per-word instructions weren't nearly enough to get a good
           | distribution when there were only 1-2 words (on a usize=u64
           | target). Even hacking in fmix64 from Murmur3 helped a ton,
           | but made it no faster than wyhash in the end, and harder to
           | reason about. I ended up using wyhash for some keys and fnv
           | for others.
           | 
           | In any case, nobody should switch hash functions without
           | evaluating performance on full-sized realistic production
           | data. You can easily end up with bad overall hash table
           | performance because of poor distribution with your real keys.
           | 
           | Microbenchmarks tend to miss this, especially if they do not
           | create real-sized and real-shaped key sets. Then all you see
           | is the hash function performance, which is misleading if it
           | results in poor overall table performance, which is what
           | happened to me with fx in particular. I think a lot of people
           | choose fx based on such microbenchmarks and assume its
           | distribution is good enough to scale out, but this should not
           | be assumed, only tested.
           | 
           | You might also see surprising effects from cache locality
           | when testing tables big enough that they don't fit in cache,
           | and if this matters for your workload then you definitely
           | want to account for that in your choice of hash table data
           | structure.
        
         | dragontamer wrote:
         | > C++ red-black tree default hashmap
         | 
         | unordered_map is the hashmap.
         | 
         | map is a red-black tree, which keeps the data sorted. You
         | should use map<> if you need to visit the data in-order with a
         | for-each loop. Hashmaps can't visit the data in order.
        
           | [deleted]
        
           | habibur wrote:
           | unordered_map performs even worse than map. Tried it.
        
             | dragontamer wrote:
             | Looking at the code more carefully, I'd actually bet its
             | string that's slow, and not actually map.
             | 
             | unordered_map is actually known to have relatively bad
             | performance on GCC and some other open-source
             | implementations. I don't know if there's a spec-problem
             | that favors a slower-hash table (rather than the modern
             | open-addressing scheme, which is faster on modern
             | computers), but I can believe unordered_map has an issue on
             | common standard libraries today.
             | 
             | ------------
             | 
             | > ret[string("id.")+to_string(i%500000)] =
             | string("val.")+to_string(i);
             | 
             | Lets count the mallocs.
             | 
             | > string("id.")
             | 
             | One
             | 
             | > to_string(i%500000)
             | 
             | Two
             | 
             | > string("id.")+to_string(i%500000)
             | 
             | Three, then two frees() as the previous two strings are
             | deallocated.
             | 
             | > ret[string("id.")+to_string(i%500000)]
             | 
             | Four. The ret[] operator searches for the string. Since it
             | doesn't exist, it now has to malloc the key and insert the
             | value in there. std::move() might solve that issue and cut
             | this one out easily?
             | 
             | > string("val.")
             | 
             | Five
             | 
             | > to_string(i)
             | 
             | Six
             | 
             | > string("val.")+to_string(i)
             | 
             | Seven + 2x frees.
             | 
             | > ret[string("id.")+to_string(i%500000)] =
             | string("val.")+to_string(i);
             | 
             | Eight, to place the value in there. Again, probably solved
             | with std::move. +1 more free.
             | 
             | So eight mallocs and 6 frees?
             | 
             | ---------
             | 
             | Yeah, C++ strings shoot yourself in the foot performance-
             | wise. But they're easy to use and free of stupid
             | malloc/free bugs, but it means that they will malloc/free a
             | LOT more than you might guess with the untrained eye.
             | 
             | I know that Perl / PHP / etc. etc. were explicitly designed
             | to be more favorable towards string manipulation. And I'd
             | bet that there's some old C++98 stuff that might lock in
             | the performance of string to this older, less performant
             | model.
             | 
             | EDIT: I see you managed to avoid the problem of excessive
             | mallocs/frees in the Rust code with "format". The C++
             | equivalent might be snprintf, though that dips down into
             | C-style strings.
        
               | maleldil wrote:
               | > The C++ equivalent might be snprintf
               | 
               | There's std::format from C++20 too[1]. One can use
               | std::ostringstream instead, I suppose, but at that point
               | I'd probably use snprintf too.
               | 
               | [1]
               | https://en.cppreference.com/w/cpp/utility/format/format
        
               | spacechild1 wrote:
               | std::ostringstream will certainly allocate its buffer on
               | the heap. When you call the str() method to retrieve the
               | final result, it will have to make another copy. So you
               | will always have an extra allocation.
               | 
               | Note that std::format still returns a std::string (which
               | might allocate memory, depending on the size of the
               | string), but that doesn't matter if it is going to be
               | moved anyways.
               | 
               | If you want to write into an existing buffer, you can use
               | std::format_to_n as a modern replacement for snprintf.
        
               | spacechild1 wrote:
               | > I don't know if there's a spec-problem that favors a
               | slower-hash table
               | 
               | Yes. std::unordered_map requires that pointers to stored
               | values will always be valid (except if the value is
               | erased). This makes open adressing effectively
               | impossible. If you need max. performance,
               | std::unordered_map is indeed not a good choice.
               | 
               | > The ret[] operator searches for the string. Since it
               | doesn't exist, it now has to malloc the key and insert
               | the value in there.
               | 
               | The optimal method would be https://en.cppreference.com/w
               | /cpp/container/unordered_map/em...
               | 
               | > So eight mallocs and 6 frees? > Yeah, C++ strings shoot
               | yourself in the foot performance-wise.
               | 
               | Most implementations of std::string use short string
               | optimization. Typically, you can save up to 12 chars
               | (32-bit) or 24 chars (64-bit) inside the actual string
               | container without any memory allocation.
               | 
               | Looking at the code example, I would say that there won't
               | be any mallocs at all for std::string.
               | 
               | I think the performance difference just comes from the
               | suboptimal design of std::unordered_map, where each node
               | has to be allocated on the heap. In open addressing hash
               | tables, as used by PHP (and I guess also by Rust), the
               | nodes live in a single large array. The only memory
               | allocation you see is when the array has to be resized
               | due to a rehash - which happens very rarely!
        
               | dragontamer wrote:
               | Thanks for the SSO pointer. I'll keep that in mind in the
               | future.
               | 
               | > Yes. std::unordered_map requires that pointers to
               | stored values will always be valid (except if the value
               | is erased).
               | 
               | Hmm, that requirement also prevents Cuckoo-hashing.
               | 
               | The general pattern here (and this is problematic in a
               | lot of languages), is the idea of having a "shortcut"
               | into an object in a data-structure. For example, C++ and
               | Java's "iterator-invalidation" rules are absolutely
               | arcane, even if we removed the pointer-requirements.
               | 
               | Many data-structures want to move things around (cuckoo-
               | hashing, open-addressing, robin-hood hashing... look 3
               | different, high-performance strategies and we're still
               | just talking about hash tables!!). But those moves
               | "invalidate" the shortcuts (be it a pointer or an
               | iterator).
               | 
               | Some data-structures can have those pointers just fine
               | (Red-and-black trees), others cannot. Representing all
               | these possibilities with a fixed, easily understood
               | interface that generalizes among all of these data-
               | structures / decisions seems impossible.
               | 
               | ------
               | 
               | Well, I guess that's why I keep rolling my own data-
               | structures when I need performance guarantees...
               | standard-library just has to make decisions that may or
               | may not match my cases.
        
               | usefulcat wrote:
               | For std::unordered_map, I don't know why they decided to
               | add the no-iterator-invalidation requirement. If you
               | really need that property, you could get most of the
               | benefits by using unordered_map<unique_ptr<T>> instead of
               | unordered_map<T>. So maybe not the best example of "don't
               | pay for what you don't use".
        
               | spacechild1 wrote:
               | > Well, I guess that's why I keep rolling my own data-
               | structures when I need performance guarantees...
               | 
               | Yes, this is what many companies do. std containers are
               | good enough for the general case, but are certainly
               | situations where you need something more fancy. However,
               | before you roll your own, first check if there's a good
               | library you can use. The world's best hashtable has
               | already been written by other people :-)
        
             | jcelerier wrote:
             | that's absolutely untrue in a lot of cases:
             | https://stackoverflow.com/questions/36392583/is-an-
             | unordered...
             | 
             | of course one can always find faster, drop-in replacement
             | libraries: https://martin.ankerl.com/2019/04/01/hashmap-
             | benchmarks-01-o...
             | 
             | e.g. replacing std::unordered_map with robin_hood::hash_map
             | makes the above C++ code go from 1.2 seconds to 0.5 seconds
             | on my machine (std::map is around 1.3/1.4 seconds for
             | libstdc++).
             | 
             | (though as others said, this test is in big part a test of
             | std::string performance... replacing strings by int make
             | times go down to ~0.1s here)
        
         | nikita2206 wrote:
         | Last time I checked, that was around PHP 7.0, it was using
         | chaining for its hash table implementation. One notable thing
         | was that its hash table was also using a more optimal array
         | impl (no hashing) if you were only using numerical consecutive
         | keys starting from zero.
        
           | masklinn wrote:
           | > that was around PHP 7.0
           | 
           | 7.0 is when php switched to open addressing:
           | https://www.npopov.com/2014/12/22/PHPs-new-hashtable-
           | impleme...
        
           | habibur wrote:
           | > Last time I checked, that was around PHP 7.0, it was using
           | chaining for its hash table implementatio
           | 
           | Chaining was from php 5.
        
       | willvarfar wrote:
       | > I am yet to find an Open Addressing implementation that
       | outperforms HashMap<K,V> by a significant margin
       | 
       | I have a performance-critical framework that went from Java's Map
       | to Trove (http://trove4j.sourceforge.net/html/overview.html) and
       | now uses FastUtil (https://fastutil.di.unimi.it/).
       | 
       | For my use cases, these massively outperform the built-in java
       | Maps and Sets.
       | 
       | It would be interesting to include these in the benchmark. A
       | quick googling finds plenty of comparisons that are a few years
       | old now but I can't find a current up to date java map/set
       | benchmark.
        
         | JanecekPetr wrote:
         | That's because they used primitives, no? In that case also look
         | at some other options: https://www.reddit.com/r/java/comments/r
         | 0b9o9/a_tale_of_java.... While I still think Koloboke in
         | general is the fastest, I do agree that the difference between
         | it and fastutil will not be very big.
         | 
         | Or was this for Object/boxed primitives, too?
        
           | willvarfar wrote:
           | If you are working with primaries then the unboxed
           | collections are a big boon. My code often uses objects
           | though, so I use a lot of the fastutil Object2ObjectMap etc.
           | 
           | There are two areas that are important in my use case:
           | 
           | * the performance of normal get/computeIfAbsent/forEach etc.
           | FastUtil has for example fast iterators avoid allocating
           | Map.Entry etc.
           | 
           | * the memory usage. The total footprint is important because
           | I actually dimension VMs based on stuff. But garbage
           | collection minimization is also really important.
        
             | JanecekPetr wrote:
             | Nice. Thanks. (For other viewers, all the non-standard maps
             | I'm aware of are trying to avoid allocating those pesky
             | Map.Entry objects, that's why they generally take up much
             | less memory.)
        
         | usefulcat wrote:
         | I can't really speak about Java, but for C++,
         | std::unordered_map (which uses separate chaining) gets
         | destroyed (performance-wise) by any number of alternate hash
         | map implementations that use open addressing. There's just no
         | question that open addressing is inherently faster for most use
         | cases.
         | 
         | To be fair, unlike std::unordered_map, the open addressing
         | implementations don't come with a guarantee that they won't
         | move the items in the hash table. But in practice that doesn't
         | really matter anyway, since you could always use
         | unordered_map<unique_ptr<T>> instead of unordered_map<T>. But,
         | I digress..
        
       | RcouF1uZ4gsC wrote:
       | > But, as described here, the decision to use Separate Chaining
       | vs. Open Addressing is not unanimously accepted by programming
       | languages designers. For example, in python, ruby, and rust, the
       | standard hash tables are implemented using Open Addressing, while
       | Java, go, C#, C++ are all more conservatory and use Separate
       | Chaining.
       | 
       | I think with regards to C++ using separate chaining is a legacy
       | decision made in the late 1990's. I think back then, the CPU vs
       | memory speed difference was not as big, so pointer chasing wasn't
       | considered that big of a deal. In addition, this was the days of
       | 32-bit systems with MB's or RAM so having separate, smaller
       | allocations was seen as better than having one large one (see for
       | example deque).
       | 
       | Nowadays, with a much bigger relative performance penalty for
       | accessing memory as well as 64-bit systems with GB's or RAM, I
       | think people are moving to open addressing (for example Rust, as
       | well as Google and Facebook's hash table implementations) unless
       | you need pointer stability.
       | 
       | The fact that std::unordered_map more or less mandated separate
       | chaining is seen as something that holds the standard library
       | back in terms of performance, hence the competing implementations
       | from Google and Facebook.
        
         | Koffiepoeder wrote:
         | Could someone explain to me in a clear and high level fashion
         | what exactly pointer stability (or pointer instability for that
         | regards) entails? I've watched some of the cppcon talks on the
         | google hashtable implementation [0] and in that talk Matt
         | Kulukundis also mentioned it affecting the adoption rate, but I
         | always failed to really grasp the concept and its implications.
         | Google searches also ended up rather fruitless to be honest.
         | 
         | [0]: https://youtube.com/watch?v=ncHmEUmJZf4
        
           | gpderetta wrote:
           | If you allocate your objects inline in your internal table,
           | rehashing will invalidate existing references to those
           | objects as they will need to be copied into the new, larger,
           | table.
           | 
           | Allocating every object separately in its own node as used in
           | chaining implementations guarantee that the object location
           | is stable (i.e doesn't change), but in addition to making
           | iteration slower it requires separate allocation for each
           | object.
        
           | UncleMeat wrote:
           | An operation maintains Pointer Stability means if you take a
           | pointer to an element in a data structure and then execute
           | that operation, the pointer is still valid and pointing to
           | the same data.
           | 
           | Think about how vector would be implemented under the hood.
           | What happens if you delete an element? Does it shift down all
           | of the elements after it? If you do that, the objects have
           | literally moved in memory. Any pointer that points into those
           | memory locations is now invalid. That's a problem if you
           | wanted to hold onto pointer to elements. But if you declare
           | that your interface has certain pointer stability guarantees
           | then suddenly your implementation is massively constrained.
           | Now your vector implementation will have big gaps that you
           | need to deal with and will waste a ton of memory if you add
           | and remove many elements.
        
             | Koffiepoeder wrote:
             | So, am I correct if I then put it as following?
             | 
             | Pointer instability means that it is undefined behaviour
             | to:
             | 
             | 1) store (both on stack and heap) the pointer address to a
             | value element outside of its respective container
             | 
             | 2) then modify the container's content
             | 
             | 3) then thereafter use the stored pointer address
             | 
             | This implies that:
             | 
             | - containers without pointer stability require pass-by-
             | value (or a pointer to a temp copy) to be used for its
             | values after accessing them OR
             | 
             | - containers without pointer stability require absolute
             | immutibility of the container to guarantee runtime safety
             | OR
             | 
             | - containers without pointer stability require "relative"
             | immutability during the existence of a pointer to a value
             | element to guarantee runtime safety. Note that I think this
             | last option is a bad idea, because it requires
             | multithreaded blocking/synchronization of the container and
             | is also easy to encapsulate wrongly, leading to UB.
        
         | dpratt wrote:
         | Granted, it's a micro benchmark without a lot of context, but
         | wouldn't the conclusion that the default HashMap implementation
         | is, on average, faster than all the open addressing
         | implementations contradict this conclusion?
        
           | nomemory wrote:
           | For me it was faster on average and with less variance. But
           | the micro-benchmarks are flawed by default, so I believe it's
           | room to improve them.
           | 
           | Also maybe there are some tricks that can be done, to have a
           | faster Open Addressing implementation in Java.
           | 
           | PS: I've also tried Cuckoo Hashing and Hopscotch and they
           | were also slower, but haven't managed to document my
           | findings.
        
         | nomemory wrote:
         | Exactly.
         | 
         | Looking at some discussions the open-jdk team had, they decided
         | to continue with Separate Chaining. I don't have the link, but
         | then I got interested to see how some Open Addressing (nothing
         | fancy) will behave in Java, so this is the reason I've written
         | this "article".
         | 
         | Unfortunately I coudln't beat HashMap with the "classic"
         | approach, but I got close. I am curious to see a better
         | implementation that brings more engineering tricks on the
         | table.
        
           | RcouF1uZ4gsC wrote:
           | > Unfortunately I coudln't beat HashMap with the "classic"
           | approach, but I got close. I am curious to see a better
           | implementation that brings more engineering tricks on the
           | table.
           | 
           | I think one of the big advantages of using open addressing is
           | memory locality. Since it is harder to control memory layout
           | in Java (and relies on the optimizer/JIT to optimize common
           | patterns) I wonder if you are getting the full benefit of
           | open addressing in Java.
           | 
           | In addition HashMap is standard and the Java
           | compiler/optimzer/JIT can make certain assumptions about how
           | it is used which may enable certain optimizations. Also, a
           | lot of the heuristics for optimization likely used a lot of
           | profiling over the years, and because HashMap is so widely
           | used, it is likely that common usages of HashMap are
           | reflected in the profile data and thus the optimization
           | heuristics.
        
             | nomemory wrote:
             | Exactly, in Java i couldn't even inline my functions, and
             | had zero control over the underlying memory layout.
        
               | JanecekPetr wrote:
               | > i couldn't even inline my functions
               | 
               | You could, manually :). Either way if they're hot,
               | they're inlined.
               | 
               | One common trick for open-addressing maps in Java I don't
               | see in your implementations is to have an array of keys
               | zipped with values (or two arrays, one for keys, one for
               | values) instead of array of Entries. This improves
               | locality and indirection a bit. Obviously more is needed
               | for even more speed.
               | 
               | (Mandatory "in the future it will be better": Valhalla is
               | coming in a few years, that's when we'll have much more
               | control over the memory layout.)
        
               | nomemory wrote:
               | I've assumed they were inline initially, until I profiled
               | and found out one of the swap methods was not. I believe
               | there's also a limit on the amount of bytecode to inline,
               | so no matter how hot a method, if the associated bytecode
               | reaches a certain threshold it won't be. I need to double
               | check though.
        
               | JanecekPetr wrote:
               | Indeed, max bytecode count (FreqInlineSize and
               | MaxInlineSize) and inlining depth (MaxInlineLevel). Your
               | GC choice will also slightly modify the inlining
               | decisions, and then obviously GraalVM will be completely
               | different.
        
               | kaba0 wrote:
               | But inlining will not cause better performance in every
               | case, will it?
        
               | JanecekPetr wrote:
               | Correct! Inlining obviously costs CPU, code cache space,
               | and makes the receiver method bigger so that it's less
               | likely it will be inlined itself. If there ever is a
               | decompilation occuring, the inlining efforts were pretty
               | much wasted.
        
               | masklinn wrote:
               | However it must be pointed out that inlining enables
               | further optimisations.
               | 
               | The tradeoffs are more complicated for a jit, but for an
               | aot compiler it's one of the most important optimisations
               | in the pipeline.
        
               | vips7L wrote:
               | Is there a full list of the differences between the graal
               | jit and C2? I was only aware of graal having better
               | partial escape analysis over C2's conservative EA.
        
               | JanecekPetr wrote:
               | It's simply a completely different implementation. Some
               | of the optimiization passes are the same, obviously, but
               | overall it simply performs ... differently.
        
               | tsimionescu wrote:
               | Out of curiosity, weren't your objects ending up neatly
               | compacted after a GC cycle?
        
               | nomemory wrote:
               | I didn't perform any deletes, and I've tried to keep
               | everything clean without allocating intermediary objects.
               | But one the pitfalls is that I cannot play easily on the
               | memory lane with Java. For example in the entries and
               | bins implementation I had to have two separate arrays. In
               | C I would've kept a single array, with a compact zone,
               | and another sparse zone, iterating through it using an
               | offset.
        
       | raymondh wrote:
       | If the OP has time, it might be nice to experiment with the hash
       | table technique used by sets in Python. It has a variant of open
       | addressing the runs several linear probes before jumping to
       | another location in hash table.
       | 
       | The premise is that probing consecutive memory addresses is cheap
       | because of cache locality and because the address computation is
       | a simple increment. This should be cheaper than resolving hash
       | collisions with additional random probes or by chasing linked
       | list references to random memory locations for each link.
       | 
       | https://github.com/python/cpython/blob/f840398a5fd8741653c26...
        
       | i3oi3 wrote:
       | Some areas that I explored more deeply in my Earlier Days w.r.t
       | hash tables (Java hash tables, at that) that aren't covered by
       | the analysis are:
       | 
       | * Memory footprint and GC load
       | 
       | * Write performance (especially characterizing fast-path vs.
       | rebalance times)
       | 
       | * Loading vs. performance tradeoffs (can the table perform well
       | with 80% occupancy? 99%?)
       | 
       | * Small-table utilization.
       | 
       | If some of the implementations consume substantially less memory
       | and/or GC churn while having largely indistinguishable lookup
       | performance, then they're a win for value. This is one place that
       | open addressing can win.
       | 
       | For the libraries we were writing, we didn't know if consumers
       | would be stuffing in one element (the most common case) or
       | millions of elements (uncommon but important), so we ended up
       | implementing a clever abstraction that would start with a pointer
       | to the single object, then replace it with a hash table when the
       | second element was added. Since we used hash tables like country
       | doctors prescribe aspirin, it reduced the heap footprint of the
       | server by ~30%. Of course, Hansen's Law states "Clever is the
       | opposite of maintainable," and that definitely applied here.
        
         | nomemory wrote:
         | Didn't have time to write about it, but I've used a GCProfiler
         | as well. With a few exceptions, all the 5 implementations are
         | comparable.
        
       ___________________________________________________________________
       (page generated 2021-11-23 23:01 UTC)