[HN Gopher] JEP 450: Compact Object Headers
       ___________________________________________________________________
        
       JEP 450: Compact Object Headers
        
       Author : mfiguiere
       Score  : 165 points
       Date   : 2023-05-04 15:25 UTC (7 hours ago)
        
 (HTM) web link (openjdk.org)
 (TXT) w3m dump (openjdk.org)
        
       | davnicwil wrote:
       | As I (think I) understand the goal of this is to reduce memory
       | footprint of the HotSpot JVM, with the tradeoff of performance
       | degradation being capped to 5%, only in the worst rarest cases.
        
         | aaa_aaa wrote:
         | Just guessing that Cache throughput increase with the memory
         | reduction may offset most performance issues.
        
       | w10-1 wrote:
       | I'm always impressed with the clarity of Java's
       | design/implementation discussions, which I think is due to Mark
       | Reinhold's leadership since 1997 (a breathtaking tenure in its
       | own right).
       | 
       | But this needs to replace stack locking with an alternate
       | lightweight locking scheme, to avoid races. Unfortunately, that
       | is opaque:
       | https://bugs.openjdk.org/browse/JDK-8291555
       | 
       | Does anyone have other pointers for the design or viability of
       | the required alternative?
        
         | zorgmonkey wrote:
         | Their is a detailed description of the new stack locking scheme
         | in the PR here https://github.com/openjdk/jdk/pull/10907
        
       | akokanka wrote:
       | 5% latency overhead is huuuge. Throw more memory and focus on
       | throughout. Not sure why memory is a concern here.
        
         | kasperni wrote:
         | "in infrequent cases" not in general.
        
         | jesboat wrote:
         | Different applications can have very different requirements.
         | I've worked on systems which would kill for 5% latency and on
         | systems that would gladly pay 5% for better memory usage.
        
       | exabrial wrote:
       | Is this part of the JVM Specification, or is this an
       | implementation detail of OpenJDK itself? If the first, thats sort
       | of surprising. Seems like an implementation detail that'd be best
       | left up to the platform on how to manage objects in memory.
        
         | mike_hearn wrote:
         | No spec changes needed for this one.
        
         | papercrane wrote:
         | This is an implementation detail of the HotSpot JIT compiler in
         | OpenJDK. Other implementations are free to layout their objects
         | as they choose.
        
         | Kwpolska wrote:
         | This document is a proposal for the HotSpot JVM, which is the
         | default JVM implementation (the one that ships with OpenJDK),
         | but not the only JVM implementation out there (see
         | https://en.wikipedia.org/wiki/List_of_Java_virtual_machines for
         | a full list).
        
         | pdpi wrote:
         | The JEP identifies the scope as "implementation" and the
         | component as "hotspot/runtime", so I do assume it's an
         | implementation detail, yes.
        
       | marginalia_nu wrote:
       | Java made a bunch of optimistic assumptions about the future of
       | hardware when it was designed. UTF-16 strings, 64 bit pointers,
       | enormous object headers.
       | 
       | While great for future-proofing, it's been hard to deny it's had
       | overhead. Glad to see it's slowly getting undone: Compressed
       | Oops, Compact Strings, now this.
        
         | komadori wrote:
         | 64-bit longs and doubles take up two slots in the JVM's local
         | variable table, whereas Object references only take one. So, if
         | anything, the design of Java bytecode assumed 32-bit pointers.
        
           | kaba0 wrote:
           | They can just use 64-bit slots for everything (it does leave
           | unused 32bit for each 32bit local variable, but there are not
           | many, and you get the benefit of the same logic no matter
           | what you store there, plus it has the same performance on
           | 64-bit systems)
           | 
           | Edit: mind explaining the downvote ?
        
             | ternaryoperator wrote:
             | Not sure why you're being downvoted.
             | 
             | The Jacobin JVM [0] does exactly what you suggest: 64-bit
             | operand stack and local variables. Longs and doubles still
             | occupy two slots on the operand stack, so as to avoid
             | having to recompile Java classes that assume the two-slot
             | allocation, but the design avoids having to smash together
             | two 32-bit values every time a long or double is operated
             | on.
             | 
             | [0] http://jacobin.org/
        
           | skitter wrote:
           | Relatedly the JVM specification section 4.4.5 says (as the
           | same holds for the cp and stack):                   In
           | retrospect, making 8-byte constants take two constant pool
           | entries was a poor choice.
        
         | masklinn wrote:
         | > 64 bit pointers
         | 
         | 64 bit pointers was not "optimistic assumptions" about
         | anything, it was just 64 bit pointers on 64 bits systems like
         | most everyone else. And compressed oops were added more than a
         | decade ago
         | (https://wiki.openjdk.org/display/HotSpot/CompressedOops).
         | 
         | For reference that's about when x32 was added to the linux
         | kernel, and unlike x32 compressed oops have _not_ been on the
         | chopping block for 5 years.
         | 
         | > enormous object headers.
         | 
         | Hardly? They're two words a class pointer, and a "mark words"
         | for GC integration.
        
           | marginalia_nu wrote:
           | > And compressed oops were added more than a decade ago
           | 
           | That is still recent on a Java timescale.
           | 
           | > Hardly? They're two words a class pointer, and a "mark
           | words" for GC integration.
           | 
           | Well compare with for example C++, that can fly commando with
           | no header, just (optional) alignment padding.
           | 
           | In many Java objects, the header is more than half the size
           | of the object. That's just not good data locality. The speed-
           | up from switching from an array of objects that has n fields
           | to an object which has an n arrays of fields can be very
           | significant.
        
             | kaba0 wrote:
             | Your array of objects are just tightly packed pointers, so
             | the data locality (or its lack) doesn't come from that
             | (especially that the order of objects in memory may be
             | different than the order in the array).
        
               | masklinn wrote:
               | I'm going on a limb as they're going in every direction
               | and trebuchet-ing goalposts, but I'd assume they're
               | talking about using an SoA structure, so instead of
               | having a bunch of Foo objects each with a header and,
               | say, an `int` field (so two words of header for half a
               | word of payload) you have a single header at the top of
               | the array, then an int value per record in the packed
               | array.
        
         | iainmerrick wrote:
         | 64-bit pointers isn't really an optimistic assumption, it's
         | just the normal pointer size on 64-bit systems.
        
           | Bjartr wrote:
           | It was optimistic because Java came out long before 64 bit
           | systems were common.
        
             | zactato wrote:
             | The DEC Alpha was a 64 bit CPU architecture that came out
             | in 1992. Java support it from Java 1.0 to ~ Java 3
             | 
             | Interestingly, I can't find any information on Google about
             | this, but ChatGPT does support this. Unfortunately all the
             | primary sources it gave me are for docs that no longer
             | exist on the web. The Wayback machine was no help. The old
             | web is dead :(
        
               | fweimer wrote:
               | Apparently, there was a Java port for Windows NT running
               | on the DEC Alpha:
               | https://archive.org/details/jdk-v1-alpha_nt
               | 
               | It's unclear whether it's actually a 64-bit build. Did
               | Windows even have a 64-bit userspace on Alpha, or was it
               | all ILP32?
               | 
               | Anyway, it's not really possible to implement the Java
               | memory model on Alpha: https://www.cs.umd.edu/~pugh/java/
               | memoryModel/AlphaReorderin... So it's not really a
               | natural target for Java code.
        
               | twoodfin wrote:
               | Wikipedia matches my recollection: Windows NT was
               | actually ported to 64-bit using Alpha workstations, so
               | there were probably some pre-release versions floating
               | around. But by the time 64-bit Windows was ready, Alpha
               | was a dying platform and the original release was only
               | for the 64-bit platform with a real future, Itanium.
        
               | cesarb wrote:
               | > Interestingly, I can't find any information on Google
               | about this, but ChatGPT does support this. Unfortunately
               | all the primary sources it gave me are for docs that no
               | longer exist on the web.
               | 
               | Or perhaps that never existed. It's well known that
               | ChatGPT often hallucinates non-existing references (see
               | for instance the discussion at
               | https://news.ycombinator.com/item?id=33841672).
        
             | formerly_proven wrote:
             | Common in desktops, yes, but Sun being a UNIX workstation
             | vendor they switched in the 90s.
        
               | oefrha wrote:
               | Well, the 3 billion devices (r) Java ran on certainly
               | weren't Sun workstations.
        
             | prpl wrote:
             | UltraSPARC says hello
        
             | iainmerrick wrote:
             | Optimistic to represent object references with native
             | pointers?
             | 
             | C and C++ usually do that too.
        
           | marginalia_nu wrote:
           | Right, but in the mid '90s when Java was designed, 64 bit
           | machines were basically an exotic architecture and its
           | practicalities weren't well understood, which is quite clear
           | when CompressedOOPs is basically the sane default.
           | 
           | The weird inconsistency is apparent as you fairly frequently
           | stub your toe on the 2 billion size limits of Java arrays,
           | which is an area where 64 bit pointers would have made much
           | more size than in object references.
        
             | znpy wrote:
             | > Right, but in the mid '90s when Java was designed, 64 bit
             | machines were basically an exotic architecture
             | 
             | No they weren't, not in the enterprise market. At home?
             | sure.
             | 
             | SUN Microsystems released the Sparc v9 architecture in
             | 1993. SUN also made the Java language and JVM.
        
               | krzyk wrote:
               | Wasn't java targeted at embedded? AFAIR at some point it
               | was marketed that it will be in every washing machine,
               | fridge.
               | 
               | There a was HotJava browser, weren't servers just a small
               | part of its target?
        
               | jbverschoor wrote:
               | "Write Once, Run Everywhere" was their slogan.
               | 
               | There's Java Card for smartcards yes. There was also Java
               | Micro Edition (j2me) for phones.
        
               | fweimer wrote:
               | Most binaries were still 32-bit for performance reason,
               | and I don't think that Java 1.2 had a 64-bit port yet,
               | not even on Solaris/SPARC.
               | 
               | The situation that an x86-64 build is faster than an i386
               | build of most applications (exception extremely pointer-
               | heavy ones) is a bit of an exception because x86-64 added
               | additional registers and uses a register-based calling
               | convention everywhere. That happens to counteract the
               | overhead of 64-bit pointers in most cases. Other 64-bit
               | architectures with 32-bit userspace compatibility kept
               | using 32-bit userspace for quite some time.
        
       | titzer wrote:
       | Identity hashcodes and monitors are perennially tricky to
       | implement with low space overhead and I think in hindsight that
       | putting them at the base of the object model (i.e. every object
       | has them) was a mistake.
       | 
       | WebAssembly GC objects do not have identity hash codes nor
       | monitors to strive for the lowest space overhead in the base
       | object model.
        
         | mastax wrote:
         | I wondered why .NET let you lock any object, it seems like a
         | strange feature. I figured they just had a spare bit in a
         | bitset somewhere. Should've known it was copied from Java.
        
           | kevingadd wrote:
           | In practice (IIRC, I'd have to check again to be 100% sure)
           | there's a little 'extra garbage' (my name, the real name is
           | something else) pointer inside .NET object headers, and if
           | your object has an identity hashcode or has been used with
           | locking, the extra garbage pointer points to a separate heap
           | allocation containing the hashcode and the locking info. That
           | way you're not paying for it for every object. I think in
           | some cases there are optimizations to be able to store the
           | hashcode inline (tagging, etc). So it ends up being similar
           | to these 'compact object headers' described here in terms of
           | size but is done by making those optional features more
           | expensive instead of by bitpacking stuff into the object
           | header.
           | 
           | Caveat: There are multiple .NET implementations so what I've
           | said may not apply to all of them
        
             | mike_hearn wrote:
             | As the JEP explains, the JVM has done both those things for
             | a long time. In particular the object doesn't actually pay
             | the price of a lock unless it's actually locked at some
             | point. It used to be the case that the JVM was even more
             | extreme and you wouldn't pay the price of a lock unless the
             | object was actually _contended_ , this was called biased
             | locking, but the code complexity to implement it was
             | eventually determined to be no longer worth it on modern
             | hardware.
        
         | hinkley wrote:
         | Even in JDK 1.3 era I recognized that anyone could grab a lock
         | on any object, not just the object itself, so that was a bit of
         | a problem. I started experimenting with member variables called
         | "lock" that were new Object(). It seemed pretty dumb at first
         | but then I discovered lock splitting and it was off to the
         | races. There's always going to be some object with multiple
         | concerns, especially if one or two functions involve both but
         | others involve one (eg, add/delete versus generate data from
         | the current state).
         | 
         | Am I misremembering that there was a time where the JRE lazily
         | added locks to objects? I thought it was part of their long
         | road to lock elision.
        
           | mike_hearn wrote:
           | It does. It's called lock inflation. The lock data structure
           | is more than just a handful of bits, so the object header can
           | point to the real lock once allocated. You don't pay a full
           | lock for every object, just the memory needed to determine if
           | it's in use or not.
        
         | Someone wrote:
         | > Identity hashcodes and monitors are perennially tricky to
         | implement with low space overhead and I think in hindsight that
         | putting them at the base of the object model (i.e. every object
         | has them) was a mistake.
         | 
         | I can understand the hashcode choice, but I never understood
         | why they chose to add the ability to lock on any object. IMO it
         | still is a bad choice now on server machines, and it certainly
         | was in a language designed for embedded devices in the early
         | 1990s.
         | 
         | Does anybody know what they were thinking? Were they afraid of
         | having to support parallel class hierarchies with a
         | _LockableFoo_ alongside _Foo_ for every class? If so, why? Or
         | did they think most programs would have very few objects, and
         | mostly use reference types?
        
           | mike_hearn wrote:
           | At the time, there was a widespread assumption that massively
           | parallel computing was the future and thus that any serious
           | language had to try and integrate concurrency as a core
           | feature.
           | 
           | It wasn't just Java. For many years THE argument for
           | functional languages was you'd better learn them because
           | these languages will soon be automatically parallelized, and
           | that's the only way you'll be able to master machines with
           | thousands of cores.
           | 
           | With hindsight we know it didn't work out like that. We got
           | multi-core CPUs but not _that_ many cores. Most code is still
           | single threaded. We got SIMD but very little code uses it. We
           | got GPUs with many cores, but they are programmed with
           | imperative fairly boring C-like languages that don 't use
           | locking or message passing or anything else, they're data
           | parallel pure functions.
           | 
           | But at the time, people didn't know that. You can see how in
           | that environment of uncertainty "everything will be massively
           | multi-threaded so let's give everything a lock" might have
           | made sense
        
             | paulddraper wrote:
             | > We got SIMD but very little code uses it.
             | 
             | Because programming for SIMD is wildly different than MIMD.
             | (And MIMD is what `synchronized` is for.)
             | 
             | The theory was that there would be more machines like AMD
             | Thread Ripper.
        
           | paulddraper wrote:
           | > Does anybody know what they were thinking?
           | 
           | Ergonomics. (Plus an inherent assumption of mutable data.)
           | 
           | You can write `synchronized (this)` or `synchronized` class
           | method, etc. and it just works.
        
             | Someone wrote:
             | Yes, but the number of things you want to synchronize on in
             | real-life code is limited, and, certainly at the time,
             | memory was scarce. I think only allowing 'synchronized' on
             | containers and allowing programmers to opt-in on them would
             | have been the better choice.
             | 
             | > Plus an inherent assumption of mutable data.
             | 
             | Yet, Strings and the boxed value types (Integer, Long,
             | etc.) are immutable, and you can synchronize on them (or
             | does this matter less for those because of the granularity
             | of the memory allocator?)
        
               | paulddraper wrote:
               | > Strings and the boxed value types (Integer, Long, etc.)
               | are immutable, and you can synchronize on them
               | 
               | Side note: That is a very bad idea, because their object
               | identity is iffy.
        
             | derefr wrote:
             | If I were the Java1.0 authors, I probably would have:
             | 
             | - made a "monitor slot" its own primitive type;
             | 
             | - made two overloads of `synchronized` --
             | synchronized(MonitorSlot), and synchronized(Object); where
             | for synchronized(Object), the compiler expects to find a
             | MonitorSlot-typed field with a special system name (e.g.
             | "__jvmMonitorSlot") on the passed-in Object
             | 
             | - taken the presence of `synchronized (this)` in body code
             | of a class to implicitly define such a field on the class,
             | meaning you _can_ "just write" `synchronized (this)`, since
             | it becomes `synchronized (this.__jvmMonitorSlot)` and also
             | triggers the creation of the __jvmMonitorSlot field
             | 
             | - explicitly define the __jvmMonitorSlot field on Class and
             | other low-object-count system types
             | 
             | The only change from today would be that you can't point at
             | some arbitrary Object you didn't define the class for
             | yourself, and say "synchronize on _that_. " Which... why do
             | you want to be doing that again?
        
               | ShroudedNight wrote:
               | At that point, why not just make types you want lock
               | functionality for be declared as 'synchronized' in their
               | class definition:                   synchronized class
               | Foo {           //...         }
        
               | derefr wrote:
               | Because, for the things that do care about
               | synchronization, they might want _multiple_ explicit
               | MonitorSlot members. It makes more sense to just be able
               | to synchronize on MonitorSlots directly, and then decide
               | where they go.
               | 
               | The only reason I added the `synchronized (this)`
               | allowance was because the parent said that they think
               | that that's "good ergonomics" -- and presumably, the
               | Java1.0 authors also thought that -- and I was trying to
               | suggest an alternative that would preserve what they
               | consider "good ergonomics."
               | 
               | But personally, if I was the _sole dictator_ of Java1.0
               | language design with nobody else to please, I would just
               | have `synchronize(MonitorSlot)` + explicitly-defined
               | MonitorSlot members (that, if you declare one, must
               | always be declared final, and cannot be assigned to in
               | the constructor), and that 's it. Just refer to them by
               | name when you need one.
        
               | paulddraper wrote:
               | Yes with sufficient complexity, it's possible to achieve
               | transparently the same result.
        
               | titzer wrote:
               | For me, the main question is where this complexity lives.
               | To minimize the places where sharp knives need to be
               | used, I would prefer to move this up into a language
               | runtime and have the VM/engine underneath focus on
               | implementing a simpler object model that has the
               | mechanisms to pull off these tricks without resorting to
               | unsafe tricks.
        
         | pron wrote:
         | Neither will Java assuming the next stage of this project goes
         | as planned. The overhead for them will only be allocated on
         | demand for those objects that need them.
        
           | titzer wrote:
           | I get that monitors are inflated dynamically, but from my
           | reading of the linked JEP, Compressed Object Headers have
           | something like 24 bits allocated for the hashcode in the 64
           | bit word?
        
             | nicktelford wrote:
             | Towards the bottom of the JEP, they mention that the
             | ultimate goal is 32 bit object headers, which would
             | necessitate object monitors be tracked on-demand in a side
             | table. That's what the parent was getting at.
        
             | pron wrote:
             | Right, but that's the first phase. In the next, I believe
             | the goal is to attempt to reduce the header to 32 bits, and
             | then the header overhead (for all objects) for both
             | monitors and hashcodes will be ~3 bits, basically just to
             | indicate whether or not they're used. For objects where
             | they're used, they will be stored outside the header.
        
               | titzer wrote:
               | That's pretty cool. These things are super tricky and are
               | hard to get into a high-performance production system, so
               | I respect the journey :-)
               | 
               | For Wasm GC, I think we need programmable metaobjects to
               | be able to combine language-level metadata with the
               | engine-level metadata. I have only a prototype in my head
               | for Virgil and plan to explore this in Wizard soon.
        
         | derefr wrote:
         | Given https://shipilev.net/jvm/anatomy-quarks/26-identity-hash-
         | cod...:
         | 
         | > For identity hash code, there is no guarantee there are
         | fields to compute the hash code from, and even if we have some,
         | then it is unknown how stable those fields actually are.
         | Consider java.lang.Object that does not have fields: what's its
         | hash code? Two allocated Object-s are pretty much the mirrors
         | of each other: they have the same metadata, they have the same
         | (that is, empty) contents. The only distinct thing about them
         | is their allocated address, but even then there are two
         | troubles. First, addresses have very low entropy, especially
         | coming from a bump-ptr allocator like most Java GCs employ, so
         | it is not well distributed. Second, GC moves the objects, so
         | address is not idempotent. Returning a constant value is a no-
         | go from performance standpoint.
         | 
         | How else _could_ identity hashcodes be implemented? Is it just
         | impossible to put a WebAssembly GC base-object as a key into a
         | map?
        
           | titzer wrote:
           | The idea is that if your language has an identity hash code
           | for every object, the language runtime can just add a field
           | in the Wasm struct for it. There's nothing special about that
           | field; it could be an 8-bit, 16-bit, 32-bit field, etc. For
           | the lock inflation logic and so on, you can make a "tagged
           | pointer" in Wasm GC by using the i31ref type, so you could do
           | something like have only the identity hash code by default,
           | but "inflate" to a (boxed) indirection with an additional
           | monitor dynamically. But the Wasm engine just treats it all
           | as regular fields. The overall idea is the Wasm GC gives you
           | the mechanisms by which you can implement your specific
           | language's needs, hopefully as efficiently as a native VM
           | could.
        
             | moonchild wrote:
             | One proposed strategy
             | (https://wiki.openjdk.org/display/lilliput#Main-Hashcode)
             | for dealing with hashes is to compute them lazily, taking
             | advantage of the fact that most objects are never hashed:
             | initially use the object's address, but also set a flag on
             | the object when it is first hashed; when the gc next moves
             | the object, it adds an extra field in which the original
             | address is cached.
             | 
             | Another strategy, used by sbcl, is to rehash identity hash
             | tables after each gc cycle.
             | 
             | How could I implement either of these with wasm?
        
               | derefr wrote:
               | > when the gc next moves the object, it adds an extra
               | field in which the original address is cached.
               | 
               | Under this scheme, if I allocate object A at address 0,
               | then GC-move it to address 100 (such that it caches that
               | it was "originally at" address 0); and then, with object
               | A still alive, I allocate object B at address 0... then
               | don't objects A and B both now have 0 as their identity
               | hashcode?
               | 
               | (I'm guessing the answer here is "this only works with
               | generational GC, and the GC generation sequence-number is
               | an implicit part of the stateless-address hashcodes and
               | an explicit part of the cached-in-member hashcodes")
        
         | PaulHoule wrote:
         | Java is also considering making value objects that don't have
         | that overhead:
         | 
         | https://openjdk.org/jeps/8277163
         | 
         | .NET has something like that already, but it would make a huge
         | difference for things like Optional or if you want to make a
         | type for, say, complex numbers where there is just 64-bits of
         | data (for FP32) and even a 64-bit object header would be 100%
         | overhead. Value objects could be possibly allocated on the
         | stack and completely bypass the garbage collector in some
         | cases.
         | 
         | Note that with that overhead, Java is an environment in which
         | you can write multithreaded applications with good reliability
         | and scaling, something that WebAssembly definitely isn't.
        
           | bcrosby95 wrote:
           | I've been lightly following some of their work here. From the
           | outside, it seems like a lot of this has been born from
           | optimizing the JVM over the past few decades, running into
           | fundamental problems with how dynamic the JVM can be, then
           | targeting changes for making that dynamicism opt-in, either
           | through JVM flags or new features.
           | 
           | I do think some of it is kinda interesting: some people might
           | say "if you never do X, we can guarantee Y". But it's nice,
           | as a programmer, to be told "if you use feature A, you will
           | never do X, which means we can guarantee Y". It's a lot more
           | comforting that I can't slip up and accidentally do X.
        
             | titzer wrote:
             | > It's a lot more comforting that I can't slip up and
             | accidentally do X.
             | 
             | Indeed, this is why value semantics (i.e. structural
             | equality) for language constructs like ADTs is so
             | wonderful. Because a program can never observe "identity",
             | which is an implementation detail, the implementation
             | doesn't have to use objects at all underneath. That opens a
             | whole host of value representation options that aren't
             | otherwise available.
        
         | kccqzy wrote:
         | Monitors are especially strange when I learned Java. I mean,
         | why would you put a synchronization feature into every object?
         | It's not necessary in the vast majority of the time.
         | 
         | Maybe the programming style at the time had something to do
         | with it. Maybe they thought every class needs to be thread safe
         | and mutable.
        
           | layer8 wrote:
           | This was over 30 years ago when there was still a lack of
           | awareness that locks don't compose [0]. Look at the early JRE
           | classes like Vector, Hashtable, and StringBuffer, whose
           | operations are synchronized. The idea was that since you'd
           | potentially want to synchronize access to any mutable object
           | (because you'd want to be able to share any object between
           | threads), and most objects are mutable, that the most
           | convenient solution would be for every object to have that
           | functionality built in.
           | 
           | [0] https://en.wikipedia.org/wiki/Lock_(computer_science)#Lac
           | k_o...
        
         | vbezhenar wrote:
         | Yeah, I asked why there're no Hashable & Equatable interfaces
         | instead of putting things into Object which would make more
         | sense to me. People responded that's just how things should be.
         | Apparently not. IMO object identity is almost always not needed
         | and a sign of bad design. You either should write proper hash
         | code, or you should not use data structures which use hash
         | code.
        
           | lazulicurio wrote:
           | Yeah, having equals and hashCode on the root Object class is
           | Java's biggest mistake, IMO. Although for a slightly
           | different reason: equality is usually context-dependent, but
           | having equals as an instance method ties you to one
           | implementation.
        
             | devman0 wrote:
             | That is somewhat fixed by Comparable, but the fact that
             | HashMap doesn't have a pluggable Hash override always
             | bothered me.
        
           | cesarb wrote:
           | > You either should write proper hash code, or you should not
           | use data structures which use hash code.
           | 
           | There's also the question of _which hash code_. Do you use a
           | fast but low quality hash function, or a slower but higher
           | quality one? Does your hash function need to be secure
           | against hash collision attacks? Does it have to be
           | deterministic? The correct choice of hash function can depend
           | on the data structure, and the same object might have to be
           | hashed with different hash functions (or different hash
           | function seeds) in the same program.
        
           | josefx wrote:
           | > IMO object identity is almost always not needed and a sign
           | of bad design.
           | 
           | Sometimes you have no choice. I had multiple times where I
           | needed to store additional data for instances of class X but
           | had not control over it, so I had to store it in a separate
           | structure and keep track of things by object identity.
           | 
           | > You either should write proper hash code
           | 
           | And object identity fulfills all the requirements of a
           | "proper" hash code.
        
           | mike_hearn wrote:
           | The reason is that it'd make HashMap and HashSet a lot less
           | useful. If every type had to opt in, you'd be unable to use
           | many types as keys or set entries for no better reason than
           | the authors didn't bother to implement hash codes or
           | equality. By providing reasonable identity-based versions of
           | these, it increases the utility of the language at a tradeoff
           | in memory usage.
        
             | hesk wrote:
             | You could implement the hashing code in a helper class and
             | construct the HashMap with it.
        
               | mike_hearn wrote:
               | Without a way to give objects identity you'd get stuck
               | pretty fast as it's not guaranteed you have access to any
               | data to hash. You'd have to break encapsulation. That
               | would then hit problems with evolution, where a new
               | version of a class changes its fields and then everything
               | that uses it as a hashmap key breaks.
               | 
               | My experience with platform design has consistently been
               | that handling version evolution in the presence of
               | distant teams increases complexity by 10x, and it's not
               | just about some mechanical notion of backwards
               | compatibility. It's a particular constraint for Java
               | because it supports separate compilation. This enables
               | extremely fast edit/run cycles because you only have to
               | recompile a minimal set of files, and means that
               | downloading+installing a new library into your project
               | can be done in a few seconds, but means you have to
               | handle the case of a program in which different files
               | were compiled at different times against different
               | versions of each other.
        
               | titzer wrote:
               | JavaScript handles the "no identity hash" with WeakMap
               | and WeakSet, which are language built-ins. For Virgil, I
               | chose to leave out identity hashes and don't really
               | regret it. It keeps the language simple and the
               | separation clear. HashMap (entirely library code, not a
               | language wormhole) takes the hash function and equality
               | function as arguments to the constructor.
               | 
               | [1] https://github.com/titzer/virgil/blob/master/lib/util
               | /Map.v3
               | 
               | This is partly my style too; I try to avoid using maps
               | for things unless they are really far flung, and the
               | things that end up serving as keys in one place usually
               | end up serving as keys in lots of other places too.
        
               | josefx wrote:
               | Does that assume that you have only one key type and not
               | an infinite sized hierarchy of child classes to deal
               | with? If you had a map that took a Number as key, how
               | many child classes do you think your helper class would
               | cover and what extension framework would it use to be
               | compatible with user defined classes?
        
             | jzoch wrote:
             | yeah this is where traits instead of hierarchies become
             | useful - I should be able to implement the Hash interface
             | for an object I do not own and then use that object + trait
             | going forward for HashMap and HashSet.
             | 
             | Java doesnt make this very composable
        
               | nusaru wrote:
               | Should be noted that Rust (one of the most prominent
               | languages with traits) doesn't allow you to implement a
               | trait for an object you do not own. A common workaround
               | is to wrap that object in your own tuple struct and then
               | implement the trait for that struct.
        
               | Sharlin wrote:
               | (If you don't own the trait either, that is. Your own
               | traits can be implemented for foreign types.)
               | 
               | Rust's approach to the Hash and Eq problem is to make
               | them opt-in but provide a derive attribute that
               | autoimplements them with minimal boilerplate for most
               | types.
               | 
               | Also, Rust's Hash::hash implementations don't actually
               | hash anything _themselves_ , they just pass the relevant
               | parts of the object to a Hasher passed as a parameter.
               | This way types aren't stuck with just a single hash
               | implementation, and normal programmers don't need to
               | worry about primes and modular arithmetic.
        
               | Phelinofist wrote:
               | That sounds like a use case for the decorator pattern?
        
       | maherbeg wrote:
       | Java has just been on a roll with such substantial changes to
       | fundamental pieces of the runtime! Great work to the teams that
       | keep making improvements in backwards compatible manners.
        
         | capableweb wrote:
         | This specific change seems to impact the JVM the runtime, not
         | Java the language. Which is great for people like me, who
         | heavily rely on JVM but couldn't care less about Java.
        
           | pron wrote:
           | When some people (like me) say "Java", they mean the Java
           | Platform, and they call what you call Java "the Java
           | language". When you say JVM, I assume you mean the Java
           | Platform minus the language (the language is ~3-5% JDK, the
           | JVM is about 25%). People who use "the JVM" but not "Java"
           | use ~97% of Java, i.e. the Java platform.
        
           | pulse7 wrote:
           | People may not care about your "carelessness about Java"...
        
             | capableweb wrote:
             | I'm fairly sure that's fine, people get to care about
             | whatever they want :)
        
           | Scarbutt wrote:
           | Funny, those languages rely heavily on the Java ecosystem and
           | libraries, and they better hope the Java ecosystem keeps
           | thriving, improving current libraries and producing new
           | libraries else they become impractical. Show me your pure
           | Clojure production grade http servers or database drivers ;)
        
       | kernal wrote:
       | Summary
       | 
       | Reduce the size of object headers in the HotSpot JVM from between
       | 96 and 128 bits down to 64 bits on 64-bit architectures. This
       | will reduce heap size, improve deployment density, and increase
       | data locality.
        
       ___________________________________________________________________
       (page generated 2023-05-04 23:00 UTC)