[HN Gopher] Strings Just Got Faster
       ___________________________________________________________________
        
       Strings Just Got Faster
        
       Author : Tomte
       Score  : 106 points
       Date   : 2025-05-01 06:33 UTC (1 days ago)
        
 (HTM) web link (inside.java)
 (TXT) w3m dump (inside.java)
        
       | gavinray wrote:
       | This post makes mention of a new JEP I hadn't heard of before:
       | "Stable Values"
       | 
       | https://openjdk.org/jeps/502
       | 
       | https://cr.openjdk.org/~pminborg/stable-values2/api/java.bas...
       | 
       | I don't understand the functional difference between the
       | suggested StableValue and Records, or Value Classes.
       | 
       | They define a StableValue as:                 > "A stable value
       | is a holder of contents that can be set at most once."
       | 
       | Records were defined as:                 > "...  classes that act
       | as transparent carriers for immutable data. Records can be
       | thought of as nominal tuples."
       | 
       | And Value Objects/Classes as:                 > "... value
       | objects, class instances that have only final fields and lack
       | object identity."
       | 
       | Both Records and Value Objects are immutable, and hence can only
       | have their contents set upon creation or static initalization.
        
         | sagacity wrote:
         | Handwavy explanation: A stable value is used as a static
         | constant, with the difference being that you can initialize it
         | at runtime. Once initialized it is treated as fully constant by
         | the JVM. It's similar to something like lateinit in Kotlin,
         | except on the JVM level.
         | 
         | Records are also immutable, but you can create them and delete
         | them throughout your application like you would a regular
         | class.
        
           | gavinray wrote:
           | But this is also achievable with static init methods on
           | records and value classes, right?                 record
           | Rational(int num, int denom) {           Rational {
           | int gcd = gcd(num, denom);               num /= gcd;
           | denom /= gcd;           }       }
        
             | sagacity wrote:
             | How would you do the same thing with the late
             | initialization of, say, a HashMap where you don't control
             | the constructor?
        
               | gavinray wrote:
               | Hmm, I see what you're saying -- yeah, this seems to be
               | semantically identical to Kotlin's `lateinit` then
        
               | sagacity wrote:
               | Exactly. AFAICT this will be optimized at the JVM level
               | though, so once you've initialized the constant the JIT
               | compiler can take full advantage of that fact.
        
           | pkulak wrote:
           | So every time you take the hash of a string you leak 4 bytes
           | of memory???
           | 
           | I assume it's static in the context of it's containing
           | object. So, it will be collected when it's string is
           | collected.
        
           | leksak wrote:
           | > It's similar to something like lateinit in Kotlin, except
           | on the JVM level.
           | 
           | What level are you suggesting lateinit happens at if not on
           | the JVM?
        
           | w10-1 wrote:
           | > used as a static constant
           | 
           | Yes, but remind people it's not static in the sense of being
           | associated with the class, nor constant for compile-time
           | purposes.
           | 
           | Perhaps better to say: A stable value is lazy, set on first
           | use, resulting in pre- and post- initialization states. The
           | data being set once means you cannot observe a data change
           | (i.e., appears to be immutable), but you could observe
           | reduction in resource utilization when comparing instances
           | with pre-set or un-set values -- less memory or time or other
           | side-effects of value initialization.
           | 
           | So even if data-immutable, a class with a stable value ends
           | up with behavior combinations of two states for each stable
           | value. Immutable records or classes without stable values
           | have no such behavior changes.
           | 
           | But, writ large, we've always had this with the JVM's hotspot
           | optimizations.
           | 
           | For String, it becomes significant whether hashcode is used
           | when calculating equals (as a fast path to negative result).
           | If not, one would have two equal instances that will behave
           | differently (though producing the same data), at least for
           | one hashcode-dependent operation.
        
         | whartung wrote:
         | I don't know, these are mostly uninformed guesses, but the
         | distinction between Records and Value objects is that the
         | contents lack object identity.
         | 
         | Which, to me, means, potentially, two things.
         | 
         | One, that the JVM can de-dup "anything", like, in theory, it
         | can with Strings now. VOs that are equal are the same, rather
         | than relying on object identity.
         | 
         | But, also, two, it can copy the contents of the VO to
         | consolidate them into a single unit.
         | 
         | Typically, Java Objects and records are blobs of pointers. Each
         | field pointing to something else.
         | 
         | With Value Objects that may not be the case. Instead of acting
         | as a collection of pointers, a VO with VOs in it may more be
         | like a C struct containing structs itself -- a single,
         | continuous block of memory.
         | 
         | So, an Object is a collection of pointers. A Record is a
         | collection of immutable pointers. A Value Object is (may be) a
         | cohesive, contiguous block of memory to represent its contents.
        
         | blacklion wrote:
         | Records can be changed via reflection and thus doesn't
         | participate in constant-folding in JIT phases, as this could
         | break, for example, some serialization libraries and other
         | nasty and dirty code. It will work in interpreter mode and
         | eventually has heisenbugs after JIT. Not good.
         | 
         | `@Stable` annotation (only internal for now) and
         | `StableValue<>` (for user code in future) says JIT that
         | programmer guarantee (swear by his life!) that no dirty tricks
         | are played with these values in whole codebase and JIT can
         | constant-fold these values as soon as they are initialized.
        
           | _old_dude_ wrote:
           | I've just tried to change a field of an instance of a record
           | using reflection, and it's not possible, even with
           | setAccessible(true).
        
             | blacklion wrote:
             | ooops, my bad. I remember, that Aleksey Shipilev explained
             | why even `final static` fields are not constant-folded by
             | JIT, and thought that record classes which were introduced
             | later, is the same.
             | 
             | It means, that `StableValue<>` can be used in simple
             | classes (where `final` fields are still not constant-
             | folded) and, additionally, supports late initialization.
        
         | _old_dude_ wrote:
         | In Java, constants are declared as static final.
         | static final Complex CONSTANT = new Complex(1, 2);
         | 
         | If you want a lazy initialized constant, you want a stable
         | value                 static final StableValue<Complex>
         | STABLE_VALUE = StableValue.of();            Complex
         | getLazyConstant() {         return STABLE_VALUE.orElseGet(() ->
         | new Complex(1, 2))       }
         | 
         | If you want the fields of a constant to be constant too,
         | Complex has to be declared has a record.
        
       | Traubenfuchs wrote:
       | This all sounds very hard to grasp for me. Does this only work
       | with Map.of? Would it also work with map.put?
       | 
       | What would be the performance improvement in average java
       | services?
       | 
       | Are there specific types of applications that would benefit a
       | lot?
       | 
       | Does this make string.intern() more valueable? String caches?
        
         | amiga386 wrote:
         | > Does this only work with Map.of? Would it also work with
         | map.put?
         | 
         | It would be faster but not as blindingly fast. Combined with an
         | immutable map, what it means is that the JVM can directly
         | replace your key with its value, like the map is not even
         | there. Because the key's hashcode won't ever change, and the
         | map won't ever change.
         | 
         | > Does this make string.intern() more valueable?
         | 
         | No, String.intern() does a different job, it's there to save
         | you memory - if you know a string (e.g. an attribute name in an
         | XML document) is used billions of times, and parsed out of a
         | stream, but you know you only want one copy of it and not a
         | billion copies). The downside is that it puts the string into
         | PermGen, which means if you start interning normal strings,
         | you'll run out of memory quickly.
        
           | Traubenfuchs wrote:
           | How does the jvm know the map is immutable?
           | 
           | But interned strings can also reuse their hashcode forever.
        
             | amiga386 wrote:
             | Map.of() promises to return an immutable map. new
             | HashMap<>() does not.
             | 
             | https://docs.oracle.com/en/java/javase/17/docs/api/java.bas
             | e...
             | 
             | How it tells the JVM this? It uses the internal annotation
             | @jdk.internal.ValueBased
             | 
             | https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base
             | /...
        
           | daxfohl wrote:
           | How would it directly replace your key with its value? What
           | if there are bucket collisions? Do immutable maps expand
           | until there aren't any? Moreover, what if there are hash key
           | collisions? There needs to be some underlying mechanism to
           | deal with these, I'd think. I don't see how replace-like-the-
           | map-isn't-there could work. Or even how "@Stable" could be
           | used to affect it. Would love to understand more deeply.
        
             | amiga386 wrote:
             | > How would it directly replace your key with its value?
             | 
             | In the same way that if you wrote this C code:
             | const int x[] = {20, 100, 42};         int addten(int idx)
             | { return x[idx] + 10; }
             | 
             | the C compiler would "just know" that anywhere you wrote
             | x[2], it could substitute 42. Because you signalled with
             | the "const" that these values will never change. It could
             | even replace addten(2) with 52 and not even make the call
             | to addten(), or do the addition.
             | 
             | The same goes for Java's value-based classes: https://docs.
             | oracle.com/en/java/javase/17/docs/api/java.base...
             | 
             | But it's a bit more magical than C, because _some_ code
             | runs, to initialise the value, and then once it's
             | initialised, there can be further rounds of code
             | compilation or optimisation, where the JVM can take
             | advantage of knowing these objects are plain values and can
             | participate in things like constant-folding, constant
             | propagation, dead-code elimination, and so on. And with
             | @Stable it knows it that if a function has been called once
             | and didn't return zero, it can memoise it.
             | 
             | > What if there are bucket collisions? Do immutable maps
             | expand until there aren't any? Moreover, what if there are
             | hash key collisions?
             | 
             | I don't know the details, but you can't have an immutable
             | map until it's constructed, and if there are problems with
             | the keys or values, it can refuse to construct one by
             | throwing a runtime exception instead.
             | 
             | Immutable maps make a lot of promises -- https://docs.oracl
             | e.com/en/java/javase/17/docs/api/java.base... -- but for
             | the most part they're normal HashMaps that are just making
             | semantic promises. They make enough semantic promises
             | internally to the JVM that it can constant fold them, e.g.
             | with x = Map.of(1, "hello", 2, "world") the JVM knows
             | enough to replace x.get(1) with "hello" and x.get(2) with
             | "world" without needing to invoke _any_ of the map
             | internals more than once.
             | 
             | What wasn't working until now was strings as keys, because
             | the JVM didn't see the String.hash field as stable. Now it
             | does, and it can constant fold _all_ the steps, meaning you
             | can also have y = Map.of("hello", 1, "world", 2) and the
             | JVM can replace y.get("hello") with 1
        
         | daxfohl wrote:
         | > Does this make string.intern() more valuable?
         | 
         | Probably depends on the use case, though I'm having trouble
         | thinking of such a use case. If you were dynamically creating a
         | ton of different sets that had different instances of the same
         | strings, then, maybe? But then the overhead of calling
         | `.intern` on all of them would presumably outweigh the overhead
         | of calling `.hash` anyway. In fact, now that `.hash` is faster,
         | that could ostensibly make `.intern` _less_ valuable. I guess.
        
       | daxfohl wrote:
       | Cool that we're still seeing perf improvements after all these
       | years! I'm confused by some of the details in the example. Like,
       | would we see similar 8x improvement in a simpler example like a
       | string hashset lookup? Is there something special about
       | MethodHandle or _immutable_ maps here that accentuates the
       | improvement?
       | 
       | > Computing the hash code of the String "malloc" (which is always
       | -1081483544)
       | 
       | Makes sense. Very cool.
       | 
       | > Probing the immutable Map (i.e., compute the internal array
       | index which is always the same for the malloc hashcode)
       | 
       | How would this work? "Compute" seems like something that would be
       | unaffected by the new attribute. Unless it's stably memoizing,
       | but then I don't quite see what it would be memoizing here: it's
       | already a hash map.
       | 
       | > Retrieving the associated MethodHandle (which always resides on
       | said computed index)
       | 
       | Has this changed? Returning the value in a hash map once you've
       | identified the index has always been zero overhead, no?
       | 
       | > Resolving the actual native call (which is always the native
       | malloc() call)
       | 
       | Was this previously "lazyinit" also? If so, makes sense, though
       | would be nice if this was explained in the article.
        
         | twic wrote:
         | > How would this work? "Compute" seems like something that
         | would be unaffected by the new attribute.
         | 
         | The index is computed from the hashcode and the size of the
         | array. Now that the hash code can be treated as a constant, and
         | the size of the array is already a constant, the index can be
         | worked out at compile time. The JVM can basically inline all
         | the methods involved in creating and probing the map, and
         | eliminate it entirely.
        
           | daxfohl wrote:
           | The bucket is computed from the hash code and the size of the
           | array, but that's not necessarily the index. If there are no
           | bucket collisions, then index==bucket and this works out. But
           | if there are bucket collisions then the index will be
           | different from the bucket. So you still need some computation
           | IIUC. And there's no way to memoize that result, since
           | memoization would require a hashmap that has the exact same
           | characteristics as the original hashmap.
           | 
           | I guess a @Stable attribute on the array underlying the map
           | would allow for the elimination of one redirection: in a
           | mutable map the underlying array can get resized so its
           | pointer isn't stable. With an annotated immutable map it
           | could be (though IDK whether that'd work with GC defrag etc).
           | But that seems like relatively small potatoes? I don't see a
           | way to "pretend the map isn't even there".
        
       | esafak wrote:
       | Will Kotlin and Scala benefit?
        
         | daxfohl wrote:
         | They should. It's a JVM feature on core JDK classes.
        
           | taeric wrote:
           | This implies they have their own String implementations? I
           | wouldn't be shocked to find out that is the case, but it
           | would surprise me.
        
             | daxfohl wrote:
             | Why? IIUC it implies that they _don't_ have their own
             | string implementations. They get the benefit of Java JDK's
             | string implementation (and JVM optimization thereof) for
             | free. If they had their own string implementations, they'd
             | be unable to use this optimization until it's publicly
             | available.
        
       | _tom_ wrote:
       | Off-the-cuff thought:
       | 
       | Could you solve the empty string hashes to zero problem by just
       | adding one when computing hash codes?
        
       | Bootvis wrote:
       | Is @Stable a generalization of the final keyword or is there more
       | to it that I'm missing?
        
         | selkin wrote:
         | Final is an access modifier. It controls who can change a value
         | (in this case, no one). And like other access modifiers, you
         | can use reflection to remove or change those modifier. It is a
         | security barrier the developer asks the system to uphold.
         | 
         | The Stable annotation is an optimization mechanism: a promise
         | the developer makes to the compiler. It is on the developer to
         | uphold.
        
       | vault wrote:
       | why is a website about programming not writing code blocks in
       | ASCII? I'm referring to quotes and other undesirable symbols
        
         | VWWHFSfQ wrote:
         | Java supports non-ASCII source code since forever. A lot of
         | languages do.
        
           | dullcrisp wrote:
           | But it doesn't support using fancy unicode quotes as string
           | delimiters.
        
       | int_19h wrote:
       | I'm rather surprised that string hashes aren't randomized in JVM
       | - or at least that's what the article implies. These days, it's a
       | fairly common mitigation for DDoS attacks based on hash
       | collisions. Does Java not do it to preserve backwards
       | compatibility because too much existing code relies on a specific
       | hashing algorithm?
        
         | sedatk wrote:
         | Java uses a tree instead of a linked list for collided items,
         | so the search performance degrades more gracefully (e.g. O(N)
         | vs O(logN)).
        
           | 77 wrote:
           | To be more precise, Java initially uses a linked list for
           | nodes within a bin. If the number of items inside the bin
           | crosses TREEIFY_THRESHOLD (which is 8), then that specific
           | bin is converted into a RB tree.
           | 
           | This is detailed in implementation notes comment here: https:
           | //github.com/openjdk/jdk/blob/56468c42bef8524e53a929dc...
        
         | yxhuvud wrote:
         | Nothing stops the jvm from caching hashes even if the hashes
         | are unique per process invocation.
        
           | PaulHoule wrote:
           | They aren't and it's quite unfortunate that the empty string
           | hashes to 0 so it will have to recompute every time although
           | presumably it is quick to compute the hash of the empty
           | string.
        
         | ivan_gammel wrote:
         | Java hashCode contract is to optimize calculation of hash for
         | performance not for collision search resistance. Its sole
         | purpose is to use it in collections. It must not be used in
         | situations where you need cryptographic properties.
        
           | tialaramex wrote:
           | So, the problem here is that you're thinking of
           | "cryptographic properties" as only the "cryptographic hashes"
           | such as those built with the Merkle-Damgard construction
           | (SHA-512/256 being the one you might reasonable pick today)
           | 
           | But, it's actually desirable to have _some_ cryptographic
           | properties in a faster one way function for making hash
           | tables. Read about SipHash to see why.
           | 
           | Because Java didn't (and as others have discussed, now can't)
           | choose otherwise the hash table structures provided must
           | resist sabotage via collision, which isn't necessary if your
           | attack can't collide the hash you use.
        
           | GolDDranks wrote:
           | The thing is, even it being used just in collections can lead
           | to DOS, if the attacker can control the string contents, and
           | selectively choose strings that your hash table stops being a
           | hash table and turns into a linked list.
        
           | andrewaylett wrote:
           | Even in collections, an unstable hash is desirable -- to
           | avoid denial of service attacks caused by attacker-controlled
           | hash collisions.
           | 
           | For example, Python back in 2012:
           | https://nvd.nist.gov/vuln/detail/cve-2012-1150.
        
         | ncruces wrote:
         | The hashing algorithm is specified in the docs:
         | s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
         | 
         | Per Hyrum's law, there's no changing it now.
         | 
         | https://docs.oracle.com/javase/8/docs/api/java/lang/String.h...
        
       | stevoski wrote:
       | I find the entire history of improvements to Java's String class
       | enjoyable to read about.
       | 
       | Over the years, the implementation of Java's String class has
       | been improved again and again, offering performance improvements
       | and memory usage reduction. And us Java developers get these
       | improvements with no work required other than updating the JRE we
       | use.
       | 
       | All the low-hanging fruit was taken years ago, of course. These
       | days, I'm sure most Java apps would barely get any noticeable
       | improvement from further String improvements, such as the one in
       | the article we're discussing.
        
         | niuzeta wrote:
         | I love hearing more about this, especially the historical
         | context, but don't have a good java writeups/articles on this.
         | Would you mind sharing some suggestions/pointers? I'd very much
         | appreciate it.
        
           | stevoski wrote:
           | A good starting point is Joshua Bloch's Effective Java. He
           | shares some stories there from Java's early days, and - at
           | least in passing - mentions some aspects of the String
           | class's history.
        
             | niuzeta wrote:
             | Ah, I certainly remember these anecdotes! What other
             | resources would you recommend(even the tidbits) could there
             | be for more modern Java? The original article like this one
             | should be treasured.
        
       | spoonsort wrote:
       | Having a @Stable annotation which the standard library can use to
       | signal things to the compiler like that is very smart. I wonder
       | how many standard libraries are missing out on optimizations like
       | this.
        
       | delusional wrote:
       | The example is entirely unconvincing. Why would you store those
       | calls in a map and not just a variable?
       | 
       | Even if the map is crucial for some reason, why not have the map
       | take a simple value (like a unint64) and require the caller to
       | convert their string into a slot before looking up the function
       | pointer. That way the cost to exchange the string becomes obvious
       | to the reader of the code.
       | 
       | I struggle to find a use case where this would optimize good
       | code. I can think of plenty of bad code usecases, but are we
       | really optimizing for bad code?
        
       ___________________________________________________________________
       (page generated 2025-05-02 23:00 UTC)