hngopher.com

       [HN Gopher] Go Does Not Need a Java Style GC
       ___________________________________________________________________
        
       Go Does Not Need a Java Style GC
        
       Author : nahuel0x
       Score  : 84 points
       Date   : 2021-11-23 15:47 UTC (7 hours ago)
        
 (HTM) web link (erik-engheim.medium.com)
 (TXT) w3m dump (erik-engheim.medium.com)
        
       | natanbc wrote:
       | > In a multithreaded program, a bump allocator requires locks.
       | That kills their performance advantage.
       | 
       | Java uses per-thread pointer bump allocators[1]
       | 
       | > While Java does it as well, it doesn't utilize this info to put
       | objects on the stack.
       | 
       | Correct, but it does scalar replacement[2] which puts them in
       | registers instead
       | 
       | > Why can Go run its GC concurrently and not Java? Because Go
       | does not fix any pointers or move any objects in memory.
       | 
       | Most java GCs are concurrent[3], if you want super low pauses you
       | can get those too[4][5]. Pointers can get fixed while the
       | application is running with GC barriers
       | 
       | [1]: https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/
       | 
       | [2]: https://shipilev.net/jvm/anatomy-quarks/18-scalar-
       | replacemen...
       | 
       | [3]: https://shipilev.net/jvm/anatomy-quarks/3-gc-design-and-
       | paus...
       | 
       | [4]: https://wiki.openjdk.java.net/display/zgc/Main
       | 
       | [5]: https://wiki.openjdk.java.net/display/shenandoah
        
         | majou wrote:
         | ZGC[4] in particular has me excited, enough so to want to pick
         | up a JVM language.
        
           | Thaxll wrote:
           | ZGC and Shenandoah can be slower than G1, those are not
           | silver bullets. The fact that there is 4-5 GCs explains the
           | situation, there is not a single GC that is better than the
           | others.
           | 
           | It really depends of the workload.
        
             | geodel wrote:
             | Indeed. It is strange that no official JDK document puts
             | pros/cons of GCs packaged with standard JDKs in some kinda
             | easy-to-read table/matrix.
        
           | the-alchemist wrote:
           | ZGC is already available, since JDK 15, September 2020. =)
           | 
           | https://wiki.openjdk.java.net/display/zgc/Main#Main-
           | ChangeLo...
        
             | majou wrote:
             | Max pause times of 0.5ms is what got me really interested.
             | 
             | It feels like a huge trade-off of GCs is almost completely
             | gone.
             | 
             | https://malloc.se/blog/zgc-jdk16
        
               | silon42 wrote:
               | There's still a memory tradeoff, some due to GC, some due
               | to Java (lots of runtime reflection...). Guessing 2-4x.
        
               | marginalia_nu wrote:
               | This is a bit of a tangent, but you can get into
               | situations where Java's memory-overhead becomes pretty
               | untenable. I was in a situation of having to keep track
               | of ~1 billion short strings of a median length of maybe 7
               | characters.
               | 
               | In terms of just data, that should clock in at about 10
               | Gb; in practice it was closer to 24 Gb. I tried going
               | with just byte[]-instances instead, which didn't help a
               | lot. Using long byte[]-instances and indexing those
               | doesn't help as much as you'd think because they get
               | sliced up into small objects behind the scenes.
               | 
               | I ended up memory mapping blocks of memory and basically
               | implementing my own append-only allocator.
        
               | jeeeb wrote:
               | FWIW. This would probably present a challenge in most
               | (all?) languages.
               | 
               | For example in libc++ due to SSO an std::string has a
               | minimum size of 24 bytes.
               | 
               | For a billion strings less than 15 chars (+ the null
               | byte) that gets you to 24GB, and that's optimistically
               | assuming each string is allocated in place.
               | 
               | I doubt heap allocated char* would do much better either.
               | Just having a billion 8 byte pointers eats a lot of
               | memory. You'd really need some sort of string packing
               | scheme similar to what you did in Java.
        
               | marginalia_nu wrote:
               | It's _a lot_ easier to build custom allocators in C++
               | though.
               | 
               | For one, Java has a maximum mmap-size of 2 Gb, and as a
               | cherry on top of that turd, you have no control over
               | their lifecycle. The language is very clearly not
               | designed for this type of work, and if you try to make it
               | do it anyway, it fights you every step of the way.
        
               | kasperni wrote:
               | The foreign memory API which is currently incubating
               | should help with most of these limitations:
               | https://openjdk.java.net/jeps/419
        
               | hashmash wrote:
               | The Lilliput project aims to address this:
               | https://wiki.openjdk.java.net/display/lilliput
        
               | CyberDildonics wrote:
               | I would never write something like this in java, but to
               | be fair, a program shouldn't be written like this in the
               | first place. If you "need" a billion strings in memory
               | and you didn't design for that with something that would
               | scale better, you messed up a long time ago.
        
               | marginalia_nu wrote:
               | Huh, I didn't really have a problem solving the problem
               | as it came up. Like many scaling problems, it wasn't a
               | problem until it was. Then I fixed it. Now I have a
               | solution that can deal with ten times as many strings as
               | before. If I grow out of that one, I'll come up with a
               | better design.
               | 
               | I could have gotten 10 times as much hardware instead,
               | but that would be an incredible waste of money compared
               | to just spending a few days writing more hardware-
               | efficient code.
        
               | kaba0 wrote:
               | Also, throughput. But latency and throughput are almost
               | universally opposite ends of the same axis -- that's why
               | it's great that Java allows for choosing a GC
               | implementation.
        
               | masklinn wrote:
               | The GC memory overhead affects all languages with a GC
               | more advanced than refcounting. It certainly does affect
               | Go as well.
        
               | masklinn wrote:
               | > It feels like a huge trade-off of GCs is almost
               | completely gone.
               | 
               | FWIW the tradeoff of low latency GC is usually paid in
               | throughput.
               | 
               | That is definitely the case for Go, which can lag very
               | much behind allocations (so if your allocation pattern is
               | bad enough the heap will keep growing despite the _live_
               | heap being stable, because the GC is unable to clear the
               | dead heap fast enough for the new allocations).
        
         | sam_bishop wrote:
         | I agree. The author seems to know quite a bit about Go and GCs,
         | but doesn't seem to have much experience with Java. As a Java
         | performance engineer, it sounds like he is comparing Go to how
         | he thinks Java works based on what he's read about it.
        
           | pjmlp wrote:
           | Additionally he doesn't seem to know that much about C#,
           | which also has advanced GC, while allowing for C++ like
           | memory management, if needed.
        
         | mappu wrote:
         | "Scalar replacement" explodes the object into its class member
         | variables and does not construct a class object at all. That
         | does result in the exact same `sub %esp` (that Go would do for
         | any struct), but it is restricted to only working if every
         | single usage of that class type is fully inlined and the class
         | is never passed anywhere that needs it in its object form.
         | 
         | It's worse than what Go has. Go can stack-allocate any struct
         | and still pass pointers to it to non-inlined functions.
        
           | pkolaczk wrote:
           | Scalar replacement does not work even in very trivial cases:
           | https://pkolaczk.github.io/overhead-of-optional/
           | 
           | In all those cases, Optionals were inlined, didn't escape,
           | yet they haven't been properly optimized out.
        
       | madmax108 wrote:
       | As some of the other comments in the thread allude, this is quite
       | a rudimentary (or rather outdated) understanding of how Java GC
       | operates and ends up (unfortunately) turning an otherwise good
       | comparison into a straw-man argument.
       | 
       | As someone who's worked with Java from the days where "If you
       | want superhigh performance from Java without GC pauses, then just
       | turn off GC and restart your process every X hours" was
       | considered a "valid" way to run high-performance Java systems, I
       | think the changes Java has made to GC are among the biggest
       | improvements to the framework/JVM and have contributed vastly to
       | JVM stability and growth over the last decade.
        
         | geodel wrote:
         | Fundamentals of Java about data/class layouts in memory have
         | remain same for decades. So author is right at big picture.
         | 
         | > I think the changes Java has made to GC are among the biggest
         | improvements to the framework/JVM and have contributed vastly
         | to JVM stability and growth over the last decade.
         | 
         | This is of course true. However the point is for Java it is
         | absolute necessity for Go it may be nice to have.
        
       | esarbe wrote:
       | > Java is a language that basically outsourced memory management
       | entirely to its garbage collector. This turned out to be a big
       | mistake.
       | 
       | Given that Java is not far behind C and C++ for many kinds of
       | workload and in quite some scenarios can outperform C code, I'm
       | not sure that I buy this line of reasoning.
       | 
       | > Doing these updates requires freezing all threads.
       | 
       | Eh, what? There are multi-threaded GCs, what is this article even
       | talking about?
       | 
       | > However, this does not put C# and Java on equal footing with
       | languages like Go and C/C++ in terms of memory management
       | flexibility
       | 
       | In which world are Go and C in the same worlds when it comes to
       | memory management flexibility? That's like comparing a language
       | that uses a Garbage Collector to one that requires manual memory
       | management. Because - it is.
       | 
       | > Modern Languages Don't Need Compacting GCs
       | 
       | > If need be, the Pacer slows down allocation while speeding up
       | marking.
       | 
       | So, you just traded in one drawback for another?
       | 
       | Heck, I get it. The JVM is not a thin graceful fawn. It's a
       | complicated beast that requires years of experience to tame it -
       | and even then it will come back from time to time an bite you.
       | There are many good points to critique the JVM - but I don't get
       | the feeling that the author of this article has spent much time
       | with modern JVMs because he's not pointing out any of them.
        
       | Yoric wrote:
       | I'm a bit skeptical.
       | 
       | Yes, Go has value types and pointers. But whether you need a
       | modern GC will undoubtedly depend a lot on the type of algorithms
       | you need to execute. Also, it's great that you can implement
       | (some form of) allocators, and that will definitely help for many
       | algorithms, but that's definitely a case of tradeoff between
       | convenience and readability. Similarly, unless I'm mistaken,
       | TCMalloc "solves" fragmentation in two cases: either allocations
       | are small (very common) or memory allocation maps neatly to
       | threads (much less common). That's two good cases to have, but I
       | wouldn't count on it solving memory allocation on its own for,
       | say, a browser engine or a videogame.
       | 
       | Oh, and hasn't Java's GC been fully concurrent for a while now?
       | 
       | That being said, revisiting the assumptions made by Java (and
       | other languages) is a very good idea.
        
       | seunosewa wrote:
       | I'm cautious about believing this sort of claim because I
       | remember reading about why Go doesn't need generics, yet here we
       | are waiting for Go Generics to be ready. However, the article
       | convincingly explains who Java has a greater need for a
       | compacting GC - it creates more garbage. This doesn't necessarily
       | mean Go won't benefit from having a generational, compacting GC
       | at some point, for some applications.
        
         | nemo1618 wrote:
         | > I remember reading about why Go doesn't need generics, yet
         | here we are waiting for Go Generics to be ready
         | 
         | Hey, us generics-naysayers are still out here (grumbling)! :)
        
         | mohanmcgeek wrote:
         | The JVM world right now has quite a few GCs and your choice
         | essentially dictates your program's performance tradeoffs.
         | 
         | But I'm not sure if there will ever be a point where this is
         | true for Go considering how little gets into the stack compared
         | to Java.
        
         | JulianMorrison wrote:
         | Go creates less garbage _and_ also stack allocates things using
         | escape analysis which as the article explains, is effectively a
         | form of generational garbage collection.
        
           | kaba0 wrote:
           | As far as I know (I'm not too familiar with Go), Go mostly
           | stack allocates based on the developer's intent, eg. by using
           | structs.
           | 
           | Java doesn't (yet) have an option for value types that can be
           | reliably stack allocated, so it resorts to very complex
           | escape analysis. Calling the former escape analysis is a bit
           | misleading imo, even if technically true.
        
             | vore wrote:
             | Go is a little more elaborate than that: if a value is
             | initialized as a pointer-to-struct (e.g. foo := &Foo{...})
             | and it doesn't escape the function, Go will allocate it as
             | if were a value type.
        
               | JulianMorrison wrote:
               | You can stack-allocate in C, with alloca() or by taking
               | the address of a local, and use it like a pointer. So
               | long as you're extremely sure nothing is going to hang
               | onto the pointer beyond the lifetime of that stack frame,
               | it's fine.
               | 
               | Same thing with Go, except that the compiler makes the
               | decision.
        
           | masklinn wrote:
           | However it's a weak one, the stack allocation acts as a form
           | of extremely limited nursery, but lots of "escaping" objects
           | could well fit into a nursery, to say nothing of "heap"
           | objects (like strings and slices) which always trigger heap
           | allocations.
           | 
           | Furthermore AFAIK most generational GCs have 3 generations,
           | not 2 (let alone 1.5).
           | 
           | It does make the tradeoff more complicated, a generational GC
           | is not simple (especially in a language with ubiquitous
           | mutability), but the casual dismissals are... troubling.
           | 
           | And that's before mentioning the regularly problematic lack
           | of tuning knobs of the Go GC, also often dismissed as "java
           | concerns" (which Go users have to work around using ugly
           | hacks when hit, because they don't have tuning knobs).
        
             | geodel wrote:
             | > Go users have to work around using ugly hacks when hit,
             | because they don't have tuning knobs
             | 
             | Yeah, Java users just have to hire Java performance tuning
             | experts from sprawling Java perf consulting cottage
             | industry. Can't get much simpler than that.
        
         | runevault wrote:
         | Even if it doesn't need specifically a compacting QC having a
         | swappable more tunable one might still be a big deal for people
         | who need something other than the current GC. Like a simple
         | example is the Discord article about how their use case could
         | not work with Go because it always ran the GC every two
         | minutes. If they could optionally disable that feature they
         | theoretically could have kept using Go instead of rewriting to
         | Rust.
        
           | lostcolony wrote:
           | That wasn't my read.
           | 
           | The Discord team that wrote the article I believe you're
           | referencing ( https://discord.com/blog/why-discord-is-
           | switching-from-go-to... ) wanted GC, the problem was that
           | they had a huge cache which took a long time for the GC to
           | scan. They could shrink the cache and it would remain
           | performant in the face of GC, but they took latency hits due
           | to increased cache misses. After a bunch of testing and
           | tweaking they found a goldilocks zone where performance was
           | okay.
           | 
           | Rust was introduced because it was already something other
           | teams were interested in, and when they tried a quick
           | prototype, they avoided the issue entirely, and saw better
           | performance even in the prototype than their finely tuned Go
           | code, so they switched to it.
        
           | pkaye wrote:
           | You can disable the GC in go.
           | https://pkg.go.dev/runtime/debug#SetGCPercent
        
           | Andys wrote:
           | The go gc was greatly improved in the versions subsequent to
           | the ones used by Discord. The timing there was unfortunate.
        
             | geodel wrote:
             | Besides sometimes engineers just want to use Rust.
        
       | hopsas wrote:
       | Do not add articles behind pay wall.
        
       | hopsas wrote:
       | Do not add articles behind paywall.
        
       | avita1 wrote:
       | Cool article, I'm not sure I agree with the headline.
       | 
       | I used to write low-scale Java apps, and now I write memory
       | intensive Go apps. I've often wondered what would happen if Go
       | _did_ have a JVM style GC.
       | 
       | It's relatively common in Go to resort to idioms that let you
       | avoid hitting the GC. Some things that come to mind:
       | 
       | * all the tricks you can do with a slice that have two slice
       | headers pointing to the same block of memory [1]
       | 
       | * object pooling, something so common in Go it's part of the
       | standard library [2]
       | 
       | Both are technically possible in Java, but I've never seen them
       | used commonly (though in fairness I've never written performance
       | critical Java.) If Go had a more sophisticated GC, would these
       | techniques be necessary?
       | 
       | Also Java is supposed to be getting value types soon (tm) [3]
       | 
       | [1] https://ueokande.github.io/go-slice-tricks/
       | 
       | [2] https://pkg.go.dev/sync#Pool
       | 
       | [3] https://openjdk.java.net/jeps/169
        
         | bestinterest wrote:
         | How have you found Go in contrast to Java. Is the simplicity
         | worth it?
        
         | munificent wrote:
         | _> Both are technically possible in Java, but I 've never seen
         | them used commonly (though in fairness I've never written
         | performance critical Java.)_
         | 
         | I don't know about the Java world, but in C#--especially in
         | games written in Unity--object pooling is very common.
        
         | jillesvangurp wrote:
         | Java has a pretty decent standard library with different list,
         | map and set implementations and quite a few third party
         | libraries with yet more data structures. Honestly, Go felt a
         | bit primitive and verbose to me on that front on the few times
         | I used it. Simplicity has a price and some limitations.
         | 
         | There are also other tricks you can do like for example using
         | off heap memory (e.g. Lucene does this), using array buffers,
         | or using native libraries. There obviously is a lot of very
         | memory intensive, widely used software written for the JVM and
         | no shortage of dealing with all sorts of memory related
         | challenges. I'd even go as far as to argue that quite a few of
         | those software packages might be a little out of the comfort
         | zone for Go. Maybe if it were used more for such things, there
         | would be increased demand for better GCs as well?
         | 
         | Object pooling is pretty common for things like connection
         | pools. For example apache commons pool is used for doing
         | connection pooling (database, http, redis, etc.) in Spring Boot
         | and probably a lot more products. Also there are thread pools,
         | worker pools and probably quite a few more that are pretty
         | widely used and quite a few of those come with the Java
         | standard library. Caching libraries are also pretty common and
         | well supported popular web frameworks like Spring.
         | 
         | A typical Java based search or database software product
         | (Elasticsearch, Kafka, Casandra, etc.) is likely to use all of
         | the above. Likewise for things like Hadoop, Spark, Neo4j, etc.
         | 
         | Of course there's a difference between Java the language and
         | the JVM, which is also targeted by quite a few other languages.
         | For example, I've been using Kotlin for the last few years.
         | There are functional languages like Scala and Clojure. And
         | people even run scripting languages on jython, jruby, groovy,
         | or javascript on it.
         | 
         | There even have been some attempts to make Go run on the JVM.
         | Apparently performance, concurrency and memory management were
         | big motivators for attempting that (you know, stuff the JVM
         | does at scale): https://githubmemory.com/repo/golang-
         | jvm/golang-jvm
         | 
         | Their pitch: "You can use go-jvm simply as a faster version of
         | Golang, you can use it to run Golang on the JVM and access
         | powerful JVM libraries such as highly tuned concurrency
         | primitives, you can use it to embed Golang as a scripting
         | language in your Java program, or many other possibilities."
        
           | fl0ki wrote:
           | Not sure if you're in on the joke, but for those who didn't
           | go to the repo itself:
           | 
           | https://github.com/golang-jvm/golang-jvm
           | 
           | It's just a copy-paste of JRuby on April 1st and the readme
           | now includes a rickroll.
           | 
           | Maybe it's irresponsible of them to leave it up in a way that
           | Google still finds as a legitimate-looking search result.
        
           | geodel wrote:
           | > There even have been some attempts to make Go run on the
           | JVM. Apparently performance, concurrency and memory
           | management were big motivators for attempting that (you know,
           | stuff the JVM does at scale):
           | 
           | This seems legit. Just links to their website/Wiki are not
           | working right now.
        
         | sam_bishop wrote:
         | Pooling objects (for the purposes of minimizing GC) is consider
         | a bad practice in modern Java. The article suggests that
         | compacting, generational collectors are a bad thing, but they
         | can dramatically speed up the amount of time it takes to
         | deallocate memory if most of your objects in a given region of
         | memory are now dead. All you have to do is remove objects that
         | are still alive, and you're done: that region is now available
         | for use again. The result is that long-lived objects have a
         | greater overhead.
        
         | papercrane wrote:
         | Object pooling in Java used to be fairly common. I don't see it
         | much anymore in new code, but used to run into it all the time
         | when writing code for Java 1.4/5. Even Sun used pooling when
         | they wrote EJBs. Individual EJBs can be recycled instead of
         | released to the GC.
         | 
         | Nowadays the GC implementations are good enough that's it's not
         | worth the effort and complexity.
         | 
         | Though now that I think about it Netty provides an object
         | pooling mechanism.
        
       | maxpert wrote:
       | Low Java knowledge and more Fan babel rather than true critique.
        
       | dragontamer wrote:
       | > In a multithreaded program, a bump allocator requires locks.
       | That kills their performance advantage.
       | 
       | Wait, what?
       | 
       | What's wrong with:                   char theHeap[0x1000000];
       | atomic_ulong bumpPtr;              void* bump_malloc(int size){
       | uint32_t returnCandidate = bumpPtr.fetch_add(size,
       | std::memory_order_relaxed);              if(returnCandidate +
       | size >= HEAP_SIZE){                 // Garbage collect. Super
       | complicated, lets ignore it lol.                 // Once garbage
       | collect is done, try to malloc again. If fail then panic().
       | }             return &theHeap[returnCandidate];         }
       | 
       | ------
       | 
       | You don't even need acq_release consistency here, as far as I can
       | tell. Even purely relaxed memory ordering seems to work, which
       | means you definitely don't need locks or even memory barriers.
       | 
       | The exception is maybe the garbage-collect routine. Locking for
       | garbage collect is probably reasonable (other programmers accept
       | that garbage-collection is heavy and may incur a large running +
       | synchronization costs), but keeping locks outside of the "hot
       | path" is the goal.
       | 
       | ------
       | 
       | This is what I do with my GPU test programs, where locking is a
       | very, very bad idea (but possible to do). Even atomics are kinda-
       | bad in the GPU world but relaxed-atomics are quite fast.
       | 
       | -------
       | 
       | > In Java, this requires 15 000 separate allocations, each
       | producing a separate reference that must be managed.
       | 
       | Wait, what?                   points [] array = new
       | points[15000];
       | 
       | This is a singular alloc in Java. 15000 _constructors_ will be
       | called IIRC (My Java is rusty though, someone double-check me on
       | that), but that's not the same as 15000 allocs being called, not
       | by a long shot.
       | 
       | ------
       | 
       | Frankly, it seems like this poster is reasonably good at
       | understanding Go performance, but doesn't seem to know much about
       | Java performance or implementations.
        
         | GreenToad wrote:
         | Point[] array=new Point[15000];
         | 
         | In java would create an array with null references, to fill it
         | up you need to create each object so that point is valid.
        
           | dragontamer wrote:
           | I stand corrected on that point.
        
       | jeffbee wrote:
       | There are very few falsifiable statements in this article but the
       | extant ones are all demonstrably false, starting with this
       | whopper:
       | 
       | "This typically causes Java programs to have complete freezes of
       | several hundred milliseconds where objects get moved around"
       | 
       | Yeah, i mean, definitely not. The only language I regularly work
       | with that has this property is, surprise, Go.
       | 
       | The proof is in the tasting, as they say. Go look at the gRPC
       | daily benchmarks if you want hard data. Java beats Go latency at
       | the median and at the tail while having substantially better
       | specific throughput. In other words the Java GC has no tradeoff
       | versus the Go GC; it is better in _every way_.
        
         | feffe wrote:
         | To be fair, early Java definitely had lots of issues with GC
         | and "hanging" programs due to GC activity. It's a failure of
         | the article to bring it up if modern Java implementation are
         | better on that point. As an example of bad reputation, I think
         | the latest Eclipse IDE on the latest Java SDK is still sucky
         | and because of my ignorance I blame it on Java.
         | 
         | I was disappointed with the first Go gRPC implementation, it
         | allocated like crazy and was really slow because of it. It's
         | been rewritten since then but I don't know how much better it
         | is now. To get good performance out of Go it is important to
         | think about memory allocations (unfortunately), just as it is
         | in C and C++. Although allocating like crazy in C/C++ will
         | probably mostly result in a general loss of performance, not
         | affect tail latencies in network protocols much.
        
         | Thaxll wrote:
         | The Java gRPC SDK is heavily optimized by Google, much more
         | than the Go one. Until recently Java pauses were a real issue,
         | 
         | I worked with a lot of Java based solutions ( elastic search,
         | hadoop etc ... ) and was never impressed by the GC.
        
           | jeffbee wrote:
           | Maybe, but it's still not possible that the Java test has a
           | median latency of 150us, a tail latency of 350us, and a
           | regular need to stop the world for hundreds of milliseconds.
           | That statement is simply not compatible with reality.
        
             | Thaxll wrote:
             | As I said the Java SDK is heavily optimized to do the least
             | minimum of allocations. I mean if you don't allocate much
             | the GC is not really an issue.
        
               | jeffbee wrote:
               | OK but again, that is a statement totally contrary to the
               | article. The article is claiming two things: that it is
               | easier in Go to avoid the heap while Java is utterly
               | dependent on allocating everything on the heap - which is
               | not consistent with the experience of actual Java and Go
               | programmers - and that Java has a "preference for high
               | throughput and high latency" which it doesn't.
        
               | Yoric wrote:
               | FWIW, I used to write some high-performance JavaScript.
               | One of the tricks of the trade was to allocate early and
               | make sure you avoid allocating once the loading phase has
               | started, hence avoiding triggering the garbage-collector
               | (in most runtimes/languages, gc phases are typically
               | triggered by the allocator). This involved writing very
               | non-idiomatic code, to a large extent reimplementing a
               | form of high-level custom allocator on top of the
               | existing allocator/gc, but it worked.
               | 
               | I'd be very surprised if the same wasn't possible in
               | Java.
               | 
               | I don't know if this is how gRPC is written but it
               | _could_ explain an apparent contradiction.
        
           | kaba0 wrote:
           | While I agree that the java gRPC lib is better maintained, I
           | don't agree with your second point. Java's GCs are the state
           | of the art, that can manage heap sizes up to 16 _tera_ bytes.
           | Other GC implementations would simply die there.
        
             | throwaway894345 wrote:
             | That's neat, but a lot of applications will never need
             | 16TB.
        
       | hashmash wrote:
       | The author doesn't really understand how Java escape analysis
       | works, and just focuses on one key aspect: "It does not replace a
       | heap allocation with a stack allocation for objects that do not
       | globally escape."
       | 
       | The author then implies that escape analysis is only used to
       | reduce lock acquisition. Java escape analysis will replace a heap
       | allocation with a stack allocation if the code is fully inlined.
       | This is known as scalar replacement.
        
       | pjmlp wrote:
       | I guess it is the same reasoning like Go not needing generics.
        
         | geodel wrote:
         | No, it is same reasoning as Java not needing dense memory
         | layouts
        
       | dandotway wrote:
       | The binary-trees benchmark on The Debian Language Shootout[1]
       | involves allocating millions of short-lived trees and traversing
       | them. It is informative about GC performance even with the caveat
       | that there are 'lies, damned lies, and benchmarks', because many
       | real-world graph analysis and brute force tree search algorithms
       | similarly allocate zillions of short-lived nodes. For non-GC
       | languages like C/C++/Rust it gives a decent idea of the
       | performance difference between malloc'ing and freeing individual
       | objects vs doing bulk arena allocations:                 language
       | secs       GC'd language?       ========         ====
       | ==============       C++ (g++)        0.94       Rust
       | 1.09       C (gcc)          1.54       Free Pascal      1.99
       | Intel Fortran    2.38       Java             2.48       yes
       | <====       Lisp (SBCL)      2.84       yes       Ada (GNAT)
       | 3.12       OCaml            4.68       yes       Racket
       | 4.81       yes       C# .NET          4.81       yes
       | Haskell (GHC)    5.02       yes       Erlang           5.19
       | yes       F# .NET          6.06       yes       Node.js
       | 7.20       yes       Julia            7.43       yes       Chapel
       | 7.96       yes       Dart             9.90       yes       Go
       | 12.23       yes  <====       Swift           16.15       *
       | "Automatic Reference Counting"       Smalltalk (VW)  16.33
       | yes       PHP             18.64       yes       Ruby
       | 23.80       yes       Python 3        48.03       yes       Lua
       | 48.15       yes       Perl            53.02       yes
       | 
       | So Java has the fastest GC for this test, 2.48 secs vs 12.23 secs
       | for Golang. The Java code is also notably perfectly idiomatic for
       | multicore, it doesn't do heroic "avoid GC by writing C-like code
       | manipulating a fixed global memory array" tricks. The Java code
       | is also more concise.
       | 
       | The 'plain C' code that uses Apache Portable Runtime memory pools
       | instead of standard malloc/free and uses OpenMP #pragma's strikes
       | me as more 'heroic' than 'idiomatic', whereas C++ and Rust use
       | standard libraries/crates and idiomatic patterns. (Note that
       | OpenMP is 'standard' for high-performance C and well supported
       | across GCC/LLVM/Microsoft/Intel compilers. But still....)
       | 
       | OCaml and Haskell made impressive showings for functional
       | languages which are in practice the easiest for dealing with
       | complicated tree algorithms, which is perhaps why the formally
       | verified C compiler, CompCert, is implemented in OCaml, as is
       | Frama-C for formally verifying ISO C programs, as is the Coq
       | theorem prover, etc.
       | 
       | [1] https://benchmarksgame-
       | team.pages.debian.net/benchmarksgame/... [Edited link]
        
         | geodel wrote:
         | > So Java has the fastest GC for this test, 2.48 secs vs 12.23
         | secs for Golang.
         | 
         | Further not mentioning memory used Java/Go programs makes it
         | very fair comparison. Because GC perf does not depend on memory
         | allocated.
        
           | dandotway wrote:
           | Right now in the datacenter CPU usage is considerably more
           | expensive than RAM usage. Ram consumes comparatively little
           | power, whereas burning hot CPUs+GPUs are the reason
           | datacenters are favored near cooling water and power
           | stations. 2.48 vs 12.23 seconds for Java and Go is a big deal
           | for how many solar panels or tons of coal are needed to run
           | an app on Xeon or Epyc instances, whereas 1.7GB vs 0.4GB for
           | Java and Go, a 4x difference in low-power memory usage, is
           | not so big of deal.
           | 
           | At any rate I did link to the full table so everyone can see
           | the mem usage, source listings, etc.
        
         | [deleted]
        
       | bitwize wrote:
       | What it needs is Rust style static lifetime management.
        
         | Yoric wrote:
         | As a Rust developer, contributor and generally a big fan of
         | Rust, I'd tend to disagree. Having a garbage-collector is a
         | lifesaver for many algorithms. Static lifetime management is
         | great but isn't something you want to force developers to use
         | when they're more interested in coming up with solutions
         | quickly.
         | 
         | Also, I really don't see how it would work in Go. You require a
         | pretty strong type system to make static lifetime management
         | work. Unless something has changed radically in the recent
         | past, that's not something that the Go community had much
         | interest in (yes, I know that generics are one step in that
         | direction, but I'm not aware of further steps being discussed).
        
           | AtlasBarfed wrote:
           | If you're at the point that serious GC performance is being
           | examined, then a rewrite to rust is something that should be
           | considered on the strategic roadmap if it is code you have
           | control over (versus third party / OSS software).
           | 
           | I think Go has a maximum expansion footprint. People
           | presumably use Go because it is faster and better GC than
           | Java. Rust will probably eat a lot of that territory. That
           | will leave people that like it strictly for
           | language/idiomatic reasons, and that won't be enough.
           | 
           | The entire meta-point of the post is to try to argue that Go
           | is "better" than Java GC-wise. Well, it is and it isn't in
           | reality, as benchmarks and people in the know have said. If
           | it is "better" it is a very unconvincing win.
           | 
           | As someone wise said 10 years ago, you almost need 10x the
           | performance to have a convincing improvement to get laypeople
           | to notice, and to get dev people to consider switching from
           | legacy/entrenched ways of doing things.
           | 
           | Here, there's basically no real world improvement to point
           | to.
        
           | IceWreck wrote:
           | > You require a pretty strong type system to make static
           | lifetime management work
           | 
           | Go is strongly and statically typed.
        
             | Yoric wrote:
             | It's a spectrum. Rust is further than Go in the direction
             | of "strong type system", and somewhat lateral wrt Haskell,
             | F# or OCaml, for instance (each is more strongly typed in
             | different directions). Idris or Coq are further than Rust
             | in most directions, etc.
        
             | lucian1900 wrote:
             | For some value of "strongly". It has nil and lacks sum
             | types, so it is comparatively much weaker than even
             | Rust/Swift.
        
             | masklinn wrote:
             | Go's type system is much, much weaker than what is needed
             | for safe static lifetime management be even remotely close
             | to workable.
             | 
             | Static typing is not an on-off, there are statically typed
             | languages with extremely weak type systems (e.g. C),
             | languages with somewhat weak type systems (e.g. Java, Go),
             | languages with strong type systems (e.g. OCaml, Haskell)
             | and languages with extremely strong type systems (e.g. ATS,
             | Idris).
             | 
             | And that's a simplification because it's not linear either.
        
             | remexre wrote:
             | Rust's lifetimes need full subtyping, with covariant and
             | contravariant type constructors (though I don't think it
             | supports annotations for them, and always infers them
             | instead), which I think would make the generics
             | implementation quite a bit more complicated than it
             | currently is...
        
               | Yoric wrote:
               | Yeah, the fact that co/contravariant type constructors
               | aren't (or aren't always?) annotated has always felt a
               | bit awkward to me. Ah, well, it works :)
        
       | Ericson2314 wrote:
       | This is mostly stupid. Being able to have many GCs is a good
       | thing. The big reason for "value types" is controlling spacial
       | locality in memory, not GCs being bad per-se.
       | 
       | Also, they undersell the java/c# situation. C# has "ref, out, or
       | in", but even without those, you can always make a reference
       | wrapper that has the value type as a field. So "reference types
       | suck because coppying" is nonsense garbage.
        
         | throwaway894345 wrote:
         | I mean, I've dabbled in all of these languages, and I much
         | prefer Go's value types to ref, out, in, and a half dozen GCs.
         | It's nice that these VM languages have a distinct thing for
         | every eventuality, but I much prefer a single thing that works
         | 99% of the time.
        
           | Ericson2314 wrote:
           | I _usually_ am bashing Go, but I am not this time. :D
           | 
           | I am saying this blog author is confused on what the runtimes
           | are capable of. I don't mean to say ref out and in are good
           | language ergonomics or whatever, but simply that they show
           | the compilation target is capable of expressing thing these
           | things.
           | 
           | The only thing I know of that the go runtime can do that
           | these others can't is "interior pointers", i.e. pointers to a
           | field of a larger object. You can always just copy the field
           | into a new box, but that breaks mutation semantics. Java gets
           | away with this precisely because most fields are themselves
           | boxed...bu that's exactly the no-control-over-locality
           | problem we're trying to solve.
        
       ___________________________________________________________________
       (page generated 2021-11-23 23:01 UTC)