[HN Gopher] What should I know about garbage collection as a Jav...
___________________________________________________________________
What should I know about garbage collection as a Java developer?
Author : saikatsg
Score : 37 points
Date : 2023-01-10 05:38 UTC (17 hours ago)
(HTM) web link (www.azul.com)
(TXT) w3m dump (www.azul.com)
| turtledragonfly wrote:
| One thing that that I hadn't fully understood until recently is
| that garbage collectors can actually allow you to write _more
| efficient_ code.
|
| Previously, I had the general understanding that you were trading
| convenience (not thinking about memory management or dealing with
| the related bugs) in exchange for performance (GC slows your
| program down).
|
| That's still true broadly, but there's an interesting class of
| algorithms where GC can give you a perf. improvement: immutable
| data structures, typically used in high-concurrency situations.
|
| Consider a concurrent hash map: when you add a new key, the old
| revision of the map is left unchanged (so other threads can keep
| reading from it), and your additions create a new revision. Each
| revision of the map is immutable, and your "changes" to it are
| really creating new, immutable copies (with tricks, to stay
| efficient).
|
| These data structures are great for concurrent performance, but
| there's a problem: how do you know when to clean up the memory?
| That is: how do you know when all users are done with the old
| revisions, and they should be freed?
|
| Using something like a reference count adds contention to this
| high-concurrency data structure, slowing it down. Threads have to
| fight over updating that counter, so you have now introduced
| shared mutable state which was the whole thing you were trying to
| avoid.
|
| But if there's a GC, you don't have to think about it. And the GC
| can choose a "good time" to do it's bookkeeping in bulk, rather
| than making all of your concurrent accesses pay a price. So, if
| done properly, it's an overall performance win.
|
| Interestingly, a performant solution without using GC is "hazard
| pointers," which are essentially like adding a teeny tiny garbage
| collector, devoted just to that datastructure (concurrent map, or
| whatever).
| tadfisher wrote:
| Well put. I find it fascinating to watch memory-safe runtimes
| converge on automatic memory management (via GC or ARC) and
| owner/borrower models. I'm just not sure which I like better,
| or if I'm thinking too imperatively.
| bob1029 wrote:
| > But if there's a GC, you don't have to think about it. And
| the GC can choose a "good time" to do it's bookkeeping in bulk,
| rather than making all of your concurrent accesses pay a price.
| So, if done properly, it's an overall performance win.
|
| In many environments, you can explicitly force a GC collection
| from application code. I've got a few situations where
| explicitly running GC helps reduce latency/jitter, since I can
| decide precisely where and how often it occurs.
|
| In my environment, calling GC.Collect more frequently than the
| underlying runtime typically will result in the runtime-induced
| collections taking less time (and occurring less frequently).
| But, there is a tradeoff in that you are stopping the overall
| world more frequently (i.e. every frame or simulation tick) and
| theoretical max throughput drops off as a result.
|
| Batching is the best way to do GC, but it is sometimes
| catastrophic for the UX.
| mike_hearn wrote:
| Yeah, but it's actually deeper than just adding refcounts. The
| algorithms themselves can change in some cases.
|
| The issue is, the hardware can usually only do
| atomic/interlocked operations at the word level. If you have a
| GC then you can atomically update a pointer from one thing to
| another and not think about the thing that was being pointed to
| previously: an object becomes unreachable atomically due to the
| guarantees provided by the GC (either via global pauses or
| write barriers or both). If you don't have that then you need
| to both update a pointer and a refcount atomically, which goes
| beyond what the hardware can easily do without introducing
| locks, but that in turn creates new problems like needing an
| ordering.
| zackangelo wrote:
| Most JVMs take advantage of a thread local "bump" allocator[1]
| as well to avoid having to cross JVM or kernel boundaries to
| allocate memory, which can result in huge speedups for memory-
| intensive use cases.
|
| [1] https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/
| eldenring wrote:
| Bump allocators are incredibly fast, and are super efficient
| in generational GCs where compaction is super cheap, however
| almost all (maybe all) modern languages don't usually cross
| kernel boundaries when allocating memory, including C++
| malloc.
| Alifatisk wrote:
| I don't exactly know how but I've always connected GC-languages
| with slow performance, but today, I realized how wrong I was.
| mike_hearn wrote:
| Performance and GC is a tricky topic, partly because there's
| not so many GCd languages explicitly designed for performance
| above usability (maybe D would count? _maybe_ Go?). GC is
| normally chosen for usability reasons, and then the language
| has other usability features that reduce performance and it
| gets difficult to disentangle them. Immutability is a common
| problem. GC makes allocating lots of objects easy, so people
| make immutable types (e.g. Java 's String type) and that
| forces you to allocate lots of objects, which causes lots of
| cache misses as the young gen pointer constantly moves
| forwards, and that slows everything down whereas a C++ dev
| might shorten a string by just inserting a NULL byte into the
| middle of it. Functional programming patterns are a common
| culprit because of their emphasis on immutability. You bleed
| performance in ways that don't show up on profiles because
| they're smeared out all over the program.
|
| Another complication is that people talk about the
| performance of languages, when often it's about the
| performance of an implementation. The most stunning example
| of this is TruffleRuby in which the GraalVM EE Ruby runtime
| often runs 50x faster or more than standard Ruby. Language
| design matters a lot, but how smart your runtime is matters a
| lot too.
|
| A final problem is that many people associate GC with
| scripting languages like Python, JavaScript, Ruby, PHP etc
| and they often have poor or non-existent support for multi-
| threading. So then it's hard to get good performance on
| modern hardware of course and that gets generalized to all GC
| languages.
| turtledragonfly wrote:
| Well, there's still truth to it in other cases, I think. One
| terrible thing GCs can do is make your performance
| _unpredictable_. In some performance-sensitive situations
| (eg: video games), your worst-case perf is more important
| than your average case. Adding a GC can mess with that worst-
| case behavior, and in unpredictable ways.
|
| That being said, modern GCs are much better (less "stop the
| world" stuff), and more configurable. But it's still a real
| concern.
___________________________________________________________________
(page generated 2023-01-10 23:00 UTC)