[HN Gopher] A Guide to the Go Garbage Collector
___________________________________________________________________
A Guide to the Go Garbage Collector
Author : ibraheemdev
Score : 149 points
Date : 2022-07-15 15:44 UTC (7 hours ago)
(HTM) web link (tip.golang.org)
(TXT) w3m dump (tip.golang.org)
| [deleted]
| [deleted]
| cube2222 wrote:
| This is a really great guide! Nice to have something official and
| in-depth.
|
| I have two tips I can share based on my experience optimizing
| OctoSQL[0].
|
| First, some applications might have a fairly constant live heap
| size at any given point in time, but do a lot of allocations
| (like OctoSQL, where each processed record is a new allocation,
| but they might be consumed by a very-slowly-growing group by). In
| that case the GC threshold (which is based on the last live heap
| size) can be low and result in very frequent garbage collection
| runs, even though your application is using just megabytes of
| memory. In that case, using debug.SetGCPercent to modify that
| threshold at startup to be closer to 10x the live heap size will
| yield enormous performance benefits, while sacrificing very
| little memory.
|
| Second, even if the CPU profiler tells you the GC is consuming a
| lot of time, that doesn't mean it's taking it away from your app,
| if it's single-threaded. `go tool trace` can give you a much
| better overview of how computationally intensive and problematic
| the GC really is, even though reading it takes some getting used
| to.
|
| [0]: https://github.com/cube2222/octosql
| kccqzy wrote:
| > Second, even if the CPU profiler tells you the GC is
| consuming a lot of time, that doesn't mean it's taking it away
| from your app
|
| I have experienced the same issue here. Our load balancer used
| CPU usage as a proxy for deciding how much traffic should be
| assigned when performing load balancing. When the app was
| written in Go, we consistently found that the GC is consuming a
| lot of CPU time even though all other metrics like request
| latency were very good, even in the microseconds range. This
| was the case even when the app was massively parallel with lots
| of goroutines. But the load balancer kept sloshing traffic
| around unnecessarily based on its observation that GC is
| consuming a lot of CPU time.
| tdudzik wrote:
| > When the app was written in Go
|
| Did you rewrite it to something else?
| kccqzy wrote:
| Yes we did. But the rewrite was not because of this issue.
| silisili wrote:
| Out of curiosity...which language did you choose and why?
| How did it turn out?
| oorza wrote:
| They wrote it in PHP4 and it currently routes 2/3 of the
| internet.
| cube2222 wrote:
| That does actually sound like it could be scenario one too.
|
| If you have a lot of small requests, with only few requests
| active at the same time, but many requests per second
| overall, with each making a few allocations, you will have a
| small live heap size, while quickly reaching the threshold
| for another GC.
|
| This way you get a lot of GC runs. Latency isn't affected too
| much because Go is quite good at keeping the stop-the-world's
| short. You might have interleaving application/stop-the-world
| in a 50/50 ratio of computation time (that's something you
| can diagnose very easily with go tool trace btw).
|
| Having a higher GOGC threshold might help a lot there, since
| it will make stop-the-world's less frequent, while keeping
| their duration mostly unchanged (as that scales
| proportionally to live heap size).
|
| That's obviously just a guess based on the limited info I
| have though.
| morelisp wrote:
| > it will make stop-the-world's less frequent, while
| keeping their duration mostly unchanged (as that scales
| proportionally to live heap size).
|
| Go's STW phases are mostly not proportional to live heap
| size; iirc one never is and the other is proportional to
| something variable but only weakly correlated with heap
| size (cleaning up cached spans).
|
| It's hard to figure out what GP is describing exactly but I
| don't think GOGC alone would necessarily address that. If
| latency was still good it was probably not over-triggering
| STW nor dragging other threads into mark assist.
|
| I think they may have been hitting the pathological case
| described where stack roots were not being counted so a
| little allocation with a lot of stack-heavy goroutines
| could get confused for a lot of allocation, trigger a CPU-
| intensive mark phase, but neither clean much real memory in
| absolute terms nor effectively count a larger live set
| after. Prior to 1.18, GOGC might have to be set dangerously
| large avoid that.
| eatonphil wrote:
| I'd love to read more about your experience profiling, how your
| techniques work.
| cube2222 wrote:
| Thanks, I'll try to whip up an article about it in the not-
| too-distant future.
|
| Though I can tell that the biggest improvement to my
| profiling flow was adding a `--profile` flag to OctoSQL
| itself. This way I can easily create CPU/memory/trace
| profiles of whole OctoSQL command invocations, which makes
| experiments and debugging on weird inputs much quicker.
| omginternets wrote:
| Has anyone tried "gc_details": true in VSCode? I've just gone
| through the configuration steps, but I'm not seeing anything
| obvious. What should I be looking for?
|
| EDIT: found it at the top of the file.
| erik_seaberg wrote:
| Hm, I was hoping for a roadmap that would talk about supporting
| generations and more tuning options.
| morelisp wrote:
| Would generational support improve anything given a) 99% of the
| nursery is probably already on the stack, and b) using
| generations to inform any kind of compaction / relocation still
| seems out of the question?
|
| `GOMEMLIMIT` described in the document is a new tuning option.
___________________________________________________________________
(page generated 2022-07-15 23:00 UTC)