[HN Gopher] Fgtrace - The Full Go Tracer
___________________________________________________________________
Fgtrace - The Full Go Tracer
Author : felixge
Score : 100 points
Date : 2022-09-19 13:23 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| hknmtt wrote:
| Looks interesting, added a star to keep track.
| jaffee wrote:
| this thing is awesome, have used it many times to quickly track
| down tricky performance issues.
| jaffee wrote:
| wait I'm thinking of fgprof.... this looks awesome too though
| felixge wrote:
| Sorry for the confusion :). I'm the author of both tools and
| was also considering to build the new functionality into
| fgprof since the data capturing approach is very similar.
| Anyway, if you found fgprof useful, I think fgtrace could be
| even more useful in similar situations :)
| felixge wrote:
| I'm the author of fgtrace, happy to answer any questions :).
|
| I've also posted a few more comments in this twitter thread:
| https://twitter.com/felixge/status/1571850160358965249
| [deleted]
| lanstin wrote:
| I use the builtin pprof flame graphs all the time, and since
| each of the goroutine pools have different stack traces, i can
| tell them apart. what does this package improve on? Wall time
| instead of CPU time? It isnt immediately obvious to me what the
| extra info is?
| felixge wrote:
| The main difference is that you get a timeline (flame chart)
| rather than flame graph. This allows you to understand the
| order in which operations are taking place. You also get
| walltime (instead of CPU time), so you can debug Off-CPU
| performance bottlenecks (e.g. database calls) without the
| need for additional instrumentation. Last but not least you
| get everything broken down per-goroutine, so you can
| understand which operations are executed concurrently vs
| sequentially.
|
| The Go CPU profiler is great for reducing CPU utilization.
| But unless you're CPU-bound, it's not very useful for
| improving latency. fgtrace is trying to help with that.
| kjeetgill wrote:
| > fgtrace may cause noticeable stop-the-world pauses in your
| applications.
|
| Huh, I wonder if this is a temporary limitation or an issue with
| the approach. In my experience if you're doing profiling you
| probably better off getting something lighter weight that you can
| get more honest numbers from.
|
| Edit: reading closer, it looks like the go team had similar
| concerns. I wonder if this can capture how long a goroutine was
| unmounted for.
| felixge wrote:
| Capturing a consistent snapshot of all goroutines requires
| stopping the world. However, this can be very quick as the GC
| relies on the same mechanism.
|
| The bigger problem is capturing the stack traces for all
| goroutines. Rhys added a patch to Go 1.19 [1] that mostly moves
| this work outside of the critical STW section, which greatly
| reduces the overhead. Unfortunately this improvement only
| applies to the official goroutine profiling APIs, and those do
| not provide details such as goroutine ids. This means fgtrace
| has to use runtime.Stack() which returns the stack traces as
| text (yikes) and isn't optimized like the other goroutine
| profiling APIs.
|
| There are various ways the implementation details of fgtrace
| and the Go runtime could be improved for this use case
| (wallclock timeline views), and I'm hoping to work on
| contributions in the coming months.
|
| [1] https://go-review.googlesource.com/c/go/+/387415
| nathias wrote:
| never go full tracer
| felixge wrote:
| haha - certainly not in production, at least not with my hacky
| code here :)
| chrsig wrote:
| The proposal[0] mentioned in the README has some good insight
| from rsc.
|
| He notes the performance & scalability issues already noted here
| by other commenters.
|
| > Probably the right thing to do is figure out more of a trace
| like the current trace profiles but perhaps less low level.
|
| This is the key take away for me.
|
| I think there's room for tracing support somewhere in-between
| runtime/trace and full blown distributed traces (e.g.,
| OpenTelemetry[1]) - so I'm hopeful this effort may evolve into a
| good solution in that space.
|
| From a usability point of view, my biggest gripe right now with
| the go tracer is that it's viewer is...painful. It uses the
| tracer that's built into chrome, which chrome itself is moving
| away from.
|
| I'd hacked around a bit recently to try and get the existing go
| traces into perfetto[2], with some success. As I recall, I
| couldn't get user traces functioning.
|
| The `go tool trace` server has an api to output compatible json,
| but it's limited in what it outputs. Unfortunately, the trace
| file itself is in some custom binary format. All the tools for
| manipulating it are in `internal/` folders, making them
| unavailable for import, so creating new tools for working with
| the traces is quite burdensome.
|
| I'd debated copying the code out into a new project, and starting
| to hack on it, but at that point, I'd reached the end of my
| willingness to invest time. Perhaps I should open an issue or
| mesage the mailing list to see what the maintainers think the
| future of runtime tracing looks like.
|
| [0]
| https://github.com/golang/go/issues/41324#issuecomment-70379...
|
| [1] https://opentelemetry.io
|
| [2] https://ui.perfetto.dev/
| felixge wrote:
| > He notes the performance & scalability issues already noted
| here by other commenters.
|
| Go 1.19 has made some improvements in this regard [1]. But yes,
| profiling all goroutines does not scale to programs that use
| more than perhaps 10k goroutines which isn't entirely uncommon.
| To overcome this, the goroutine profile API would need to be
| extended to allow profiling a subset of goroutines. pprof
| labels could be used to specify which goroutines should be
| profiled.
|
| > Probably the right thing to do is figure out more of a trace
| like the current trace profiles but perhaps less low level.
|
| Yeah, in the long run the tracer, perhaps in combination with
| the cpu profiler [2], also offers a great way of capturing this
| data. But right now it's too much of a firehose, so it probably
| needs some way of selecting a subset of goroutines to trace as
| well. Additionally the unwinding of stack traces is a major
| bottleneck, so maybe frame pointer unwinding or similar will be
| needed to make it faster.
|
| I've heard some stuff about future plans to the tracer that
| would help with the custom binary format problem, so hopefully
| this will improve in the future.
|
| Anyway, I mostly see fgtrace as a "Do Things that Don't Scale"
| [3] kind of project. If people like the value it can provide,
| it will likely motivate myself and others to figure out how to
| build a version of it that is safe for production usage :).
|
| [1] https://go-review.googlesource.com/c/go/+/387415
|
| [2] https://go-review.googlesource.com/c/go/+/400795
|
| [3] http://paulgraham.com/ds.html
___________________________________________________________________
(page generated 2022-09-19 23:01 UTC)