[HN Gopher] Flame Graphs: Making the opaque obvious (2017)
___________________________________________________________________
Flame Graphs: Making the opaque obvious (2017)
Author : davikr
Score : 22 points
Date : 2024-06-26 11:43 UTC (1 days ago)
(HTM) web link (tech.popdata.org)
(TXT) w3m dump (tech.popdata.org)
| PreInternet01 wrote:
| OK, shameful confession time here: I just _cannot_ grasp flame
| charts, no matter how hard I try. And yes: that 's just me, I'm
| dumb, etc. etc. (and I freely admit all of that, including the
| et-ceteras!)
|
| I tried to follow along with things that are relevant to my job,
| like https://randomascii.wordpress.com/2016/09/05/etw-flame-
| graph... ...And totally failed? I look at the reddest part of the
| chart, I look at the peaks, and none of that matches the
| conclusion reached in the blog post?
|
| And then I tried to grok all the knowledge conveyed in
| https://www.brendangregg.com/flamegraphs.html and... came away
| similarly confused? Sure, I see patterns, but when asked to
| identify which of those patterns indicate problems, I would still
| score <50%?
|
| And, I mean, I do memory/CPU traces like all day every day, and I
| fix code all the time based on that, but that's all just based on
| hierarchical bar charts, which are NOT the same as inverted flame
| graphs, as far as I can tell?
|
| Anyway, thanks for coming to my therapy session, and feel free to
| leave your helpful-comment-that-will-finally-enlighten-me and/or
| downvote at the door.
| eterm wrote:
| I think I'm the same as you, I much prefer the style where you
| get an interactive profile and can hit "show me the hot path"
| and it'll expand down the hot path.
|
| The overall flame charts just don't provide the immediate
| "Here's what's wrong" you need from a chart.
|
| Charts should be simple enough to quickly get an idea of
| whether the thing you're looking at is "Good" or "bad" and if
| "Bad", quickly see where the problem is.
| PreInternet01 wrote:
| Well, thank you for pointing out that hierarchical bar charts
| do a pretty good job at showing the hot path.
|
| So my (and I guess your) workflow is then to fix the top-X
| hot paths, and then re-profile, etc.
|
| The attraction of flame graphs, as I understand it, is that
| you can get the same information _without_ the need to
| iterate. Which is obviously very valuable if you 're
| operating on single-shot bug reports, but... _how_ exactly?
| (Not a question directed at you, obviously, but more towards
| the general discussion)
| hobs wrote:
| I recommend writing your own flame graph generator, the
| syntax is super simple and it will help you understand the
| span/subspan nesting.
|
| In some cases it's not that helpful if you don't have the
| right type of problem, eg I was troubleshooting some higher
| level stuff (SQL) and looking at stored procedure
| invocations and a two depth invocation doesn't tell you
| very much.
|
| For me a FG is more about sounding out the problem and
| understanding what the scope of what is possible more than
| fixing an immediate and obvious issue.
| eterm wrote:
| The jetbrains profiler lets you quickly "Scope to X", so
| you can scope to the second or third highest ranked
| functions and see their breakdown too, so you don't need to
| fix the highest to get a full understanding.
| throw0101b wrote:
| > _And then I tried to grok all the knowledge conveyed
| inhttps://www.brendangregg.com/flamegraphs.html_
|
| Have you watched the LISA 2013 presentation video?
|
| * https://www.youtube.com/watch?v=nZfNehCzGdw
|
| * https://www.usenix.org/conference/lisa13/technical-
| sessions/...
|
| Some follow-on presentations from him, I found the first one
| useful:
|
| * https://www.youtube.com/watch?v=D53T1Ejig1Q&t=12m50s
|
| * https://www.youtube.com/watch?v=VMpTU15rIZY&t=8m9s
| rkallos wrote:
| > I look at the reddest part of the chart, I look at the peaks
|
| Neither of these are really the places I look at when examining
| flame graphs. I tend to look at the bottom, and work my way up.
|
| The key thing (imo) to look for are wide pieces that are not a
| core part of what the code you're profiling is supposed to do.
|
| In the first example of your first link, you have a flame graph
| of code that seems to draw an image. There are a bunch of
| functions like 'CopyPixels' and 'RasterizeEdges'. Those seem to
| be a core part of drawing an image. In contrast, there's a
| 'KiPageFault' function that's pretty wide that does not seem to
| be important to the code's overall goal. I think a bit more
| context is needed to arrive at the author's conclusion, though.
|
| I think flame graphs, like all graphs, are more helpful when
| the reader has a lot of context about what's supposed to
| happen, or some intuition about how the chart is supposed to
| look.
| PreInternet01 wrote:
| > I think flame graphs, like all graphs, are more helpful
| when the reader has a lot of context about what's supposed to
| happen, or some intuition about how the chart is supposed to
| look.
|
| Yeah, and then your comment... just ends? So, what I get here
| is that, in a flame graph, the reddest part isn't the most
| interesting, and neither is the widest part, nor the part
| with the most peaks.
|
| So, what, _exactly_ , am I looking for? "You know what this
| graph should look like in case all was OK" is not exactly
| helpful, right, because I _might_ just lack that baseline?
| spencerflem wrote:
| As I understand it, flame graphs add to normal x-axis-as-
| time ones by merging repeated calls to the same function,
| so if one function is called a lot it shows up as one wide
| chunk and not many small chunks spread out.
|
| So yes, height doesn't matter much, you read bottom to top
| to get context, and width is what you're looking for.
|
| But like any chart, it can't tell you what to optimise, or
| what Can be optimized. It just tells you what parts of the
| code are taking up the most time, in a hierarchical way.
| Asking for a complete description of what to see here is
| like asking for a complete description of what to look for
| in a histogram. Like, ofc it matters what the histogram is
| of!
| SassyBird wrote:
| _> So, what, exactly, am I looking for?_
|
| This can't be answered in general. Flamegraphs are
| measurements of what happened. But just like a ruler
| doesn't tell you whether a given human is atypically short
| or tall for its species, a flamegraph can't tell you which
| portion of the program takes too long a time. You need to
| have prior knowledge about data structures, algorithms,
| memory bandwidth etc in order to confront your justified
| expectations with the reality and be surprised with
| something. And it will all depend on the particular program
| you profile.
| simscitizen wrote:
| > I do memory/CPU traces like all day every day, and I fix code
| all the time based on that
|
| So I take it you understand one of the standard visualizations
| produced by a CPU profiling tool, e.g. it takes a call stack
| sample every millisecond. The x-axis is time (one sample per
| ms, if you have only one CPU), and the y-axis is the call
| stack.
|
| Now for a flamegraph, you basically have the same
| visualization, but you sort the samples across the x-axis so
| that callstacks that start with the same prefixes are grouped
| together.
|
| Incidentally, the sorting of samples across the x-axis destroys
| the time information which is often critical. I constantly see
| engineers who don't understand flamegraphs looking at the
| entire flamegraph of a say 3 second trace, and then trying to
| use that whole flamegraph to optimize a 100 ms critical path
| within the trace, which is totally nonsensical.
| simscitizen wrote:
| As an example, imagine you sampled a program and got 5 CPU
| call stack samples. c c b
| b d a a a a a
| main main main main main
|
| In a flamegraph, you would see: [c
| ] [b ][d ] [a
| ] [main ]
| PreInternet01 wrote:
| Yeah, I imagine it, and still don't see how the flame graph
| would help?
|
| Shown as a hierarchical bar chart, this would suggest 'b'
| is problematic.
|
| Where, color-wise (because peak-wise, 'c' would be the
| culprit here) do I see this issue in a flame graph? Because
| I fear that either 'main' or 'a' would have the most
| dominant shade of red here?
| simscitizen wrote:
| Peaks don't matter, they just correspond to the depth of
| the call stack.
|
| Probably the simplest way to use the flame graph is work
| from the bottom of the flamegraph and walk upwards until
| you find something interesting you optimize. Ideally you
| find something wide to optimize that makes sense. (The
| widest thing here is "main" which is obviously probably
| not the interesting thing to optimize, so you would work
| upwards from there.) The basic idea is that things that
| are wide in the flamegraph are expensive and potential
| things to optimize.
|
| Where I work, we have tools that can produce diffed
| flamegraphs which can be really useful in figuring out
| why one trace uses so much more/less CPU than another.
| PreInternet01 wrote:
| > Probably the simplest way to use the flame graph is
| work from the bottom of the flamegraph and walk upwards
| until you find something interesting you optimize
|
| OK, so going by what is apparently the 'simple example'
| in the linked article: https://tech.popdata.org/images/cp
| s1970_before_fix_dwarf_gcc...
|
| I work my way up. First thing that is really red is
| Conversion::Process::Run, but that probably wraps a lot
| of things, so I keep going up.
|
| Next is Cps::Editor::relate_edits, or possibly
| EditingAPI::Rules::countPeopleMatching, because it's a
| darker red?
|
| And then there is another red-ish function, followed(?)
| by some yellow-colored (and thus unimportant?) stack
| entries, and then the apparent culprit:
| Record::hasVariable.
|
| So, and I'm truly not trying to be difficult or
| argumentative here: how was I supposed to pick out
| 'Record::hasVariable' right away from 'https://tech.popda
| ta.org/images/cps1970_before_fix_dwarf_gcc...'?
|
| The first function that is red being called from yellow-
| colored functions with about the same duration (width)?
| And if so, why is Metadata::Cache::getVarsByName not a
| more likely optimization target?
| fwip wrote:
| Other user here: Confession - I don't actually know what,
| if anything, the colors mean in a flamegraph. They seem
| random to me.
|
| The way I'd personally hone in on Record::hasVariable is
| that it's a relatively-simple sounding function (from the
| name) that is taking a large portion of the X-axis.
| Starting at the bottom, I'd go "main -> editInOrder ->
| relateEdits -> countPeopleMatching -> getSourceDataAsLong
| -> hasVariable." Then I'd be like "we really spend 47% of
| our time in this simple-sounding function? What's it
| doing?"
|
| Basically, I look for the functions that have an outsized
| complexity/time ratio. A function with a simple task is
| usually easier to optimize, and a function that only runs
| for 2% of your program isn't worth spending the time to
| optimize.
| eterm wrote:
| Indeed, the flame chart can't tell you that.
|
| The solution provided in the article seems to rip out
| `Metadata::Cache::getVarsByName` entirely. If it were
| easy to optimise `Metadata::Cache::getVarsByName`
| instead, then that would also have been a suitable
| optimisation.
|
| I guess domain knowledge and experience let them know
| which optimisation was more suitable here.
| nemetroid wrote:
| Did you read the linked article? It starts with a very simple
| example and then builds up to a real-world case, unlike the
| articles you linked, which both start off with an advanced
| case.
| PreInternet01 wrote:
| > Did you read the linked article?
|
| From https://news.ycombinator.com/newsguidelines.html:
| "Please don't comment on whether someone read an article"
|
| And yes, I did, and I truly don't grasp even the 'simple'
| explanation. I already admitted I'm dumb.
|
| Do you have _anything_ to add to the discussion?
| nemetroid wrote:
| > Do you have anything to add to the discussion?
|
| Do _you_? Can you point out where it (edit: the simple
| example) breaks down, instead of just repeating that you
| 're dumb?
| PreInternet01 wrote:
| By the 'simple example', I assume you mean https://tech.p
| opdata.org/images/cps1970_before_fix_dwarf_gcc...?
|
| If so, the widest, reddest, and highest-magnitude
| function I see there is __int__malloc. Right?
|
| Yet, reading on, it seems that has nothing to do with
| anything, and the actual culprit is Record::hasVariable,
| which is somewhere in the middle of the graph, and not
| any redder or less red, or wider or less wide, than
| _many_ other functions.
|
| So, looking at just the first graph, how am I supposed to
| immediately spot the culprit?
| vessenes wrote:
| I haven't read the article. However, here is how I read
| the chart:
|
| Record::hasVariable takes a long time. You can see that
| because it's wide and red. The immediate ones below it on
| the chart don't do anything but call it; you can see that
| because they are basically the same width. hasVariable
| splits into two calls (the row above). I'm going from
| memory here because I don't have the chart here, but I
| think it should be clear from the trace:
|
| A) hasVariable takes a lot of compute time
|
| B) from the name, this seems surprising / non optimal.
|
| C) Digging into the functions above it will yield chances
| to optimize.
|
| I do agree that the malloc is sort of a surprising amount
| of the total percentage of hasVariable. Again, just from
| the flame graph and function names, I'd bet that some
| memory allocations are being done in a loop inside and
| would be much faster to allocate outside the hasVariable
| call, and get reused.
| PreInternet01 wrote:
| > Record::hasVariable takes a long time
|
| As does Memdata::Cache::getVarsByName[blah] right above
| it, and many functions below (unsurprisingly, but
| still...), and they all have pretty much the same width
| and color.
|
| The point of flame graph proponents is that "you see
| where the problem is right away."
|
| My question is: "how, exactly?" And the answers so far
| seems to be mostly... lacking, to the point that I now
| officially declare flame graphs a cargo cult that is in
| no way superior to my "hierarchical bar charts"
| religion...
| nemetroid wrote:
| By simple example, I mean this:
| https://tech.popdata.org/images/flamegraph-example.svg
|
| But looking at the one you're referencing, slightly out
| of order:
|
| > highest-magnitude function I see there is
| __int__malloc.
|
| Height shows stack depth, not magnitude.
|
| > the actual culprit is Record::hasVariable, which is
| somewhere in the middle of the graph, and not any redder
| or less red, or wider or less wide, than many other
| functions.
|
| > not any redder or less red
|
| The colors are randomized to help with contrast
| (Brendan's website mentions this practice), so they
| aren't conveying any information.
|
| > or wider or less wide
|
| That's not really true. Hovering over Record::hasVariable
| tells that this bar covers 45.6% of the runtime. The only
| bars wider than that are the callers of
| Record::hasVariable (edit: rather, the stack through
| which Record::hasVariable is being called), i.e. the bars
| on which Record::hasVariable is resting.
|
| > somewhere in the middle of the graph
|
| Sure - being in the middle height-wise means that it's
| somewhere in the middle of the call stack. But there are
| some clues to its relevance:
|
| 1. It's close to the boundary between application code
| and standard library code. It does call
| MetaData::Cache::getVarsByName, which (going by the name)
| also is part of the application, but everything deeper in
| the stack (i.e. on top of those bars) is purely std::
| stuff.
|
| 2. Domain knowledge. The text alludes to this:
| Record::hasVariable is a conceptually simple operation
| that's not expected to be a major part of the runtime.
|
| This does not mean that Record::hasVariable must be the
| culprit. Maybe some function higher in the call stack
| (e.g. EditingAPI::Rules::getSourceDataAsLong) is calling
| Record::hasVariable way too many times? But it's a good
| place to start looking.
| PreInternet01 wrote:
| Ah, OK, so there is a graph, where magnitude is
| meaningless, colors are meaningless, runtime is relative,
| yet, with "enough domain knowledge" you can "see where
| the problem is right away"...
|
| I'm pretty sure we're done here.
| nemetroid wrote:
| > I'm pretty sure we're done here.
|
| Sorry to hear.
| starkrights wrote:
| I don't know if it helps, but this is how I read through the
| graph from the first article after reading the authors
| conclusion. (I'm an idiot)
|
| - okay, we're analyzing DvrDrawImage - Let's analyze the
| function on the left, GpBitmap::PipeLockBits - Up through
| ::Decode, takes up 100% of the time from PipeLockBits, so we're
| just racing down the call stack to get to that point, and
| _then_ other stuff starts taking up time - Lots of
| WindowsCodecs calls, I guess we kind of expect those to be
| valid since image processing needs codecs and stuff and things
| - Okay, now there's a bar taking up ~45-50% of the ::Decode
| call, that's not a codec or drawing related thing- it's an os
| level pagefault. That's fucky. - Since the ::Decode call is
| ~55-65% of the ::DvrDrawImage call, that pagefault is (~40% of
| ~60% =)~25% of this draw image call
|
| So essentially I looked and saw a stack of bars altogether and
| went 'okay, the call at the top of this is the only reason the
| call 6 bars up the stack (ie, 6 bars down on the flamegraph) is
| taking time, why?', and start applying the same process
| recursively from there.
|
| That is, until $why == 'due to codec calls and an ntkernel
| call', and the question becomes 'is that valid?' Then the
| question becomes "are these codec calls taking the amount of
| time I would expect?", and "
|
| iiuc peaks and colors don't matter. Tbf I think the
| colorscheming on this chart is shit- afaik flamegraph coloring
| is just a way of marking the same (or from the same library?
| Implementation dependent?) function for easy spotting. Like why
| not make the functions wildly different colors. Redness !=
| hotness of a certain codepath.
|
| The only thing peaks indicate is calls that have deeply nested
| function calls, which I guess would be important if you're
| having recursion issues.
| zlurkerz wrote:
| How do flame graphs handle the case where most of the time is
| spent in some leaf function that is called from all over the
| program? In this case, each individual stack would not take much
| time but in aggregate, a lot of time is spent in the function at
| the top of all of the call stacks. This should not be that
| uncommon to have hotspots in things like copying routines,
| compression, encryption etc that are not associated with any
| particular stack.
|
| pprof from https://github.com/google/pprof can produce a DAG view
| of a profile where nodes are sized proportional to their
| cumulative time, e.g.,
| https://slcjordan.github.io/images/pprof/graphviz.png and such a
| view would seem to cover the case above and subsume the usual use
| cases for a flame graph, would it not?
|
| Although I guess a flat text profile of functions sorted by time
| would also highlight these kinds of hot spots. Still, if we want
| a single graphical view as a go-to, it's not clear that flame
| graphs are all that much better than pprof DAGs.
| mandarax8 wrote:
| My flame graph tool (KDAB hotspot) has a bottom-up flame graph
| for that purpose.
| zlurkerz wrote:
| Right, but then you have to know to consult both top-down and
| bottom-up. Seems like a DAG is a union of these views.
| nemetroid wrote:
| > How do flame graphs handle the case where most of the time is
| spent in some leaf function that is called from all over the
| program?
|
| In my experience, they don't really. They're very good for
| finding easy wins where a high/medium-level operation takes
| much longer than you'd expect it to, not so much for finding
| low-level hotspots in the code.
|
| > such a view would [...] subsume the usual use cases for a
| flame graph, would it not?
|
| I don't think it does. A flame graph gives you a good idea of
| the hierarchical structure (and quickly). Sometimes you can
| work that out from a DAG, but not necessarily:
|
| Let's say that A and B call M, and M calls X and Y. All four
| edges have roughly the same value. A flame graph can show you
| that 90% of the time spent in X started with A and that 90% of
| the time spent in Y started with B, but the DAG can't.
| tanelpoder wrote:
| I had the same experience/question at some point when I noticed
| (and investigated), why didn't a new Ubuntu Linux install break
| out interrupt usage in vmstat anymore. [1]
|
| Perf+flamegraphs were able to catch in-interrupt-handling
| samples as I was on a bare metal machine with PMU counters
| accessible. But the various NVMe I/O completion interrupts
| happened all over the place, even when the CPUs were in random
| userspace code sections. Neither the bottom-up & top-down
| FlameGraph approach made it visually clear in this case.
|
| But since interrupt time _did_ show up and since the FlameGraph
| JS tool had a text search box, I searched for "interrupt" or
| "irq" in the search box - and the JS code highlighted all
| matching sections with purple color (completely distinct from
| the classic color range of flamegraphs). And seeing that purple
| color all over the place made me smile and gave me a strong
| visual "a-ha" moment. Probably with some extra JS tinkering,
| these kinds of "events scattered all over the place" scenarios
| could be made even clearer.
|
| [1] https://news.ycombinator.com/item?id=26139611 (2021)
| irogers wrote:
| Just to advertise the perf tool has inbuilt flamegraph generation
| code these days (well leaning on D3.js). So `perf script report
| flamegraph` will convert a perf.data file into a flamegraph.html.
| Similarly there is `perf script report gecko` to write out the
| firefox profiler's json format.
___________________________________________________________________
(page generated 2024-06-27 23:01 UTC)