[HN Gopher] Navtive FlameGraphViewer
       ___________________________________________________________________
        
       Navtive FlameGraphViewer
        
       Author : laladrik
       Score  : 92 points
       Date   : 2024-12-23 10:43 UTC (3 days ago)
        
 (HTM) web link (laladrik.xyz)
 (TXT) w3m dump (laladrik.xyz)
        
       | laladrik wrote:
       | Hello, I found that it's difficult to visualize the flamegraph
       | out of the huge amount of data when I was profiling Rust
       | Analyzer. Viewing the flamegraph in a browser (Firefox and
       | Chrome) made it impossible to view. In fact, it was simply
       | frozen. I made this visualizer to solve my problem. Maybe it
       | would help someone else. I leave the link to my article about it,
       | but you can find the link to the project right in the first
       | paragraph.
        
         | janice1999 wrote:
         | I was surprised to hear the Hotspot isn't fast. I had assumed
         | it would be since it's written in C++.
        
           | mandarax8 wrote:
           | I've never had hotspot not be fast enough. Even on 20Gb
           | traces, everything is instant.
           | 
           | Only thing that ever takes some time is the initial load of
           | the perf file and filtering (bit still really fast).
        
         | guipsp wrote:
         | I think going for xlib is somewhat missing the forest for the
         | trees. Does it take less memory? Yeah, but you lose out on any
         | gpu assistance you might get for free otherwise. This only
         | really matters as you get to bigger resolutions tho, as you
         | avoid redrawing.
        
         | atq2119 wrote:
         | Props to you for making a cool little project, but as somebody
         | who's been involved in Linux graphics a bit: please just let
         | Xlib die. It's an outdated API, even if you ignore the
         | existence of Wayland. For something like fast visualizations,
         | you should really go with something that does offscreen
         | rendering and then blits the result. As long as you're just
         | drawing a bunch of rectangles, even CPU software rendering may
         | be the better solution, though obviously modern tools should
         | use GPU rendering.
         | 
         | I see your journey and _how_ you ended up with Xlib. But I
         | think that 's really more of an indictment of the sorry state
         | of GUI in Rust.
         | 
         | I know that's not your job, I just couldn't let this use of
         | Xlib stand uncommented because it's really bad for the larger
         | ecosystem.
        
       | IshKebab wrote:
       | Damn I hate it when you write a whole project and someone comes
       | along and says "this already exists" and you realise how much
       | time you wasted (yeah even if some of it counts towards learning
       | I'd still rather not needlessly repeat other people's work).
       | 
       | Anyway, pprof has a fantastic interactive Flamegraph viewer that
       | lets you narrow down to specific functions. It's really very
       | good, I would use that.
       | 
       | https://github.com/google/pprof
       | 
       | Run `pprof -http=:` on a profile and you get a web interface with
       | the Flamegraph, call graph, line based profiling etc.
       | 
       | It's demonstrated in this video.
       | 
       | https://youtu.be/v6skRrlXsjY
       | 
       | They only show a very simple example and no zooming, but it works
       | very well with huge flamegraphs.
        
         | Gobd wrote:
         | I tried to find something fast and native. Saying "native" I
         | mean something which doesn't require a browser.
         | 
         | Uses a browser which doesn't meet the requirements they set.
        
           | josephg wrote:
           | Yep. Personally I love the Firefox profiler for interacting
           | with perf - since it can show you flame graphs and let you
           | explore a perf trace by dominators and whatnot.
           | 
           | But I applaud the effort to make small, native apps. I agree
           | with the author - not everything should live in the browser.
        
           | IshKebab wrote:
           | I think they were saying "fast and native" because web things
           | usually aren't fast. In this case it is though, so I don't
           | see why it would be a problem for it to be web based.
        
         | jasonjmcghee wrote:
         | MacOS Instruments is really quite good.
         | 
         | I have a `profile` function I use.                   fn
         | profile() {             xcrun xctrace record --template 'Time
         | Profiler' --launch -- $@         }
         | 
         | Then I just do:                   $ profile ./my-binary -a -b
         | -c "foo bar"
         | 
         | or w/e and when it completes (can be one-time run or long-
         | running / interactive) now I have a great native experience
         | exploring the profile.
         | 
         | All the normal bells and whistles are there and I can double
         | click on something and see it inline in the source code with
         | per-line (cumulative) timings.
        
           | Sesse__ wrote:
           | > MacOS Instruments is really quite good.
           | 
           | It really isn't. It's probably the slowest profiler UI I've
           | ever used (it loves to beachball...), it hardly has any
           | hardware performance counters, and its actual profiling core
           | (xctrace) is... just really buggy? After the fifth time where
           | it told me "this function uses 5% CPU" and you optimize it
           | away and absolutely nothing happened, because it was just
           | another Instruments mirage. Or the time where it told me
           | opening a file on iOS took 1000+ ms, but that was just
           | because its end timestamps were pure fabrications.
           | 
           | Maybe it's better if you have toy examples, but for large
           | applications, it's among the worst profilers I've ever seen
           | along almost every axis. I'll give you that gprof is worse,
           | though...
        
         | orlp wrote:
         | I can really recommend samply:
         | https://github.com/mstange/samply. It just works out of the
         | box.
         | 
         | It uses the Firefox profiler to view its recorded profiles. You
         | can (don't have to, just can) even share them, I was looking at
         | this profile just yesterday: https://share.firefox.dev/3PxfriB
         | for my day job, for example.
        
       | lubsch wrote:
       | A very enjoyable and inspiring read! I wonder if self-rolling a
       | native application similar to this is feasible on Wayland.
        
       | adolph wrote:
       | The article linked as "W3C specifications are bigger than POSIX."
       | is also worth reading.
       | 
       |  _The total word count of the W3C specification catalogue is 114
       | million words at the time of writing. If you added the combined
       | word counts of the C11, C++17, UEFI, USB 3.2, and POSIX
       | specifications, all 8,754 published RFCs, and the combined word
       | counts of everything on Wikipedia's list of longest novels, you
       | would be 12 million words short of the W3C specifications._
       | 
       | https://drewdevault.com/2020/03/18/Reckless-limitless-scope....
        
         | atombender wrote:
         | Sorry, but that analysis is too sloppy to allow any such
         | comparisons.
         | 
         | If you look at the scraped document list [1]:
         | 
         | * Most of these are _not normative_! They 're not
         | specifications, they're guides, recommendations, terminology
         | explainers, and so on.
         | 
         | * A lot of documents are irrelevant to implementing a web
         | browser (XSLT, XPath, RDF, XHTML, ITS, etc.).
         | 
         | * A lot are obsolete (e.g. SMIL, OWL).
         | 
         | * There are tons of duplicate versions (all of CSS 1-3 are
         | included; multiple versions of HTML, MathML, and of course the
         | irrelevant XML-based standards).
         | 
         | * Many standards are scraped both as individual section files,
         | and as a single complete.html file. He didn't notice this, and
         | counted both.
         | 
         | As a particularly egregious example, he includes every version
         | of the Web Content Accessibility Guidelines (WCAG) standard,
         | going back to 1999, each of which is large.
         | 
         | I have not done any kind of analysis myself (which should be
         | thorough to actually be fair), but if you prune it down to the
         | core technologies (HTML5, CSS, ECMAScript, PNG/GIF/WebP, etc.),
         | I'll wager it's probably less than a million, or at the very
         | least less than 2 million. The ECMAScript spec is just 356,000
         | words.
         | 
         | [1]
         | https://paste.sr.ht/~sircmpwn/475ad10f9ff9f63cd0a03a3f998370...
        
       | tdullien wrote:
       | For the prodfiler.com flamegraph viewer we ended up building it
       | in Pixi.JS, which allowed us to have nice GPU acceleration and
       | render massive flamegraphs quickly. Omitting to draw blocks of
       | less than half a pixel width is also a good idea, as is the
       | monospace font.
        
       | josephg wrote:
       | As someone who's gone down the rust "native pointers vs pin vs
       | ..." rabbit hole many times now, I really recommend just using a
       | Vec for the data and storing indexes into the vec when you need a
       | pointer.
       | 
       | Pin adds a huge amount of weird incidental complexity to your
       | code base - since you need to pin-project your struct fields (but
       | which ones?). You can't just take an &self or &mut self in
       | functions if your value is pinned, and pin is just generally
       | confusing, hard to use and hard to reason about.
       | 
       | The article ended up with Vec<Box<T>> - but that's a huge code
       | smell in my book. It's much less performant than Vec<T> because
       | every object needs to be individually allocated & deallocated. So
       | you have orders of magitude more calls to malloc & free, more
       | memory fragmentation and way more cache misses while accessing
       | your data. The impact this has on performance is insane.
       | 
       | Vec & indexes is a lovely middle ground. In my experience it's
       | often (remarkably) slightly more performant than using raw
       | pointers. You don't have to worry about vec reallocations (since
       | the indexes don't change). And it's 100% safe rust. It feels
       | weird at first - indexes are just pointers with more steps. But I
       | find rust's language affordances just work better if you write
       | your code like that. Code is simple, safe, ergonomic and obvious.
        
         | wging wrote:
         | > Code is simple, safe, ergonomic and obvious.
         | 
         | Dunno about 'safe' -- or at least not in the more general sense
         | that you seem to intend, rather than the more limited sense of
         | rust's safe/unsafe distinction. If you store an index into a
         | Vec<T> as a usize, rather than a &T, very little is stopping
         | you from invalidating that pseudo-pointer without knowing it.
         | (Or from using it as an index into the wrong vector, etc...)
         | 
         | These problems are manageable and I'm not saying 'never do
         | this' -- I've done it myself on occasion. It's just that there
         | are more pitfalls than you're indicating here, and it is
         | actually a meaningful tradeoff of bug potential for ease-of-
         | use.
        
           | josephg wrote:
           | I mean safe in the narrow way that rust intends. It's memory
           | safe, but as you imply, we're leaving the door to open to
           | logic bugs if you misuse those array indices.
           | 
           | But honestly, I think danger from that is wildly overstated.
           | The author isn't talking about implementing an ECS or b-tree
           | here. They're just populating an array from a file when the
           | program launches, then freeing the whole thing when the
           | program terminates. It's really not rocket science.
           | 
           | The other big advantage of this approach is that you don't
           | have to deal with unsafe rust. So, no unsafe {} blocks. No
           | wrangling with rust's frankly awful syntax for following raw
           | pointers. No stressing about whether or not a future version
           | of rust will change some subtle invariant you're accidentally
           | depending on, or worrying about if you need to use MaybeInit
           | or something like that. I think the chance of making a
           | mistake while interacting with unsafe code is far higher than
           | the chance of misusing an array index. And the impact is
           | usually worse.
           | 
           | The author details running into exactly that problem while
           | coding - since they assumed memory allocated by vec would be
           | pinned (it isn't). And the program they ended up with still
           | doesn't use pin, even though they depend on the memory being
           | pinned. That's cause for far more concern than a simple array
           | index.
        
       | javierhonduco wrote:
       | This is pretty cool work.
       | 
       | Something that's been on my mind recently is that there's a need
       | of a high-performance flame graph library for the web.
       | Unfortunately the most popular flame graph as a library /
       | component, basically the react and d3 ones, work fine but the
       | authors don't actively maintain them anymore and their
       | performance with large profiles is quite poor.
       | 
       | Most people that care about performance either hard-fork the
       | Firefox profiler / speedscope flame graph component or create
       | their own.
       | 
       | Would be nice to have a reusable, high performance flame graph
       | for web platforms.
        
       | tantalor wrote:
       | It's very funny they would call out poor performance of KDAB's
       | Hotspot, a performance analysis app.
        
       | Scene_Cast2 wrote:
       | I recently went through trying to profile Rust code recently. I
       | realized that the profiling toolchain is underdeveloped across
       | the board - "perf", the recommended profiler, isn't cross-
       | platform (and there aren't any profilers that I found that "just
       | work"); visualizing traces from a multi-threaded app is not fun;
       | there isn't an IDE plugin to highlight the problematic lines,
       | etc.
        
       | bombela wrote:
       | This article reads like it was AI padded generously.
        
         | creatonez wrote:
         | Except it very obviously was not. Is this accusation going to
         | come up in every single HN thread?
        
       | audidude wrote:
       | As someone who went down this path many years ago, I think the
       | GTK numbers in the article are a bit misleading. You wouldn't
       | create 1000 buttons to do a flamegraph properly in GTK.
       | 
       | In Sysprof, it uses a single widget for the flamegraph which
       | means in less than 150 mb resident I can browse recordings in the
       | GB size range. It really comes down to how much data gets
       | symbolized at load time as the captures themselves are mmap'able.
       | In nominal cases, Sysprof even calculates those and appends them
       | after the capture phase stops so they can be mmap'd too.
       | 
       | That just leaves the augmented n-ary tree key'd by instruction
       | pointer converted to string key, which naturally
       | deduplicates/compresses.
       | 
       | The biggest chunk of memory consumed is GPU shaders.
        
       ___________________________________________________________________
       (page generated 2024-12-26 23:02 UTC)