[HN Gopher] Tracy: A real time, nanosecond resolution frame prof...
___________________________________________________________________
Tracy: A real time, nanosecond resolution frame profiler
Author : Flex247A
Score : 176 points
Date : 2024-09-24 02:57 UTC (20 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Flex247A wrote:
| I am a beginner in graphics programming, and I came across this
| amazing frame profiler.
|
| Web demo of Tracy: https://tracy.nereid.pl/
|
| This blows my mind. It's so fast and responsive I never expected
| a WebAssembly application to be!
| vblanco wrote:
| A truly incredible profiler for the great price of free. There is
| nothing coming at this level of features and performance even on
| paid software. Tracy could cost thousands of dollars a year and
| would still be the best profiler.
|
| Tracy requires you to add macros to your codebase to log
| functions/scopes, so its not an automatic sampling profiler like
| superluminal, verysleepy, VS profiler, or others. Each of those
| macros has around 50 nanoseconds of overhead, so you can
| liberally use them in the millions. On the UI, it has a stats
| window that will record average, deviation, min/max of those
| profiler zones, which can be used to profile functions at the
| level of single nanoseconds.
|
| Its the main thing i use for all my profiling and optimization
| work. I combine it with superluminal (sampling profiler) to get a
| high level overview of the program, then i put tracy zones on the
| important places to get the detailed information.
| Flex247A wrote:
| Hello! Going through your tutorial and it's been a great ride!
|
| Thanks for the good work.
| eagle2com wrote:
| Doesn't Tracy have the capability to do sampling as well? I
| remember using it at some point, even if it was finicky to
| setup because windows.
| vblanco wrote:
| it does, but i dont use it much due to it being too slow and
| heavy on memory on my ryzen 5950x (32 threads) on windows. a
| couple seconds of tracing goes into tens of gigabytes of ram.
| forrestthewoods wrote:
| Yeah I had issues with the Tracy sampler. It didn't "just
| work" the way Superluminal did.
|
| My only issue with Superluminal is I can't get proper
| callstacks for interpreted languages like Python. It treats
| all the CPP callstacks as the same. Not sure if Tracy can
| handle that nicely or not...
| forrestthewoods wrote:
| Tracy and Superluminal are the way. Both are so good.
| cwbaker400 wrote:
| Tracy is brilliant. @wolfpld I hope you're enjoying reading this
| and all of the other great comments in this thread. Great work
| and thank you very very much!
| drpossum wrote:
| Can someone explain how this achieves nanosecond resolution?
| That's an extremely difficult target to reach on computing
| hardware due to inherent clock resolutions and interrupt timing.
| vardump wrote:
| On x86/AMD64 it uses CPU's TSC clock.
|
| https://github.com/wolfpld/tracy/blob/master/public/client/T...
| simonask wrote:
| There are several sources of timing information, and I think in
| this context "nanosecond precision" just means that Tracy is
| able to accurately represent and handle input in nanoseconds.
|
| The resolution of the actual measurements depends on the kind
| of measurement:
|
| 1. If the measurement is based on high resolution timers on the
| CPU, the resolution depends on the hardware and the OS. On
| Windows, `QueryPerformanceFrequency()` returns the resolution,
| and I believe it is often in the order of 10s or 100s of
| nanoseconds.
|
| 2. If the measurement is based on GPU-side performance
| counters, it depends on the driver and the hardware. Graphics
| APIs allow you to query the "time-per-tick" value to translate
| from performance counters to nanoseconds. Performance counters
| can be down to "number of instructions executed", and since a
| single instruction can be on the order of 1-2 nanoseconds in
| some cases, translating a performance counter value to a time
| period requires nanosecond precision.
|
| 3. Modern GPUs also include their own high-precision timers for
| profiling things that are not necessarily easy to capture with
| performance counters (like barriers, contention, and
| complicated cache interactions).
| drpossum wrote:
| Yes, that's my understanding and why I asked. I disagree
| about "in this context", though, which is a pitch. If I was
| going to buy hardware that claimed ns resolution for
| something I was building I would expect 1ns resolution, not
| "something around a few ns" and not qualified "only on
| particular hardware". If such a product were presenting
| itself in a straightforward way to be compared to similar
| products and respecting the potential user it would say
| "resolutions down to a few ns" or something more specific but
| accurate.
|
| There was even a discussion on this not long ago on how to
| market to technical folks and things to not do (this is one
| of the things not to do)
|
| https://www.bly.com/Pages/documents/STIKFS.html
|
| https://news.ycombinator.com/item?id=41368583
| Galanwe wrote:
| It does reach nanosecond only in the sense that its sampling
| profiler can report nanosecond resolution. I've tried the event
| profiler for microsecond sensitive projets though, and it blows
| up the timings and latency even at low event frequency.
| vardump wrote:
| I think it's mostly due to Tracy's poor timing calibration
| code. TSC is good accuracy and latency wise.
| Galanwe wrote:
| The problem is not the timestamping, it's the queue used to
| push profiling events which is not fast enough
| vardump wrote:
| That queue is about as fast as it gets, <10 ns. Timestamp
| is taken before queueing.
|
| Again, due to bad calibration code the measured
| timestamps have quite a bit jitter.
|
| Edit: TSC might not be synchronized in multi-socket
| systems. (Multiple physical CPU sockets). That can
| generate a large error.
| boywitharupee wrote:
| can someone explain how is profiling tools like this written for
| GPU applications? wouldn't you need access to internal runtime
| api?
|
| for ex. Apple wraps Metal buffers as "Debug" buffers to record
| allocations/deallocations.
| ossobuco wrote:
| I don't know about Tracy, but I've seen a couple WebGPU JS
| debugging tools simply intercepting calls to the various WebGPU
| functions like writeBuffer, draw, etc, by modifying the
| prototypes of Device, Queue and so on[0].
|
| - 0: https://github.com/brendan-
| duncan/webgpu_inspector/blob/main...
| MindSpunk wrote:
| Some graphics APIs support commands that tell the GPU to record
| a timestamp when it gets to processing the command. This is
| oversimplified, but is essentially what you ask the GPU to do.
| There's lots of gotchas in hardware that makes this more
| difficult in practice as a GPU won't always execute and
| complete work exactly as you specify at the API level if it's
| safe to. And the timestamp domain isn't always the same as the
| CPU.
|
| But in principle it's not that different to how you just grab
| timestamps on the CPU. On Vulkan the API used is called
| "timestamp queries"
|
| It's quite tricky on tiled renderers like Arm/Qualcomm/Apple as
| they can't provide meaningful timestamps at much tighter
| granularity than a whole renderpass. I believe Metal only
| allows you to query timestamps at the encoder level, which
| roughly maps to a render pass in Vulkan (at the hardware level
| anyway)
| gcr wrote:
| My favorite FOSS video game, Dr. Robotnik's Ring Racers
| (http://kartkrew.org), has Tracy support! It's not compiled into
| the default build. I've learned a lot reading the code.
| mastax wrote:
| I just started using this yesterday, it looks really good.
| Haven't really dug into it.
|
| Is the latest windows build broken for anyone else? It doesn't
| start. In WinDbg it looks like it dereferences a null pointer. I
| built it myself and it works fine.
| mastax wrote:
| Of course I come into work the next day and now I can't run my
| custom built one either...
| mastax wrote:
| https://github.com/wolfpld/tracy/issues/887
|
| MSVC changed the mutex constructor to constexpr, breaking
| binary backward compatibility. They say WONTFIX, you must use
| the latest MSVCRT with the latest MSVC. But I have the latest
| MSVCRT installed? Whatever - a workaround was pushed to
| master yesterday.
| mastax wrote:
| This article is a good quick introduction to Tracy:
| https://luxeengine.com/integrating-tracy-profiler-in-cpp/
| Green-Man wrote:
| Does anybody have an opinion or comparison with respect to
| easy_profiler?
|
| https://github.com/yse/easy_profiler
|
| Especially interesting if based on real practical experience.
| throwawaymaths wrote:
| Can anyone with experience suggest pointers on arguing Tracy vs
| perfetto? My team uses perfetto and I highly suspect we are
| running into artefacts due to that.
___________________________________________________________________
(page generated 2024-09-24 23:01 UTC)