[HN Gopher] AMD GPU Debugger
___________________________________________________________________
AMD GPU Debugger
Author : ibobev
Score : 186 points
Date : 2025-12-08 16:06 UTC (6 hours ago)
(HTM) web link (thegeeko.me)
(TXT) w3m dump (thegeeko.me)
| snarfy wrote:
| Is there not an official tool from AMD?
| c2h5oh wrote:
| GDB supports it
| https://sourceware.org/gdb/current/onlinedocs/gdb.html/AMD-G...
|
| You also get UMR from AMD
| https://gitlab.freedesktop.org/tomstdenis/umr
|
| There is also a bunch of other tools provided:
| https://gpuopen.com/radeon-gpu-detective/
| https://gpuopen.com/news/introducing-radeon-developer-tool-s...
| slavik81 wrote:
| It's worth noting that upstream gdb (and clang) are somewhat
| limited in GPU debugging support because they only use (and
| emit) standardized DWARF debug information. The DWARF
| standard will need updates before gdb and clang can reach
| parity with the AMD forks, rocgdb and amdclang, in terms of
| debugging support. It's nothing fundamental, but the AMD
| forks use experimental DWARF features and the upstream
| projects do not.
|
| It's a little out of date now, but Lance Six had a
| presentation about the state of AMD GPU debugging in upstream
| gdb at FOSDEM 2024. https://archive.fosdem.org/2024/events/at
| tachments/fosdem-20...
| thegeeko wrote:
| amd gdb is an actual debugger but it only works with
| applications that emit dwarf and use the amdkfd KMD aka it
| doesn't work with graphics .. all of the rest are not a
| actual debuggers .. UMR does support wave stepping but it
| doesn't try to be a shader debugger rather a tool for drivers
| developers and the AMD tools doesn't have any debugging
| capabilities.
| almostgotcaught wrote:
| > After searching for solutions, I came across rocgdb, a
| debugger for AMD's ROCm environment.
|
| It's like the 3rd sentence in the blog post.......
| djmips wrote:
| to be fair it wasn't clear that was an official AMD debugger
| and besides that's only for debugging ROCm applications.
| almostgotcaught wrote:
| this sentence doesn't make any sense a) ROCm is an AMD
| product b) ROCm "applications" are GPU "applications".
| fc417fc802 wrote:
| But not all GPU applications are ROCm applications (I
| would think).
|
| I can certainly understand OP's confusion. Navigating
| parts of the GPU ecosystem that are new to you can be
| incredibly confusing.
| thegeeko wrote:
| there's 2 AMD KMD(kernel mode drivers) in linux: amdkfd
| and amdgpu .. the graphics applications use the amdgpu
| which is not supported by amdgdb .. amdgdb also has the
| limitation of requiring dwarf and and mesa/amd UMDs
| doesn't generate that ..
| shetaye wrote:
| There also exists cuda-gdb[1], a first-party GDB for NVIDIA's
| CUDA. I've found it to be pretty good. Since CUDA uses a
| threading model, it works well with the GDB thread ergonomics
| (though you can only single-step at the warp granularity IIRC by
| the nature of SM execution).
|
| [1] https://docs.nvidia.com/cuda/cuda-gdb/index.html
| danjl wrote:
| For NVIDIA cards, you can use NSight. There's also RenderDoc that
| works on a large number of GPUs.
| _zoltan_ wrote:
| nsys and nvtx are awesome.
|
| many don't know but you can use them without GPUs :)
| whalesalad wrote:
| Tangent: is anyone using a 7900 XTX for local
| inference/diffusion? I finally installed Linux on my gaming pc,
| and about 95% of the time it is just sitting off collecting dust.
| I would love to put this card to work in some capacity.
| qskousen wrote:
| I've done it with a 6800XT, which should be similar. It's a
| little trickier than with an Nvidia card (because everything is
| designed for CUDA) but doable.
| FuriouslyAdrift wrote:
| You'd be much better off wiht any decent nVidia against the
| 7900 series.
|
| AMD doesn't have a unified architecture across GPU and compute
| like nVidia.
|
| AMD compute cards are sold under the Insinct line and are
| vastly more powerfull than their GPUs.
|
| Supposedly, they are moving back to a unified architecture in
| the next generation of GPU cards.
| Joona wrote:
| I tested some image and text generation models, and generally
| things just worked after replacing the default torch libraries
| with AMD's rocm variants.
| universa1 wrote:
| try it with ramalama[1]. worked fine here with a 7840u and a
| 6900xt.
|
| [1] https://ramalama.ai/
| Gracana wrote:
| I bought one when they were pretty new and I had issues with
| rocm (iirc I was getting kernel oopses due to GPU OOMs) when
| running LLMs. It worked mostly fine with ComfyUI unless I tried
| to do especially esoteric stuff. From what I've heard lately
| though, it should work just fine.
| jjmarr wrote:
| I've been using it for a few years on Gentoo. There were
| challenges with Python 2 years ago, but over the past year it's
| stabilized and I can even do img2video which is the most
| difficult local inference task so far.
|
| Performance-wise, the 7900 xtx is still the most cost effective
| way of getting 24 gigabytes that isn't a sketchy VRAM mod. And
| VRAM is the main performance barrier since any LLM is going to
| _barely_ fit in memory.
|
| Highly suggest checking out TheRock. There's been a big
| rearchitecting of ROCm to improve the UX/quality.
| veddan wrote:
| For LLMs, I just pulled the latest llama.cpp and built it.
| Haven't had any issues with it. This was quite recently though,
| things used be a lot worse as I understand it.
| mitchellh wrote:
| Non-AMD, but Metal actually has a [relatively] excellent debugger
| and general dev tooling. It's why I prefer to do all my GPU work
| Metal-first and then adapt/port to other systems after that:
| https://developer.apple.com/documentation/Xcode/Metal-debugg...
|
| I'm not like a AAA game developer or anything so I don't know how
| it holds up in intense 3D environments, but for my use cases it's
| been absolutely amazing. To the point where I recommend people
| who are dabbling in GPU work grab a Mac (Apple Silicon often
| required) since it's such a better learning and experimentation
| environment.
|
| I'm sure it's linked somewhere there but in addition to
| traditionally debugging, you can actually emit formatted log
| strings from your shaders and they show up interleaved with your
| app logs. Absolutely bonkers.
|
| The app I develop is GPU-powered on both Metal and OpenGL systems
| and I haven't been able to find anything that comes near the
| quality of Metal's tooling in the OpenGL world. A lot of stuff
| people claim is equivalent but for someone who has actively used
| both, I strongly feel it doesn't hold a candle to what Apple has
| done.
| mattbee wrote:
| My initiation into shaders was porting some graphics code from
| OpenGL on Windows to PS5 and Xbox, and (for your NDA and devkit
| fees) they give you some very nice debuggers on both platforms.
|
| But yes, when you're stumbling around a black screen, tooling
| is everything. Porting bits of shader code between syntaxes is
| the easy bit.
|
| Can you get better tooling on Windows if you stick to DirectX
| rather than OpenGL?
| mitchellh wrote:
| > Can you get better tooling on Windows if you stick to
| DirectX rather than OpenGL?
|
| My app doesn't currently support Windows. My plane was to use
| the full DirectX suite when I get there and go straight to
| D3D and friends. I lack experience at all on Windows so I'd
| love if someone who knows both macOS and Windows could
| compare GPU debugging!
| speps wrote:
| Windows has PIX for Windows, PIX is the name of the GPU
| debugging since Xbox 360. The Windows version is similar
| but it relies on debug layers that need to be GPU specific
| which is usually handled automatically. Although because of
| that it's not as deep as the console version but it lets
| you get by. Most people use RenderDoc on supported
| platforms though (Linux and Windows). It supports most APIs
| you can find on these platforms.
| billti wrote:
| It's a full featured and beautifully designed experience, and
| when it works it's amazing. However it regularly freezes of
| hangs for me, and I've lost count of the number of times I've
| had to 'force quit' Xcode or it's just outright crashed. Also,
| for anything non-trivial it often refuses to profile and I have
| to try to write a minimal repro to get it to capture anything.
|
| I am writing compute shaders though, where one command buffer
| can run for seconds repeatedly processing over a 1GB buffer,
| and it seems the tools are heavily geared towards graphics work
| where the workload per frame is much lighter. (Will all the AI
| focus, hopefully they'll start addressing this use-case more).
| mitchellh wrote:
| > However it regularly freezes of hangs for me, and I've lost
| count of the number of times I've had to 'force quit' Xcode
| or it's just outright crashed.
|
| This has been my experience too. It isn't often enough to
| diminish its value for me since I have basically no
| comparable options on other platforms, but it definitely has
| some sharp (crashy!) edges.
| hoppp wrote:
| Is your code easy to transfer to other environments? The Apple
| vendor lock-in is not a great place for development if the end
| product runs on servers, unlike using AMD Gpus which can be
| found on the backend. Same goes for games because most gamers
| either have an AMD or an Nvidia graphics card as playing on Mac
| is still rare, so priority should be supporting those platforms
|
| Its probably awesome to use Metal and everything but the vendor
| lock-in sounds like an issue.
| mitchellh wrote:
| It has been easy. All modern GPU APIs are basically the same
| now unless you're relying on the most cutting edge features.
| I've found that converting between MSL, OpenGL (4.3+), and
| WebGPU to be trivial. Also, LLMs are pretty good at it on
| first pass.
| hoppp wrote:
| Thats pretty cool then!
___________________________________________________________________
(page generated 2025-12-08 23:00 UTC)