[HN Gopher] What is NGG and shader culling on AMD RDNA GPUs?
___________________________________________________________________
What is NGG and shader culling on AMD RDNA GPUs?
Author : pantalaimon
Score : 55 points
Date : 2022-07-13 07:57 UTC (2 days ago)
(HTM) web link (timur.hu)
(TXT) w3m dump (timur.hu)
| dragontamer wrote:
| A few notes.
|
| 1. Rasterization is, conceptually at least, the painter's
| algorithm. You conceptually paint the "backmost" triangle, then
| paint the triangles "on top" afterwards. As long as you paint
| from furthest back, to front, you will get the right image. The
| textbook algorithm sorts all triangles from furthest back to
| front, but O(n*log(n)) sort is obviously too slow for 100-million
| triangles 60-frames-per-second that video-gamers want.
|
| 2. Culling modifies the #1 algorithm by removing triangles from
| consideration. Traditionally, this is done by ASIC parts of a GPU
| (I mean, traditionally, the whole GPU was ASIC and non-
| programmable. But even just a few years ago, hardware culling was
| not yet done in shaders). If Triangle#5 is completely covered by
| Triangle#200, then you can "optimize" by never drawing #5 to
| begin with, and instead just drawing Triangle#200. The GPU's
| hardware can detect cases like this and automatically skip the
| "Drawing of Triangle#5)
|
| 3. Primitive Shaders / NGG and other features from Vega onwards
| of AMD allow for data to be passed between the rendering pipeline
| in new ways. This seems to enable *software* culling.
|
| 4. Its not too hard to do software culling per se. What's hard is
| to do software culling that its worthwhile (aka: faster than the
| hardware ASIC culling). The claims here are suggesting that
| software culling is finally worthwhile thanks to these new
| shading units, new ways of passing data back-and-forth between
| stages of the GPU. With these new datapaths, it is possible to
| implement a software culler that matches (or exceeds) the speed
| of the hardware culler.
|
| 5. That's what a "shader culling" is. Software culling that's
| faster than the ASIC-paths of the GPU. Fully defined in software,
| so you can make them more flexible / tuned for your specific
| video game than the hardware.
|
| 6. Video games do often have CPU-side culling before sending the
| data to the GPU side. Its also culling, but in a different
| context. I believe that "shader culling" implies the low-level,
| fine-grained culling that the GPU-hardware was expected to do,
| rather than the coarse-grained "X character is on the wrong side
| of the camera so don't draw X or the 200,000 triangles associated
| with X" that CPUs have always done.
|
| 7. On #6's note, there's lots of kinds of culling done at many
| stages of any video game engine today, from software/CPU side all
| the way to the low level GPU stuff. Since this seems to be a low-
| level GPU driver post, you can assume that they're talking about
| low-level GPU hardware culling specifically.
|
| 8. I'm not actually a video game programmer. I just like
| researching / learning about this field.
| faragon wrote:
| Great explanation, thank you.
| sylware wrote:
| Anybody wanna make a simple C-coded SPIR-V assembler in mesa,
| that to remove that horrible glsl radix sort?
| Jasper_ wrote:
| SPIRV-Tools has an assembler...
| sylware wrote:
| not simple C coded, as far as I know it is horrible c++.
| phkahler wrote:
| When he says the main performance benefit is when games over
| tesselate, I'd say this is just an invitation for everyone to
| increase tesselation and bump mapping for more detail.
| Jasper_ wrote:
| Well bump mapping happens in the pixel shader so that wouldn't
| affect tricounts. The traditional to way fight "too many
| triangles" is with LODs, but LODs aren't the perfect solution
| since triangle-screen density changes with camera settings too.
|
| I wouldn't expect massive, massive gains from the new culling,
| as suggested. The instancing demo gets gains only because it
| didn't have any CPU-side frustum culling to begin with. But
| real games have frustum culling nowadays.
| rasz wrote:
| >invitation for everyone to increase tesselation
|
| that ship sailed with the introduction of Gameworks - an Nvidia
| program that paid game studios to deoptimize games on AMD
| hardware.
|
| https://techreport.com/review/21404/crysis-2-tessellation-to...
|
| https://wccftech.com/fight-nvidias-gameworks-continues-amd-c...
|
| "Number one: Nvidia Gameworks typically damages the performance
| on Nvidia hardware as well, which is a bit tragic really. It
| certainly feels like it's about reducing the performance, even
| on high-end graphics cards, so that people have to buy
| something new.
|
| "That's the consequence of it, whether it's intended or not -
| and I guess I can't read anyone's minds so I can't tell you
| what their intention is. But the consequence of it is it brings
| PCs to their knees when it's unnecessary. And if you look at
| Crysis 2 in particular, you see that they're tessellating water
| that's not visible to millions of triangles every frame, and
| they're tessellating blocks of concrete - essentially large
| rectangular objects - and generating millions of triangles per
| frame which are useless."
| iforgotpassword wrote:
| > All of these HW stages (except GS, of course)
|
| ...Of course...
|
| The intro was pretty rough, it expects a lot of knowledge about
| the topic. I almost stopped reading there. The second part that
| explains the possibilities and talks about real world performance
| impact was pretty interesting though, so, glad I kept reading.
| tpxl wrote:
| It'd be nice if it explained the acronyms the first time they
| are used.
| cylon13 wrote:
| Even knowing what they all are it's annoying to read compared
| to having them spelled out. I'm constantly pausing to
| translate. If you're talking to someone IRL you just say
| "hardware vertex shader," not "HW VS".
| striking wrote:
| They link to a doc that has a glossary[1] but you're not
| wrong, it would be helpful.
|
| 1: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/a
| md...
| Jasper_ wrote:
| Yeah, I know all these terms, but I only learned them from
| working on the PS4 for a long time. Even those quite familiar
| with graphics might struggle to read this article.
|
| GS = geometry shader, and it's main gimmick is allowing you to
| generate triangle topology on the GPU. So that's why it should
| be "obvious" that a GS lane can control more than one output
| vertex. GS was also a terrible idea, because it's slow.
| jayd16 wrote:
| Hey, geometry shaders can be awesome but they do have limited
| use. I guess it might fall out of favor if we end up with
| Nanite taking over.
| ninepoints wrote:
| Them falling out of favor has nothing to do with nanite
| FYI, and you would not want to use GS for any of the
| benefits a tech like nanite gives you because it's far too
| slow. We've all moved to CS for that sort of things if you
| can't rely on AS/MS
___________________________________________________________________
(page generated 2022-07-15 23:02 UTC)