[HN Gopher] What is NGG and shader culling on AMD RDNA GPUs?
       ___________________________________________________________________
        
       What is NGG and shader culling on AMD RDNA GPUs?
        
       Author : pantalaimon
       Score  : 55 points
       Date   : 2022-07-13 07:57 UTC (2 days ago)
        
 (HTM) web link (timur.hu)
 (TXT) w3m dump (timur.hu)
        
       | dragontamer wrote:
       | A few notes.
       | 
       | 1. Rasterization is, conceptually at least, the painter's
       | algorithm. You conceptually paint the "backmost" triangle, then
       | paint the triangles "on top" afterwards. As long as you paint
       | from furthest back, to front, you will get the right image. The
       | textbook algorithm sorts all triangles from furthest back to
       | front, but O(n*log(n)) sort is obviously too slow for 100-million
       | triangles 60-frames-per-second that video-gamers want.
       | 
       | 2. Culling modifies the #1 algorithm by removing triangles from
       | consideration. Traditionally, this is done by ASIC parts of a GPU
       | (I mean, traditionally, the whole GPU was ASIC and non-
       | programmable. But even just a few years ago, hardware culling was
       | not yet done in shaders). If Triangle#5 is completely covered by
       | Triangle#200, then you can "optimize" by never drawing #5 to
       | begin with, and instead just drawing Triangle#200. The GPU's
       | hardware can detect cases like this and automatically skip the
       | "Drawing of Triangle#5)
       | 
       | 3. Primitive Shaders / NGG and other features from Vega onwards
       | of AMD allow for data to be passed between the rendering pipeline
       | in new ways. This seems to enable *software* culling.
       | 
       | 4. Its not too hard to do software culling per se. What's hard is
       | to do software culling that its worthwhile (aka: faster than the
       | hardware ASIC culling). The claims here are suggesting that
       | software culling is finally worthwhile thanks to these new
       | shading units, new ways of passing data back-and-forth between
       | stages of the GPU. With these new datapaths, it is possible to
       | implement a software culler that matches (or exceeds) the speed
       | of the hardware culler.
       | 
       | 5. That's what a "shader culling" is. Software culling that's
       | faster than the ASIC-paths of the GPU. Fully defined in software,
       | so you can make them more flexible / tuned for your specific
       | video game than the hardware.
       | 
       | 6. Video games do often have CPU-side culling before sending the
       | data to the GPU side. Its also culling, but in a different
       | context. I believe that "shader culling" implies the low-level,
       | fine-grained culling that the GPU-hardware was expected to do,
       | rather than the coarse-grained "X character is on the wrong side
       | of the camera so don't draw X or the 200,000 triangles associated
       | with X" that CPUs have always done.
       | 
       | 7. On #6's note, there's lots of kinds of culling done at many
       | stages of any video game engine today, from software/CPU side all
       | the way to the low level GPU stuff. Since this seems to be a low-
       | level GPU driver post, you can assume that they're talking about
       | low-level GPU hardware culling specifically.
       | 
       | 8. I'm not actually a video game programmer. I just like
       | researching / learning about this field.
        
         | faragon wrote:
         | Great explanation, thank you.
        
       | sylware wrote:
       | Anybody wanna make a simple C-coded SPIR-V assembler in mesa,
       | that to remove that horrible glsl radix sort?
        
         | Jasper_ wrote:
         | SPIRV-Tools has an assembler...
        
           | sylware wrote:
           | not simple C coded, as far as I know it is horrible c++.
        
       | phkahler wrote:
       | When he says the main performance benefit is when games over
       | tesselate, I'd say this is just an invitation for everyone to
       | increase tesselation and bump mapping for more detail.
        
         | Jasper_ wrote:
         | Well bump mapping happens in the pixel shader so that wouldn't
         | affect tricounts. The traditional to way fight "too many
         | triangles" is with LODs, but LODs aren't the perfect solution
         | since triangle-screen density changes with camera settings too.
         | 
         | I wouldn't expect massive, massive gains from the new culling,
         | as suggested. The instancing demo gets gains only because it
         | didn't have any CPU-side frustum culling to begin with. But
         | real games have frustum culling nowadays.
        
         | rasz wrote:
         | >invitation for everyone to increase tesselation
         | 
         | that ship sailed with the introduction of Gameworks - an Nvidia
         | program that paid game studios to deoptimize games on AMD
         | hardware.
         | 
         | https://techreport.com/review/21404/crysis-2-tessellation-to...
         | 
         | https://wccftech.com/fight-nvidias-gameworks-continues-amd-c...
         | 
         | "Number one: Nvidia Gameworks typically damages the performance
         | on Nvidia hardware as well, which is a bit tragic really. It
         | certainly feels like it's about reducing the performance, even
         | on high-end graphics cards, so that people have to buy
         | something new.
         | 
         | "That's the consequence of it, whether it's intended or not -
         | and I guess I can't read anyone's minds so I can't tell you
         | what their intention is. But the consequence of it is it brings
         | PCs to their knees when it's unnecessary. And if you look at
         | Crysis 2 in particular, you see that they're tessellating water
         | that's not visible to millions of triangles every frame, and
         | they're tessellating blocks of concrete - essentially large
         | rectangular objects - and generating millions of triangles per
         | frame which are useless."
        
       | iforgotpassword wrote:
       | > All of these HW stages (except GS, of course)
       | 
       | ...Of course...
       | 
       | The intro was pretty rough, it expects a lot of knowledge about
       | the topic. I almost stopped reading there. The second part that
       | explains the possibilities and talks about real world performance
       | impact was pretty interesting though, so, glad I kept reading.
        
         | tpxl wrote:
         | It'd be nice if it explained the acronyms the first time they
         | are used.
        
           | cylon13 wrote:
           | Even knowing what they all are it's annoying to read compared
           | to having them spelled out. I'm constantly pausing to
           | translate. If you're talking to someone IRL you just say
           | "hardware vertex shader," not "HW VS".
        
           | striking wrote:
           | They link to a doc that has a glossary[1] but you're not
           | wrong, it would be helpful.
           | 
           | 1: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/a
           | md...
        
         | Jasper_ wrote:
         | Yeah, I know all these terms, but I only learned them from
         | working on the PS4 for a long time. Even those quite familiar
         | with graphics might struggle to read this article.
         | 
         | GS = geometry shader, and it's main gimmick is allowing you to
         | generate triangle topology on the GPU. So that's why it should
         | be "obvious" that a GS lane can control more than one output
         | vertex. GS was also a terrible idea, because it's slow.
        
           | jayd16 wrote:
           | Hey, geometry shaders can be awesome but they do have limited
           | use. I guess it might fall out of favor if we end up with
           | Nanite taking over.
        
             | ninepoints wrote:
             | Them falling out of favor has nothing to do with nanite
             | FYI, and you would not want to use GS for any of the
             | benefits a tech like nanite gives you because it's far too
             | slow. We've all moved to CS for that sort of things if you
             | can't rely on AS/MS
        
       ___________________________________________________________________
       (page generated 2022-07-15 23:02 UTC)