[HN Gopher] Raytracing on AMD's RDNA 2/3, and Nvidia's Turing an...
       ___________________________________________________________________
        
       Raytracing on AMD's RDNA 2/3, and Nvidia's Turing and Pascal
        
       Author : treesciencebot
       Score  : 74 points
       Date   : 2023-03-22 19:00 UTC (4 hours ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | matthewfcarlson wrote:
       | I've often wondered why Nvidia cards are generally so much better
       | at rendering scenes in Blender's cycles renderer (a raytracing
       | engine). The benchmarks on Blender's website are really telling
       | (https://opendata.blender.org/benchmarks/query/?group_by=devi...)
       | by the fact that the only non Nvidia entry on the first page is
       | the AMD 2X EPYC 9654 96-Core.
       | 
       | This really lays out the decisions that Nvidia made compared to
       | AMD and how their approach tends to hide some of the shortcoming
       | of GPUs (latency and utilization).
        
         | zokier wrote:
         | That is more of a software (ecosystem) thing. Nvidias CUDA and
         | OptiX are well beyond anything AMD has to offer. In Cycles
         | case, I believe that on Nvidia it is taking good advantage of
         | RT cores while on AMD they are completely unused which has
         | predictable effect on performance. Even ignoring the RT cores I
         | suspect the Nvidia code path is likely far more optimized than
         | AMD one
         | 
         | https://www.phoronix.com/news/AMD-HIP-RT-Blender-3.5-Plans
        
           | dotnet00 wrote:
           | Plus on AMD's side we have their inability to commit to fully
           | supporting any specific system long term, limiting open
           | source interest in doing things for them.
        
             | Melatonic wrote:
             | I thought AMD was all in on OpenCL?
        
               | dotnet00 wrote:
               | Nope, their OpenCL support has been kind of stagnant for
               | a while, especially on Windows. On top of that, part of
               | why Blender dropped its OpenCL supporting renderer was
               | that AMD's OpenCL was still pretty buggy, making the
               | renderer a pain to maintain.
               | 
               | Lately their focus is ROCm and their CUDA equivalent
               | language, but it also has limited official hardware
               | support and AFAIK the Windows SDK for it is still not
               | public.
               | 
               | Similar commitment issues have plagued their custom
               | renderers.
        
               | my123 wrote:
               | AMD's OpenCL driver is significantly _worse_ than NVIDIA
               | 's since ages afaik...
        
       | br1 wrote:
       | Interesting that card/drivers customize so much of ray tracing,
       | like rasterization in pre vulkan/metal/d3d12 or even fixed
       | function gpu days.
        
       | ladberg wrote:
       | Would love to see a more in-depth article on BVH construction
       | itself! I'm decently familiar with the main concepts but have no
       | clue what the current SOTA looks like (is that even public
       | info?).
       | 
       | BVH construction is my favorite question to ask in interviews
       | because there's no single best solution and it mostly relies on
       | mathy heuristics to get a decent tree. You can also always devote
       | more time to making a more optimal tree but there's a tradeoff
       | where it'll eventually take more time than it saves in
       | raytracing.
        
         | frogblast wrote:
         | This is the best I've found that covers recent developments:
         | 
         | https://meistdan.github.io/publications/bvh_star/paper.pdf
        
       | shmerl wrote:
       | Ray tracing on Linux for CP2077 with 7900 XTX is still barely
       | usable, but it's getting better.
       | 
       | I'd say RDNA 3 is not really giving useful ray tracing on for
       | example 2560x1440 unless you use upscaling to speed it up. May be
       | in a few GPU generations ray tracing will become usable with
       | native resolutions.
        
       | sylware wrote:
       | I did not get into the real details yet, but mesa radv pulls that
       | horrible glslang due to some shaders related to acceleration
       | structures.
       | 
       | Personnaly, I am a dev, then I patch to compile out all that (and
       | all the tracers at the same time) since ray tracing has currently
       | a ridiculous ratio benefits/technical costs.
       | 
       | This defeats the very purpose of vulkan spirv: getting rid of
       | those horrible high level shader compilers from the driver stack
       | and keep them contained at the application level.
       | 
       | It seems beyond clumsy, but as I said, I need to get into the
       | details of "why" those shaders in the first place, and then why
       | they are not written directly in RDNA assembly or SPIR-V assembly
       | (that would require an "assembler" coded in simple and plain C).
        
         | TazeTSchnitzel wrote:
         | Generating a ray tracing acceleration structure is very
         | complex, who'd want to implement that in assembly language?
        
       | pixelesque wrote:
       | I suspect the reason the author is seeing very shallow trees for
       | Nvidia might be because the lower levels are done fully behind
       | the scenes:
       | 
       | https://forums.developer.nvidia.com/t/extracting-bvh-from-op...
       | 
       | As someone who deals with BVHs a lot for ray intersection, I find
       | it pretty difficult to believe that leaf nodes with that number
       | of primitives will be anywhere near performant, even with fast
       | dedicated hardware like the RT cores.
       | 
       | It's true that the Nvidia cards have better intersection
       | performance than ray/box tests, but I don't believe it's in the
       | 100x ratio range which I suspect would be needed if the BVHs were
       | that shallow and leaf nodes that large.
        
         | TinkersW wrote:
         | Isn't wide BVH how embree works, 1 ray vs SIMD width boxes..
         | maybe Nvidia is simply doing the same thing but with the wider
         | GPU SIMD(32 I believe).
        
           | berkut wrote:
           | Yes, but normally 4- or 8-wide is the norm: the wider you go
           | the more sorting you have to do to traverse things in order
           | or find the nearest hit which has an overhead (hardware may
           | help with this, but it's still an overhead).
           | 
           | Previous indications from Nvidia about their BVHs don't seem
           | to show anything about very shallow trees for any of the BVH
           | algorithms that OptiX supports (scroll to bottom for reverse
           | visualisation of a BVH hierarchy on top of the Stanford Bunny
           | model): https://drive.google.com/file/d/1B5fNRFwv2LsGlCBJ8oKY
           | RiiDUtL...
        
         | frogblast wrote:
         | I strongly suspect the reason Nvidia trees are so shallow is
         | that NSight simply isn't showing the actual tree structure,
         | probably because Nvidia considers that proprietary. It appears
         | to just list all the leafs of a tree in a big flat list. But
         | there definitely is a tree in there.
        
           | Arrath wrote:
           | I'm very curious to see it unrolled down to its actual
           | structure.
        
           | kevingadd wrote:
           | Perhaps the rest of it isn't a tree and is some other
           | optimized data structure? Like some sort of spatial hash or
           | sort
        
       ___________________________________________________________________
       (page generated 2023-03-22 23:01 UTC)