[HN Gopher] Writing an efficient Vulkan renderer (2020)
       ___________________________________________________________________
        
       Writing an efficient Vulkan renderer (2020)
        
       Author : nabla9
       Score  : 70 points
       Date   : 2022-01-28 21:51 UTC (2 days ago)
        
 (HTM) web link (zeux.io)
 (TXT) w3m dump (zeux.io)
        
       | [deleted]
        
       | shmerl wrote:
       | Slightly off-topic, but I recently saw a very interesting dive
       | into comparing Vulkan to D3D12:
       | 
       | https://themaister.net/blog/2021/11/07/my-personal-hell-of-t...
        
       | oddity wrote:
       | Sadly, it's even more complicated than what's described here.
       | There are a lot of undocumented fast paths and nonorthogonalities
       | between features where you can really only learn about them by
       | talking to your friendly neighborhood driver engineer in a
       | backdoor conversation (that only a select few developers have
       | access to). Otherwise, it's easy to miss a lot of spooky action
       | at a distance. Vulkan and DX12 might have been sold as "low-
       | level" APIs but they didn't (and can't) do much to address the
       | fundamental economic issues that push hardware engineers to build
       | patchwork architectures.
       | 
       | I'm half of the opinion that writing an efficient Vulkan or DX12
       | renderer has not been and cannot be done. If you're not one of
       | the developers shipping a AAA title with enough weight to get
       | attention from the RGBs, you're probably better off going as
       | bindless and gpu-driven as you can. It's not optimal (or entirely
       | possible on some architectures), but it probably has the most
       | consistent behavior across architectures when it's supported.
        
         | shmerl wrote:
         | Is Godot efficient enough? They are preparing to release their
         | Vulkan based version.
         | 
         | Vulkan drivers for AMD and Intel are also open source (at least
         | for Linux), so developers can look into how things are
         | implemented without asking for some special attention. Plus
         | driver developers are pretty open to questions.
        
           | oddity wrote:
           | For most games that people might want to make with Godot,
           | it's probably good enough. Not everything needs to be perf-
           | optimal.
           | 
           | Also, the driver source being open doesn't mean that all
           | their secrets are too.
        
             | shmerl wrote:
             | What do you mean by secrets? Mesa especially uses open
             | development, so decisions about drivers are made in the
             | open.
        
               | oddity wrote:
               | Even if the driver were a thin layer over DMA talking to
               | the hardware such that every button being pressed was
               | obvious, the performance characteristics of the buttons
               | the driver pushes are not unless you can see their RTL or
               | otherwise know the right combination to push to hit the
               | right potholes.
               | 
               | Except, we're not in that situation, and even for mostly
               | transparent drivers there can be an opaque firmware blob
               | in between you and the hardware. Without changing the
               | economics, pushing the API lower only pushes the
               | implementation of the opaque behavior lower.
        
               | shmerl wrote:
               | There is some level of reverse engineering that went into
               | ACO from what I understood, because even with
               | documentation on AMD's ISA, some edge cases remain. But
               | not sure how much of that is affected by the firmware
               | blob.
               | 
               | It would be nice to have an open firmware, yeah, but is
               | that really preventing making efficient drivers or using
               | them efficiently? For the most part how hardware operates
               | is more or less understood.
        
               | oddity wrote:
               | Performance, at the level of a single frame, is a
               | complicated multi-system interaction where only a few
               | layers are visible, and small bubbles can cascade into
               | missed frames. None of the IHVs are incentivized to make
               | all layers visible, so I'd argue that how hardware
               | operates is not more or less understood by the
               | application developer unless they have the influence to
               | talk to the IHVs directly.
               | 
               | I'm not saying that the APIs can't be used to get good
               | enough performance, but I am saying that it's a
               | fiendishly hard problem, and even harder if you're not
               | one of the handful to get (more) complete information.
               | Perhaps much more so than the CPU world where
               | microbenchmarking is more mature.
        
               | shmerl wrote:
               | I'd say it would be more of a concern for driver
               | developers than to application developers, because their
               | focus is on making the layer that's interacting with
               | hardware efficient. I haven't seen them talking so far
               | about having such kind problems with firmware in case of
               | AMD, but may be you've seen that (I'd be interested in
               | reading about it).
               | 
               | I've seen other kind of issues described by Nouveau
               | developers, due to Nvidia being nasty and hiding stuff
               | like ability to reclock the GPU behind the firmware,
               | making it very hard to develop open drivers because they
               | can't even make the GPU to run on higher clocks.
        
         | throwaway17_17 wrote:
         | Question: By gpu-driven do you mean targeting actual GPU
         | specific functionality and then creating a renderer for each
         | manufacturer and card generation you intend to support?
         | 
         | Also, what do you mean by bindless?
         | 
         | Sorry if those are basic questions, but I find graphics
         | programming fascinating as someone who doesn't get to do it
         | professionally.
        
           | oddity wrote:
           | Most GPUs are not self-sufficient to the degree that CPUs
           | are. Where on a CPU, I can spawn an arbitrary child
           | process/thread running (almost) any code with customized
           | communication between the threads/processes, on a GPU, the
           | hardware (and/or driver) implements the process loading and
           | scheduling logic, usually restricted to a pattern that suits
           | graphics applications. This means that there's a greater
           | surface area for the graphics APIs to have unintuitive
           | behavior.
           | 
           | GPU-driven rendering relies on some of the limited ability
           | for the application programmer to schedule more work for the
           | GPU _on_ the GPU (example: https://www.advances.realtimerende
           | ring.com/s2015/aaltonenhaa...). Bindless rendering refers to
           | the ability for the GPU shader to use resources on the GPU
           | without having to announce to the driver that it intends to
           | use those resources. Essentially, I'm saying that the layers
           | between the application and its shaders are so opaque that
           | for most ordinary people, the most reasonable solution is to
           | use less of them wherever possible so that they can actually
           | intuit the numbers they're seeing through microbenchmarking.
           | In both cases, there's a code complexity and performance
           | penalty, so if you're trying to get peak performance (like a
           | AAA might), then there are good reasons to not do it. This is
           | on top of portability concerns since not all hardware
           | supports all the features that might be needed to do this.
           | 
           | If you're just trying to get into graphics though, I'd
           | recommend starting with webgpu (where these techniques aren't
           | possible, yet). The API is relatively clean and easy to work
           | with compared to everything else, and I'm assuming
           | performance won't be the primary concern.
        
       | dang wrote:
       | A small thread at the time:
       | 
       |  _Writing an Efficient Vulkan Renderer_ -
       | https://news.ycombinator.com/item?id=24368353 - Sept 2020 (2
       | comments)
        
         | aliswe wrote:
         | How about automating this functionality that youre currently
         | doing manually?
        
       ___________________________________________________________________
       (page generated 2022-01-30 23:01 UTC)