[HN Gopher] Writing an efficient Vulkan renderer (2020)
___________________________________________________________________
Writing an efficient Vulkan renderer (2020)
Author : nabla9
Score : 70 points
Date : 2022-01-28 21:51 UTC (2 days ago)
(HTM) web link (zeux.io)
(TXT) w3m dump (zeux.io)
| [deleted]
| shmerl wrote:
| Slightly off-topic, but I recently saw a very interesting dive
| into comparing Vulkan to D3D12:
|
| https://themaister.net/blog/2021/11/07/my-personal-hell-of-t...
| oddity wrote:
| Sadly, it's even more complicated than what's described here.
| There are a lot of undocumented fast paths and nonorthogonalities
| between features where you can really only learn about them by
| talking to your friendly neighborhood driver engineer in a
| backdoor conversation (that only a select few developers have
| access to). Otherwise, it's easy to miss a lot of spooky action
| at a distance. Vulkan and DX12 might have been sold as "low-
| level" APIs but they didn't (and can't) do much to address the
| fundamental economic issues that push hardware engineers to build
| patchwork architectures.
|
| I'm half of the opinion that writing an efficient Vulkan or DX12
| renderer has not been and cannot be done. If you're not one of
| the developers shipping a AAA title with enough weight to get
| attention from the RGBs, you're probably better off going as
| bindless and gpu-driven as you can. It's not optimal (or entirely
| possible on some architectures), but it probably has the most
| consistent behavior across architectures when it's supported.
| shmerl wrote:
| Is Godot efficient enough? They are preparing to release their
| Vulkan based version.
|
| Vulkan drivers for AMD and Intel are also open source (at least
| for Linux), so developers can look into how things are
| implemented without asking for some special attention. Plus
| driver developers are pretty open to questions.
| oddity wrote:
| For most games that people might want to make with Godot,
| it's probably good enough. Not everything needs to be perf-
| optimal.
|
| Also, the driver source being open doesn't mean that all
| their secrets are too.
| shmerl wrote:
| What do you mean by secrets? Mesa especially uses open
| development, so decisions about drivers are made in the
| open.
| oddity wrote:
| Even if the driver were a thin layer over DMA talking to
| the hardware such that every button being pressed was
| obvious, the performance characteristics of the buttons
| the driver pushes are not unless you can see their RTL or
| otherwise know the right combination to push to hit the
| right potholes.
|
| Except, we're not in that situation, and even for mostly
| transparent drivers there can be an opaque firmware blob
| in between you and the hardware. Without changing the
| economics, pushing the API lower only pushes the
| implementation of the opaque behavior lower.
| shmerl wrote:
| There is some level of reverse engineering that went into
| ACO from what I understood, because even with
| documentation on AMD's ISA, some edge cases remain. But
| not sure how much of that is affected by the firmware
| blob.
|
| It would be nice to have an open firmware, yeah, but is
| that really preventing making efficient drivers or using
| them efficiently? For the most part how hardware operates
| is more or less understood.
| oddity wrote:
| Performance, at the level of a single frame, is a
| complicated multi-system interaction where only a few
| layers are visible, and small bubbles can cascade into
| missed frames. None of the IHVs are incentivized to make
| all layers visible, so I'd argue that how hardware
| operates is not more or less understood by the
| application developer unless they have the influence to
| talk to the IHVs directly.
|
| I'm not saying that the APIs can't be used to get good
| enough performance, but I am saying that it's a
| fiendishly hard problem, and even harder if you're not
| one of the handful to get (more) complete information.
| Perhaps much more so than the CPU world where
| microbenchmarking is more mature.
| shmerl wrote:
| I'd say it would be more of a concern for driver
| developers than to application developers, because their
| focus is on making the layer that's interacting with
| hardware efficient. I haven't seen them talking so far
| about having such kind problems with firmware in case of
| AMD, but may be you've seen that (I'd be interested in
| reading about it).
|
| I've seen other kind of issues described by Nouveau
| developers, due to Nvidia being nasty and hiding stuff
| like ability to reclock the GPU behind the firmware,
| making it very hard to develop open drivers because they
| can't even make the GPU to run on higher clocks.
| throwaway17_17 wrote:
| Question: By gpu-driven do you mean targeting actual GPU
| specific functionality and then creating a renderer for each
| manufacturer and card generation you intend to support?
|
| Also, what do you mean by bindless?
|
| Sorry if those are basic questions, but I find graphics
| programming fascinating as someone who doesn't get to do it
| professionally.
| oddity wrote:
| Most GPUs are not self-sufficient to the degree that CPUs
| are. Where on a CPU, I can spawn an arbitrary child
| process/thread running (almost) any code with customized
| communication between the threads/processes, on a GPU, the
| hardware (and/or driver) implements the process loading and
| scheduling logic, usually restricted to a pattern that suits
| graphics applications. This means that there's a greater
| surface area for the graphics APIs to have unintuitive
| behavior.
|
| GPU-driven rendering relies on some of the limited ability
| for the application programmer to schedule more work for the
| GPU _on_ the GPU (example: https://www.advances.realtimerende
| ring.com/s2015/aaltonenhaa...). Bindless rendering refers to
| the ability for the GPU shader to use resources on the GPU
| without having to announce to the driver that it intends to
| use those resources. Essentially, I'm saying that the layers
| between the application and its shaders are so opaque that
| for most ordinary people, the most reasonable solution is to
| use less of them wherever possible so that they can actually
| intuit the numbers they're seeing through microbenchmarking.
| In both cases, there's a code complexity and performance
| penalty, so if you're trying to get peak performance (like a
| AAA might), then there are good reasons to not do it. This is
| on top of portability concerns since not all hardware
| supports all the features that might be needed to do this.
|
| If you're just trying to get into graphics though, I'd
| recommend starting with webgpu (where these techniques aren't
| possible, yet). The API is relatively clean and easy to work
| with compared to everything else, and I'm assuming
| performance won't be the primary concern.
| dang wrote:
| A small thread at the time:
|
| _Writing an Efficient Vulkan Renderer_ -
| https://news.ycombinator.com/item?id=24368353 - Sept 2020 (2
| comments)
| aliswe wrote:
| How about automating this functionality that youre currently
| doing manually?
___________________________________________________________________
(page generated 2022-01-30 23:01 UTC)