[HN Gopher] GPU Architecture Types Explained
       ___________________________________________________________________
        
       GPU Architecture Types Explained
        
       Author : asicsp
       Score  : 111 points
       Date   : 2021-07-20 13:50 UTC (9 hours ago)
        
 (HTM) web link (rastergrid.com)
 (TXT) w3m dump (rastergrid.com)
        
       | Pieman103021 wrote:
       | Archived link -
       | https://web.archive.org/web/20210720135744/https://rastergri...
        
       | nspattak wrote:
       | apparently the poor web site felt some of the infinite hacker
       | news exposure love :)
        
       | dragontamer wrote:
       | This seems like a misnomer. This seems more like rendering API
       | architectures more so than GPU-architecture.
       | 
       | Which is still important: Immediate mode vs Tile-based is a big
       | shift in overall style. And GPU-hardware is designed for
       | particular software architectures (because the CPU will be
       | inevitably invoking calls in a certain pattern).
       | 
       | But it'd probably be more accurate to call this blogpost
       | "Rendering Architecture Types Explained" moreso than "GPU
       | Architecture". A modern GPU running DirectX 9.0 or OpenGL 2.0
       | would still be immediate mode for example.
        
         | [deleted]
        
         | oflordal wrote:
         | No, this is about HW architectures. While they are likely
         | evolving towards one a other there are tile based (like
         | Imagination and ARM Mali) And immediate mode (Nvidia AMD) that
         | both implement the same APIs (OpenGL, Vulkan etc). All these HW
         | architectures are modern and in use.
        
           | opencl wrote:
           | Basically all modern GPU architectures implement tiled
           | rasterization. NVIDIA has been doing it since Maxwell (2014)
           | and AMD has been doing it since Vega (2017). Even Intel has
           | been doing it for a few years now starting with their Gen 11
           | (2019) GPUs.
        
             | Arelius wrote:
             | Those are going to require some serious citations. I'm
             | quite sure most desktop GPUs don't run as tiled renderers
             | at least under normal circumstances.
        
               | brigade wrote:
               | Section 5.2 of Intel's Gen11 architecture manual [1]
               | 
               | (yes, PTBR is only enabled on passes the driver thinks
               | will benefit from it)
               | 
               | [1] https://software.intel.com/content/dam/develop/extern
               | al/us/e...
        
               | ryuuchin wrote:
               | > Specifically, Maxwell and Pascal use tile-based
               | immediate-mode rasterizers that buffer pixel output,
               | instead of conventional full-screen immediate-mode
               | rasterizers.
               | 
               | https://www.realworldtech.com/tile-based-rasterization-
               | nvidi...
               | 
               | He describes it as "tile-based immediate mode" in the
               | article and the video should go into more detail about
               | it. It's been a while since I watched it.
        
               | cma wrote:
               | The parent article already discusses that article, saying
               | those GPUs don't use TBR in areas where the primitive
               | count is too high or something:
               | 
               | > Another class of hybrid architecture is one that is
               | often referred to as tile-based immediate-mode rendering.
               | As dissected in this article[1], this hybrid architecture
               | is used since NVIDIA's Maxwell GPUs. Does that mean that
               | this architecture is like a TBR one, or that it shares
               | all benefits of both worlds? Well, not really...
               | 
               | What the article and the video fails to show is what
               | happens when you increase the primitive count.
               | Guillemot's test application doesn't support large
               | primitive counts, but the effect is already visible if we
               | crank up both the primitive and attribute count. After a
               | certain threshold it can be noted that not all primitives
               | are rasterized within a tile before the GPU starts
               | rasterizing the next tile, thus we're clearly not talking
               | about a traditional TBR architecture.
               | 
               | [1] https://www.realworldtech.com/tile-based-
               | rasterization-nvidi...
        
               | monocasa wrote:
               | Classic TBDRs typically require multiple passes on tiles
               | with large primitive counts as well. Each tile's buffer
               | containing binned geometry generally has a max size, with
               | multiple passes required if that buffer size is exceeded.
        
               | Arelius wrote:
               | Yeah, please see
               | https://news.ycombinator.com/item?id=27898421
               | 
               | Having watched the video, I'm fairly certain what is
               | being observed is not really tiled.
               | 
               | I'm not however sure what a "tile-based immediate-mode
               | rasterizers that buffer pixel output", but I think that's
               | enough qualifications to make it somewhat meaningless.
               | All modern gpu's dispatch thread groups that could look
               | like "tiles" and have plenty of buffers, likely including
               | buffers between fragment output, and render target
               | output/color blending, But that doesn't make it a
               | tiled/deferred renderer.
        
               | monocasa wrote:
               | AMD has even talked publicly about how their rasterizer
               | can run in a TBDR mode that they call DSBR.
               | 
               | https://pcper.com/2017/01/amd-vega-gpu-architecture-
               | preview-...
        
           | monocasa wrote:
           | Interestingly, Nvidia has been using tile based rasterizers
           | for a bit too. https://www.techpowerup.com/231129/on-nvidias-
           | tile-based-ren...
        
             | Arelius wrote:
             | It's been often quoted that Nvidia has switched to tile
             | based for their Desktop renderers, but I haven't seen a
             | source that confirms this. I suspect this is speculation
             | due to changes in raster order that produce side-effects
             | that look tiled even though they aren't.
        
               | ribit wrote:
               | This has been empirically tested on multiple occasions.
               | There is an article on realwordtechnologies discussing
               | this, and the results have been related for newer AMD
               | GPUs as well. I have a little tool for macOS that tests
               | these things out, and the Navi GPU on my MacBook is
               | definitely a tiler (the Gen10 Intel GPU is not).
        
               | [deleted]
        
             | [deleted]
        
         | lmeyerov wrote:
         | Agreed. For non-movies/games people -- think ML, neural
         | networks, simulations, ETL -- this is far from how we think
         | about them. Instead, focus is much more on thread divergence,
         | NUMA memory models, consistency models, hw/sw schedulers,
         | latency hiding, growing variety of DMA modes, funny ISA stacks,
         | etc. The rendering pipeline is a tiny bit relevant for GPGPU
         | people, e.g., if you're trying to do 1990s style shoehorning of
         | it into antiquated webgl 1/2 rendering primitives because
         | google/apple won't let you do the real thing.
        
       | ribit wrote:
       | I think that the article focuses too much on the academic
       | distinction between immediate renders and tilers but fails short
       | to discuss how these techniques relate to real-world GPUs. For
       | example, the fact that all contemporary AMD and Nvidia gaming
       | GPUs are tilers with large tiles (that's one of the key reasons
       | why Maxwell and Navi got a big boost in performance). Or that
       | many mainstream mobile GPUs employ various hacks (e.g. vertex
       | shader splitting) in order to simplify the architecture, but
       | which ultimately blocks their ability to scale to more advanced
       | applications. Notably missing any mention of TBDR which currently
       | powers the fastest low-power mobile and desktop GPUs on the
       | market.
        
         | phire wrote:
         | Regarding Maxwell and Navi: Actually, that's not true.
         | 
         | The micro-benchmark that suggested Maxwell (and later) was a
         | tiled deferred gpu was actually measuring something else. Each
         | GPC gets assigned different sceenspace areas, and concurrency
         | rules between the areas is relaxed (unless explicitly required
         | by shader atomics).
         | 
         | The result looks somewhat like tiled deferred rendering in that
         | micro-benchmark. But it's still very much immediate mode.
         | 
         | A similar thing happened with Navi.
         | 
         | However, there are mobile GPUs (Qualcomm's Adreno) that
         | dynamically switch between tiled deferred mode and immediate
         | mode on a per renderpass basis, depending on what driver
         | heuristics suggest will be faster.
        
           | Jasper_ wrote:
           | When did Adreno gain a deferred more? Back when I was talking
           | to Rob Clark in 2014 or so, it sounded like it was all
           | immediate per-tile.
        
             | phire wrote:
             | GPU terminology is confusing at times.
             | 
             | Imgtec and Apple use the term "Tile-Based Deferred
             | Rendering" to mean a combination of tiling and deferred
             | shading. Because that's what their GPUs do.
             | 
             | Other vendors, like qualcomm [1] still use the term
             | "Deferred" in regards to their Tile-Based Rendering, simply
             | because the draw calls are deferred. It doesn't mean
             | deferred shading.
             | 
             | Every company appears to make the the terminology as they
             | go. I found an early presentation from ARTX [2] and they
             | are using database terminology to describe what we now call
             | vertex buffers.
             | 
             | [1] https://developer.qualcomm.com/docs/adreno-
             | gpu/developer-gui...
             | 
             | [2] http://www.graphics.stanford.edu/courses/cs448a-01-fall
             | /lect...
        
         | cma wrote:
         | >For example, the fact that all contemporary AMD and Nvidia
         | gaming GPUs are tilers with large tiles (that's one of the key
         | reasons why Maxwell and Navi got a big boost in performance)
         | 
         | There's a whole section on it near the end:
         | 
         | "Another class of hybrid architecture is one that is often
         | referred to as tile-based immediate-mode rendering. As
         | dissected in this article, this hybrid architecture is used
         | since NVIDIA's Maxwell GPUs."
         | 
         | >Notably missing any mention of TBDR which currently powers the
         | fastest low-power mobile and desktop GPUs on the market.
         | 
         | Another section mentions:
         | 
         | "There's a long-standing myth (that luckily slowly disappears)
         | that deferred rendering techniques are not suitable for TBR
         | GPUs. "
        
       | qd6pwu4 wrote:
       | 503 Service Unavailable
        
         | squarefoot wrote:
         | 2021: the year GPUs became unavailable, just like websites
         | about them.
        
       ___________________________________________________________________
       (page generated 2021-07-20 23:01 UTC)