[HN Gopher] GPU Caching Compared Among AMD, Intel UHD, Apple M1
       ___________________________________________________________________
        
       GPU Caching Compared Among AMD, Intel UHD, Apple M1
        
       Author : jb1991
       Score  : 43 points
       Date   : 2023-01-16 17:58 UTC (5 hours ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | hot_gril wrote:
       | Nice, succinct 1-2 page article going into interesting technical
       | details. As someone who's hardly touched graphics, GPUs have
       | always been magic to me, especially integrated ones, so it's nice
       | to read digestible explanations about them.
        
       | lowbloodsugar wrote:
       | >bandwidth is the same for AMD and Apple and much lower for
       | Intel.
       | 
       | Later
       | 
       | >Intel: 700, AMD 1400, Apple: 2100
       | 
       | I wouldn't call 2x and 3x "similar".
       | 
       | Also I don't see why author thinks desktop chips with integrated
       | graphics are meant to be paired with a discreet GPU. Surely the
       | opposite is true. I got a faster CPU by not getting one with
       | integrated graphics.
       | 
       | Finally, doesn't the fact that apple has a fundamentally
       | different rendering pipeline relevant?
        
         | Dalewyn wrote:
         | At least with regards to Intel CPUs, iGPU-less CPUs (the ones
         | with -F suffixes) are otherwise identical to the standard ones
         | with iGPUs. The main reason to buy them is the slightly lower
         | price, which could make a difference if you're on a tight
         | budget.
         | 
         | On a tangential note, it's great having an iGPU even if you are
         | almost never going to use it. If your discrete GPU borks, you
         | have a fallback ready and waiting. If you do use it alongside a
         | discrete GPU, you can offload certain lower priority tasks like
         | video encoding/decoding to it.
        
       | deagle50 wrote:
       | Intel Steam Deck 2 would be very interesting. I think they could
       | make something very compelling in the continuous 15W under gaming
       | load space.
        
       | alanfranz wrote:
       | > as modern dedicated GPUs, can theoretically do zero-copy
       | transfers by mapping the appropriate memory on both the CPU and
       | GPU.
       | 
       | Is this true for dgpus? How does this work?
        
         | yaantc wrote:
         | This is not specific to dGPU, it could apply to any PCIe
         | device. Emphasis on "theoretically" too.
         | 
         | On the device (dGPU here), it is possible to route memory
         | accesses to part of the internal address space to the PCIe
         | controller. In turn, the PCIe controller can translate such
         | received memory access into a PCIe request (read or write), in
         | the different PCIe address space, with some address
         | translation.
         | 
         | This PCIe request goes to the PCIe host (CPU in a dGPU
         | scenario). Here too the host PCIe controller can map the PCIe
         | request, using using a PCIe address space address, into the
         | host address space. And this can go to the host memory (after
         | IOMMU filtering and address translation usually). And all this
         | back for the return trip to the device in case of a read.
         | 
         | So latency would be rather high, but technically possible. In
         | most application such transfers are offloaded to a DMA in the
         | PCIe controller doing a copy between PCIe and local address
         | spaces, but a processing core can certainly do a direct access
         | without DMA if all the address mappings are suitably
         | configured.
        
         | kevingadd wrote:
         | In theory for a long time you've been able to "persistently
         | map" A GPU side buffer that houses things like indexes, vertex
         | data, or even textures, and then write directly* into GPU
         | memory from the CPU without a staging buffer. This was referred
         | to as 'AZDO' (Approaching Zero Driver Overhead) in the OpenGL
         | space and eventually fed into the design of Vulkan and Direct3D
         | 12 (see https://www.gdcvault.com/play/1020791/Approaching-Zero-
         | Drive... if you're curious about all of this)
         | 
         | I say in theory and used an asterisk because I think it's
         | generally the case that the driver could lie and just maintain
         | an illusion for you by flushing a staging buffer at the 'right
         | time'. But in practice my understanding is that the memory
         | writes will go straight over the PCIe bus to the GPU and into
         | its memory, perhaps with a bit of write-caching/write-combining
         | locally on the CPU. It would be wise to make sure you never
         | _read_ from that mapped memory :)
        
       ___________________________________________________________________
       (page generated 2023-01-16 23:00 UTC)