[HN Gopher] Brotli-G: A GPU compression/decompression standard f...
       ___________________________________________________________________
        
       Brotli-G: A GPU compression/decompression standard for digital
       assets
        
       Author : josephscott
       Score  : 75 points
       Date   : 2022-11-22 14:49 UTC (8 hours ago)
        
 (HTM) web link (gpuopen.com)
 (TXT) w3m dump (gpuopen.com)
        
       | ttoinou wrote:
       | Anyone knows if there is a way to tune this algorithm to be lossy
       | ?
        
       | telendram wrote:
       | Interesting, nVidia selected to go with zstd instead:
       | https://developer.nvidia.com/nvcomp
        
       | pstuart wrote:
       | It requires the source asset to be compressed using brotli-g as
       | well, so it's dependent on the hosting servers to be of any
       | value.
        
         | the-alchemist wrote:
         | I don't think so, if I read the article correctly.
         | 
         | > Existing optimized Brotli decompression functions (CPU
         | implementations) should be able to decompress the Brotli-G
         | bitstream, while more optimal data-parallel implementations on
         | hosts or accelerators can further improve performance.
        
           | pstuart wrote:
           | Pre-coffee non-clarity on my part. I was referring to this
           | part:
           | 
           | > One thing for developers to note is that assets that have
           | already been compressed with Brotli cannot be decompressed
           | with Brotli-G decompressor implementations
        
       | peter_d_sherman wrote:
       | There's an interesting side point here: There exist (and there
       | will be more of these in the future) data compression algorithms
       | which are in general, _too slow_ for specific software use-cases.
       | (related: https://superuser.com/questions/263335/compression-
       | software-...).
       | 
       | The thing is -- they typically run too slow for their intended
       | applications on current, consumer-grade CPU's...
       | 
       | But, could some of them be optimized to take advantage of GPUs
       | (as Brotli is here) -- and would that then increase their
       | performance to a level such that applications which previously
       | could not use them because the algorithm previously took so long
       | -- can now make use of them IF the software end-user has the
       | proper GPU?
       | 
       | ?
       | 
       | I think there's a huge amount of possibilities here...
       | 
       | Especially when you get to compression algorithms that include
       | somewhat esoteric stuff, like Fourier Transforms, Wavelet
       | Transforms -- and other weird and esoteric math algorithms both
       | known and yet-to-be-discovered...
       | 
       | In other words, we've went far beyond Huffman and Lempel-Ziv for
       | compression when we're in this territory...
       | 
       | (In fact, there should be a field of study... the
       | confluence/intersection of all GPUs and all known compression
       | algorithms... yes, I know... something like that probably already
       | exists(!)... but I'm just thinking aloud here! <g>)
       | 
       | In conclusion, I think there's a huge amount of interesting
       | future possibilities in this area...
        
         | miohtama wrote:
         | See also Blosc, with their tag line "faster than memcpy"
         | 
         | https://www.blosc.org/pages/blosc-in-depth/
         | 
         | Blosch it optimised to have the work buffers that fit into L1
         | cache, so it can outperform memcpy for certain workflows, e.g.
         | numeric arrays. Because the bottleneck is not CPU, but RAM and
         | slower caches.
        
         | the-alchemist wrote:
         | Actually, there is a huge amount of work in this area. Take a
         | look at https://encode.su/forums/2-Data-Compression
        
           | MuffinFlavored wrote:
           | How do you get around the performance hit of having to buffer
           | from disk/network -> CPU -> RAM -> GPU and back or whatever?
        
             | jessermeyer wrote:
             | There is always a minimum cost of moving data from one
             | place to another. If you're computing on the GPU, the data
             | must arrive there. The problem is that PCIE bandwidth is
             | often a bottleneck, and so if you can upload compressed
             | data then you essentially get a free multiplier of
             | bandwidth based on the compression ratio. If the
             | decompression time is faster than having sent the full
             | uncompressed dataset, then you win.
             | 
             | But yeah, direct IO to the GPU would be great but that's
             | not feasible right now.
        
         | eklitzke wrote:
         | All the things you're talking about are widely used in audio
         | and video codecs, which typically do have hardware acceleration
         | support.
        
       | darrinm wrote:
       | Benchmarks?
        
       | corysama wrote:
       | There is a new interest in GPU-side general-purpose decompression
       | due to Microsoft pushing DirectStorage.
       | 
       | https://devblogs.microsoft.com/directx/directstorage-1-1-com...
        
         | nevi-me wrote:
         | To digress and talk about DirectStorage, reading their
         | documentation and announcements still leaves me with an
         | unanswered question. Maybe someone knows the answer.
         | 
         | Does DirectStorage only work for games, or can one use it for
         | compute workloads?
         | 
         | Context: I've been learning some basic GPU programming (via
         | rust-gpu though, not CUDA), and one of the things that sound
         | easy to implement are offloading compute kernels to the GPU
         | (e.g. arrow-rs).
         | 
         | Being able to load datasets via DirectStorage could be great,
         | but as I'm still really learning the basics, I can't figure out
         | whether I could leverage this for my work/learning.
        
           | mmozeiko wrote:
           | You can use it for any workload that can use D3D12 buffers or
           | textures as input. All the API does for you is transfer data
           | from disk to ID3D12Resource object. After that it is up to
           | you to do whatever you want - use it for fragment shader or
           | compute shader input, etc.. If you use other api's like cuda
           | or vulkan, then you'll need to use interop to create their
           | resources using D3D12 backed resource (or do a copy, whatever
           | is possible there).
        
       ___________________________________________________________________
       (page generated 2022-11-22 23:01 UTC)