[HN Gopher] Brotli-G: A GPU compression/decompression standard f...
___________________________________________________________________
Brotli-G: A GPU compression/decompression standard for digital
assets
Author : josephscott
Score : 75 points
Date : 2022-11-22 14:49 UTC (8 hours ago)
(HTM) web link (gpuopen.com)
(TXT) w3m dump (gpuopen.com)
| ttoinou wrote:
| Anyone knows if there is a way to tune this algorithm to be lossy
| ?
| telendram wrote:
| Interesting, nVidia selected to go with zstd instead:
| https://developer.nvidia.com/nvcomp
| pstuart wrote:
| It requires the source asset to be compressed using brotli-g as
| well, so it's dependent on the hosting servers to be of any
| value.
| the-alchemist wrote:
| I don't think so, if I read the article correctly.
|
| > Existing optimized Brotli decompression functions (CPU
| implementations) should be able to decompress the Brotli-G
| bitstream, while more optimal data-parallel implementations on
| hosts or accelerators can further improve performance.
| pstuart wrote:
| Pre-coffee non-clarity on my part. I was referring to this
| part:
|
| > One thing for developers to note is that assets that have
| already been compressed with Brotli cannot be decompressed
| with Brotli-G decompressor implementations
| peter_d_sherman wrote:
| There's an interesting side point here: There exist (and there
| will be more of these in the future) data compression algorithms
| which are in general, _too slow_ for specific software use-cases.
| (related: https://superuser.com/questions/263335/compression-
| software-...).
|
| The thing is -- they typically run too slow for their intended
| applications on current, consumer-grade CPU's...
|
| But, could some of them be optimized to take advantage of GPUs
| (as Brotli is here) -- and would that then increase their
| performance to a level such that applications which previously
| could not use them because the algorithm previously took so long
| -- can now make use of them IF the software end-user has the
| proper GPU?
|
| ?
|
| I think there's a huge amount of possibilities here...
|
| Especially when you get to compression algorithms that include
| somewhat esoteric stuff, like Fourier Transforms, Wavelet
| Transforms -- and other weird and esoteric math algorithms both
| known and yet-to-be-discovered...
|
| In other words, we've went far beyond Huffman and Lempel-Ziv for
| compression when we're in this territory...
|
| (In fact, there should be a field of study... the
| confluence/intersection of all GPUs and all known compression
| algorithms... yes, I know... something like that probably already
| exists(!)... but I'm just thinking aloud here! <g>)
|
| In conclusion, I think there's a huge amount of interesting
| future possibilities in this area...
| miohtama wrote:
| See also Blosc, with their tag line "faster than memcpy"
|
| https://www.blosc.org/pages/blosc-in-depth/
|
| Blosch it optimised to have the work buffers that fit into L1
| cache, so it can outperform memcpy for certain workflows, e.g.
| numeric arrays. Because the bottleneck is not CPU, but RAM and
| slower caches.
| the-alchemist wrote:
| Actually, there is a huge amount of work in this area. Take a
| look at https://encode.su/forums/2-Data-Compression
| MuffinFlavored wrote:
| How do you get around the performance hit of having to buffer
| from disk/network -> CPU -> RAM -> GPU and back or whatever?
| jessermeyer wrote:
| There is always a minimum cost of moving data from one
| place to another. If you're computing on the GPU, the data
| must arrive there. The problem is that PCIE bandwidth is
| often a bottleneck, and so if you can upload compressed
| data then you essentially get a free multiplier of
| bandwidth based on the compression ratio. If the
| decompression time is faster than having sent the full
| uncompressed dataset, then you win.
|
| But yeah, direct IO to the GPU would be great but that's
| not feasible right now.
| eklitzke wrote:
| All the things you're talking about are widely used in audio
| and video codecs, which typically do have hardware acceleration
| support.
| darrinm wrote:
| Benchmarks?
| corysama wrote:
| There is a new interest in GPU-side general-purpose decompression
| due to Microsoft pushing DirectStorage.
|
| https://devblogs.microsoft.com/directx/directstorage-1-1-com...
| nevi-me wrote:
| To digress and talk about DirectStorage, reading their
| documentation and announcements still leaves me with an
| unanswered question. Maybe someone knows the answer.
|
| Does DirectStorage only work for games, or can one use it for
| compute workloads?
|
| Context: I've been learning some basic GPU programming (via
| rust-gpu though, not CUDA), and one of the things that sound
| easy to implement are offloading compute kernels to the GPU
| (e.g. arrow-rs).
|
| Being able to load datasets via DirectStorage could be great,
| but as I'm still really learning the basics, I can't figure out
| whether I could leverage this for my work/learning.
| mmozeiko wrote:
| You can use it for any workload that can use D3D12 buffers or
| textures as input. All the API does for you is transfer data
| from disk to ID3D12Resource object. After that it is up to
| you to do whatever you want - use it for fragment shader or
| compute shader input, etc.. If you use other api's like cuda
| or vulkan, then you'll need to use interop to create their
| resources using D3D12 backed resource (or do a copy, whatever
| is possible there).
___________________________________________________________________
(page generated 2022-11-22 23:01 UTC)