[HN Gopher] Zstandard - Real-time data compression algorithm
       ___________________________________________________________________
        
       Zstandard - Real-time data compression algorithm
        
       Author : RafelMri
       Score  : 62 points
       Date   : 2022-09-04 18:12 UTC (4 hours ago)
        
 (HTM) web link (facebook.github.io)
 (TXT) w3m dump (facebook.github.io)
        
       | maxpert wrote:
       | Also try LZ4 and Snappy. Depending on your use case they can save
       | your extra CPU too.
        
         | wmf wrote:
         | Recent versions of zstd mostly obsolete LZ4 and Snappy so you
         | can use one compressor to rule them all.
        
           | morelisp wrote:
           | The chart on the zstd site doesn't really back that up for
           | lz4.                   zstd 1.4.5 -1        2.884  500 MB/s
           | 1660 MB/s         zstd 1.4.5 --fast=1  2.434  570 MB/s  2200
           | MB/s         zstd 1.4.5 --fast=5  2.178  700 MB/s  2420 MB/s
           | lz4 1.9.2            2.101  740 MB/s  4530 MB/s
           | 
           | 3% larger size for a nearly 2x decompression speed is a no
           | brainer for lots of uses.
           | 
           | Anecdotally I have also gotten much better decomp speed on
           | snappy on previous generations of CPUs staying within ~10%
           | compression ratio, though I had not benchmarked/tuned
           | extensively nor reconfirmed in the past ~2 years. (Probably
           | this can also vary a lot by implementation, which you're
           | often stuck with based on some other library choice.)
        
             | adgjlsfhk1 wrote:
             | this can be a bit misleading because you only get the 2x
             | decompression speed if you aren't bandwidth limited. this
             | means that zstd will almost always be faster over a
             | network.
        
         | croon wrote:
         | They are in the benchmarks on the page.
        
       | mrl5 wrote:
       | Interesting fact that under the hood it's based on tANS
       | introduced by Jaroslaw Duda from Jagiellonian University:
       | 
       | some cool references:
       | 
       | https://www.youtube.com/watch?v=uXtmN9fE01k
       | 
       | https://th.if.uj.edu.pl/~dudaj/
       | 
       | https://demonstrations.wolfram.com/DataCompressionUsingAsymm...
       | 
       | https://encode.su/threads/2078-List-of-Asymmetric-Numeral-Sy...
        
         | vanderZwan wrote:
         | Well, the entropy coding step is. Which is just one of multiple
         | parts of the data compression. But entropy coding typically is
         | the bottleneck in encoding/decoding speed, yes, and Duda's work
         | is impressive (also because he took on Google when the latter
         | didn't appear to keep their no-software-patents promise that
         | they. made when they first started collaborating. The man
         | stands up for his principles)
        
         | tester756 wrote:
         | >PhD in Theoretical Physics, PhD in Theoretical Computer
         | Science, MSc in Theoretical Mathematics
         | 
         | well, impressive
        
       | tgsovlerkhgsel wrote:
       | Is this comparing the same number of cores, or is zlib single-
       | threaded and zstd multi-threaded?
       | 
       | For some applications it matters how fast one operation
       | completes, but for others, it's much more relevant how much CPU
       | time it consumes, so if zstd needs 1 second on 8 cores for
       | something gzip does in 8 seconds on 1 core, it would be no
       | benefit at all.
        
         | plorkyeran wrote:
         | The benchmark numbers on the linked page are from lzbench,
         | which tests single-threaded performance.
        
       | Cyberdog wrote:
       | I don't really know anything about compression algos, but
       | something that struck me as interesting with this one when it
       | came up on KF before is the ability to create a custom dictionary
       | - not sure if that's really common among other compression algos
       | or not. But since I often work with compressed MySQL dumps with a
       | lot of repetitive data from my client's site that come out to
       | almost 50MB when compressed with xz, I wonder if I could make a
       | dictionary with all the MySQL keywords you'd see in dumps, plus
       | other repetitive strings unique to that site (for example,
       | something like two-thirds of all user email addresses end with
       | "@gmail.com") and get those dump sizes even smaller.
       | 
       | I imagine that xz might be good enough at making its own
       | dictionary that what it ends up with wouldn't be much different
       | from what I'd make manually, so I could see a decent chance that
       | the improvements would be minimal at best. Has anyone done any
       | experimentation along these lines?
        
       | svnpenn wrote:
       | I wish Rust would add this, but they don't seem interested at
       | all:
       | 
       | https://github.com/rust-lang/rustup/issues/1858
        
         | JoshTriplett wrote:
         | We're interested. I'd _love_ to see this happen. But it does
         | require several different things to change, and rustup wouldn
         | 't be the starting place (apart from needing to have _support_
         | for it).
        
         | staticassertion wrote:
         | They seem very interested. There's lots of benchmarks and
         | discussion. It just looks like zstd would take more space than
         | xz - it wasn't a universally better solution. The concerns
         | noted are totally reasonable and are based on real world issues
         | that the teams that maintain this infra have. What's obvious,
         | however, is that there was a _lot_ of interest.
        
       | Svetlitski wrote:
       | Zstandard really is incredible. Anywhere I would've previously
       | compressed things with gzip I now universally prefer to use
       | Zstandard instead. It's a shame it's not supported as a HTTP
       | Content-Encoding in web browsers.
        
         | MarkSweep wrote:
         | There is RFC8478 for HTTP Content-Encoding, but I don't think
         | any web browsers currently implement it.
         | 
         | https://www.rfc-editor.org/rfc/rfc8478
        
         | BackBlast wrote:
         | Technically it is available, but it's just not widely
         | supported. Though it's really unnecessary. Brotli is, in many
         | ways, very comparable or a near equivalent to zstd.
        
         | btdmaster wrote:
         | According to some Firefox developers, brotli is more suitable
         | because of the standardised dictionaries for web content, like
         | JS: https://bugzilla.mozilla.org/show_bug.cgi?id=1301878
        
           | jacooper wrote:
           | I wish Caddy would support brotil, is either gzip or zstd
           | currently.
        
             | BackBlast wrote:
             | Caddy 2 supports brotli static compression out of the box,
             | I ran a test on it just last week with this exact
             | configuration. Dynamic, I'm not sure. That use case doesn't
             | interest me as much.
        
             | anotherevan wrote:
             | Just did some quick research and it looks like you can
             | serve pre-compressed static resources in Brotil. See the
             | precompressed option of the file_server directive[1].
             | 
             | The thinking seems to be due to how CPU intensive it is,
             | Brotil is not favoured for on-the-fly compression[2].
             | 
             | If you're really desperate for it though, you could try
             | [this extension][3].
             | 
             | [1] https://caddyserver.com/docs/caddyfile/directives/file_
             | serve...
             | 
             | [2] https://caddy.community/t/caddy-v2-brotli/8805
             | 
             | [3] https://github.com/ueffel/caddy-brotli
        
           | morelisp wrote:
           | Yeah, this is definitely a real advantage and not a broke-ass
           | feedback loop of SEO and confirmation bias. It's great that
           | we have to ship our compressors with massive English-biased
           | static dictionaries now.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-09-04 23:01 UTC)