[HN Gopher] Zstandard - Real-time data compression algorithm
___________________________________________________________________
Zstandard - Real-time data compression algorithm
Author : RafelMri
Score : 62 points
Date : 2022-09-04 18:12 UTC (4 hours ago)
(HTM) web link (facebook.github.io)
(TXT) w3m dump (facebook.github.io)
| maxpert wrote:
| Also try LZ4 and Snappy. Depending on your use case they can save
| your extra CPU too.
| wmf wrote:
| Recent versions of zstd mostly obsolete LZ4 and Snappy so you
| can use one compressor to rule them all.
| morelisp wrote:
| The chart on the zstd site doesn't really back that up for
| lz4. zstd 1.4.5 -1 2.884 500 MB/s
| 1660 MB/s zstd 1.4.5 --fast=1 2.434 570 MB/s 2200
| MB/s zstd 1.4.5 --fast=5 2.178 700 MB/s 2420 MB/s
| lz4 1.9.2 2.101 740 MB/s 4530 MB/s
|
| 3% larger size for a nearly 2x decompression speed is a no
| brainer for lots of uses.
|
| Anecdotally I have also gotten much better decomp speed on
| snappy on previous generations of CPUs staying within ~10%
| compression ratio, though I had not benchmarked/tuned
| extensively nor reconfirmed in the past ~2 years. (Probably
| this can also vary a lot by implementation, which you're
| often stuck with based on some other library choice.)
| adgjlsfhk1 wrote:
| this can be a bit misleading because you only get the 2x
| decompression speed if you aren't bandwidth limited. this
| means that zstd will almost always be faster over a
| network.
| croon wrote:
| They are in the benchmarks on the page.
| mrl5 wrote:
| Interesting fact that under the hood it's based on tANS
| introduced by Jaroslaw Duda from Jagiellonian University:
|
| some cool references:
|
| https://www.youtube.com/watch?v=uXtmN9fE01k
|
| https://th.if.uj.edu.pl/~dudaj/
|
| https://demonstrations.wolfram.com/DataCompressionUsingAsymm...
|
| https://encode.su/threads/2078-List-of-Asymmetric-Numeral-Sy...
| vanderZwan wrote:
| Well, the entropy coding step is. Which is just one of multiple
| parts of the data compression. But entropy coding typically is
| the bottleneck in encoding/decoding speed, yes, and Duda's work
| is impressive (also because he took on Google when the latter
| didn't appear to keep their no-software-patents promise that
| they. made when they first started collaborating. The man
| stands up for his principles)
| tester756 wrote:
| >PhD in Theoretical Physics, PhD in Theoretical Computer
| Science, MSc in Theoretical Mathematics
|
| well, impressive
| tgsovlerkhgsel wrote:
| Is this comparing the same number of cores, or is zlib single-
| threaded and zstd multi-threaded?
|
| For some applications it matters how fast one operation
| completes, but for others, it's much more relevant how much CPU
| time it consumes, so if zstd needs 1 second on 8 cores for
| something gzip does in 8 seconds on 1 core, it would be no
| benefit at all.
| plorkyeran wrote:
| The benchmark numbers on the linked page are from lzbench,
| which tests single-threaded performance.
| Cyberdog wrote:
| I don't really know anything about compression algos, but
| something that struck me as interesting with this one when it
| came up on KF before is the ability to create a custom dictionary
| - not sure if that's really common among other compression algos
| or not. But since I often work with compressed MySQL dumps with a
| lot of repetitive data from my client's site that come out to
| almost 50MB when compressed with xz, I wonder if I could make a
| dictionary with all the MySQL keywords you'd see in dumps, plus
| other repetitive strings unique to that site (for example,
| something like two-thirds of all user email addresses end with
| "@gmail.com") and get those dump sizes even smaller.
|
| I imagine that xz might be good enough at making its own
| dictionary that what it ends up with wouldn't be much different
| from what I'd make manually, so I could see a decent chance that
| the improvements would be minimal at best. Has anyone done any
| experimentation along these lines?
| svnpenn wrote:
| I wish Rust would add this, but they don't seem interested at
| all:
|
| https://github.com/rust-lang/rustup/issues/1858
| JoshTriplett wrote:
| We're interested. I'd _love_ to see this happen. But it does
| require several different things to change, and rustup wouldn
| 't be the starting place (apart from needing to have _support_
| for it).
| staticassertion wrote:
| They seem very interested. There's lots of benchmarks and
| discussion. It just looks like zstd would take more space than
| xz - it wasn't a universally better solution. The concerns
| noted are totally reasonable and are based on real world issues
| that the teams that maintain this infra have. What's obvious,
| however, is that there was a _lot_ of interest.
| Svetlitski wrote:
| Zstandard really is incredible. Anywhere I would've previously
| compressed things with gzip I now universally prefer to use
| Zstandard instead. It's a shame it's not supported as a HTTP
| Content-Encoding in web browsers.
| MarkSweep wrote:
| There is RFC8478 for HTTP Content-Encoding, but I don't think
| any web browsers currently implement it.
|
| https://www.rfc-editor.org/rfc/rfc8478
| BackBlast wrote:
| Technically it is available, but it's just not widely
| supported. Though it's really unnecessary. Brotli is, in many
| ways, very comparable or a near equivalent to zstd.
| btdmaster wrote:
| According to some Firefox developers, brotli is more suitable
| because of the standardised dictionaries for web content, like
| JS: https://bugzilla.mozilla.org/show_bug.cgi?id=1301878
| jacooper wrote:
| I wish Caddy would support brotil, is either gzip or zstd
| currently.
| BackBlast wrote:
| Caddy 2 supports brotli static compression out of the box,
| I ran a test on it just last week with this exact
| configuration. Dynamic, I'm not sure. That use case doesn't
| interest me as much.
| anotherevan wrote:
| Just did some quick research and it looks like you can
| serve pre-compressed static resources in Brotil. See the
| precompressed option of the file_server directive[1].
|
| The thinking seems to be due to how CPU intensive it is,
| Brotil is not favoured for on-the-fly compression[2].
|
| If you're really desperate for it though, you could try
| [this extension][3].
|
| [1] https://caddyserver.com/docs/caddyfile/directives/file_
| serve...
|
| [2] https://caddy.community/t/caddy-v2-brotli/8805
|
| [3] https://github.com/ueffel/caddy-brotli
| morelisp wrote:
| Yeah, this is definitely a real advantage and not a broke-ass
| feedback loop of SEO and confirmation bias. It's great that
| we have to ship our compressors with massive English-biased
| static dictionaries now.
| [deleted]
___________________________________________________________________
(page generated 2022-09-04 23:01 UTC)