[HN Gopher] Zstandard v1.5.0
___________________________________________________________________
Zstandard v1.5.0
Author : ascom
Score : 92 points
Date : 2021-05-14 16:11 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| aidenn0 wrote:
| Zstd is so much better than the commonly-used alternatives that I
| get mildly annoyed when given a .tar.{gz,xz,bz2} it's not like
| it's a huge deal, but a much smaller file (compared to gz) or
| similarly sized with much faster decompression (comared to xz,
| bz2) just makes me a tiny bit happier.
| xxpor wrote:
| The only problem I have with it is when I first heard about it
| I thought it was another name for the old school .Z/COMPRESS
| algorithm.
| infogulch wrote:
| Your comment made me curious what a Zstandard-compressed tar
| file's extension would be, and apparently it's .tar.zst
| apendleton wrote:
| I agree with the general premise that there's no reason to ever
| use gzip anymore (unless you're in an environment where you
| can't install stuff), but interestingly my experience with the
| tradeoffs is apparently not the same as yours. I tend to find
| that zstd and gzip give pretty similar compression ratios for
| the things I tend to work with, but that zstd is way faster,
| and that xz offers better compression ratios than either, but
| is slow. So like, my personal decision matrix is "if I really
| care about compression, use xz; if I want pretty good
| compression and great speed -- that is, if before I would have
| used gzip -- use zstd; and if I really want the fastest
| possible speed and can give up some compression, use lz4."
| jopsen wrote:
| Most of the time you also care about ease of use and
| compatibility.
| apendleton wrote:
| Maybe in a generic-you sense ("one also cares"), but if by
| "you" you mean me, no, most of my compression needs are in
| situations where I control both the compression and
| decompression sides of the interaction, e.g., deciding how
| to store business data at rest on s3, and debating the
| tradeoffs between cost of space, download time, and
| decompression time/CPU use. We migrated a bunch of
| workflows at my last job from gzip to either lz4 or zstd to
| take advantage of better tradeoffs there, and if I were
| building a similar pipeline from scratch now, gzip would
| not be a contender. Adding an extra dependency to my
| application is pretty trivial, in exchange for shaving ten
| minutes' worth of download and decompression time off of
| every CI run.
| aidenn0 wrote:
| A few comments:
|
| 1. There are two speeds: compression and decompression; lz4
| only beats zstd when decompressing ("zstd -1" will compress
| faster than lz4, and you can crank that up several levels and
| still beat lz4_hc on cmopression). bzip2 is actually fairly
| competitive at compression for the ratios it achieves but
| loses badly at decompression.
|
| 2. "zstd --ultra -22" is nearly identical compression to xz
| on a corpus I just tested (an old gentoo distfiles snapshot)
| while decompressing much faster (I didn't compare compression
| speeds because the files were already xz compressed).
|
| [edit]
|
| Arch linux (which likely tested a larger corpus than I)
| reported a 0.8% regression in size when switching from xz to
| zstd using a compression level 20. This supports your
| assertion that xz will beat zstd in compression ratio.
|
| [edit2]
|
| bzip2 accidentally[1] outperforms all other compression
| algorithms I've tried handily on large files that are all
| zero; for example 1GB of zeroes with "dd if=/dev/zero
| bs=$((1024*1024)) count=1024 |bzip2 -9 > foo.bz2" generates a
| file that is only 785 bytes. zstd is 33k and xz is 153k. Of
| course my non-codegolfed script for generating 1GB of zeros
| is only 38 bytes...
|
| 1: There was a bug in the original BWT implementation that
| had degenerate performance on long strings of identical
| bytes, so bzip2 includes an RLE pass before the BWT.
| markdog12 wrote:
| Good blog post on zstd:
| https://gregoryszorc.com/blog/2017/03/07/better-compression-...
| greatgoat420 wrote:
| > Single file Libs > This move reflects a commitment on our part
| to support this tool and this pattern of using zstd going
| forward.
|
| I love that they are moving toward supporting an amalgamation
| build. I and many others reach for SQLite because of this
| feature, and I think this will really increase the adoption of
| Zstd.
| felixhandte wrote:
| Glad to hear it! It's a pretty hefty single file, so it
| probably won't be qualifying for
| https://github.com/nothings/single_file_libs anytime soon...
| but hopefully people find it useful nonetheless.
| rektide wrote:
| When can we bring this to the web? Zstd aka RFC8478[1] is so
| good. That it can continue to improve at all feels almost
| unbelievable, but @Cyan4973 &al continue to make it faster,
| somehow.
|
| Especially on mobile, with large assets, I feel like zstd's
| lightning fast decompression time could be a huge win. It used to
| be that Brotli was the obvious choice for achieving high
| compression, but it doesn't feel so clear to me now. Here's one
| random run-off between the two[2].
|
| The other obvious use case is if there is large-ish dynamic-ish
| data, where the cost of doing Brotli compression each time might
| be too high.
|
| [1] https://datatracker.ietf.org/doc/html/rfc8478
|
| [2] https://peazip.github.io/fast-compression-benchmark-
| brotli-z...
| ac29 wrote:
| Caddy supports Zstd encoding:
| https://caddyserver.com/docs/caddyfile/directives/encode
|
| On the client end, curl does:
| https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
| meltedcapacitor wrote:
| Brotli is such an ugly hack (hardcoded dictionary with a
| snapshot of the world as it looked from Mountain View on some
| random day...), the quicker it dies the better.
| nvllsvm wrote:
| It's also impossible to identify whether an arbitrary file is
| compressed with brotli. It lacks a magic number.
| https://github.com/google/brotli/issues/298
| rubyist5eva wrote:
| Is this the same "zstd" compression used in Fedora's btrfs
| transparent block level compression? I have been thoroughly
| impressed with it in Fedora 34. If that's true, I had no idea
| that it was a Facebook project. Color me shocked.
| gliptic wrote:
| It wasn't originally. Facebook hired Yann Collet well after
| zstd was a working thing.
| terrelln wrote:
| Yeah, it is.
|
| The Linux kernel is currently using zstd-1.3.1, and I'm working
| on getting it updated to the latest zstd version.
| post-factum wrote:
| Looking forward to having modern zstd in-kernel! Thanks for
| your efforts.
| ipsum2 wrote:
| According to https://en.wikipedia.org/wiki/Btrfs the core
| developers of brtfs work at Facebook.
|
| > In June 2012, Chris Mason left Oracle for Fusion-io, which he
| left a year later with Josef Bacik to join Facebook. While at
| both companies, Mason continued his work on Btrfs.[27][17]
| greatgoat420 wrote:
| Facebook actually does a decent amount of work on Fedora, and
| were even part of the force behind using btrfs as the default.
| sudeepj wrote:
| zstandard continues to amazes me. Compared to zlib (level=4 I
| think) it seems to have best of both worlds (good speed &
| comparable compression ratio).
___________________________________________________________________
(page generated 2021-05-14 23:00 UTC)