[HN Gopher] Data Compression Drives the Internet. Here's How It ...
___________________________________________________________________
Data Compression Drives the Internet. Here's How It Works
Author : the-mitr
Score : 82 points
Date : 2023-06-09 00:36 UTC (22 hours ago)
(HTM) web link (www.quantamagazine.org)
(TXT) w3m dump (www.quantamagazine.org)
| fguerraz wrote:
| Quanta magazine and all these pop science websites need to be
| stopped.
|
| (Not because popular science is bad, but because they do it
| badly, and the clickbait is insufferable)
| sgt101 wrote:
| Caches drive the internet.
|
| They are orders of magnitude more important than compression.
| 99.99% of requests hit a local cache. (1)
|
| Compression is important too.
|
| (1) I worked in a telco. Check it out for yourself!
| Solvency wrote:
| Ok how do they work then?
| parl_match wrote:
| There's lot of information about this on the open web, and
| tons of resources for varying levels of skill.
| joshsabol46 wrote:
| Please pardon my ignorance...
|
| My understanding is that there are 1000s of different compression
| algorithms, each with their own pros/cons dependent on the type
| and characteristics of the file. And yet we still try to pick the
| "generically best" codec for a given file (ex. PNG) and then use
| that everywhere.
|
| Why don't we have context-dependent compression instead?
|
| I'm imagining a system that scans objects before compression,
| selects the optimal algorithm, and then encodes the file. The
| selected compression algorithm could be prefixed for easy
| decompression.
|
| Compare a single black image that's 1x1 to one that's 1000x1000.
| PNGs are 128bytes and 6KB, respectively. However, Run Length
| Encoding would compress the latter to a comparable size as the
| former.
| deepsun wrote:
| > selects the optimal algorithm
|
| Here's the catch: how does the "system" know which algorithm
| would be the best? It could try encoding it with multiple
| algorithms and see which one is shorter, but that's extra CPU.
|
| And the "system" can be called acompression algorithm itself.
| U2EF1 wrote:
| I mean yeah that's basically what high compression solutions
| like paq have done, depending on the compression level
| desired apply increasingly speculative and computationally
| intensive models to the block and pick whichever one worked
| the best.
| denvaar wrote:
| > The first step involves yet another clever strategy for
| identifying repetition and thereby compressing message size, but
| the second step is to take the resulting compressed message and
| run it through the Huffman process.
|
| I wonder if this "first step" is Burrows-Wheeler Transform?
|
| Side note: In Silicon Valley (the show), I'm pretty sure that
| Richard has a picture of David Huffman by his bedside.
| Scaevolus wrote:
| No, BWT is largely unused in modern compression codecs.
|
| Lempel-Ziv is the basis for almost all modern general purpose
| compression, and works more like having a hash table mapping 3
| or 4 byte fragments to their positions, and walking through the
| input byte by byte checking the hash table for matches and
| inserting the latest fragments&positions into the hash table.
|
| BWT has nearly identical speed compressing and decompressing,
| but searching for matches to compress is _much_ slower than
| simply copying data according to instructions to decompress.
| pseudotrash wrote:
| I came here to find Silicon Valley references and wasn't
| disappointed.
| devsegal wrote:
| Both of us have been satisfied
| hinkley wrote:
| at the same time?
| ThrowawayTestr wrote:
| I don't like that show because whenever I watch it I start
| imagining a world with inside-out compression and I get sad
| we'll never have it.
| gfody wrote:
| missed opportunity to explain how compression and prediction are
| related, and that the better you can predict the next token the
| better your compression gets, then your article gets to mention
| GPT hey
| optimalsolver wrote:
| Apparently compression and intelligence are synonymous:
|
| https://mattmahoney.net/dc/rationale.html
| valenterry wrote:
| I think in a sense compression is worse - because not only
| you want to correctly predict the next token, you also want
| to do it fast, with a minimal but efficient algorithm that
| also doesn't require much space / a big dictionary.
|
| You could think of it as taking a "snapshot" if an AI and
| then optimizing the hell out of it for a specific case and
| you end up with a good compression algorithm.
| hackcasual wrote:
| Huffman trees are really cool and a neat little example program
| to put together.
___________________________________________________________________
(page generated 2023-06-09 23:01 UTC)