[HN Gopher] Ts_zip: Text Compression Using Large Language Models
       ___________________________________________________________________
        
       Ts_zip: Text Compression Using Large Language Models
        
       Author : Deeg9rie9usi
       Score  : 110 points
       Date   : 2023-08-16 20:08 UTC (2 hours ago)
        
 (HTM) web link (bellard.org)
 (TXT) w3m dump (bellard.org)
        
       | refulgentis wrote:
       | Intriguing promise, but I can't find technical details or source,
       | only binary exectuables. Anyone else have luck?
       | 
       | The "(and hopefully decompress)" would usually make me think it
       | was like when people would "compress" sequences into emoji using
       | ChatGPT...but its bellard
        
       | sterlind wrote:
       | How does this work? I know that LLMs predict the log-likelihood
       | of the next token in a sequence.. so I guess you initialize the
       | LLM from scratch, say that each token of the input text is the
       | Nth most likely possibility, and record what N is for that token?
       | And then you do some combination of run-length encoding and
       | Huffman trees or something to record the Ns?
       | 
       | Since this is not open-source, but seems simple to do, perhaps
       | I'll make my own.
        
       | AceJohnny2 wrote:
       | This is hilarious.
       | 
       | In compression contest circles, there's long been this concept of
       | "AI compression" (long before the current LLM wave), considered
       | the upper limit of compression algorithms. This is based on the
       | idea of the Dictionary... IIRC, some compression systems include
       | the Dictionary as part of the decompressor, to save space in the
       | payload, but then that Dictionary is fixed and may be suboptimal
       | for the data. Others include it in the payload, so that it's
       | optimal for the data, which most do.
       | 
       | But you could achieve incredible compression of, say, The
       | Complete Works Of Shakespeare if you knew your peer knew about
       | The Complete Works Of Shakespeare (had a rich enough dictionary)
       | and just send them that designator of the payload! Or, you could
       | send them a particular high-resolution picture of a wheat field
       | with cypresses under a blue sky with an appropriately specific
       | "description".
       | 
       | What Fabrice did is gone ahead and put this theoretical idea in
       | practice. And that's hilarious.
        
         | [deleted]
        
         | sterlind wrote:
         | Notably, I think this would fail on the Wikipedia compression
         | contest because that counts the size of the decompression
         | program (model in this case), so that it measures Kolmogorov
         | complexity. And RWKV is surely too big. A neat thing to think
         | about though, and maybe a winning idea if a more space-
         | efficient model can be found.
        
       | fsiefken wrote:
       | Impressive! But an order of a magnitude slower then non-LLM
       | compression techniques.
       | 
       | I try to compare with the results from the top 9 of this enwik8
       | compression test by Matt Mahoney:
       | https://www.mattmahoney.net/dc/text.html
       | 
       | The durilca compressor by Dmitry Shkarin (what happened to him?)
       | has been the fastest compressor in the top since it's debut in
       | 2006.
       | 
       | The original size, enwik8, is 10^8 bytes. The compressed size is
       | the compressend enwik8 plus the size of a zip archive containing
       | the decompressor.
       | 
       | The bpb value of rwkv_430M on enwik8 is 0.948 bpb, the 7B models
       | will be lower, perhaps around 0.800. So if my calculations are
       | correct, the LLM's can perform 50% better then the best
       | performing conventional compressors excluding the zipped LLM
       | decompressor code (I am unsure about the size).
       | 
       | The bpb ratio for each existing program can be calculated as: bpb
       | ratio=(Original size (enwik9)Total size (enwik9+prog) )x8
       | 
       | nncp v3.1: Given compressed size is not provided, we cannot
       | calculate bpb for nncp v3.1 using enwik8.
       | 
       | cmix v19: bpb=(14,837,987+223,485108)x8bpb=(10814,837,987+223,485
       | )x8 bpb[?]1.205
       | 
       | tensorflow-compress v4:
       | bpb=(15,905,037+55,283108)x8bpb=(10815,905,037+55,283 )x8
       | bpb[?]1.272
       | 
       | cmix-hp 10 Jun 2021: Given compressed size is not provided, we
       | cannot calculate bpb for cmix-hp using enwik8.
       | 
       | fast-cmix: Given compressed size is not provided, we cannot
       | calculate bpb for fast-cmix using enwik8.
       | 
       | starlit 31 May 2021: bpb=(15,215,107108)x8bpb=(10815,215,107 )x8
       | bpb[?]1.217
       | 
       | phda9 1.8: bpb=(15,010,414+42,944108)x8bpb=(10815,010,414+42,944
       | )x8 bpb[?]1.205
       | 
       | paq8px_v206fix1:
       | bpb=(15,849,084+402,949108)x8bpb=(10815,849,084+402,949 )x8
       | bpb[?]1.265
       | 
       | durilca'kingsize:
       | bpb=(16,209,167+407,477108)x8bpb=(10816,209,167+407,477 )x8
       | bpb[?]1.317
        
         | londons_explore wrote:
         | > The bpb value of rwkv_430M on enwik8 is 0.948 bpb
         | 
         | This is an unfair test, because the rwkv_430M model was almost
         | certainly trained on wikipedia. Testing using training data is
         | a big no-no in the ML world - you get unrealistically good
         | results.
         | 
         | To test this properly, it needs to be run on a bit of newly
         | written text which is nowhere on the internet. Obviously
         | writing 100 MB of human written text which is not already on
         | the internet is rather a big task...
        
           | etiam wrote:
           | Newly declassified material maybe? Or just newly digitized?
        
       | MaximilianEmel wrote:
       | "The same exact GPU model ... must be used for compression and
       | decompression."
       | 
       | This seems like a massive issue for actual use. Are there really
       | not some workarounds to get deterministic output?
        
         | slimsag wrote:
         | Unfortunately GPUs (and even CPUs' SIMD) floating point math is
         | riddled with cross-platform determinism issues; hardware
         | manufacturers intentionally trade that in order to get faster
         | math operations in general, because although behavior of
         | floating point is defined in IEEE 754, you get rounding errors
         | for each operation.
         | 
         | Compiler optimizations (and remember, GPU drivers each use
         | their own compiler behind the scenes to translate to their
         | actual hardware architecture) can alter rounding errors of each
         | operation, and parallel execution - which differs from hardware
         | to hardware - also affects it.
         | 
         | Some APIs (Cuda?) let you disable all optimizations and there
         | are ways to get cross-platform determinism, but in general it's
         | much much slower if you want bit-for-bit equality across
         | different hardware.
         | 
         | SPIR-V/Vulkan for example[0] only define an error range based
         | in ULP for some operations - not bit-for-bit equality.
         | 
         | [0]
         | https://registry.khronos.org/vulkan/specs/1.2-extensions/htm...
        
         | johndough wrote:
         | Reproducible results across GPUs are theoretically possible,
         | but it would take some effort to implement and be slower. At
         | least the primitive operations (addition, multiplication, etc.)
         | are there: https://docs.nvidia.com/cuda/floating-
         | point/index.html
        
       | tysam_and wrote:
       | The reason this works is because LLMs are trained to minimize the
       | empirical risk over a large set of data
       | (https://en.wikipedia.org/wiki/Empirical_risk_minimization), and
       | it is where the log-likelihood-based measure comes from,
       | oftentimes represented simply as the perplexity.
       | 
       | This is precisely because LLMs learn a representation of the nth-
       | order Markov Chain, which by its nature is extraordinarily
       | sparse. This allows for excellent compression, and I would assume
       | (and have for a while, I suppose) the nearly-linear nature of
       | LLMs allows for excellent interpolation between these very sparse
       | states.
       | 
       | This lets us 'break' the hard-combinatorial problem into one that
       | can be softly, albeit with the fuzzy error that comes with any
       | estimator.
       | 
       | Since computers cannot tractably compress markov chains past a
       | certain point, this allows LLMs some opportunity to outpace
       | traditional methods for compression, at least in theory.
       | 
       | Also, this method technically cheats since you need the language
       | model to decompress, which counts towards your dictionary size.
       | In a truly unbounded case, I would be interested to see which
       | method wins. I'm assuming it's the LLM.
       | 
       | Apologies for any factual inaccuracies or such, I am still
       | learning many of these things. Many thanks and much love! <3
       | :))))
        
         | blovescoffee wrote:
         | If you want to learn something cool and related, checkout
         | Autoencoders (or VAEs). They effectively compress information
         | by forming representations of some data.
        
           | tysam_and wrote:
           | Indeed. They are extremely cool! <3 :) I think in a way,
           | every neural network works on compression! Though AEs & VAEs
           | are a bit more blatant in how they do it. It's just always
           | turned around a little, here and there. I have a pet theory
           | that no neural network can work without compression (even
           | generative ones that have constantly increasing layer depths!
           | :D :)))) )
        
       | jandrese wrote:
       | I guess they've worked out to always get exactly the same text
       | back? Hopefully this won't be a repeat of the photocopiers that
       | used a similar compression technique and occasionally substituted
       | letters in the copy.
       | 
       | https://www.bbc.com/news/technology-23588202
        
         | cbhl wrote:
         | The author, Fabrice Bellard, is well-known for his work on QEMU
         | (among other things), but I'd treat the rest of the posts as
         | "idea sharing" at an early stage. For example, while BPG
         | (Better Portable Graphics) didn't take off, the core idea
         | (image formats based on video codecs) can be found in WebP
         | (VP8, VP9) and AVIF (AV1) and HEIF (HEVC/H.265).
         | 
         | For this, I'd look at comparable work in the audio codec space,
         | including Lyra (https://arxiv.org/abs/2102.09660) and
         | SoundStream (https://arxiv.org/abs/2107.03312).
         | 
         | For text compression, I think it would be novel if you could
         | get a lossy but semantically equivalent decompression that was
         | resilient to the exact hardware used for inference. I don't
         | think that's what happened here, though, given the requirement
         | for "exact same GPU and model".
        
           | wood_spirit wrote:
           | Fabrice is also well known for creating Ffmpeg and
           | discovering an algorithm for computing digits of PI
           | https://en.m.wikipedia.org/wiki/Fabrice_Bellard
           | 
           | He has also held the 'world record' for data compression
           | since 2019 with nncp http://mattmahoney.net/dc/text.html#1085
        
         | OGWhales wrote:
         | JBIG2 is what the NSO group exploited as part of their zero-
         | click iMessage exploit. Google zero has a good write up if
         | anyone hasn't heard about it yet--it's super cool:
         | 
         | https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...
        
         | ks2048 wrote:
         | If you can predict text very well, it can still be a good
         | _lossless_ compressor - you just have to encode the _errors_
         | you make. Fewer errors, less data to store
        
         | wmf wrote:
         | Yeah, using a probabilistic predictor to do lossless
         | compression has been known for a few decades.
        
       | albertzeyer wrote:
       | > The same exact GPU model and program versions must be used for
       | compression and decompression.
       | 
       | This makes the whole approach really difficult to be useful in
       | practice. Basically whenever you have a method that relies on
       | 100% reproducible float numbers.
       | 
       | What are ways around that? Why do the quantized variants actually
       | have the same problem?
        
       | londons_explore wrote:
       | Bellard still does great work... But no longer releases anything
       | opensource.
       | 
       | In turn, that makes it far less interesting to me.
       | 
       | Especially since the majority of what he releases is more of a
       | cool tech demo than a product in it's own right. A closed source
       | tech demo really isn't of much use to anyone who doesn't have the
       | source code to extend it.
        
         | [deleted]
        
         | DSingularity wrote:
         | Why did he stop?
        
           | londons_explore wrote:
           | I can't speak for him...
           | 
           | But I stopped open sourcing some of my work when I got
           | aggressive users demanding I support them for free. I
           | realised I was pouring hours into triaging bugs, explaining
           | why PR's weren't good enough, fixing other people's corner
           | cases, and settling disputes, without really getting much
           | back. Sure, my projects were getting a following and there
           | were lots of happy users, but the actual benefit to me of
           | that happening was negative.
           | 
           | One middle ground I used for a while was releasing my
           | projects as a tar file only, not a git repo. Someone else can
           | then make a git repo and maintain the project. Looks like
           | Bellard did the same for some projects.
        
             | londons_explore wrote:
             | Code is a tool. Often you build a tool to complete a task.
             | 
             | Open Sourcing the tools lets others do _their_ tasks. But
             | you run the risk of becoming an unpaid toolmaker and unpaid
             | tool maintainer, and getting none of your own tasks done.
             | 
             | In a way, github's biggest opensource committers are in a
             | way the 'gig economy' workers of the tech world. Paid in
             | just the occasional few bucks of donation for work which
             | would otherwise be worth hundreds of thousands of dollars
             | done with a salary.
        
             | dakotasmith wrote:
             | I recently came across Curio
             | (https://github.com/dabeaz/curio) & really appreciated
             | their FAQ section on contributing. I plan to use it in any
             | future open source projects:
             | 
             | Q: Can I contribute?
             | 
             | A: Curio is not a community-based project seeking
             | developers or maintainers. However, having it work reliably
             | is important. If you've found a bug or have an idea for
             | making it better, please file an issue.
        
       | hn_throwaway_99 wrote:
       | My hope is that one day I'll be reincarnated and the new me will
       | produce 1/1000 of the output of Fabrice Bellard. Whenever I see
       | new posts from him I just think " _How_ can he do so much? "
        
         | dekhn wrote:
         | The SI unit for productivity is measured in millibellards
        
         | [deleted]
        
       | optimalsolver wrote:
       | Reminder that compression = intelligence:
       | 
       | https://mattmahoney.net/dc/rationale.html
       | 
       | ...which leads to the theory of mind known as "compressionism":
       | 
       | https://ceur-ws.org/Vol-1419/paper0045.pdf
        
         | burtonator wrote:
         | There must be a relationship here between Shannon's estimation
         | that english is 1 bit of entropy per character and is highly
         | redundant and 'easy' to predict.
         | 
         | A highly advanced AI could compress the text and predict the
         | next sequences easily.
         | 
         | This seems like a direct connection like electricity and
         | magnetism.
         | 
         | And maybe that's why English needs to be about 1 bit because
         | we're not very intelligent.
        
         | [deleted]
        
         | YeGoblynQueenne wrote:
         | >> Reminder that compression = intelligence:
         | 
         | If I conclude from this that you mean that gzip is intelligent,
         | will I be accused of bad faith?
        
         | ks2048 wrote:
         | I would say compression and intelligence are closely related,
         | not equal. For example, zstd is a pretty good compressor, but
         | does any one think it should be called a pretty good "AI"?
        
           | willis936 wrote:
           | Someone in sales with a product that uses zstd.
        
           | dragonwriter wrote:
           | Wasn't there a paper recently about a fairly simple wrapper
           | around a common conpression algorithm beating LLMs on
           | classification tasks?
        
             | jmholla wrote:
             | I thought that was about LLMs being trained on compressed
             | data. But I might be thinking about a different paper.
        
               | dragonwriter wrote:
               | Gzip + kNN for text classification:
               | 
               | https://aclanthology.org/2023.findings-acl.426.pdf
        
               | junipertea wrote:
               | Not LLM, just BERT, also did not actually outperform it.
               | 
               | source: https://kenschutte.com/gzip-knn-paper/
        
           | londons_explore wrote:
           | Read the paper...
           | 
           | Zstd, while it might be better than gzip or bzip, is still a
           | very poor compressor compared to an ideal compressor (which
           | hasn't yet been discovered).
           | 
           | That is why zstd acts like a rather bad AI. Note that if you
           | wanted to use zstd as an AI, you would patch out of the
           | source code checksum checks, and you would then feed it a
           | file to decompress (The cat sat on the mat), followed by a
           | few bytes of random noise.
           | 
           | A great compressor would output: The cat sat on the mat. It
           | was comfortable, so he then lay down to sleep.
           | 
           | A medium compressor would output: The cat sat on the mat. bat
           | cat cat mat sat bat.
           | 
           | A terrible compressor would output: The cat sat on the mat.
           | D7s"/r %we
           | 
           | See how each is using knowledge at different levels to
           | generate a completion. Notice also how that few bytes
           | generates different amounts of output depending on the
           | compressors level of world understanding, and therefore
           | compression ratio.
        
             | astrange wrote:
             | > A terrible compressor would output: The cat sat on the
             | mat. Dsr %we3 9T23 }PS{D:rg!@ !jvPSdP$
             | 
             | LLMs are sort of unable to do this because they use a fixed
             | tokenizer instead of raw bytes. That means they won't
             | output binary garbage even early on + saves a lot of
             | memory, but it may hurt learning things like
             | capitalization, rhyming, etc we think are obvious.
        
               | londons_explore wrote:
               | Even if you trained an LLM with a simplified tokenizer
               | that simply had 256 tokens for each of 256 possible ascii
               | characters, you would see the same result.
        
         | bbstats wrote:
         | I would say compression has some congruency with intelligence.
        
         | tysam_and wrote:
         | No. Compression == information, not intelligence. This is a
         | common mistake that people make, and is a popular expression
         | making the rounds right now, but it is incorrect.
         | 
         | Intelligence _may arise_ from information. Information
         | _necessarily arises_ from compression.
         | 
         | It is very similar, to me, to the concept of humors
         | (https://en.wikipedia.org/wiki/Humorism). Empirically, it was
         | observed that these things had some correlation, and they
         | occurred together, but they are not directly causative.
         | Similarly with the theory of spontaneous generation
         | (https://en.wikipedia.org/wiki/Spontaneous_generation), which
         | was another theory spawned (heh, as it were), from casual
         | causal correlation (again, as it were).
         | 
         | This can be shown rather easily when you show that the time-
         | dynamics of an exemplar system with high information diverge
         | from the time dynamics of a system with high intelligence, i.e.
         | there is a structural component to the information embedded in
         | the system such that the temporal aspects of applied
         | information are somehow necessary for intelligence.
         | 
         | I hope to release some work at least tangentially related to
         | this within the next few years (though of course it is a bit
         | more high-flying and depends on some other in-flight work). If
         | we are to attempt to move towards AGI, I really think we need
         | to stick with the mathematical basics. Neural networks,
         | although they've been made out to be really complicated, are
         | actually quite simple in terms of the concepts powering them, I
         | believe. That's something I'm currently working on putting
         | together and trying to distill to its essence. That will also
         | be at least 1-2 years out, and likely before any temporally-
         | related work, all going well. In my experience, this all is
         | just a personal belief from a great deal of time and thought
         | with the minutia of neural networks. That said, I could perhaps
         | be biased with some notion of simplicity, since I've worked
         | with them long enough for some concepts to feel comfortably
         | second-nature.
        
           | derefr wrote:
           | I think what the GP meant is not that _compressed data_ has
           | intelligence, but rather that a _compressor_ is necessarily
           | "intelligent" in correlation to the degree that it compresses
           | well on arbitrary novel datasets it wasn't tuned for. At
           | least insofar as we define "intelligence" with measures like
           | IQ, that relate to "number of examples of a pattern required
           | to notice the pattern."
           | 
           | To compress data, you have to derive some interesting insight
           | about that data that allows the data to be projected into a
           | lower-dimensional, lower-entropy embedding. (In pop-
           | neurological terms, you have to "make a neuron" that
           | represents a concept, so that you can re-phrase your dataset
           | into a star-schema expressed in its connections to that
           | concept.)
           | 
           | The more _different kinds of things_ a single compressor can
           | do that for -- without the compressor itself being made
           | larger -- the more  "intelligent" the compressor is, no?
        
           | bbstats wrote:
           | OTOH true intelligence can still poorly compress things.
        
             | tysam_and wrote:
             | Any ERM-based method will have this flaw (if it is a flaw
             | -- I'd possibly consider it a tradeoff) -- it is the nature
             | of compression itself. I would make the argument, I think,
             | that I believe that this disrupts the 'information' layer
             | of the hierarchy, not the 'intelligence', though that is
             | just my personal 2 cents of course.
        
       | leumassuehtam wrote:
       | Interesting application for the RWKV family of models.
        
       | lwneal wrote:
       | A man goes to prison, and the first night while he's laying in
       | bed, he hears someone yell out, "44!", followed by laughter from
       | the other prisoners.
       | 
       | Puzzled, he laid back down, but then he heard someone else yell
       | out, "72!", followed by even more laughter.
       | 
       | "What's going on?" he asked his cellmate.
       | 
       | "Well, we've all heard every joke so many times, we've given them
       | each a number to make it easier."
       | 
       | "Oh," he says, "can I try?"
       | 
       | "Sure, go ahead."
       | 
       | So, he yells out "102!" and the place goes nuts. People are
       | whooping and laughing in a hysteria. He looks at his cellmate
       | rolling on the ground laughing.
       | 
       | "Wow, good joke, huh?"
       | 
       | "Yeah! We ain't never heard that one before!"
        
         | [deleted]
        
       | ducktective wrote:
       | Man fears AGI, AGI fears Fabrice Bellard.
        
         | [deleted]
        
         | optimalsolver wrote:
         | Woman inherits the Earth.
        
       | elil17 wrote:
       | Idea: lossy text compression, where the LLM replaces unlikely
       | words with likelier synonyms.
        
         | GaggiX wrote:
         | This is actually a very interesting idea, I hope someone would
         | implement it and how. Would be fun to read the same text with
         | stronger and stronger lossy compression.
        
           | dekhn wrote:
           | up-goer five: explained using only the ten hundred words
           | people use the most often
        
       | freedmand wrote:
       | Was it run on any text that was not feasibly training data for
       | the LLMs? It wouldn't be a fair comparison otherwise
        
         | NobodyNada wrote:
         | Every lossless "compression" algorithm makes some strings
         | shorter and other strings longer, such that the _average_ space
         | savings against all possible strings is neutral at best. This
         | is a fundamental consequence of the way information works --
         | there's no way to losslessly map (say) 8 bits of input into 7
         | bits of output.
         | 
         | The goal of compression, then, is to make common strings
         | shorter while making uncommon strings longer. You can think of,
         | say, UTF-8 as a simple compression algorithm: it would take 20
         | bits per character to encode all Unicode code points with a
         | fixed-width encoding, but UTF-8 takes advantage of the fact
         | that the most commonly used characters have a lot of leading
         | zeroes (at least in English). So the characters of the English
         | alphabet require only 8 bits to encode, but uncommon characters
         | require up to 32 bits.
         | 
         | Thus, I would expect an LLM-based compression algorithm to do
         | well on strings that were common in its training data, and make
         | strings that were uncommon or absent slightly longer. If it did
         | not do that, it would not be a lossless compression algorithm.
        
           | freedmand wrote:
           | I don't disagree with anything you said.
           | 
           | I'm saying more that if the compression algorithm is
           | benchmarking against "Alice in Wonderland" and has consumed
           | the entirety of "Alice in Wonderland" in training the LLM
           | (along with popular paragraphs and sentences quoted
           | elsewhere), then it might do particularly well at reciting
           | lines from that book and thus be able to compress it
           | extremely well. I'd be more interested in seeing the
           | compression algorithm's performance on new or unreleased
           | works that would have no way of being training data.
           | 
           | As an extreme hypothetical, I could make a compression
           | algorithm that is a table mapping an ID to an entire book and
           | fill it with all the popular works. "Alice in Wonderland"
           | would be replaced with a single short identifier string and
           | achieve a ~0.001% compression ratio. An unseen work would be
           | replaced with an <unknown> ID followed by the entire work and
           | be slightly bigger. Then, I benchmark only the popular works
           | and show insanely impressive results!
           | 
           | I have no doubt the LLM compressor would do really well on
           | unseen works based on what you said above, but it's not a
           | fair look at its performance to run it on works it may have
           | been explicitly trained on.
        
         | thomasahle wrote:
         | Exactly. This is id always a pitfall when benchmarking LLM
         | based techniques. The enwiki8 dataset they use, for example, is
         | for sure in the training data.
         | 
         | To know how the method performs on novel data, the authors have
         | to come up with entirely new datasets, since anything already
         | existing must be assumed probably contaminated.
        
           | makapuf wrote:
           | But doesn't the size in benchmarks include the size of the
           | binary decoder ? So the embedded trained data is accounted
           | for (preventing a plain copy of wikipedia to be included in
           | the decoder)
        
             | loeg wrote:
             | I don't think the compressed size statistics on this
             | webpage include the size of the LLM needed to decode. Some
             | of these inputs are only a few 100 kB -- LLMs absolutely
             | dwarf that.
        
       | londons_explore wrote:
       | There are plenty of usecases for very slow and very good text
       | compression.
       | 
       | Imagine you run a big email server for millions of users.
       | 
       | Any email that hasn't been accessed for a month is probably a
       | good candidate for compressing with this method.
       | 
       | In the unlikely event the user wishes to read an old email again,
       | I'm sure they won't mind waiting 0.1 seconds for 10 kilobytes of
       | email (ie. a screenfull) to decompress.
        
         | otabdeveloper4 wrote:
         | Old email needs to be indexed for search, and those indexes
         | might be as large as the original text.
        
         | CharlesW wrote:
         | > _Any email that hasn 't been accessed for a month is probably
         | a good candidate for compressing with this method._
         | 
         | You don't think "the same exact GPU model and program versions
         | must be used for compression and decompression" rules it out as
         | more than an interesting experiment, especially given that
         | storage is cheap and continues to get cheaper?
        
           | londons_explore wrote:
           | I think they mean the exact same model data (for the GPU). I
           | don't think it requires the exact same model of GPU.
        
             | CharlesW wrote:
             | I'm also reading it as "the same exact GPU [LLM] and
             | program versions must be used for compression and
             | decompression".
        
         | eklitzke wrote:
         | As soon as someone uses a POP or IMAP client to fetch all of
         | their mail your server is going to catch on fire. Even if you
         | had POP/IMAP disabled, the computation costs to decode data are
         | enormous with this method, which is precisely why people don't
         | use the most maximally compressing codecs for anything other
         | than true archival applications (e.g. database backups which,
         | unlike old email, really are expected to be accessed almost
         | never).
        
           | londons_explore wrote:
           | Sounds like a good reason to just ship the compressed data to
           | the customer and say "decompress it yourself - this bellard
           | guy makes a decompressor!"
        
             | etiam wrote:
             | _That_ sounds like recipe for really poor relations with
             | the sizeable fraction of customers who are more concerned
             | with getting their data back than delighted to be
             | introduced to a cutting-edge text compression method...
        
           | londons_explore wrote:
           | A full IMAP sync of a 20 gigabyte inbox from gmail takes ~12
           | hours.
           | 
           | Thats 400 kilobytes per second. And much of that is image
           | attachments and stuff - perhaps only 10 kilobytes/second of
           | text.
           | 
           | Maybe Google already does this?
        
       | ttoinou wrote:
       | Aren't the LLM supposed to have "seen" those text corpus before ?
       | I would expect that a very small prompt could generate the whole
       | requested text
        
       ___________________________________________________________________
       (page generated 2023-08-16 23:00 UTC)