http://pepijndevos.nl/2023/07/15/chatlmza.html
Wishful Coding
Didn't you ever wish your
computer understood you?
ChatLZMA
I came across a random tweet, found a Wikipedia page and bumped into
some smart people and long story short, apparently compression is
equivalent to general intelligence.
So, how do you build ChatGPT with data compression? What if you
compress a large corpus of text to build up the encoding table, then
you compress your prompt and append some random data and decompress
the random data, and hope it decompresses to something sensible.
It feels vaguely similar to diffusion, but what do I know. Look, this
is just a dumb idea, let's just see what happens ok? Well, here is my
progress so far. It's kind of whack but it's hilarious to me that it
produces something resembling words.
import nltk
import lzma
import random
my_filters = [
{"id": lzma.FILTER_LZMA2, "preset": 9 | lzma.PRESET_EXTREME},
]
lzc = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=my_filters)
corp = nltk.corpus.reuters.raw().encode()
out1 = lzc.compress(corp)
corp = ' '.join(nltk.corpus.brown.words()).encode()
out2 = lzc.compress(corp)
corp = nltk.corpus.gutenberg.raw().encode()
out3 = lzc.compress(corp)
out_end = lzc.flush()
lzd = lzma.LZMADecompressor(lzma.FORMAT_RAW, filters=my_filters)
lzd.decompress(out1)
lzd.decompress(out2)
lzd.decompress(out3)
# mess around to avoid LZMAError: Corrupt input data
lzd.decompress(out_end[:-344])
# insert prompt????
print(lzd.decompress(random.randbytes(50)).decode(errors="ignore"))
Here are a few runs. Note how the start is always , and tri, usually
completing it into some word. Are we doing some primitive accidental
"prompting" or just flushing the buffer? Either way, not bad for mere
seconds of "training"!
$ python train.py
, and triof billioerse,
But
ht and see th,
Thy smile, in to be happy,
Wmson,
Over tout as aThy smile;t as aThyrged in
ent, foldehe snoion since how long,
my roomr? Is ic books
$ python train.py
, and triompact, sca,
Take deepcky fouy vitaliz bodiehow there i,
Nor drummiwisibly wile of the-ations, dutway?
Yet ld woman'okesmanall whoy slow bekesmanalle me
$ python train.py
, and tri billions of the boftier, faie no acqutory's dazzd haOr that thpages:
(Sometimeseathe ihern, Sounte, fld Turkey n one,
Worlseathe Border Minstrelsy,ine, New-Ene Queen.
'Thelicate l
$ python train.py
, and tri, sleepinlke babes bent,
Abird;
Forhis fair n!
By thea mystic strangehe gifts ofhe body aering
t: I haue a lugs whipageantuperb-fnz
$ python train.py
, and triions of b--n the gra, with
e open the countless buAnd bid theng;
Billi toward you.i ally undya Songs
To f Death, istas of was mar to be UY 9,30
Pepijn de Vos
15 July 2023
python machinelearning
Interesting Projects
Share this page on Twitter
Remotely related posts
* Lithopedia part 1: intro and clay experiments
* Running Julia on the Lego Robot Inventor Hub
* Your Router is Fine. You Do Not Need To Build Your Own Instead!
Follow me with
* Twitter
* Facebook
* Github
* Atom
Click to open menu (c) 2023 Pepijn de Vos [wheel]