[HN Gopher] 78% MNIST accuracy using GZIP in under 10 lines of code
___________________________________________________________________
78% MNIST accuracy using GZIP in under 10 lines of code
Author : js98
Score : 268 points
Date : 2023-09-20 13:00 UTC (9 hours ago)
(HTM) web link (jakobs.dev)
(TXT) w3m dump (jakobs.dev)
| tysam_and wrote:
| If you'd like to play around with MNIST yourself, I wrote a
| PyTorch training implementation that gets ~99.45%+ val accuracy
| in <13.6 seconds on a V100, est. < 6.5 seconds on an A100. Made
| to be edited/run in Colab: https://github.com/tysam-code/hlb-
| CIFAR10
|
| It's originally kitted for CIFAR10, but I've found the parameters
| to be quite general. The code is very easy to read and well-
| commented, and is a great starting place for exploration.
|
| Min-cut deltas to run MNIST: .datasets.CIFAR10('
| -> .datasets.MNIST(' (both occurences) 'whiten':
| Conv(3, -> 'whiten': Conv(1, crop_size = 28 - >
| `crop_size = 28
|
| Compute for the project funded by Carter Brown and Daniel Gross,
| my appreciation to them both for helping make this possible.
| Their support via encouragement has been very helpful as well. <3
| :)
| hinkley wrote:
| [flagged]
| morelisp wrote:
| > Such as OCR.
|
| Such as what?
| hinkley wrote:
| If you can't hear my eyes rolling from here I'll do it again,
| harder... how bout now?
|
| Optical Character Recognition. Very old, and very infamous
| attempts at replacing humans with software. Secretaries and
| executives know what OCR means.
| tasty_freeze wrote:
| I believe you missed the point, though I could be wrong.
|
| Person A complains about MNIST not being defined. Person B
| was sending them up for complaining about not defining the
| meaning of acroynms and then using one they don't define.
| [deleted]
| hinkley wrote:
| No I got it. I just don't want it.
| vren wrote:
| Optical character recognition... Has been pretty widespread
| tech since ~2016 or so.
| hinkley wrote:
| Oh my ex and some acquaintances were bitching about it in
| 1999.
|
| I recall a guy with good transcription skills starting a
| long thread on... must have been slashdot? About how it was
| faster for him to transcribe the numbers again than to
| verify the OCR was correct, but he couldn't tell his boss
| that and what should he do?
|
| There's also a famous case where the compression algorithm
| in a copy/fax machine had a bug in a predefined dictionary
| that resulted in changing digits to other digits when
| making copies or faxes. Because they weren't just
| compressing, they were trying to use a specialized OCR to
| do aggressive compression. I believe it was spitting out
| perfectly formed zeroes where another digit was in the
| original. Yikes.
| ElectricalUnion wrote:
| > I believe it was spitting out perfectly formed zeroes
| where another digit was in the original. Yikes.
|
| JBIG2 lossy compression. Covered in another hn story:
| https://news.ycombinator.com/item?id=29223815
|
| From the story and the comments:
|
| > This is not an OCR problem (as we switched off OCR on
| purpose), it is a lot worse - patches of the pixel data
| are randomly replaced in a very subtle and dangerous way:
| The scanned images look correct at first glance, even
| though numbers may actually be incorrect.
|
| It's lossy compression turned to 11 that looks
| convincingly non-lossy.
| hinkley wrote:
| I know technically you are right, but I feel like if you
| sat down and described the high level features and goals
| of that image compression (to compress text to a
| ridiculous degree) and OCR , you'd be hard pressed to say
| which list is which unless they gave it away with jargon
| words. My brain compressed it to "OCR" and filed it
| there.
|
| It's like Jim Gaffigan's joke about working in a TexMex
| restaurant in the northern Midwest. It's a tortilla,
| cheese, meat and beans... I tell you what I'll just bring
| you something.
| ipython wrote:
| you're referring to the Xerox Workcenter:
| http://www.dkriesel.com/en/blog/2013/0802_xerox-
| workcentres_...
| hinkley wrote:
| YES. God what a nightmare. Can you imagine all of the
| arguments and threats of lawsuits that bug caused for
| small and medium businesses? You changed the contract
| after we had a verbal agreement and tricked me into
| signing!
|
| I wonder if any executive assistants got fired over that.
| Surely.
| telotortium wrote:
| MNIST (and OCR) are acronyms that are so well-known to anyone
| who has taken any kind of intro to ML class that there's no
| more need to define them in a short blog post than there would
| be for us to define HTML. _I_ learned about MNIST in 2008, and
| I was just taking a class on numerical methods as a physics
| major in Matlab where MNIST was just one project.
| hinkley wrote:
| This isn't a forum about machine learning, though. It's a
| general forum of geek news. I could talk to you all day about
| compression. Gzip plus MNIST rings no bells.
| telotortium wrote:
| Fortunately MNIST is a pretty distinctive Google search
| term. At most you would need to search for (MNIST machine
| learning).
| hinkley wrote:
| Are you running google searches on
| https://news.ycombinator.com/news before clicking on
| stuff? Or expecting an executive summary. I think you can
| guess which one I'm doing. And which one I believe is
| normal human behavior.
| itishappy wrote:
| No need to guess, you requested a summary:
|
| > you've missed the most important hyperlink: what the
| fuck is a MNIST?
| hinkley wrote:
| Two questions and you answered the more rhetorical one.
| Is that a no?
|
| Anyway my main beef isn't with this GitHub repo or the
| author, it's that nobody remembers how to write a
| goddamned overview anymore. The point of tree structured
| data is making it cheap to backtrack when doing a semi
| random search. The overview is an important part of
| making hypertext work, particularly when it feeds through
| catch-all lists like a news site or a landing page for a
| wiki.
|
| When a hyperlink is organically embedded into a
| paragraph, you can usually guess what it's for, from the
| sentence and the context. When it's just a title that is
| lost.
| telotortium wrote:
| No, but if the article headline interested me enough to
| click into the article, I would Google MNIST if I didn't
| know it already (ML is full of acronyms for datasets,
| methods, etc. - can't expect blog posts to define all of
| them. If it were a paper that didn't define the term, I
| would be annoyed.)
|
| Also, just because an acronym isn't used doesn't mean
| you'll understand the jargon. For example, also on the
| front page is a paper titled "Neurons in Large Language
| Models: Dead, N-Gram, Positional". No acronyms, but I
| would certainly need to Google (or actually read the
| paper, not just the abstract) to know what the 3 terms at
| the end means (well, I do know what a dead neuron is, but
| not the other two).
| jjtheblunt wrote:
| i think the idea is that compression exploits recognized
| similarities, so that begs the question of if a compressor
| is actually recognizing things usefully.
| itishappy wrote:
| The fuck is a GZIP? ;)
| hinkley wrote:
| Goddamnit where is the spray bottle?
|
| Psst psst psst. NO. Bad kitty.
| RecycledEle wrote:
| [flagged]
| w23j wrote:
| > I suggest all HN articles must define all abbreviations and
| acronyms or be removed.
|
| I can't decide if you are joking?
|
| HR = Human Resources
|
| PC = Personal Computer
|
| HN = Hacker News
|
| OP = Original Poster (?)
|
| M.S. = Master of Science
|
| certs = certificates
| jp57 wrote:
| Leaving aside whether this problem is a good application of this
| compression trick, I want to say that everyone experimenting with
| this should stop using `gzip` and start using `zlib`.
|
| If you change the first line from `gzip.compress` to
| `zlib.compress` you should get the same classifier performance
| with a 3x speedup.
| montebicyclelo wrote:
| I tried replacing the distance function in the code with some
| simpler distance measures: Gzip distance: ~3
| minutes, 78% accuracy Euclidean distance: ~0.5 seconds,
| 93% accuracy Jaccard distance * : ~0.7 seconds, 94%
| accuracy Dice dissimilarity * : ~0.8 seconds, 94%
| accuracy * after binarising the images
|
| So, as a distance measure for classifying MNIST digits, GZIP has
| lower accuracy, and is much more computationally demanding than
| other measures.
|
| I'm not that familiar with how the GZIP algorithm works, it's
| kind of interesting that it's so much lower. I wonder whether
| image focused compression algorithms might do better?
|
| (Edit: Btw, I enjoyed the post; it's a creative idea, the writing
| and code was great, and it's sparked some good discussion. But
| after having a closer look, I think the baselines above provide
| some context to the gzip scores.)
| w-m wrote:
| Best one I found is Normalized Mutual Information at 95%. It's
| a little more complex, but you could still compute it
| relatively quickly on binarized images. NMI
| skimage: ~30 seconds, 95% accuracy NMI numba *: ~0.6
| seconds, 95% accuracy
|
| Here's the code, courtesy of ChatGPT:
| @njit(cache=True) def compute_joint(img1, img2):
| stacked = 2 * img1.ravel() + img2.ravel() return
| np.bincount(stacked, minlength=4).reshape(2, 2)
| @njit(cache=True) def entropy(counts, total):
| # Convert counts to probabilities p = counts /
| total # Only calculate for non-zero entries to
| avoid log(0) return -np.sum(p[p > 0] * np.log2(p[p
| > 0])) @njit(cache=True) def
| normalized_mutual_information(img1, img2):
| joint_counts = compute_joint(img1, img2)
| total_pixels = 28*28 # Marginal counts
| c1 = np.sum(joint_counts, axis=1) c2 =
| np.sum(joint_counts, axis=0) # Compute
| entropies directly from counts h1 = entropy(c1,
| total_pixels) h2 = entropy(c2, total_pixels)
| joint_entropy = entropy(joint_counts.flatten(), total_pixels)
| mutual_info = h1 + h2 - joint_entropy return 2 *
| mutual_info / (h1 + h2)
| tysam_and wrote:
| Holy cow, I knew that MNIST was simple, but not that simple.
|
| Could you post a snippet of the code that you used to achieve
| this? It would be really, really nice to have a baseline to
| work from I'm sure, and I feel like this could be really useful
| to a few other areas (my personal obsession is speed-training
| on CIFAR10 :')))) )
|
| Holy cow, that's insane. :O
| montebicyclelo wrote:
| I used the notebook linked in the original post [1]. It
| evaluates using 100 samples from the test set, (I'm guessing
| because the gzip method is slow - it would take ~7 hours on
| the full test set, on my machine).
|
| I plugged in the distance measures, for the `compute_ncd`
| function. (Jaccard/Dice have been negated and the -1
| removed.) def euclidean(x1, x2):
| return np.sum(np.square(x1 - x2)) def
| jaccard(x1, x2): x1_binary = x1 > 0.5
| x2_binary = x2 > 0.5 return
| np.logical_or(x1_binary, x2_binary).sum() /
| np.logical_and(x1_binary, x2_binary).sum() def
| dice(x1, x2): x1_binary = x1 > 0.5
| x2_binary = x2 > 0.5 return 2 *
| np.logical_or(x1_binary, x2_binary).sum() / (x1_binary.sum()
| + x2_binary.sum())
|
| [1] https://github.com/Jakob-98/mono/blob/73168bc0ea904e75865
| 815...
| jdthedisciple wrote:
| That's pretty amazing.
|
| So the whole gzip thing - while fancy - really is extra steps
| for less bang.
| yannccc2 wrote:
| All these ideas date back to at least 2010, probably earlier. It
| was called "information distance".
| https://arxiv.org/abs/1006.3520
| petters wrote:
| I wonder why you started with a code-golfed version? That seems
| original to the main point of the post
| seeknotfind wrote:
| original -> orthogonal?
|
| I like short code! It's like a joke "this is so easy, 10 loc"
| :)
| hinkley wrote:
| The internet has turned into a game of "is it him or is it
| autocorrect?"
|
| I can't count how many times I've looked up at a line of text
| to spot check it, type in a couple more letters, hit save
| without looking, and find later that I sound like I'm having
| a stroke. Never mind the times I've simply forgotten to look.
| Yes that's a word, stop trying to replace it.
|
| (Five minutes ago I had to type "laissez-faire" three times.
| Yesterday it turned "understanding" into god knows what. And
| if iOS doesn't stop adding apostrophes to "its" I'm gonna
| fuckin lose my shit)
| burnished wrote:
| I ended up turning auto correct off due to iOS being more
| aggressive with its corrections than my previous android
| phones (both of which frequently fail to recognize common
| words, and maybe both but certainly the android was
| absurdly brand aware).
| hinkley wrote:
| > brand aware
|
| I got so tired of Mathematica italicizing its own name
| when I was in college so I figured out how to type it in
| two phases so it wouldn't. It was my tiny little petty
| victory over Stephen Wolfram ruining math for me.
|
| I know that struggle.
| great_psy wrote:
| Having some artificial intelligence algorithm that is completely
| understood and tu able like gzip is would be great for many uses.
|
| I think it's pretty hard to just improve a NN without more data
| or some non trivial amount of effort.
| Lerc wrote:
| I'm sure there was something like this mentioned in "Managing
| Gigabytes" or thereabouts. It might have been using bitwise
| arithmetic compression which makes sense for the problem at hand.
| tdr2d wrote:
| Obviously, the code may be elegant and compact, 78% accuracy is
| considered very very bad for MNIST.
|
| A dummy model written with Tensorflow easilly reaches 90%
| accuracy. The best models ranked at 99,87%, see the benchmark :
| https://paperswithcode.com/sota/image-classification-on-mnis...
| m00x wrote:
| It's not trying to break records, it just shows a neat aspect
| of compression. It's still 8 times better than baseline, which
| showcases that compression can learn representation.
| esafak wrote:
| The article emphasizes the wrong thing, in my view. The
| interesting part is that compression -- without learning a
| model -- can be used for classification. This raises the
| question of what other information-theoretic measures can be
| used; cheaper, lossy ones.
|
| _To Compress or Not to Compress- Self-Supervised Learning and
| Information Theory: A Review_
| https://arxiv.org/abs/2304.09355\*
| Quenty wrote:
| I mean, they're using knn underneath. You can apparently get
| 97% accuracy with normal knn, at n=4 if you compare pixel
| distance.
|
| So another way to frame this might be that gzip costs a lot
| of accuracy but may lead to better performance.
|
| https://newpblog.netlify.app/2018-01-24-knn-analysis-on-
| mnis...
| a_wild_dandan wrote:
| An increasingly common refrain in machine learning is
| "intelligence is compression." Folks who believe that might
| bristle at the distinction between learning and compression.
| esafak wrote:
| I doubt it because learned features already constitute
| lossy compression. The question is what kind of
| compression; lossy vs. lossless, learned vs. unlearned.
| sundarurfriend wrote:
| The point is not to have "elegant and compact" code, this is
| meant to be a fun curiosity, and doing it in 10 lines is just
| an additional layer of challenge for the heck of it.
|
| The interesting thing is not in whether GZip can achieve SOTA,
| it's that it can do a decent job at all. (The interesting thing
| is not in whether the bear can recreate Mozart exactly, it's
| that it can play the piano at all.)
| 3abiton wrote:
| Didn't it turn out that authors of that paper have made mistakes
| that catapulted their results to the top of the benchmark charts?
| I thought the theory was inconsistent after that incident. 78%
| accuracy from just GZIP is impressive.
| 0xfffafaCrash wrote:
| haven't looked through this post yet but I think this is what
| you have in mind: https://kenschutte.com/gzip-knn-paper/
| 0xDEF wrote:
| Has anyone tried how different compression algorithms compare
| when doing NCD classification?
|
| gzip first does LZ77 and then Huffman coding. ANS is a more
| modern alternative to Huffman coding that can achieve higher
| compression ratios than Huffman coding.
| tjungblut wrote:
| I don't immediately find it, but couple of years back there was a
| "meta-feature" which was the size of the MNIST image. I think
| that scored about 90'ish % accurate results on its own - without
| even looking at the image.
| 13of40 wrote:
| A few years back I worked on a project that involved
| fingerprinting screenshots of web pages, and compressed image
| size was pretty much as good as any fingerprinting method we
| could come up with for comparing the similarity between them.
| tysam_and wrote:
| Makes sense, considering that's not too terribly far off from
| what the KL divergence does
| momojo wrote:
| The off-the-beaten-path nature of this reminds me of banks
| sending $<arbitrary decimal amount> as a PIN for auth.
| p1esk wrote:
| What do you mean by "size"? Gzipped size? If you simply look at
| how dark a Mnist image is (count the percentage of dark pixels)
| you'll get about 20% accuracy, which is twice better than
| random guess but a long way from 90'ish %.
| yvdriess wrote:
| What do you mean with accuracy here? Usually 50% accuracy
| means cointoss, meaning 20% accuracy is equal to 80%
| accuracy, which is better than the article's 78% and not that
| far from 90%.
| Frotag wrote:
| > meaning 20% accuracy is equal to 80% accuracy,
|
| Only if your model is outputting a yes/no answer right? And
| that your definition of accuracy is "class with highest
| probability" (and not "N classes with highest prob")
|
| If your dataset has more than 2 classes like MNIST, a super
| low accuracy only tells you to ignore the class the model
| guesses. It doesn't tell you which of the remaining classes
| is correct
| rjh29 wrote:
| There are ten choices, so getting the answer right 20% of
| the time is very plausible.
| p1esk wrote:
| There are 10 classes in Mnist. Random guess would is 10%.
| tysam_and wrote:
| "One simple trick to beat the statistical odds...."
| tobeyey wrote:
| Replacing the NCD distances = [(compute_ncd(x1,
| x), label) for x, _, label in compressed_lengths]
|
| with the Euclidean distance distances =
| [(np.sqrt(np.sum(np.square(x1-x))), label) for x, _, label in
| compressed_lengths]
|
| gives you +15% test accuracy and saves you a lot of compute.
| benreesman wrote:
| My favorite book about the deep connections between information
| theory, compression, and learning algorithms is MacKay (most
| probably know about it but I didn't for a long time so maybe some
| will benefit from the mention).
|
| I gather this is common knowledge (if not sufficiently emphasized
| at times) among those with serious educations, but as a self-
| taught, practical application-type ML person this profound thread
| running through all these topics (and seemingly into heavy-ass
| particle physics and cosmology and stuff like that) was a
| blinding "Aha!" moment that I'll venture this comment in the
| hopes that even one other person has that unforgettable moment.
| GaggiX wrote:
| For a comparison with others techniques:
|
| Linear SVC (best performance): 92 %
|
| SVC rbf (best performance): 96.4 %
|
| SVC poly (best performance): 94.5 %
|
| Logistic regression (prev assignment): 89 %
|
| Naive Bayes (prev assignment): 81 %
|
| From this blog page: https://dmkothari.github.io/Machine-
| Learning-Projects/SVM_wi...
|
| Also it seems from reading online articles that people are able
| to obtain much better results just by using K-NN, so I imagine
| that the author just made his job harder by using gzip but I
| could be wrong about this.
| grandma_tea wrote:
| While it's cool that this works at all, I wish we would stop
| using MNIST as a benchmark given how trivial it is.
| unoti wrote:
| MNIST is pretty good as a proof of concept. While I agree
| it's trivial, I see MNIST as being more like how when you're
| making a toy programming language, the first thing you do
| with it is writing a recursive Fibonacci function.
| dmazzoni wrote:
| It's still a great data set because it's real-world, useful,
| high-quality, and an excellent size (~60k examples), and the
| "correct" classification isn't very subjective at all -
| humans can easily get 99% accuracy.
| IKantRead wrote:
| It's a good benchmark _because_ it 's so trivial.
|
| Sure it's not great at differentiating between SotA
| techniques, but it's very useful for sanity checks like this
| one.
|
| Even for SotA models, it's still useful to verify that you
| can get greater than 98% accuracy on MNIST, before exploring
| larger, more complex bench marks.
|
| It certainly shouldn't be the _only_ benchmark but it 's a
| great place to start iterating on ideas.
| TimPC wrote:
| It's a bad benchmark because it's artificially clean. It's
| effectively a 2d dataset with no occlusions. So nearly
| everything you try on it will work, and many things you try
| on it won't scale to typical image problems. There are good
| 3d datasets with more realistic examples that are still
| fairly simplistic compared to the state of the art large
| datasets, but at least give you signal that your technique
| is robust to common problems in vision. MNIST is so
| simplistic that you encounter none of the typical problems
| in computer vision settings so it doesn't give you a good
| prediction of how good your technique is.
| ghshephard wrote:
| Can you imagine a model that performs very poorly on
| MNIST that would perform well in real-world computer
| visions problems? If you can't, then MNIST is a nice
| simple smoke test when assessing models.
| d3w4s9 wrote:
| What's wrong with "artificially clean"? The goal of
| benchmarks is to compare and know whether one model is
| better than the other. There is never a "perfect" or
| "objective" benchmark. Different benchmarks may highlight
| advantages in certain models, which is a good thing, but
| there is absolutely nothing wrong with using MNIST as a
| dataset to give you a basic idea of how models perform.
| antiquark wrote:
| Take a look at the EMNIST - Extended MNIST dataset. It has
| both digits and letters of the alphabet.
| dmazzoni wrote:
| That blog page is not showing state-of-the-art results, it's
| just taking relatively naive SVM implementations and comparing
| them.
|
| The original research paper that introduced the MNIST data set
| was getting around 98% accuracy, and today neural nets are
| getting 99.87% accuracy.
|
| https://paperswithcode.com/sota/image-classification-on-mnis...
| GaggiX wrote:
| Yep I was showing others simple techniques, of couse a neural
| network would be much better at it.
| TimPC wrote:
| The whole point isn't to do better. It's to show that enough
| information survives compression that you can still get very
| large signal. Compression is intended to make things harder and
| still does.
| amelius wrote:
| Why? I mean it's trivial to add a layer of encryption ...
|
| (Also the terminology seems off here, as 100% of information
| survives gzip.)
| GaggiX wrote:
| The article OP linked is inspired by Gzip+Knn paper that was
| released a few months ago, compression is not intended to
| make things harder but to extract useful features that can be
| used by simple algorithms like Knn or SVG etc...
|
| For example it would be almost impossible to distinguish cat
| and dog images by using SVG or Knn directly on the data but
| it would be much easier if the images were first passed into
| an image encoder and a small embeddings vector would be used
| for each one instead. In the article OP linked it doesn't
| seem that Gzip is very good at extracting useful patterns for
| the MNIST dataset.
| IKantRead wrote:
| Thanks for posting this.
|
| Most people don't realize that Logistic regression can get ~90%
| accuracy on MNIST.
|
| As a big fan of starting with simple models first and adding
| complexity later, I've frequently been told that "logistic
| regression won't work!" for problems where it can in fact
| perform excellent.
|
| When faced with this resistance to logistic regression I'll
| often ask what they think the baseline performance of it would
| be on MNIST. The guesses I hear most often are 20-30%.
|
| People, even machine learning people, often don't realize the
| rapidly diminishing returns you get for adding _a lot_ of
| complexity in your models. Sometimes it 's worth it, but it's
| always good to start simple first.
|
| It's also been my experience that if you don't get good
| performance with a simple model, you are very unlikely to get
| great performance from a more complex one.
| short_sells_poo wrote:
| To a large degree this happens because logistic regression is
| not a sexy approach that one can add to their CV. Everyone
| wants to solve problems with big, complicated and buzzwordy
| models, because that sells (and is perhaps more interesting
| as well).
|
| It's really a tragedy, because so many classic models would
| work fine for real world applications.
| make3 wrote:
| this is true unless you can use large pretrained models,
| which are very simple to use, are very resistant to ask sorts
| of noise, & would get ultra high accuracy with just logistic
| regression on the output of the model
| mardifoufs wrote:
| When would a 90% accuracy on a dataset like mnist ever be
| useful? And I mean useful as in usable for actual products or
| software. especially considering mnist is more of a toy
| dataset.
|
| I think that's why machine learning is the way to go for this
| type of detection, why go with anything else than a CNN (in
| this case) when it is now trivial to set up and train? Again,
| unless it's just to mess around with, 90% mnist accuracy is
| not useful in the real world
| vannevar wrote:
| I don't think the point was that you should use logistic
| regression on MNIST. In lesser-known problems, say a custom
| in-house model, if you don't try the simpler approach
| first, you'll never know that your more complex solution is
| not worth the extra expense, or is actually worse than a
| simpler, cheaper model. MNIST is well-known to have nearly
| perfect solutions at this point, but for most novel
| problems, the data scientist has no idea what is
| theoretically possible.
|
| Now, you can say that CNNs or other techniques are easily
| accessible these days, and almost trivial to set up. But
| they may not be trivial to train and run in terms of
| compute in the real world.
| KRAKRISMOTT wrote:
| If you stack logistic regression in layers, you get a neural
| network.
| nextos wrote:
| Came here to say the same thing, actually NCD can probably do
| much better than 78%. Li & Vitanyi's book about Kolmogorov
| complexity has some interesting unsupervised examples.
|
| A simple CNN as implemented in Keras tutorial can easily
| exceed 98%. 78% is very poor performance for MNIST even if
| model complexity is penalized.
| vjerancrnjak wrote:
| Extending existing features with their non-linear mappings
| would improve logistic regression too, probably to the svc
| level (rbf or poly kernel is that but implicit).
|
| Linear models are really well researched, and today with the
| compute and with proper training and data preparation they
| can easily get to satisfying levels of performance for a
| variety of tasks.
| westurner wrote:
| So there is a more optimal compression algorithm for the
| relation between the MNIST inputs and outputs!
|
| Other models tend to add noise somewhere in there; what about
| feature engineering before the gzip? Maybe a gaussian blur and
| a convolution first and then deep learning for feature
| selection
| jszymborski wrote:
| In fairness you can run MNIST through UMAP and get near perfect
| seperation. I'm of the belief that you have to try pretty hard
| not to do well on MNIST these days.
|
| https://github.com/lmcinnes/umap_paper_notebooks/blob/master...
|
| EDIT: I should add, unless it isn't clear, that we really should
| retire the dataset. Something like the QuickDraw dataset makes a
| lot more sense to me.
| sundarurfriend wrote:
| Y'all are making the (rude) person below complaining about
| acronyms look reasonable. The repository doesn't define UMAP
| either, but if you believe ChatGPT it is:
|
| > UMAP, which stands for Uniform Manifold Approximation and
| Projection, is a dimensionality reduction technique and data
| visualization method commonly used in machine learning and data
| analysis.
| tomrod wrote:
| UMAP is as ubiquitous as T-SNE these days, though to be fair
| those on HN that focus on JVM, LLVM, NPM, or TLS may not know
| about UMAP/T-SNE.
| jszymborski wrote:
| It's[0] a non-linear dimensionality reduction technique in
| the vein of t-SNE[1] that is very well known in the field.
|
| My two cents is that if your problem can be solved with
| something like UMAP + kNN[2], then you really shouldn't be
| using Deep Learning to solve it.
|
| [0] https://en.wikipedia.org/wiki/Nonlinear_dimensionality_re
| duc...
|
| [1] https://en.wikipedia.org/wiki/T-distributed_stochastic_ne
| igh...
|
| [2]
| https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
| wenc wrote:
| We work in our own ecological niches which sometimes entails
| having a specialized language that allows us to be precise
| when referring to things.
|
| I sometimes see HN comments complaining that acronyms and
| terms are dropped with no explanation. These comments come
| across as entitled to me -- instead of complaining, why not
| be curious and ask "what does that mean in this context?"
|
| Imagine someone dropping in here and complaining, "the
| article sucks because it uses the word 'compiler' and assumes
| we all know what a 'compiler' is." This is what it sounds
| like to those of us to work in specialized fields.
|
| Instead if someone said, "I'm new to this, could someone help
| me understand what a 'compiler' is?", it demonstrates
| curiosity and people are more inclined to explain.
| krick wrote:
| Yeah, but the thing is, gzip isn't "these days". It was a thing
| long before UMAP and even MNIST itself. And the approach isn't
| perticulary novel too, this is a very simple idea, if you do
| understand compression. It could've been written the first day
| MNIST was published, and it would be still 78% accuracy. This
| is kinda amazing to me.
| js98 wrote:
| Author here: completely agree. I think it is not much of an
| achievement by itself, but it is interesting to see that it
| works!
|
| I will add a comment/edit to the post once I am home to clarify
| the relative ease of solving MNIST
| jszymborski wrote:
| Sorry if my comment came off as dismissive of the post! I was
| just trying to frame the results for those unfamiliar with
| the properties of MNIST.
|
| I think it is indeed interesting as it shows that Gzip can
| capture the same things that e.g. UMAP et co. are capturing
| when they acheive good scores on MNIST.
|
| Also, I'll add, that even despite some of the suspicions
| people have cast on the Gzip results in other experiments,
| I'm bullish on the utility of reducing entropy for
| classification problems.
| ekidd wrote:
| > _I should add, unless it isn 't clear, that we really should
| retire the dataset._
|
| From a research perspective, MNIST is basically solved. I think
| current performance on MNIST is better than humans, correct?
|
| But it's still valuable as a teaching tool or a "Hello world"
| data set, because it's so simple and because most reasonable
| algorithms will reach 97% accuracy. Even if you build your
| tools from scratch, it makes a nice homework-sized problem. And
| it's an obviously useful task that anyone can understand. "Oh,
| digit recognition for mail!"
| jszymborski wrote:
| I was thinking about this just now... the real utility of
| MNIST is that, since it's basically impossible to get a bad
| result, it's a good sanity check.
| wayeq wrote:
| I'm merely a hobbyist in this domain but isn't highly compressed
| data (like encrypted data) also high entropy? If this is finding
| patterns in the compressed data to figure out which digit the
| uncompressed data represents, shouldn't those patterns be
| exploitable for better compression?
| fritzo wrote:
| The demonstration isn't classifying based on compressed data,
| rather it's classifying on the _compressibility_ of data. The
| idea is that "7 7" should be more compressible than "7 3", and
| similarly raster images of "7 7" should be more compressible
| than raster images of "7 3".
| wayeq wrote:
| Ah ok, that makes sense. Thank you for the explanation.
| Zamicol wrote:
| Encrypted data ideally is incompressible. Incompressibility is
| a hallmark of efficient cryptographic operations.
|
| See the Wikipedia article on Kolmogorov complexity, which has a
| short section about compression.
| https://en.wikipedia.org/wiki/Kolmogorov_complexity#Compress...
|
| Edit: One of my favorite concepts in the domain of compression
| is the pigeonhole principle, that states that for all
| compression algorithms, some outputs will be _larger_ than the
| inputs.
|
| Well designed encrypted payloads may be compressed, but the
| outputs should on average be larger than the inputs, rendering
| compression useless thus it is said to be "incompressible".
|
| https://en.wikipedia.org/wiki/Pigeonhole_principle#Uses_and_...
| benob wrote:
| What explains the huge difference in perf with the approach from
| 2019 mentioned at the end?
|
| For image classification, couldn't one use the residual from
| applying a jpeg codebook learned on a training example as metric?
| Karellen wrote:
| You solved some problem, _including implementing the GZIP
| algorithm_ , in less than 10 lines of code?
|
| ...
|
| Oh, ok, not that.
| _a_a_a_ wrote:
| "MNIST"?
|
| accuracy - but of what?
|
| what's this about?
| simonw wrote:
| MNIST is a classic image classification exercise - a dataset of
| 60,000 training images and 10,000 testing images where each
| image is a handwritten numeral as a 28x28 pixel grayscale
| image.
|
| The challenge is to build a computer vision model that can tell
| which numeral each handwritten digit represents.
|
| https://en.wikipedia.org/wiki/MNIST_database
|
| 78% accuracy on a solution is pretty bad, but achieving it just
| using GZIP is a very neat hack.
| _a_a_a_ wrote:
| thank you
| Detrytus wrote:
| What's the average human performance for this task?
| burnished wrote:
| I don't know off hand, but go take a look at the images - I
| would expect near 100%.
| tysam_and wrote:
| Additionally, try flipping your images and averaging the size
| before comparing the distance between them. I'd expect a boost of
| about 78% -> 84% or so, based on how this typically works as TTA.
| jdthedisciple wrote:
| Can you elaborate more on that technique?
|
| Why would it improve the result so much? Sounds very
| interesting so I'm curious.
| tysam_and wrote:
| _facepalm_ I just realized I was talking about this in the
| context of MNIST -- my apologies.
|
| It would be better in a problem with horizontal symmetry like
| CIFAR, esp if the zipping is serialized by unwinding the 2d
| pixel array into 1d.
|
| One trick that _should_ work is by comparing the distances of
| starting the compression at the 4 different corners of the
| image, in the 2 separate directions for each corner. That
| should provide way more than enough information for k-means
| clustering.
|
| My apologies again for my mistake, thank you for asking, I
| wouldn't have really seen that otherwise :')))).
| m00x wrote:
| It goes along the same lines as the recent paper from Deepmind
| stating that DL (language modeling in their case) is compression.
|
| https://arxiv.org/pdf/2309.10668.pdf
| bob1029 wrote:
| General purpose compressors and information distance measures
| have become super interesting to me while I've been investigating
| alternative language models.
|
| I've been playing around with an attention mechanism that
| combines the idea of using normalized compression distance (gzip)
| with discrete convolution between candidate sequences (sliding
| window of N bytes over each). Another round of normalization over
| the convolution outputs - accommodating varying lengths - allows
| for us to compare candidate sequences for relevant information on
| equal grounds.
|
| The NCD formula I am using right now: NCD(x,y) =
| (C(xy) - MIN(C(x),C(y))) / MAX(C(x),C(y))
|
| No weird parameters or any other things to tune. The only
| parameters are the source documents and the input context/query.
___________________________________________________________________
(page generated 2023-09-20 23:00 UTC)