[HN Gopher] Linear Hammer: PCA is not a panacea (2013)
___________________________________________________________________
Linear Hammer: PCA is not a panacea (2013)
Author : PaulHoule
Score : 32 points
Date : 2022-03-23 19:17 UTC (3 hours ago)
(HTM) web link (danluu.com)
(TXT) w3m dump (danluu.com)
| jmalicki wrote:
| (2013 - had to check, there was no way talking about LSI could be
| recent)
| sydthrowaway wrote:
| So what do people do nowadays
| leecarraher wrote:
| In general these are referred to as embedding techniques
| PCA/SVD is still common along with other linear methods like
| ICA, CCA, PLS t-sne is pretty popular statistically inspired
| with some manifold learning non-negative matrix factorization
| is useful recommender systems topologically inspired
| techniques are gaining traction, like UMAP, LLE the author
| notes neural embeddings, like autoencoders.
| prionassembly wrote:
| UMAP.
| spekcular wrote:
| Where does the the idea that PCA is optimal come from?
|
| I understand the second example (girlfriend taking a phone
| screen): some people never took statistics past a linear models
| course, so they use what they know. Understandable, if
| inefficient. But they don't seem to be claiming linear models are
| optimal all the time, they're just stuck in a local maximum of
| trying to speed up the system they have.
|
| But the PCA guy I really don't understand.
| prionassembly wrote:
| PCA is pretty close to optimal in situations where you want to
| preserve _global structure_ and variance-covariance matrices
| are highly informative. This comes down to the min-max theorem
| for Rayleigh quotients; I won 't grandstand going over the math
| here. It won't work well for gene sequences or text embeddings
| because the desired structure there is _local_ (UMAP seems to
| reign supreme these days for that).
|
| An older explanation for PCA is that it's a sort of default
| factor model prior to rotations that introduce structural
| assumptions. (Varimax etc. factor analysis is really underrated
| in exploratory statistics; but then, by now data science
| training never introduces people to the FWL theorem,
| identification, etc. With large enough deep learning pretty
| much anything is possible -- also because using deep learning
| implies having truckloads of data -- but the median xgboost guy
| is way out of his element.)
| taylorius wrote:
| "and variance-covariance matrices are highly informative"
|
| So - a distribution which is linear?
| prionassembly wrote:
| You keep using that word.
| kurthr wrote:
| I don't think that the distribution needs to be linear...
| simply the correlation between N elements needs to be
| linear (or 1st order dominant). It works great for
| estimating multiple (e.g. different causes) small
| deviations (over multiple elements) from from a global
| response. If you're looking at those larger eigen values,
| then you're looking at the dominant mode shapes of the
| eigen vectors of correlation.
|
| As others have said, it's better for estimation (not for
| classification).
| wenc wrote:
| PCA is optimal in that it returns orthogonal axes in directions
| of maximum covariance. It can be cast as an optimization
| problem.
|
| It is definitely not optimal in a classification sense.
| [deleted]
| epgui wrote:
| While PCA can be useful for some classification problems, it is
| intended to be a dimensionality reduction technique, not a
| classification technique.
___________________________________________________________________
(page generated 2022-03-23 23:01 UTC)