[HN Gopher] Linear Hammer: PCA is not a panacea (2013)
       ___________________________________________________________________
        
       Linear Hammer: PCA is not a panacea (2013)
        
       Author : PaulHoule
       Score  : 32 points
       Date   : 2022-03-23 19:17 UTC (3 hours ago)
        
 (HTM) web link (danluu.com)
 (TXT) w3m dump (danluu.com)
        
       | jmalicki wrote:
       | (2013 - had to check, there was no way talking about LSI could be
       | recent)
        
         | sydthrowaway wrote:
         | So what do people do nowadays
        
           | leecarraher wrote:
           | In general these are referred to as embedding techniques
           | PCA/SVD is still common along with other linear methods like
           | ICA, CCA, PLS t-sne is pretty popular statistically inspired
           | with some manifold learning non-negative matrix factorization
           | is useful recommender systems topologically inspired
           | techniques are gaining traction, like UMAP, LLE the author
           | notes neural embeddings, like autoencoders.
        
           | prionassembly wrote:
           | UMAP.
        
       | spekcular wrote:
       | Where does the the idea that PCA is optimal come from?
       | 
       | I understand the second example (girlfriend taking a phone
       | screen): some people never took statistics past a linear models
       | course, so they use what they know. Understandable, if
       | inefficient. But they don't seem to be claiming linear models are
       | optimal all the time, they're just stuck in a local maximum of
       | trying to speed up the system they have.
       | 
       | But the PCA guy I really don't understand.
        
         | prionassembly wrote:
         | PCA is pretty close to optimal in situations where you want to
         | preserve _global structure_ and variance-covariance matrices
         | are highly informative. This comes down to the min-max theorem
         | for Rayleigh quotients; I won 't grandstand going over the math
         | here. It won't work well for gene sequences or text embeddings
         | because the desired structure there is _local_ (UMAP seems to
         | reign supreme these days for that).
         | 
         | An older explanation for PCA is that it's a sort of default
         | factor model prior to rotations that introduce structural
         | assumptions. (Varimax etc. factor analysis is really underrated
         | in exploratory statistics; but then, by now data science
         | training never introduces people to the FWL theorem,
         | identification, etc. With large enough deep learning pretty
         | much anything is possible -- also because using deep learning
         | implies having truckloads of data -- but the median xgboost guy
         | is way out of his element.)
        
           | taylorius wrote:
           | "and variance-covariance matrices are highly informative"
           | 
           | So - a distribution which is linear?
        
             | prionassembly wrote:
             | You keep using that word.
        
             | kurthr wrote:
             | I don't think that the distribution needs to be linear...
             | simply the correlation between N elements needs to be
             | linear (or 1st order dominant). It works great for
             | estimating multiple (e.g. different causes) small
             | deviations (over multiple elements) from from a global
             | response. If you're looking at those larger eigen values,
             | then you're looking at the dominant mode shapes of the
             | eigen vectors of correlation.
             | 
             | As others have said, it's better for estimation (not for
             | classification).
        
         | wenc wrote:
         | PCA is optimal in that it returns orthogonal axes in directions
         | of maximum covariance. It can be cast as an optimization
         | problem.
         | 
         | It is definitely not optimal in a classification sense.
        
       | [deleted]
        
       | epgui wrote:
       | While PCA can be useful for some classification problems, it is
       | intended to be a dimensionality reduction technique, not a
       | classification technique.
        
       ___________________________________________________________________
       (page generated 2022-03-23 23:01 UTC)