[HN Gopher] Principal Component Analysis Explained Visually
___________________________________________________________________
Principal Component Analysis Explained Visually
Author : xk3
Score : 83 points
Date : 2021-05-02 18:37 UTC (4 hours ago)
(HTM) web link (setosa.io)
(TXT) w3m dump (setosa.io)
| quantstats wrote:
| This article is relatively popular here (considering the topic).
| Two previous discussions about it:
|
| From 2015: https://news.ycombinator.com/item?id=9040266.
|
| From 2017: https://news.ycombinator.com/item?id=14405665.
|
| A more recent approach to visualizing high-dimensional data is
| the t-SNE algorithm, which I normally use together with PCA when
| exploring big data sets. If you're interested in the differences
| between both methods, here's a really good answer:
| https://stats.stackexchange.com/a/249520.
| rcar wrote:
| PCA is a cool technique mathematically, but in my many years of
| building models, I've never seen it result in a more accurate
| model. I could see it potentially being useful in situations
| where you're forced to use a linear/logistic model since you're
| going to have to do a lot of feature preprocessing, but tree
| ensembles, NNs, etc. are all able to tease out pretty complicated
| relationships among features on their own. Considering that PCA
| also complicates things from a model interpretability point of
| view, it feels to me like a method whose time has largely passed.
| mcguire wrote:
| By definition, it's going to result in a less accurate model,
| unless you keep all of the dimensions or your data is very
| weird, right? And NNs are going to complicate your
| interpretability more?
| ivalm wrote:
| It is still a nice tool for projecting things (at least to
| visualize) where you expect the data to be on a lower
| dimensional hyperplane. I do agree in most cases t-SNE or UMAP
| are better (esp if you don't care about distances).
| a-dub wrote:
| i can think of a few places where it's useful:
|
| if you know that your data comes from a stationary
| distribution, you can use it as a compression technique which
| reduces the computational demands on your model. sure,
| computing the initial svd or covariance matrix is expensive,
| but once you have it, the projection is just a matrix multiply
| and a vector subtraction. (with the reverse being the same)
|
| if you have some high dimensional data and you just want to
| look at it, it's a pretty good start. not only does it give you
| a sense for whether higher dimensions are just noise (by
| looking at the eigenspectrums) it also makes low dimensional
| plots possible.
|
| pca, cca and ica have been around for a very long time. i doubt
| "their time has passed."
|
| but who knows, maybe i'm wrong.
| baron_harkonnen wrote:
| > Considering that PCA also complicates things from a model
| interpretability point of view
|
| This is a strange comment since my primary usages of PCA/SVD is
| as a first step in understanding latent factors which are
| driving the data. Latent factors typically involve all of the
| important things that anyone running a business or deciding
| policy care about: customer engagement, patient well being,
| employee hapiness, etc all represent latent factors.
|
| If you have ever wanted to perform data analysis and gain some
| exciting insight into explaining user behavior, PCA/SVD will
| get you there pretty quickly. It is one of the most powerful
| tools in my arsenal when I'm working on a project that requires
| interoperability.
|
| The "loadings" in PC and the V matrix in SVD both contain
| information about how the original feature space correlates
| with the new projection. This can easily show thing things like
| "User's who do X,Y and NOT Z are more likely to purchase".
|
| Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-
| Frequency matrix you will get a first pass at semantic
| embedding. You'll notice, for example, that "dog" and "cat"
| will project onto the new space in a common PC which can be
| used to interpret "pets".
|
| > I've never seen it result in a more accurate model. I could
| see it potentially being useful in situations where you're
| forced to use a linear/logistic model
|
| PCA/SVD are a linear transformation of the data and shouldn't
| give you any performance increase on a linear model. However
| they can be very helpful in transforming extremely high
| dimensional, sparse vectors into lower dimensional, dense
| representations. This can provide a lot of storage/performance
| benefits.
|
| > NNs, etc. are all able to tease out pretty complicated
| relationships among features on their own.
|
| PCA is literally identical to an autoencoder minimizing the MSE
| with no non-linear layers. It is a very good first step towards
| understanding what your NN will eventually do. After all, all
| NNs perform a non-linear matrix transformation so that your
| final vector space is ultimately linearly separable.
| rcar wrote:
| Sure, everyone wants to get to the latent factors that really
| drive the outcome of interest, but I've never seen a
| situation in which principal components _really_ represent
| latent factors unless you squint hard at them and want to
| believe. As for gaining insight and explaining user behavior,
| I'd much rather just fit a decent model and share some SHAP
| plots for understanding how your features relate to the
| target and to each other.
|
| If you like PCA and find it works in your particular domains,
| all the more power to you. I just don't find it practically
| useful for fitting better models and am generally suspicious
| of the insights drawn from that and other unsupervised
| techniques, especially given how much of the meaning of the
| results gets imparted by the observer who often has a
| particular story they'd like to tell.
| fredophile wrote:
| I've used PCA with good results in the past. My problem
| essentially simplified down to trying to find nearest
| neighbours in high dimensional spaces. Distance metrics in
| high dimensional spaces don't behave nicely. Using PCA to
| cut reduce the number of dimensions to something more
| manageable made the problem much more tractable.
| strontian wrote:
| What was used to make the visualizations?
| alexcnwy wrote:
| https://d3js.org/ & https://threejs.org/
| gentleman11 wrote:
| I bet you could use three.js as well
| onurcel wrote:
| Here is the best PCA explanation I ever read on the web:
| https://stats.stackexchange.com/questions/2691/making-sense-...
| saeranv wrote:
| Seconded. This is exactly the same stackexchange post I thought
| of as well.
| rsj_hn wrote:
| I put the four dots on the corners of a square and the fifth in
| the center. This results in the same square in the PCA pane but
| rotated about 45 degrees. Then, if you take one of the dots on
| the square corner and move it ever so sligthly in and out, you
| see the PCA square wildly rotating. Pretty cool to demonstrate
| sensitivity to small changes in the inputs.
| kwhitefoot wrote:
| Very interesting. It would have been even better if there had
| been a link to an explanation of how PCA is performed.
| baron_harkonnen wrote:
| If you know a bit of linear algebra the transformation is
| surprisingly intuitive.
|
| Your goal is to create a set of orthogonal vectors, each that
| captures the highest amount of variance in the original data
| (the assumption is that variance is where the most information
| is).
|
| This is achieved by performing Eigen-decomposition on the
| Covariance matrix of the original data. Essentially you are
| learning the eigenvectors of the covariance matrix, ordered by
| their eigenvalues.
| bigdict wrote:
| Or the singular vectors of the zero-centered data, ordered by
| singular values.
| gentleman11 wrote:
| > a transformation no different than finding a camera angle
|
| I've used PCA a bit in the past and it's so abstract that one
| forgets how to conceptualize it shortly after finishing the task.
| This is an interesting and memorable way to put it, I like that.
| Sinidir wrote:
| Question: is there any difference between the highest variance
| dimension pca finds and a line that linear regression would find?
| hsiang_jih_kueh wrote:
| if recall yeah there probably will be. linear regression
| minimises the vertical distance of a point to the regression
| line whereas PCA minimises the orthogonal distance of the point
| to the line.
| osipov wrote:
| Linear regression uses a measure of an "error" for every data
| point. Visually, the error is the vertical difference between a
| data point and the line/plane of linear regression. In
| contrast, PCA measures the distance from the data point along
| the line perpendicular to the PCA axis. The PCA distance is
| also known as a "projection".
|
| There is something known as orthogonal regression (total least
| squares) which uses the same measure as PCA. Unfortunately it
| doesn't work well across incompatible variables.
___________________________________________________________________
(page generated 2021-05-02 23:00 UTC)