[HN Gopher] Gradient Descent Models Are Kernel Machines
___________________________________________________________________
Gradient Descent Models Are Kernel Machines
Author : dilap
Score : 77 points
Date : 2021-02-08 19:41 UTC (3 hours ago)
(HTM) web link (infoproc.blogspot.com)
(TXT) w3m dump (infoproc.blogspot.com)
| scythmic_waves wrote:
| The paper discussed showed on reddit a few months back [1].
| Another paper showed up shortly after claiming the exact opposite
| [2]. Some discussion of contradiction this can be found in this
| part of the thread: [3].
|
| I myself am very interested in [2]. It's fairly dense, but I've
| been meaning to go through it and the larger Tensor Programs
| framework ever since.
|
| [1]
| https://www.reddit.com/r/MachineLearning/comments/k7wj5s/r_e...
|
| [2]
| https://www.reddit.com/r/MachineLearning/comments/k8h01q/r_w...
|
| [3]
| https://www.reddit.com/r/MachineLearning/comments/k8h01q/r_w...
| Grimm1 wrote:
| Thank you! Number 2 was exactly what I was looking for and I
| just couldn't find the link.
| ur-whale wrote:
| > Gradient Descent Models Are Kernel Machines
|
| ... that also happen to actually work.
| tlb wrote:
| If you're surprised by this result because you've used kernel
| machines and didn't find them very good at generalization, keep
| in mind that this assumes a kernel function that accurately
| reflects similarity of input samples. Most work with kernel
| machines just uses Euclidean distance. For instance, in an image
| recognition model it would have to identify 2 images with dogs as
| more similar than an image with a dog and an image with a cat.
|
| With a sufficiently magical kernel function, indeed you can get
| great results with a kernel machine. But it's not so easy to
| write a kernel function for a domain like image processing, where
| shifts, scales, and small rotations shouldn't affect similarity
| much. Let alone for text processing, where it should recognize 2
| sentences with similar meaning as similar.
| yudlejoza wrote:
| I may or may not be surprised by the result, I'm definitely not
| surprised by the 'Thing A is thing B' in machine learning.
|
| Every tom, machine-learner, and harry, is an expert on proving
| to the whole world that thing-A is thing-B. The only problem is
| people don't hire them with a million dollars a year total
| compensation.
| memming wrote:
| But the converse is not true in this case, so still
| interesting.
| mywittyname wrote:
| >With a sufficiently magical kernel function, indeed you can
| get great results with a kernel machine. But it's not so easy
| to write a kernel function for a domain like image processing,
| where shifts, scales, and small rotations shouldn't affect
| similarity much. Let alone for text processing, where it should
| recognize 2 sentences with similar meaning as similar.
|
| I think the key issue at hand is that gradient descent is
| easier to train than a model using kernel functions. Someone
| could absolutely devise a mechanism for back propagation of
| errors with kernel functions, but at that point, it is
| basically a neural network.
| throwawaysea wrote:
| > This result makes it very clear that without regularity imposed
| by the ground truth mechanism which generates the actual data
| (e.g., some natural process), a neural net is unlikely to perform
| well on an example which deviates strongly (as defined by the
| kernel) from all training examples.
|
| Is this another way of saying that neural networks are just
| another statistical estimation method and not a path to general
| artificial intelligence? Or is it saying that problems like self-
| driving cars are not suitable to the current state of the art for
| AI since we have to ensure that reality doesn't deviate from
| training examples? Or both?
|
| I'd love to understand the "real life" implications of this
| finding better.
| phreeza wrote:
| Is it not possible to achieve AGI with a good statistical
| estimation method?
| viraptor wrote:
| Depends if you think "a human unlikely to perform well on an
| example which deviates strongly (as defined by experience)
| from all training examples." is true.
| Grimm1 wrote:
| When this was discussed about two months ago the conclusion I
| took away was not very many at the moment beyond somewhat
| formal equivalence.
|
| https://news.ycombinator.com/item?id=25314830
| 6gvONxR4sf7o wrote:
| Neither. It's more akin to the fundamental theorem of calculus.
| If you follow your data from point A to point B along some
| differential path, you can sum up/integrate the steps to follow
| your data in a single jump from point A to point B directly
| (with the jump in terms of those integrals). It's a super
| interesting viewpoint on gradient descent and models that use
| it that could be really useful in looking at and understanding
| those models abstractly, but isn't saying anything about
| suitability for different tasks.
| SubiculumCode wrote:
| somedays I feel that, in the final analysis, everything is just
| linear regression.
| jtmcmc wrote:
| given the properly transformed space that may very well be
| true...
| wenc wrote:
| As someone who has studied nonlinear nonconvex optimization, I
| don't think linear regression is the final word here. In the
| universe of optimization problems for curve-fitting, the linear
| case is only one case (albeit a very useful one).
|
| In many cases though, it is often insufficient. The next step
| is piecewise linearity and then convexity. It is said that
| convexity is a way to weaken a linearity requirement while
| leaving a problem tractable.
|
| Many real world systems are nonlinear (think physics models),
| and often nonconvex. You can approximate them using locally
| linear functions to be sure, but you lose a lot of fidelity in
| the process. Sometimes this is ok, sometimes this is not, so it
| depends on the final application.
|
| It happens that linear regression is good enough for a lot of
| stuff out there, but there are many places where it doesn't
| work.
| nightcracker wrote:
| > Many real world systems are nonlinear (think physics
| models)
|
| Technically, only if you don't zoom in too far, quantum
| mechanics is linear.
| SubiculumCode wrote:
| Yes there are non-linear functions...but often these result
| from combinations of linear functions.
| SubiculumCode wrote:
| Or rather, that all statistical methods are dressed up
| regression.
| IdiocyInAction wrote:
| FC neural nets are iterated linear regression, in some sense.
| tobmlt wrote:
| hey, hey!, we can over-fit higher order approximations too. The
| nerve on some of ya.
|
| (Absolutely just kidding around here)
| tqi wrote:
| https://twitter.com/theotheredmund/status/134945323076219699...
| 6gvONxR4sf7o wrote:
| Discussion of the actual paper here:
|
| https://news.ycombinator.com/item?id=25314830
|
| It's a really neat one for people who care about what's going on
| under the hood, but not immediately applicable to the more
| applied folks. I saw some good quotes at the time to the tune of
| "I can't wait to see the papers citing this one in a year or
| two."
| kdisorte wrote:
| Quoting an expert from the last time this was posted:
|
| So at the end, it rephrased a statement from "Neural Tangent
| Kernel: Convergence and Generalization in Neural Networks"
| [https://arxiv.org/abs/1806.07572], besides in a way which is
| kind of miss-leading.
|
| The assertion is known by the community at least since 2018, if
| not even well before.
|
| I find this article and the buzz around a little awkward.
| robrenaud wrote:
| Yannic Kilcher's paper explained series covers this pretty paper
| well. I feel like I have a decent understanding of it just after
| watching the video with a few pauses/rewinds.
|
| https://www.youtube.com/watch?v=ahRPdiCop3E
___________________________________________________________________
(page generated 2021-02-08 23:00 UTC)