https://infoproc.blogspot.com/2021/02/gradient-descent-models-are-kernel.html

Information Processing

Pessimism of the Intellect, Optimism of the Will     Favorite posts |
Manifold podcast | Twitter: @hsu_steve

About Me

My photo

Steve Hsu

View my complete profile
#

Sunday, February 07, 2021

Gradient Descent Models Are Kernel Machines (Deep Learning)

This paper shows that models which result from gradient descent
training (e.g., deep neural nets) can be expressed as a weighted sum
of similarity functions (kernels) which measure the similarity of a
given instance to the examples used in training. The kernels are
defined by the inner product of model gradients in the parameter
space, integrated over the descent (learning) path.

Roughly speaking, two data points x and x' are similar, i.e., have
large kernel function K(x,x'), if they have similar effects on the
model parameters in the gradient descent. With respect to the
learning algorithm, x and x' have similar information content. The
learned model y = f(x) matches x to similar data points x_i: the
resulting value y is simply a weighted (linear) sum of kernel values
K(x,x_i).

This result makes it very clear that without regularity imposed by
the ground truth mechanism which generates the actual data (e.g.,
some natural process), a neural net is unlikely to perform well on an
example which deviates strongly (as defined by the kernel) from all
training examples. Given the complexity (e.g., dimensionality) of the
ground truth model, one can place bounds on the amount of data
required for successful training.

This formulation locates the nonlinearity of deep learning models in
the kernel function. The superposition of kernels is entirely linear
as long as the loss function is additive over training data.
 

    Every Model Learned by Gradient Descent Is Approximately a Kernel
    Machine  

    P. Domingos      

    https://arxiv.org/pdf/2012.00152.pdf

    Deep learning's successes are often attributed to its ability to
    automatically discover new representations of the data, rather
    than relying on handcrafted features like other learning methods.
    We show, however, that deep networks learned by the standard
    gradient descent algorithm are in fact mathematically
    approximately equivalent to kernel machines, a learning method
    that simply memorizes the data and uses it directly for
    prediction via a similarity function (the kernel). This greatly
    enhances the interpretability of deep network weights, by
    elucidating that they are effectively a superposition of the
    training examples. The network architecture incorporates
    knowledge of the target function into the kernel. This improved
    understanding should lead to better learning algorithms.

From the paper:

    ... Here we show that every model learned by this method,
    regardless of architecture, is approximately equivalent to a
    kernel machine with a particular type of kernel. This kernel
    measures the similarity of the model at two data points in the
    neighborhood of the path taken by the model parameters during
    learning. Kernel machines store a subset of the training data
    points and match them to the query using the kernel. Deep network
    weights can thus be seen as a superposition of the training data
    points in the kernel's feature space, enabling their efficient
    storage and matching. This contrasts with the standard view of
    deep learning as a method for discovering representations from
    data. ... 

    ... the weights of a deep network have a straightforward
    interpretation as a superposition of the training examples in
    gradient space, where each example is represented by the
    corresponding gradient of the model. Fig. 2 illustrates this. One
    well-studied approach to interpreting the output of deep networks
    involves looking for training instances that are close to the
    query in Euclidean or some other simple space (Ribeiro et al.,
    2016). Path kernels tell us what the exact space for these
    comparisons should be, and how it relates to the model's
    predictions. ...

See also this video which discusses the paper. 

You can almost grasp the result from the figure and definitions
below.
[Screenshot]
[Screenshot]
Posted by Steve Hsu at 4:43 PM # #
Email ThisBlogThis!Share to TwitterShare to FacebookShare to
Pinterest
Labels: ai, machine learning

No comments:

Post a Comment

Older Post Home
Subscribe to: Post Comments (Atom)

Blog Archive

  * V  2021 (13)
      + V  02 (4)
          o Gradient Descent Models Are Kernel Machines (Deep ...
          o Enter the Finns: FinnGen and FINRISK polygenic pre...
          o Gerald Feinberg and The Prometheus Project
          o All Men Are Brothers -- 3 AM Edition
      + >  01 (9)

  * >  2020 (107)
      + >  12 (9)
      + >  11 (4)
      + >  10 (10)
      + >  09 (6)
      + >  08 (6)
      + >  07 (6)
      + >  06 (8)
      + >  05 (11)
      + >  04 (14)
      + >  03 (15)
      + >  02 (8)
      + >  01 (10)

  * >  2019 (113)
      + >  12 (11)
      + >  11 (7)
      + >  10 (10)
      + >  09 (9)
      + >  08 (10)
      + >  07 (9)
      + >  06 (9)
      + >  05 (9)
      + >  04 (9)
      + >  03 (9)
      + >  02 (10)
      + >  01 (11)

  * >  2018 (128)
      + >  12 (13)
      + >  11 (9)
      + >  10 (16)
      + >  09 (11)
      + >  08 (9)
      + >  07 (11)
      + >  06 (6)
      + >  05 (15)
      + >  04 (8)
      + >  03 (11)
      + >  02 (12)
      + >  01 (7)

  * >  2017 (150)
      + >  12 (11)
      + >  11 (11)
      + >  10 (9)
      + >  09 (11)
      + >  08 (12)
      + >  07 (16)
      + >  06 (12)
      + >  05 (16)
      + >  04 (14)
      + >  03 (13)
      + >  02 (14)
      + >  01 (11)

  * >  2016 (206)
      + >  12 (13)
      + >  11 (15)
      + >  10 (14)
      + >  09 (19)
      + >  08 (19)
      + >  07 (17)
      + >  06 (14)
      + >  05 (19)
      + >  04 (21)
      + >  03 (15)
      + >  02 (16)
      + >  01 (24)

  * >  2015 (167)
      + >  12 (14)
      + >  11 (15)
      + >  10 (14)
      + >  09 (12)
      + >  08 (16)
      + >  07 (12)
      + >  06 (12)
      + >  05 (15)
      + >  04 (13)
      + >  03 (18)
      + >  02 (13)
      + >  01 (13)

  * >  2014 (184)
      + >  12 (17)
      + >  11 (14)
      + >  10 (15)
      + >  09 (15)
      + >  08 (14)
      + >  07 (15)
      + >  06 (12)
      + >  05 (19)
      + >  04 (12)
      + >  03 (18)
      + >  02 (16)
      + >  01 (17)

  * >  2013 (211)
      + >  12 (17)
      + >  11 (17)
      + >  10 (18)
      + >  09 (14)
      + >  08 (21)
      + >  07 (15)
      + >  06 (22)
      + >  05 (20)
      + >  04 (21)
      + >  03 (18)
      + >  02 (14)
      + >  01 (14)

  * >  2012 (221)
      + >  12 (17)
      + >  11 (19)
      + >  10 (20)
      + >  09 (25)
      + >  08 (19)
      + >  07 (18)
      + >  06 (16)
      + >  05 (19)
      + >  04 (16)
      + >  03 (18)
      + >  02 (20)
      + >  01 (14)

  * >  2011 (266)
      + >  12 (20)
      + >  11 (16)
      + >  10 (25)
      + >  09 (24)
      + >  08 (19)
      + >  07 (26)
      + >  06 (24)
      + >  05 (22)
      + >  04 (20)
      + >  03 (25)
      + >  02 (24)
      + >  01 (21)

  * >  2010 (234)
      + >  12 (20)
      + >  11 (24)
      + >  10 (24)
      + >  09 (27)
      + >  08 (22)
      + >  07 (17)
      + >  06 (16)
      + >  05 (14)
      + >  04 (23)
      + >  03 (10)
      + >  02 (14)
      + >  01 (23)

  * >  2009 (204)
      + >  12 (13)
      + >  11 (17)
      + >  10 (18)
      + >  09 (15)
      + >  08 (17)
      + >  07 (18)
      + >  06 (14)
      + >  05 (13)
      + >  04 (19)
      + >  03 (17)
      + >  02 (21)
      + >  01 (22)

  * >  2008 (255)
      + >  12 (25)
      + >  11 (31)
      + >  10 (34)
      + >  09 (31)
      + >  08 (16)
      + >  07 (16)
      + >  06 (19)
      + >  05 (21)
      + >  04 (16)
      + >  03 (20)
      + >  02 (10)
      + >  01 (16)

  * >  2007 (196)
      + >  12 (16)
      + >  11 (11)
      + >  10 (20)
      + >  09 (20)
      + >  08 (24)
      + >  07 (22)
      + >  06 (17)
      + >  05 (17)
      + >  04 (11)
      + >  03 (12)
      + >  02 (12)
      + >  01 (14)

  * >  2006 (154)
      + >  12 (15)
      + >  11 (16)
      + >  10 (11)
      + >  09 (14)
      + >  08 (16)
      + >  07 (16)
      + >  06 (8)
      + >  05 (10)
      + >  04 (10)
      + >  03 (13)
      + >  02 (14)
      + >  01 (11)

  * >  2005 (253)
      + >  12 (17)
      + >  11 (14)
      + >  10 (19)
      + >  09 (19)
      + >  08 (19)
      + >  07 (22)
      + >  06 (14)
      + >  05 (17)
      + >  04 (20)
      + >  03 (29)
      + >  02 (29)
      + >  01 (34)

  * >  2004 (100)
      + >  12 (45)
      + >  11 (51)
      + >  10 (4)

#

Labels

  * physics (375)
  * genetics (301)
  * globalization (280)
  * finance (268)
  * brainpower (265)
  * genomics (249)
  * technology (237)
  * american society (229)
  * China (209)
  * innovation (192)
  * economics (183)
  * ai (180)
  * psychometrics (172)
  * science (166)
  * photos (163)
  * psychology (158)
  * machine learning (144)
  * travel (143)
  * biology (141)
  * genetic engineering (129)
  * universities (129)
  * higher education (124)
  * human capital (119)
  * credit crisis (115)
  * startups (112)
  * iq (106)
  * cognitive science (99)
  * podcasts (98)
  * autobiographical (88)
  * careers (84)
  * political correctness (84)
  * politics (84)
  * geopolitics (81)
  * statistics (79)
  * credit crunch (78)
  * elitism (76)
  * bounded rationality (74)
  * evolution (74)
  * quantum mechanics (74)
  * gilded age (73)
  * talks (72)
  * income inequality (71)
  * social science (71)
  * genius (70)
  * history of science (66)
  * caltech (64)
  * books (62)
  * realpolitik (62)
  * MSU (59)
  * mma (57)
  * sci fi (56)
  * harvard (54)
  * biotech (53)
  * silicon valley (53)
  * academia (51)
  * mathematics (51)
  * kids (50)
  * education (49)
  * bgi (48)
  * intellectual history (48)
  * history (47)
  * cdo (45)
  * derivatives (43)
  * neuroscience (43)
  * behavioral economics (41)
  * jiujitsu (41)
  * literature (40)
  * physical training (39)
  * video (38)
  * computing (37)
  * ufc (37)
  * bjj (36)
  * bubbles (36)
  * film (36)
  * mortgages (36)
  * google (35)
  * expert prediction (34)
  * many worlds (34)
  * affirmative action (33)
  * hedge funds (33)
  * economic history (32)
  * nuclear weapons (31)
  * race relations (31)
  * security (31)
  * black holes (30)
  * foo camp (30)
  * quants (30)
  * von Neumann (30)
  * efficient markets (29)
  * feynman (29)
  * movies (29)
  * sports (29)
  * music (28)
  * singularity (26)
  * entrepreneurs (25)
  * housing (25)
  * obama (25)
  * subprime (25)
  * berkeley (24)
  * taiwan (24)
  * conferences (23)
  * epidemics (23)
  * venture capital (23)
  * athletics (22)
  * meritocracy (22)
  * quantum field theory (22)
  * ultimate fighting (22)
  * wall street (22)
  * cds (20)
  * internet (20)
  * scifoo (20)
  * blogging (19)
  * gender (18)
  * goldman sachs (18)
  * new yorker (18)
  * cryptography (17)
  * dna (17)
  * freeman dyson (17)
  * smpy (17)
  * treasury bailout (17)
  * university of oregon (17)
  * algorithms (16)
  * japan (16)
  * personality (16)
  * privacy (16)
  * autism (15)
  * christmas (15)
  * cosmology (15)
  * happiness (15)
  * height (15)
  * oppenheimer (15)
  * Fermi problems (14)
  * fitness (14)
  * les grandes ecoles (14)
  * social networks (14)
  * wwii (14)
  * chess (13)
  * government (13)
  * hedonic treadmill (13)
  * india (13)
  * neanderthals (13)
  * probability (13)
  * war (13)
  * aspergers (12)
  * blade runner (12)
  * malcolm gladwell (12)
  * net worth (12)
  * nobel prize (12)
  * nsa (12)
  * philosophy of mind (12)
  * research (12)
  * russia (12)
  * entropy (11)
  * geeks (11)
  * harvard society of fellows (11)
  * string theory (11)
  * television (11)
  * Einstein (10)
  * Go (10)
  * ability (10)
  * art (10)
  * climate change (10)
  * cold war (10)
  * football (10)
  * italy (10)
  * mutants (10)
  * nerds (10)
  * olympics (10)
  * pseudoscience (10)
  * complexity (9)
  * crossfit (9)
  * democracy (9)
  * encryption (9)
  * energy (9)
  * eugene (9)
  * flynn effect (9)
  * france (9)
  * james salter (9)
  * pop culture (9)
  * turing test (9)
  * alan turing (8)
  * alpha (8)
  * data mining (8)
  * dating (8)
  * determinism (8)
  * games (8)
  * keynes (8)
  * manhattan (8)
  * pca (8)
  * philip k. dick (8)
  * qcd (8)
  * quantum computers (8)
  * real estate (8)
  * robot genius (8)
  * success (8)
  * usain bolt (8)
  * aig (7)
  * ashkenazim (7)
  * basketball (7)
  * environmentalism (7)
  * free will (7)
  * fx (7)
  * game theory (7)
  * hugh everett (7)
  * new york times (7)
  * paris (7)
  * patents (7)
  * poker (7)
  * simulation (7)
  * tail risk (7)
  * teaching (7)
  * volatility (7)
  * Iran (6)
  * anthropic principle (6)
  * bayes (6)
  * class (6)
  * drones (6)
  * godel (6)
  * intellectual property (6)
  * markets (6)
  * nassim taleb (6)
  * noam chomsky (6)
  * prostitution (6)
  * rationality (6)
  * academia sinica (5)
  * bobby fischer (5)
  * econtalk (5)
  * fake alpha (5)
  * global warming (5)
  * information theory (5)
  * iraq war (5)
  * kasparov (5)
  * luck (5)
  * nonlinearity (5)
  * perimeter institute (5)
  * renaissance technologies (5)
  * sad but true (5)
  * software development (5)
  * vietnam war (5)
  * warren buffet (5)
  * 100m (4)
  * Poincare (4)
  * bill gates (4)
  * borges (4)
  * cambridge uk (4)
  * censorship (4)
  * charles darwin (4)
  * creativity (4)
  * demographics (4)
  * hormones (4)
  * humor (4)
  * inequality (4)
  * judo (4)
  * kerviel (4)
  * microsoft (4)
  * mixed martial arts (4)
  * monsters (4)
  * moore's law (4)
  * solar energy (4)
  * soros (4)
  * trento (4)
  * 200m (3)
  * babies (3)
  * brain drain (3)
  * charlie munger (3)
  * cheng ting hsu (3)
  * chet baker (3)
  * correlation (3)
  * ecosystems (3)
  * equity risk premium (3)
  * facebook (3)
  * fannie (3)
  * feminism (3)
  * fst (3)
  * intellectual ventures (3)
  * jim simons (3)
  * language (3)
  * lee kwan yew (3)
  * lewontin fallacy (3)
  * lhc (3)
  * magic (3)
  * michael lewis (3)
  * nathan myhrvold (3)
  * neal stephenson (3)
  * olympiads (3)
  * path integrals (3)
  * risk preference (3)
  * search (3)
  * sec (3)
  * sivs (3)
  * society generale (3)
  * supercomputers (3)
  * systemic risk (3)
  * thailand (3)
  * alibaba (2)
  * assortative mating (2)
  * bear stearns (2)
  * bruce springsteen (2)
  * charles babbage (2)
  * cloning (2)
  * computers (2)
  * david mamet (2)
  * digital books (2)
  * donald mackenzie (2)
  * drugs (2)
  * eliot spitzer (2)
  * empire (2)
  * exchange rates (2)
  * frauds (2)
  * freddie (2)
  * gaussian copula (2)
  * heinlein (2)
  * industrial revolution (2)
  * james watson (2)
  * ltcm (2)
  * mating (2)
  * mba (2)
  * mccain (2)
  * mit (2)
  * monkeys (2)
  * national character (2)
  * nicholas metropolis (2)
  * no holds barred (2)
  * offices (2)
  * oligarchs (2)
  * palin (2)
  * population structure (2)
  * prisoner's dilemma (2)
  * skidelsky (2)
  * socgen (2)
  * sprints (2)
  * twitter (2)
  * ussr (2)
  * variance (2)
  * virtual reality (2)
  * abx (1)
  * anathem (1)
  * andrew lo (1)
  * antikythera mechanism (1)
  * athens (1)
  * atlas shrugged (1)
  * ayn rand (1)
  * bay area (1)
  * beats (1)
  * book search (1)
  * bunnie huang (1)
  * car dealers (1)
  * carlos slim (1)
  * catastrophe bonds (1)
  * cdos (1)
  * ces 2008 (1)
  * chance (1)
  * children (1)
  * cochran-harpending (1)
  * cpi (1)
  * david x. li (1)
  * dick cavett (1)
  * dolomites (1)
  * dune (1)
  * eharmony (1)
  * escorts (1)
  * faces (1)
  * fads (1)
  * favorite posts (1)
  * fiber optic cable (1)
  * francis crick (1)
  * gary brecher (1)
  * gizmos (1)
  * greece (1)
  * greenspan (1)
  * hypocrisy (1)
  * igon value (1)
  * iit (1)
  * inflation (1)
  * information asymmetry (1)
  * iphone (1)
  * jack kerouac (1)
  * jaynes (1)
  * jazz (1)
  * jfk (1)
  * john dolan (1)
  * john kerry (1)
  * john paulson (1)
  * john searle (1)
  * john tierney (1)
  * jonathan littell (1)
  * las vegas (1)
  * lawyers (1)
  * lehman auction (1)
  * les bienveillantes (1)
  * lowell wood (1)
  * lse (1)
  * machine (1)
  * mcgeorge bundy (1)
  * mexico (1)
  * michael jackson (1)
  * mickey rourke (1)
  * migration (1)
  * money:tech (1)
  * myron scholes (1)
  * netwon institute (1)
  * networks (1)
  * newton institute (1)
  * nfl (1)
  * oliver stone (1)
  * phil gramm (1)
  * philanthropy (1)
  * philip greenspun (1)
  * portfolio theory (1)
  * power laws (1)
  * pyschology (1)
  * randomness (1)
  * recession (1)
  * sales (1)
  * singapore (1)
  * skype (1)
  * standard deviation (1)
  * star wars (1)
  * starship troopers (1)
  * students today (1)
  * teleportation (1)
  * tierney lab blog (1)
  * tomonaga (1)
  * tyler cowen (1)
  * venice (1)
  * violence (1)
  * virtual meetings (1)
  * war nerd (1)
  * wealth effect (1)

#

G Analytics

#

statcounter

blogger visitor
#
Simple theme. Powered by Blogger.
#