https://infoproc.blogspot.com/2021/02/gradient-descent-models-are-kernel.html Information Processing Pessimism of the Intellect, Optimism of the Will Favorite posts | Manifold podcast | Twitter: @hsu_steve About Me My photo Steve Hsu View my complete profile # Sunday, February 07, 2021 Gradient Descent Models Are Kernel Machines (Deep Learning) This paper shows that models which result from gradient descent training (e.g., deep neural nets) can be expressed as a weighted sum of similarity functions (kernels) which measure the similarity of a given instance to the examples used in training. The kernels are defined by the inner product of model gradients in the parameter space, integrated over the descent (learning) path. Roughly speaking, two data points x and x' are similar, i.e., have large kernel function K(x,x'), if they have similar effects on the model parameters in the gradient descent. With respect to the learning algorithm, x and x' have similar information content. The learned model y = f(x) matches x to similar data points x_i: the resulting value y is simply a weighted (linear) sum of kernel values K(x,x_i). This result makes it very clear that without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples. Given the complexity (e.g., dimensionality) of the ground truth model, one can place bounds on the amount of data required for successful training. This formulation locates the nonlinearity of deep learning models in the kernel function. The superposition of kernels is entirely linear as long as the loss function is additive over training data. Every Model Learned by Gradient Descent Is Approximately a Kernel Machine P. Domingos https://arxiv.org/pdf/2012.00152.pdf Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms. From the paper: ... Here we show that every model learned by this method, regardless of architecture, is approximately equivalent to a kernel machine with a particular type of kernel. This kernel measures the similarity of the model at two data points in the neighborhood of the path taken by the model parameters during learning. Kernel machines store a subset of the training data points and match them to the query using the kernel. Deep network weights can thus be seen as a superposition of the training data points in the kernel's feature space, enabling their efficient storage and matching. This contrasts with the standard view of deep learning as a method for discovering representations from data. ... ... the weights of a deep network have a straightforward interpretation as a superposition of the training examples in gradient space, where each example is represented by the corresponding gradient of the model. Fig. 2 illustrates this. One well-studied approach to interpreting the output of deep networks involves looking for training instances that are close to the query in Euclidean or some other simple space (Ribeiro et al., 2016). Path kernels tell us what the exact space for these comparisons should be, and how it relates to the model's predictions. ... See also this video which discusses the paper. You can almost grasp the result from the figure and definitions below. [Screenshot] [Screenshot] Posted by Steve Hsu at 4:43 PM # # Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Labels: ai, machine learning No comments: Post a Comment Older Post Home Subscribe to: Post Comments (Atom) Blog Archive * V 2021 (13) + V 02 (4) o Gradient Descent Models Are Kernel Machines (Deep ... o Enter the Finns: FinnGen and FINRISK polygenic pre... o Gerald Feinberg and The Prometheus Project o All Men Are Brothers -- 3 AM Edition + > 01 (9) * > 2020 (107) + > 12 (9) + > 11 (4) + > 10 (10) + > 09 (6) + > 08 (6) + > 07 (6) + > 06 (8) + > 05 (11) + > 04 (14) + > 03 (15) + > 02 (8) + > 01 (10) * > 2019 (113) + > 12 (11) + > 11 (7) + > 10 (10) + > 09 (9) + > 08 (10) + > 07 (9) + > 06 (9) + > 05 (9) + > 04 (9) + > 03 (9) + > 02 (10) + > 01 (11) * > 2018 (128) + > 12 (13) + > 11 (9) + > 10 (16) + > 09 (11) + > 08 (9) + > 07 (11) + > 06 (6) + > 05 (15) + > 04 (8) + > 03 (11) + > 02 (12) + > 01 (7) * > 2017 (150) + > 12 (11) + > 11 (11) + > 10 (9) + > 09 (11) + > 08 (12) + > 07 (16) + > 06 (12) + > 05 (16) + > 04 (14) + > 03 (13) + > 02 (14) + > 01 (11) * > 2016 (206) + > 12 (13) + > 11 (15) + > 10 (14) + > 09 (19) + > 08 (19) + > 07 (17) + > 06 (14) + > 05 (19) + > 04 (21) + > 03 (15) + > 02 (16) + > 01 (24) * > 2015 (167) + > 12 (14) + > 11 (15) + > 10 (14) + > 09 (12) + > 08 (16) + > 07 (12) + > 06 (12) + > 05 (15) + > 04 (13) + > 03 (18) + > 02 (13) + > 01 (13) * > 2014 (184) + > 12 (17) + > 11 (14) + > 10 (15) + > 09 (15) + > 08 (14) + > 07 (15) + > 06 (12) + > 05 (19) + > 04 (12) + > 03 (18) + > 02 (16) + > 01 (17) * > 2013 (211) + > 12 (17) + > 11 (17) + > 10 (18) + > 09 (14) + > 08 (21) + > 07 (15) + > 06 (22) + > 05 (20) + > 04 (21) + > 03 (18) + > 02 (14) + > 01 (14) * > 2012 (221) + > 12 (17) + > 11 (19) + > 10 (20) + > 09 (25) + > 08 (19) + > 07 (18) + > 06 (16) + > 05 (19) + > 04 (16) + > 03 (18) + > 02 (20) + > 01 (14) * > 2011 (266) + > 12 (20) + > 11 (16) + > 10 (25) + > 09 (24) + > 08 (19) + > 07 (26) + > 06 (24) + > 05 (22) + > 04 (20) + > 03 (25) + > 02 (24) + > 01 (21) * > 2010 (234) + > 12 (20) + > 11 (24) + > 10 (24) + > 09 (27) + > 08 (22) + > 07 (17) + > 06 (16) + > 05 (14) + > 04 (23) + > 03 (10) + > 02 (14) + > 01 (23) * > 2009 (204) + > 12 (13) + > 11 (17) + > 10 (18) + > 09 (15) + > 08 (17) + > 07 (18) + > 06 (14) + > 05 (13) + > 04 (19) + > 03 (17) + > 02 (21) + > 01 (22) * > 2008 (255) + > 12 (25) + > 11 (31) + > 10 (34) + > 09 (31) + > 08 (16) + > 07 (16) + > 06 (19) + > 05 (21) + > 04 (16) + > 03 (20) + > 02 (10) + > 01 (16) * > 2007 (196) + > 12 (16) + > 11 (11) + > 10 (20) + > 09 (20) + > 08 (24) + > 07 (22) + > 06 (17) + > 05 (17) + > 04 (11) + > 03 (12) + > 02 (12) + > 01 (14) * > 2006 (154) + > 12 (15) + > 11 (16) + > 10 (11) + > 09 (14) + > 08 (16) + > 07 (16) + > 06 (8) + > 05 (10) + > 04 (10) + > 03 (13) + > 02 (14) + > 01 (11) * > 2005 (253) + > 12 (17) + > 11 (14) + > 10 (19) + > 09 (19) + > 08 (19) + > 07 (22) + > 06 (14) + > 05 (17) + > 04 (20) + > 03 (29) + > 02 (29) + > 01 (34) * > 2004 (100) + > 12 (45) + > 11 (51) + > 10 (4) # Labels * physics (375) * genetics (301) * globalization (280) * finance (268) * brainpower (265) * genomics (249) * technology (237) * american society (229) * China (209) * innovation (192) * economics (183) * ai (180) * psychometrics (172) * science (166) * photos (163) * psychology (158) * machine learning (144) * travel (143) * biology (141) * genetic engineering (129) * universities (129) * higher education (124) * human capital (119) * credit crisis (115) * startups (112) * iq (106) * cognitive science (99) * podcasts (98) * autobiographical (88) * careers (84) * political correctness (84) * politics (84) * geopolitics (81) * statistics (79) * credit crunch (78) * elitism (76) * bounded rationality (74) * evolution (74) * quantum mechanics (74) * gilded age (73) * talks (72) * income inequality (71) * social science (71) * genius (70) * history of science (66) * caltech (64) * books (62) * realpolitik (62) * MSU (59) * mma (57) * sci fi (56) * harvard (54) * biotech (53) * silicon valley (53) * academia (51) * mathematics (51) * kids (50) * education (49) * bgi (48) * intellectual history (48) * history (47) * cdo (45) * derivatives (43) * neuroscience (43) * behavioral economics (41) * jiujitsu (41) * literature (40) * physical training (39) * video (38) * computing (37) * ufc (37) * bjj (36) * bubbles (36) * film (36) * mortgages (36) * google (35) * expert prediction (34) * many worlds (34) * affirmative action (33) * hedge funds (33) * economic history (32) * nuclear weapons (31) * race relations (31) * security (31) * black holes (30) * foo camp (30) * quants (30) * von Neumann (30) * efficient markets (29) * feynman (29) * movies (29) * sports (29) * music (28) * singularity (26) * entrepreneurs (25) * housing (25) * obama (25) * subprime (25) * berkeley (24) * taiwan (24) * conferences (23) * epidemics (23) * venture capital (23) * athletics (22) * meritocracy (22) * quantum field theory (22) * ultimate fighting (22) * wall street (22) * cds (20) * internet (20) * scifoo (20) * blogging (19) * gender (18) * goldman sachs (18) * new yorker (18) * cryptography (17) * dna (17) * freeman dyson (17) * smpy (17) * treasury bailout (17) * university of oregon (17) * algorithms (16) * japan (16) * personality (16) * privacy (16) * autism (15) * christmas (15) * cosmology (15) * happiness (15) * height (15) * oppenheimer (15) * Fermi problems (14) * fitness (14) * les grandes ecoles (14) * social networks (14) * wwii (14) * chess (13) * government (13) * hedonic treadmill (13) * india (13) * neanderthals (13) * probability (13) * war (13) * aspergers (12) * blade runner (12) * malcolm gladwell (12) * net worth (12) * nobel prize (12) * nsa (12) * philosophy of mind (12) * research (12) * russia (12) * entropy (11) * geeks (11) * harvard society of fellows (11) * string theory (11) * television (11) * Einstein (10) * Go (10) * ability (10) * art (10) * climate change (10) * cold war (10) * football (10) * italy (10) * mutants (10) * nerds (10) * olympics (10) * pseudoscience (10) * complexity (9) * crossfit (9) * democracy (9) * encryption (9) * energy (9) * eugene (9) * flynn effect (9) * france (9) * james salter (9) * pop culture (9) * turing test (9) * alan turing (8) * alpha (8) * data mining (8) * dating (8) * determinism (8) * games (8) * keynes (8) * manhattan (8) * pca (8) * philip k. dick (8) * qcd (8) * quantum computers (8) * real estate (8) * robot genius (8) * success (8) * usain bolt (8) * aig (7) * ashkenazim (7) * basketball (7) * environmentalism (7) * free will (7) * fx (7) * game theory (7) * hugh everett (7) * new york times (7) * paris (7) * patents (7) * poker (7) * simulation (7) * tail risk (7) * teaching (7) * volatility (7) * Iran (6) * anthropic principle (6) * bayes (6) * class (6) * drones (6) * godel (6) * intellectual property (6) * markets (6) * nassim taleb (6) * noam chomsky (6) * prostitution (6) * rationality (6) * academia sinica (5) * bobby fischer (5) * econtalk (5) * fake alpha (5) * global warming (5) * information theory (5) * iraq war (5) * kasparov (5) * luck (5) * nonlinearity (5) * perimeter institute (5) * renaissance technologies (5) * sad but true (5) * software development (5) * vietnam war (5) * warren buffet (5) * 100m (4) * Poincare (4) * bill gates (4) * borges (4) * cambridge uk (4) * censorship (4) * charles darwin (4) * creativity (4) * demographics (4) * hormones (4) * humor (4) * inequality (4) * judo (4) * kerviel (4) * microsoft (4) * mixed martial arts (4) * monsters (4) * moore's law (4) * solar energy (4) * soros (4) * trento (4) * 200m (3) * babies (3) * brain drain (3) * charlie munger (3) * cheng ting hsu (3) * chet baker (3) * correlation (3) * ecosystems (3) * equity risk premium (3) * facebook (3) * fannie (3) * feminism (3) * fst (3) * intellectual ventures (3) * jim simons (3) * language (3) * lee kwan yew (3) * lewontin fallacy (3) * lhc (3) * magic (3) * michael lewis (3) * nathan myhrvold (3) * neal stephenson (3) * olympiads (3) * path integrals (3) * risk preference (3) * search (3) * sec (3) * sivs (3) * society generale (3) * supercomputers (3) * systemic risk (3) * thailand (3) * alibaba (2) * assortative mating (2) * bear stearns (2) * bruce springsteen (2) * charles babbage (2) * cloning (2) * computers (2) * david mamet (2) * digital books (2) * donald mackenzie (2) * drugs (2) * eliot spitzer (2) * empire (2) * exchange rates (2) * frauds (2) * freddie (2) * gaussian copula (2) * heinlein (2) * industrial revolution (2) * james watson (2) * ltcm (2) * mating (2) * mba (2) * mccain (2) * mit (2) * monkeys (2) * national character (2) * nicholas metropolis (2) * no holds barred (2) * offices (2) * oligarchs (2) * palin (2) * population structure (2) * prisoner's dilemma (2) * skidelsky (2) * socgen (2) * sprints (2) * twitter (2) * ussr (2) * variance (2) * virtual reality (2) * abx (1) * anathem (1) * andrew lo (1) * antikythera mechanism (1) * athens (1) * atlas shrugged (1) * ayn rand (1) * bay area (1) * beats (1) * book search (1) * bunnie huang (1) * car dealers (1) * carlos slim (1) * catastrophe bonds (1) * cdos (1) * ces 2008 (1) * chance (1) * children (1) * cochran-harpending (1) * cpi (1) * david x. li (1) * dick cavett (1) * dolomites (1) * dune (1) * eharmony (1) * escorts (1) * faces (1) * fads (1) * favorite posts (1) * fiber optic cable (1) * francis crick (1) * gary brecher (1) * gizmos (1) * greece (1) * greenspan (1) * hypocrisy (1) * igon value (1) * iit (1) * inflation (1) * information asymmetry (1) * iphone (1) * jack kerouac (1) * jaynes (1) * jazz (1) * jfk (1) * john dolan (1) * john kerry (1) * john paulson (1) * john searle (1) * john tierney (1) * jonathan littell (1) * las vegas (1) * lawyers (1) * lehman auction (1) * les bienveillantes (1) * lowell wood (1) * lse (1) * machine (1) * mcgeorge bundy (1) * mexico (1) * michael jackson (1) * mickey rourke (1) * migration (1) * money:tech (1) * myron scholes (1) * netwon institute (1) * networks (1) * newton institute (1) * nfl (1) * oliver stone (1) * phil gramm (1) * philanthropy (1) * philip greenspun (1) * portfolio theory (1) * power laws (1) * pyschology (1) * randomness (1) * recession (1) * sales (1) * singapore (1) * skype (1) * standard deviation (1) * star wars (1) * starship troopers (1) * students today (1) * teleportation (1) * tierney lab blog (1) * tomonaga (1) * tyler cowen (1) * venice (1) * violence (1) * virtual meetings (1) * war nerd (1) * wealth effect (1) # G Analytics # statcounter blogger visitor # Simple theme. Powered by Blogger. #