hngopher.com

       [HN Gopher] Kolmogorov-Arnold networks may make neural networks ...
       ___________________________________________________________________
        
       Kolmogorov-Arnold networks may make neural networks more
       understandable
        
       Author : isaacfrond
       Score  : 207 points
       Date   : 2024-09-12 10:14 UTC (12 hours ago)
        
 (HTM) web link (www.quantamagazine.org)
 (TXT) w3m dump (www.quantamagazine.org)
        
       | itsthecourier wrote:
       | TL;DR: they are talking about KAN (Kolmogorov-Arnold networks)
        
         | weberer wrote:
         | Yeah. Thankfully, HN updated the title to be more descriptive.
         | (Old title was "Novel Architecture Makes Neural Networks More
         | Understandable")
        
       | RustySpottedCat wrote:
       | Can someone explain exactly what is the "unknown" of neural
       | networks? We built them, we know what they comprise of and how
       | they work. Yes, we can't map out every single connection between
       | nodes in this "multilayer perceptron" but don't we know how these
       | connections are formed?
        
         | taneq wrote:
         | There's a ton of research going into analysing and reverse
         | engineering NNs, this "they're mysterious black boxes and
         | forever inscrutable" narrative is outdated.
        
         | lupire wrote:
         | We don't know what each connection means, what information is
         | encoded in each weight. We don't know how it would behave
         | differently if each of the million or trillion weights was
         | changed.
         | 
         | Compare this to dictionaey, where it's obvious what information
         | is on each page and each line.
        
         | wslh wrote:
         | The brain serves as a useful analogy, even though LLMs are not
         | brains. Just as we can't fully understand how we think by
         | merely examining all of our neurons, understanding LLMs
         | requires more than analyzing their individual components,
         | though decoding LLMs is most likely easier, which doesn't mean
         | easy.
        
         | og_kalu wrote:
         | Sota LLMs like GPT-4o can natively understand b64 encoded text.
         | Now we have algorithms that can decode and encode b64 text. Is
         | that what GPT-4o is doing ? Did training learn that algorithm ?
         | Clearly not or at least not completely because typos in b64
         | that would destroy any chance of extracting meaning in the
         | original text for our algorithms are barely an inconvenience
         | for 4o.
         | 
         | So how is it decoding b64 then ? We have no idea.
         | 
         | We don't built Neural Networks. Not really. We build
         | architectures and then train them. Whatever they learn is
         | outside the scope of human action beyond supplying the training
         | data.
         | 
         | What they learn is largely unknown beyond trivial toy examples.
         | 
         | We know connections form, we can see the weights, we can even
         | see the matrices multiplying. We don't know what any of those
         | calculations are doing. We don't know what they mean.
         | 
         | Would an alien understand C Code just because he could see it
         | executing ?
        
           | HarHarVeryFunny wrote:
           | Base64 encoding is very simple - it's just taking each 6-bits
           | of the input and encoding (replacing) it as one of the 64
           | (2^6) characters A-Za-z0-9+/. If the input is 8-bit ASCII
           | text, then each 3 input characters will be encoded as 4
           | Base64 characters (3 * 8 = 24 bits = 4 * 6-bit Base64
           | chunks).
           | 
           | So, this is very similar to an LLM having to deal with
           | tokenized input, but instead of sequences of tokens
           | representing words you've got sequences of Base64 characters
           | representing words.
        
             | og_kalu wrote:
             | It's not about how simple B64 is or isn't. In fact i chose
             | a simple problem we've already solved algorithmically on
             | purpose. It's that all you've just said, reasonable as it
             | may sound is entirely speculation.
             | 
             | Maybe "no idea" was a bit much for this example but any
             | idea certainly didn't come from seeing the matrices
             | themselves fly.
        
               | HarHarVeryFunny wrote:
               | Huh? I just pointed out what Base64 encoding actually is
               | - not some complex algorithm, but effectively just a
               | tokenization scheme.
               | 
               | This isn't speculation - I've implemented Base64
               | decode/encode myself, and you can google for the
               | definition if you don't believe I've accurately described
               | it!
        
               | og_kalu wrote:
               | The speculation here is not about what b64 text is. It's
               | about how the LLM has learnt to process it.
               | 
               | Edit: Basically, For all anyone knows, it treats b64 as
               | another language entirely and decoding it is akin in the
               | network to translating French rather than the very simple
               | swapping you've just described.
        
               | HarHarVeryFunny wrote:
               | LLMs, just like all modern neural nets, are trained via
               | gradient descent which means following the most direct
               | path (steepest gradient on the error surface) to reduce
               | the error, with no more changes to weights once the error
               | gradient is zero.
               | 
               | Complexity builds upon simplicity, and the LLM will begin
               | by noticing the direct (and repeated without variation)
               | predictive relationship between Base64 encoded text and
               | corresponding plain text in the training set. Having
               | learnt this simple way to predict Base64
               | decoding/encoding, there is simply no mechanism whereby
               | it could change to a more complex "like translating
               | French" way of doing it. Once the training process has
               | discovered that Base64 text decoding can be PERFECTLY
               | predicted by a simple mapping, then the training error
               | will be zero and no more changes (unnecessary
               | complexification) will take place.
        
               | og_kalu wrote:
               | Modern Neural Networks are by no means guaranteed to
               | converge on _the_ simplest solution. and examples abound
               | in which NNs are discovered to learn weird esoteric
               | algorithms when simpler ones exist. The reason why is
               | kind of obvious. The simplest solution (that you 're
               | alluding to) from the perspective of training is simply
               | what works best first.
               | 
               | It's no secret the order of data has an impact on what
               | the network learns and how quickly, it's just not
               | feasible to police for these giant trillion token
               | datasets.
               | 
               | If a NN learns a more complex solution that works
               | perfectly for a less complex subset it meets later on,
               | there is little pressure to meet the simpler solution.
               | Especially when we're talking about instances where the
               | more complex solution might be more robust to any weird
               | permutations it might meet on the internet. e.g there is
               | probably a simpler way to translate text that never has
               | typos and a LLM will never converge on it.
               | 
               | Decoding/Encoding b64 is not the first thing it will
               | learn. It will learn to predict it first as it predicts
               | any other language carrying sequence. Then, it will learn
               | to translate it, mostly like long after learning how to
               | translate other languages. All that will have some impact
               | on the exact process it carries out with b64.
               | 
               | And like i said, we already know for a fact it's not just
               | doing naive substitution because it can recover corrupted
               | b64 text wholesale that our substitutions cannot.
        
               | drdeca wrote:
               | Isn't the gradient descent used, stochastic gradient
               | descent? I think that could matter a little bit.
               | 
               | Also, the base model when responding to base64 text, most
               | of the time the next token is also part of the base64
               | text, right? So presumably the first thing to learn would
               | be like, predicting how some base64 text continues,
               | which, when the base64 text is an encoding of some ascii
               | text, seems like it would involve picking up on the
               | patterns for that?
               | 
               | I would think that there would be both those cases, and
               | cases where the plaintext is present before or after.
        
               | kevindamm wrote:
               | That's not entirely true in the case of base64 because of
               | how statistical patterns within natural languages work.
               | For example, you can use frequency analysis to decrypt a
               | monoalphabetic substitution cipher on pretty much any
               | language if you have a frequency table for character
               | n-grams of the language, even with small numbers for n.
               | This is a much more shallow statistical processing than
               | what's going on within an LLM so I don't think many were
               | surprised that a transformer stack and attention heads
               | could decode base64. Especially if there were also
               | examples of base64-encoding in the training data (even
               | without parallel corpora for their encodings).
               | 
               | It doesn't explain higher level generalizations like
               | being a transpiler between different programming
               | languages that didn't have any side-by-side examples in
               | the training data. Or giving an answer in the voice of
               | some celebrity. Or being able to find entire rhyming word
               | sequences across languages. These are probably more like
               | the kind of unexplainable generalizations that you were
               | referring to.
               | 
               | I think it may be better to frame it in terms of accuracy
               | vs precision. Many people can explain accurately what an
               | LLM is doing under all those matrix multiplies, both
               | during training and inference. But, precisely why an
               | input leads to the resulting output is not explainable.
               | Being able to do that would involve "seeing" the shape of
               | the hypersurface of the entire language model, which as
               | sibling commenters have mentioned is quite difficult even
               | when aided by probing tools.
        
           | mapt wrote:
           | Our DNA didn't build our brain. Not really. Our DNA coded for
           | a loose trainable architecture with a lot of features that
           | result from emergent design, constraints of congenital
           | development, et cetera. Even if you include our full exome, a
           | bunch of environmental factors in your simulation, and are
           | examining a human with obscenely detailed tools at autopsy,
           | you're never going to be able to tell me with any
           | authenticity whether a given subject possesses the skill
           | 'skateboarding'.
        
             | drdeca wrote:
             | I find this analogy kind of confusing? Wouldn't the
             | analogous thing be to say that our DNA doesn't understand,
             | uh, how we are able to skateboard? But like, we generally
             | don't regard DNA as understanding anything, so that not
             | unexpected.
             | 
             | Where does "we can't tell whether a person possesses the
             | skill of 'skateboarding'?" fit in with, DNA not encoding
             | anything specific to skateboarding? It isn't as if we
             | designed our genome and therefore if our genome did hard-
             | code skateboarding skill that we would therefore (as
             | designers of our genome) have full understanding of how
             | skateboarding skill works at the neuron level.
             | 
             | I recognize that a metaphor/analogy/whatever does not have
             | to extend to all parts of something, and indeed most
             | metaphors/analogies/whatever fail at some point if pushed
             | too far. But, I don't understand how the commonalities you
             | are pointing to between [NN architecture : full NN network
             | with the specific weights] and [human genome : the whole
             | behavior of a person's brain including all the facts,
             | behaviors, etc. that they've learned throughout their life]
             | is supposed to apply to the example of _knowing_that_ a
             | person knows how to skateboard?
             | 
             | It is quite possible that I'm being dense.
             | 
             | Could you please elaborate on the analogy / the point you
             | are making with the analogy?
        
         | spencerchubb wrote:
         | We know the process to train a model, but when a model makes a
         | prediction we don't know exactly "how" it predicts the way it
         | does.
         | 
         | We can use the economy as an analogy. No single person really
         | understands the whole supply chain. But we know that each
         | person in the supply chain is trying to maximize their own
         | profit, and that ultimately delivers goods and services to a
         | consumer.
        
         | Lerc wrote:
         | We know how they are formed(and how to form them), we don't
         | know why forming in that particular way solves the problem at
         | hand.
         | 
         | Even this characterization is not strictly valid anymore, there
         | is a great deal of research into what's going on inside the
         | black box. The problem was never that it was a black box(we can
         | look inside at any time), but that it was hard to understand.
         | KANs help some of that be placed into mathematical formulation.
         | Generating mappings of activations over data similarly grants
         | insight.
        
         | mjburgess wrote:
         | * Given the training data, and the architecture of the network,
         | why does SGD with backprop find the given f? vs. any other of
         | an infinite set.
         | 
         | * Why are there are a set of f each with 0-loss that work?
         | 
         | * Given the weight space, and an f within it, why/when is a
         | task/skill defined as a subset of that space covered by f?
         | 
         | I think a major reasons why these are hard to answer is that
         | it's assumed that NNs are operating within an inferential
         | statistical context (ie., reversing some latent structure in
         | the data). But they're really bad at that. In my view, they are
         | just representation-builders that find proxy representations in
         | a proxy "task" space (def, aprox, proxy = "shadow of some real
         | structure, as captured in an unrelated space").
        
         | _navierstokes wrote:
         | Skipping some detail: the model applies many high-dimensional
         | functions to the input, and we don't know the reasoning for why
         | these functions solve the problem. Reducing the dimension of
         | the weights to human-readable values is non-trivial, and
         | multiple neurons interact in unpredictable ways.
         | 
         | Interpretability research has resulted in many useful results
         | and pretty visualizations[1][2], and there are many efforts to
         | understand Transformers[3][4] but we're far from being able to
         | completely explain the large models currently in use.
         | 
         | [1] - https://distill.pub/2018/building-blocks/
         | 
         | [2] - https://distill.pub/2019/activation-atlas/
         | 
         | [3] - https://transformer-circuits.pub/
         | 
         | [4] - https://arxiv.org/pdf/2407.02646
        
       | xiaodai wrote:
       | It doesn't that's the problem
        
       | mansoor_ wrote:
       | Not really. For a trivial function fitting problem, a KAN will
       | allow you to visualise the contribution of each base function
       | into the next layer of your network. Still, these trivial shallow
       | networks are the ones nobody needs to introspect. A deep NN will
       | not be explainable using this approach.
        
         | Taikonerd wrote:
         | Yeah. I'm not sure if anything with millions or billions of
         | parameters will ever be "explainable" in the way we want.
         | 
         | I mean, imagine a regular multivariable function with billions
         | of terms, written out on a (very big) whiteboard. Are we ever
         | really going to understand why it produces the numbers it does?
         | 
         | KANs may have an order of magnitude fewer parameters, but the
         | basic problem is still the same.
        
           | afiori wrote:
           | I found these articles very interesting in the context of
           | future ways to understand LLM/AIs
           | 
           | https://www.astralcodexten.com/p/the-road-to-honest-ai
           | 
           | https://www.astralcodexten.com/p/god-help-us-lets-try-to-
           | und...
        
           | etiam wrote:
           | Good points.
           | 
           | Personally I'm still basically with Geoff Hinton's early
           | conjecture that people will have to choose whether they want
           | a model that's easy to explain or one that actually works as
           | well as it could.
           | 
           | I'd imagine the really big whiteboard would often be
           | understandable in principle, but most people wouldn't be very
           | satisfied at having the model go "Jolly good. Set aside the
           | next 25 years in your calendar then, and tell me when you're
           | ready to start on practicing the prerequisites!".
           | 
           | On the other hand, one might question how often we really
           | understand something complex ostensibly "explained" to us,
           | rather than just gloss over real understanding. A lot of the
           | time people seem to act as if they don't care about really
           | knowing it, and just (hopefully!) want to get an inkling
           | what's involved and make sure that the process could be
           | demonstrated not to be seriously flawed.
           | 
           | The models are being held to standards that are typically not
           | applied to people nor to most traditional software. But sure,
           | there are also some real issues about reliability, trust and
           | bureaucratic certifications.
        
           | scarmig wrote:
           | I came across "Learning XOR: exploring the space of a classic
           | problem" other day:
           | https://www.maths.stir.ac.uk/~kjt/techreps/pdf/TR148.pdf
           | 
           | Even something with three units and two inputs is nontrivial
           | to understand on a deep level.
        
           | crazygringo wrote:
           | > _Are we ever really going to understand why it produces the
           | numbers it does?_
           | 
           | I would expect so, because we can categorize things
           | hierarchically.
           | 
           | A medium-sized library contains many billions of words, but
           | even with just a Dewey decimal system and a card catalog you
           | could find information relatively quickly.
           | 
           | There's no inherent difficulty in understanding what a
           | billion terms do, if you're able to just drill down using
           | some basic hierarchies. It's just about finding the right
           | algorithms to identify and describe the best set of
           | hierarchies. Which is difficult, but there's no reason to
           | think it won't be solvable in the near term.
        
           | thesz wrote:
           | KAN's have O(N^(-4)) scaling law where N is the number of
           | parameters. MLPs have O(N^(-1)) scaling or worse.
           | 
           | For where you need MLP with a tens of billions of parameters
           | you may need KAN with thousands.
        
       | stefanpie wrote:
       | The main author of KANs did a tutorial session yesterday at
       | MLCAD, an academic conference focused on the intersection of
       | hardware / semiconductor design and ML / deep learning. It was
       | super fascinating and seems really good for what they advertise
       | it for, gaining insight and interpret for physical systems
       | (symbolic expressions, conserved quantities , symmetries). For
       | science and mathematics this can be useful but for engineering
       | this might not be the main priority of an ML / deep learning (to
       | some extent).
       | 
       | There are still unknowns for leaning hard tasks and learning
       | capacity over harder problems. Even choices in for things like
       | the chosen basis function used for the KAN "activations" and what
       | other architectures these layers can be plugged into with some
       | gain is still unexplored. I think as people mess around with KANs
       | we'll get better answers to these questions.
        
         | notpublic wrote:
         | Presentation by the same author made 2 months back:
         | 
         | https://www.youtube.com/watch?v=FYYZZVV5vlY
        
         | abhgh wrote:
         | Is there a publicly available version of the session?
        
       | light_hue_1 wrote:
       | They cannot.
       | 
       | Just because one internal operation is understandable, doesn't
       | imply that the whole network is understandable.
       | 
       | Take even something much simpler: decision trees. Textbooks give
       | these as an example of understandable systems. A tree where you
       | make one decision based on one feature at a time then at the
       | leaves you output something. Like a bunch of if statements. And
       | in the 90s when computers were slow and trees were small this was
       | true.
       | 
       | Today massive decision trees and approaches like random forests
       | can create trees with millions of nodes. Nothing is interpretable
       | about them.
       | 
       | We have a basic math gap when it comes to understanding complex
       | systems. Yet another network type solves nothing.
        
         | ImHereToVote wrote:
         | A formula or equation that enables you to reason about complex
         | systems might simply not exists. It could very well be that to
         | reason about complexity forces you to actually do the
         | complexity.
        
         | empath75 wrote:
         | Even extremely complicated decision trees are interpretable to
         | some extent because you can just walk through the tree and
         | answer questions like: "If this had not been true, would the
         | result have been different?". It may not be possible to hold
         | the entire tree in your head at once, but it's certainly
         | possible to investigate the tree as needed to understand the
         | path that was taken through it.
        
           | svboese wrote:
           | But couldn't the same be said about standard MLPs or NNs in
           | general?
        
             | empath75 wrote:
             | _Sometimes_, and people do find features in neural networks
             | by tweaking stuff and seeing how the neurons activate, but
             | in general, no. Any given weight or layer or perceptron or
             | whatever can be reused for multiple purposes and it's
             | extremely difficult to say "this is responsible for that",
             | and if you do find parts of the network responsible for a
             | particular task, you don't know if it's _also_ responsible
             | for something else. Whereas with a decision tree it's
             | pretty simple to trace causality and tweak things without
             | changing unrelated parts of the tree. Changing weights in a
             | neural network leads to unpredictable results.
        
               | tomhallett wrote:
               | If a KAN has multiple layers, would tweaking the
               | equations of a KAN be more similar to tweaking the
               | weights in a MLP/NN, or more similar to tweaking a
               | decision tree?
               | 
               | EDIT: I gave the above thread (light_hue_1 > empath75 >
               | svboese > empath75) to chatgpt and had it write a
               | question to learn more, and it gave me "How do KAN
               | networks compare to decision trees or neural networks
               | when it comes to tracing causality and making
               | interpretability more accessible, especially in large,
               | complex models?". Either shows me and ai are on the right
               | track, or i'm as dumb as a statistical token guessing
               | machine....
               | 
               | https://imgur.com/3dSNZrG
        
             | Scene_Cast2 wrote:
             | LIME (local linear approximation basically) is one popular
             | technique to do so. Still has flaws (such as not being
             | close to a decision boundary).
        
               | pkage wrote:
               | LIME and other post-hoc explanatory techniques (deepshap,
               | etc.) only give an explanation for a singular inference,
               | but aren't helpful for the model as a whole. In other
               | words, you can make a reasonable guess as to why a
               | specific prediction was made but you have no idea how the
               | model will behave in the general case, even on similar
               | inputs.
        
               | Narhem wrote:
               | The purpose of post-prediction explanations would be to
               | increase confidence of a practitioner to use said
               | inference.
               | 
               | It's a disconnect between finding a real life "AI" and
               | trying to find something which works and you can have a
               | form of trust with.
        
             | ljosifov wrote:
             | You are right and IDK why you are downvoted. Few units of
             | perceptrons, few nodes in a decision tree, few of anything
             | - they are "interpretable". Billions of the sames - are not
             | interpretable any more. This b/c our understanding of
             | "interpretable" is "an array of symbols that can fit a page
             | or a white board". But there is no reason to think that all
             | the rules of our world would be such that they can be
             | expressed that way. Some maybe, others maybe not.
             | Interpretable is another platitudinous term that seems
             | appealing at 1st sight, only to be found to not be that
             | great after all. We humans are not interpretable, we can't
             | explain how we come up with the actions we take, yet we
             | don't say "now don't move, do nothing, until you are
             | interpretable". So - much ado about little.
        
         | t_mann wrote:
         | I think of it as "Could Newton have used this to find the
         | expressions for the forces he was analyzing (eg gravitational
         | force = g m_1 m_2 / d^2)?". I once asked a physics prof whether
         | that was conceivable in principle, and he said yes. It seems to
         | me like KANs should be able to find expressions like these
         | given experimental data. If that was true, then I don't see how
         | that wouldn't deserve being called interpretability.
        
           | fjkdlsjflkds wrote:
           | > It seems to me like KANs should be able to find expressions
           | like these given experimental data.
           | 
           | Perhaps, but this is not something unique to KANs: any
           | symbolic regression method can (at least in theory) find such
           | simple expressions. Here is an example of such type of work
           | (using non-KAN neural networks):
           | https://www.science.org/doi/10.1126/sciadv.aay2631
           | 
           | Rephrasing: just because you can reach simple expressions
           | with symbolic regression methods based on neural networks (or
           | KANs) does not necessarily imply that neural networks (or
           | KANs) are inherently interpretable (particularly once you
           | start stacking multiple layers).
        
           | nathan_compton wrote:
           | Just giving the force law hardly counts as interpret-ability.
           | You probably know that the 1/r^2 in the force law comes from
           | the dimensionality of space. That is the interpretation.
        
         | baq wrote:
         | yeah. you can run SHAP[0] on your xgboosted trees, results are
         | kinda interesting, but it doesn't actually explain anything
         | IME.
         | 
         | [0] https://shap.readthedocs.io/en/latest/index.html
        
           | cubefox wrote:
           | No wonder. "Shapley values" have the problem that they assume
           | all necessary conditions are equally important. Say a
           | successful surgery needs both a surgeon and a nurse,
           | otherwise the patient dies. Shapley values will then assume
           | that both have contributed equally to the successful surgery.
           | Which isn't true, because surgeons are much less available
           | (less replaceable) than nurses. If the nurse gets ill, a
           | different nurse could probably do the task, while if the
           | surgeon gets ill, the surgery may well have to be postponed.
           | So the surgeons are more important for (contribute more to) a
           | successful surgery.
        
             | adammarples wrote:
             | Clearly both are equally important, 100% necessary. This
             | doesn't account for rarity, nor does it account for wages,
             | agreeability, smell or any of the other things it isn't
             | trying to measure. You'll need a different metric for that
             | and if you want to take both into account you should.
        
               | cubefox wrote:
               | Shapley values try to measure importance of
               | contributions, and for this, bare necessity isn't a
               | sufficient indicator. I think it comes down to
               | probability. The task of the surgeon is, from a prior
               | perspective, less likely to be fulfilled because it is
               | harder to get hold of a surgeon.
               | 
               | Similarly: What what was the main cause of the match
               | getting lit? The match being struck? Or the atmosphere
               | containing oxygen? Both are necessary in the sense that
               | if either hadn't occurred the match wouldn't be lit. But
               | it seems clear that the main cause was the match being
               | struck, because matches being struck is relatively rare,
               | and hence unlikely, while the atmosphere contains oxygen
               | pretty much always.
               | 
               | So I think the contributions calculated for Shapley
               | values should be weighted by the inverse of their prior
               | probabilities. Though it is possible that such
               | probabilities are not typically available in the machine
               | learning context in which SHAP operates.
        
       | empath75 wrote:
       | I have a question, which might not even be related to this -- one
       | of the keys to the power of neural networks is exploiting the
       | massive parallelism enabled by GPUs, but are we leaving some
       | compute on the table by using just scalar weights? What if
       | instead of a matrices of weights, what if they were matrices of
       | functions?
        
         | mglz wrote:
         | GPUs are optimized for matrices of floating point values, so
         | current neural networks use this as a basis (with matrices
         | containing the scalar weights).
        
         | immibis wrote:
         | Each row/column (I always forget which way around matrices go)
         | of weights followed by a nonlinearity is a learnable function.
        
         | dahart wrote:
         | They way to think about NNs is that they are already made of
         | functions; groups of layered nodes become complex nonlinear
         | functions. For example a small 3-layer network can learn to
         | model a cubic spline function. The internals of the function
         | are learned at every step of the way; every addition and
         | multiplication. You can assume the number of functions in a
         | network is a fraction of the number of weights. This makes the
         | NN theoretically more flexible and powerful than modeling it
         | using more complex functions, because it learns and adapts each
         | and every function during training.
         | 
         | I would assume its possible using certain functions to, say,
         | model a small fixed-function MLP could perhaps result in more
         | efficient training, if we know the right functions to use. But
         | you could end up losing perf too if not careful. I'd guess the
         | main problems are we don't know what functions to use, and
         | adding nonlinear functions might come with added difficultly
         | wrt performance and precision and new modes of initialization
         | and normalization. Linear math is easy and powerful and already
         | capable of modeling complex functions, but nonlinear math might
         | be useful I'd guess... needs more study! ;)
        
         | ocular-rockular wrote:
         | What you're describing is very similar to deep Gaussian
         | processes.
        
       | esafak wrote:
       | Recently discussed in
       | https://news.ycombinator.com/item?id=40219205
        
       | IWeldMelons wrote:
       | Fad.
        
         | CamperBob2 wrote:
         | What evidence would change your mind?
        
       | throwaway2562 wrote:
       | The point on interpretability is scientific applications is in
       | symbolic regression - MLPs cannot always spit out an equation for
       | some data set: KANs can.
        
         | buildbot wrote:
         | I thought that MLPs are universal function approximators?
         | https://en.wikipedia.org/wiki/Universal_approximation_theore...
        
       | js8 wrote:
       | I don't what KANs are, but from the informal description in the
       | article "turn function on many variables into many functions of
       | single variable", it sounds reminiscent of lambda calculus.
        
         | samus wrote:
         | Nope, that's just currying and/or partial application.
        
       | triclops200 wrote:
       | The (semi) automatic simplification algorithm provided in the
       | paper for KANs seem, to me, like they're solving a similar
       | problem to https://arxiv.org/pdf/2112.04035, but with the
       | additional constraint of forward functional interpretability as
       | the goal instead of just a generalized abstraction compressor.
        
       ___________________________________________________________________
       (page generated 2024-09-12 23:00 UTC)