[HN Gopher] My Python code is a neural network
___________________________________________________________________
My Python code is a neural network
Author : gnyeki
Score : 235 points
Date : 2024-07-01 12:47 UTC (10 hours ago)
(HTM) web link (blog.gabornyeki.com)
(TXT) w3m dump (blog.gabornyeki.com)
| lawlessone wrote:
| Edit: ok i see it detects code.
|
| I thought it was replacing bits of ANN with custom python
| functions.
| sdwr wrote:
| This is new to me, and therefore bad and scary.
|
| It's great that you know NN well enough to fold it into regular
| work. But think of all us poor regular developers! Who now have
| to grapple with:
|
| - an unfamiliar architecture
|
| - uncertainty / effectively non-deterministic results in program
| flow
| sva_ wrote:
| NN are in principle deterministic (unless you add randomness to
| it such as is the case with LLM top p/k temperature).
|
| Uncertainty is probably the better word of the two, but I feel
| like there should be a different term.
| alsxnt wrote:
| Recurrent neural networks can be used for arbitrary computations,
| the equivalence to Turing machines has been proven. However, they
| are utterly impractical for the task.
|
| This seems to be a state machine that is somehow learned. The
| article could benefit from a longer synopsis and "Python" does
| not appear to be relevant at all. Learning real Python semantics
| would prove quite difficult due to the nature of the language (no
| standard, just do as CPython does).
| danans wrote:
| > Recurrent neural networks can be used for arbitrary
| computations, the equivalence to Turing machines has been
| proven. However, they are utterly impractical for the task.
|
| Karpathy's 2015 RNN article [1] demonstrated that RNNs trained
| character-wise on Shakespeare's works could produce
| Shakespeare-esque text (albeit without the narrative coherence
| of LLMs). Given that, why wouldn't they be able to handle
| natural language as formulaic as code review comments?
|
| In that case inference was run with randomized inputs in order
| to generate random "Shakespeare", but the structure of the
| language and style was still learned by the RNN. Perhaps it
| could be used for classification also.
|
| 1. https://karpathy.github.io/2015/05/21/rnn-effectiveness/
| vidarh wrote:
| For RNN abilities, RWKV is worth a look[1]
|
| It's billed as "an RNN with GPT-level LLM performance".
|
| [1] https://www.rwkv.com/
| danans wrote:
| I'd like to see a cost vs precision/recall comparison of using a
| RNN vs an LLM (local or API) for a problem like this.
| danans wrote:
| Hah... Ask, and HN (sorta) delivers. Not RNN, but self-
| finetuned LLM cost vs performance compared to GPT-4.
|
| https://openpipe.ai/blog/mixture-of-agents
| Fripplebubby wrote:
| Love this post! Gets into the details of what it _really_ means
| to take some function and turn it into an RNN, and comparing that
| to the "batteries included" RNNs included in PyTorch, as a
| learning experience.
|
| Question:
|
| > To model the state, we need to add three hidden layers to the
| network.
|
| How did you determine that it would be three hidden layers? Is it
| a consequence of the particular rule you were implementing, or is
| that generally how many layers you would use to implement a rule
| of this shape (using your architecture rather than Elman's -
| could we use fewer layers with Elman's?)?
| gnyeki wrote:
| I'm glad you found it valuable! Both are good questions and I
| haven't gone far enough mapping the code to Elman's
| architecture to know the answer to the second.
|
| For your first question, using three hidden layers makes it a
| little clearer what the network does. Each layer performs one
| step of the calculation. The first layer collects what is known
| from the current token and what we knew after the calculation
| for the previous token. The second layer decides whether the
| current token looks like program code, by checking if it
| satisfies the decision rule. The third layer compares the
| decision with what we decided for previous tokens.
|
| I think that this could be compressed into a single hidden
| layer, too. A ReLU should be good enough at capturing non-
| linearities so this should work.
| Fripplebubby wrote:
| Ah, that makes sense. So, we consider two hidden layers more
| as "memory" or "buffers", and actually the rule is
| implemented in just one layer, at least for a single token.
| ryjo wrote:
| Really awesome. Thanks for this thorough write-up. I don't
| totally understand the deeper math concepts mentioned in this
| article around RNNs, but it's sparked some of my own thoughts. It
| feels similar to things I've been exploring lately-- that is:
| building your app interwoven with forward chaining algorithms. In
| your case, you're using RNNs, and in mine, I'm building into the
| Rete algorithm.
|
| You also touch on something in this article that I've found quite
| powerful: putting things in terms of digesting an input string
| character-by-character. Then, we offload all of the reasoning
| logic to our algorithm. We write very thin i/o logic, and then
| the algorithm does the rest.
| pakl wrote:
| There exists the Universal (Function) Approximation Theorem for
| neural networks -- which states that they can represent/encode
| any function to a desired level of accuracy[0].
|
| However there does not exist a theorem stating that those
| approximations can be learned (or how).
|
| [0]
| https://en.m.wikipedia.org/wiki/Universal_approximation_theo...
| arketyp wrote:
| Makes you wonder what is meant by learning...
| dekhn wrote:
| Learning is using observations to create/update a model that
| makes predictions which are more accurate than chance. At
| some point the model ends up having generalizability beyond
| the domain.
| jb1991 wrote:
| FYI, there are actually many algorithms going back longer than
| the neural network algorithm that have been proven to be a
| universal function approximator. Neural networks are certainly
| not the only and not the first to do so. There are quite a few
| that are actually much more appropriate for many cases than a
| neural network.
| derangedHorse wrote:
| What other algorithms can do this and which situations would
| they be more useful than neural networks?
| gnyeki wrote:
| This area is covered by non-parametric statistics more
| generally. There are many other methods to non-
| parametrically estimate functions (that satisfy some
| regularity conditions). Tree-based methods are one family
| of such methods, and the consensus still seems to be that
| they perform better than neural networks on tabular data.
| For example:
|
| https://arxiv.org/abs/2106.03253
| someoneontenet wrote:
| Newtons Method approximates square roots. Its useful if you
| want to approximate something like that without pulling in
| the computational power required of NN.
| kristjansson wrote:
| Newton's method related to universal function
| approximation in the same way a natural oil seep is
| related to a modern IC engine...
| astrobe_ wrote:
| I think the problem to solve is more like : given a set
| of inputs and outputs, find a function that gives the
| expected output for each input [1]. This is like Newton's
| method on a higher order ;-). One can find such a tool in
| Squeak or Pharo Smalltalk, IIRC.
|
| [1] https://stackoverflow.com/questions/1539286/create-a-
| functio...
| jb1991 wrote:
| https://www.quora.com/Is-there-any-universal-function-
| approx...
| richrichie wrote:
| Not any function though. There are restrictions on type of
| functions "universal" approximation theorem is applicable for.
| Interestingly, the theorem is about a single layer network. In
| practice, that does not work as well as having many layers.
| visarga wrote:
| They can model only continuous functions, more specifically any
| continuous function on compact subsets of Rn. They can
| approximate functions to an arbitrary level of accuracy, given
| sufficient neurons
| montebicyclelo wrote:
| People throw that proof around all the time; but all it does is
| show that a neural net is equivalent to a lookup table; and a
| lookup table with enough memory can approximate any function.
| It's miles away from explaining how real world, useful, neural
| nets, like conv-nets, transformers, LSTMs, etc. actually work.
| scotchmi_st wrote:
| This is an interesting article if you read it like a howto for
| constructing a neural network for performing a practical task.
| But if you take it at face-value, and follow a similar method the
| next time you need to parse some input, then, well, I don't know
| what to say really.
|
| The author takes a hard problem (parsing arbitrary input for
| loosely-defined patterns), and correctly argues that this is
| likely to produce hard-to-read 'spaghetti' code.
|
| They then suggest replacing that with code that is so hard to
| read that there is still active research into how it works, (i.e
| a neural net).
|
| Don't over-index something that's inscrutable versus something
| that you can understand but is 'ugly'. Sometimes, _maybe_, a ML
| model is what you want for a task. But a lot of the time,
| something that you can read and see why it's doing what it's
| doing, even if that takes some effort, is better than something
| that's impossible.
| thoughtlede wrote:
| I think the mention of 'spaghetti code' is a red herring from
| the author. If the output from an algorithm cannot be defined
| precisely as a function of the input, but you have some
| examples to show, that's where machine learning (ML) is useful.
|
| In the end, ML provides one more option to choose from. Whether
| it works or not for you depends on evaluations and how
| deterministic and explainability you need from the chosen
| algorithm/option.
|
| The thing that struck me is if RNN is the right choice given
| that it would need to be trained and we need a lot of examples
| than what we might have. That said, maybe based on known
| 'rules', we can produce synthetic data for both +ve and -ve
| cases.
| jlturner wrote:
| If this interests you, it's worth taking a look at Genetic
| Programming. I find it to be a simpler approach at the same
| problem, no math required. It simply recombines programs by their
| AST, and given some heuristic, optimizes the program for it. The
| magic is in your heuristic function, where you can choose what
| you want to optimize for (ie. Speed, program length, minimize
| complex constructs or function calls, network efficiency, some
| combination therein, etc).
|
| https://youtu.be/tTMpKrKkYXo
| PixelN0va wrote:
| hmm thanks for the link
| nickpsecurity wrote:
| I'll add the Humies Awards that highlight human-competitive
| results. One can learn a lot about what can or can't be done in
| this field by just skimming across all the submitted papers.
|
| https://www.human-competitive.org/
| dekhn wrote:
| Are RNNs completely subsumed by transformers? IE, can I forget
| about learning anything about how to work with RNNs, and instead
| focus on transformers?
| Voloskaya wrote:
| Not if you want to be a PhD/Researcher in ML, yes otherwise.
|
| Source: Working on ML/LLMs as a research engineer for the past
| 7 years, including for one of the FAANG's research lab, always
| wanted to take time to learn about RNN but never did and never
| needed to.
| rolisz wrote:
| Oh, I'm sure plenty of recent PhDs don't know about RNNs.
| They've been dropped like a hot potato in the last 4-5 years.
| Voloskaya wrote:
| I think to do pure research it's definitely worth knowing
| about the big ideas of the past, why we moved on from them,
| what we learned etc.
| derangedHorse wrote:
| I haven't read it in a while but I remember this post giving
| a good rundown of rnns
|
| https://dennybritz.com/posts/wildml/recurrent-neural-
| network...
| toxik wrote:
| Transformers have finite context, RNNs don't. In practice the
| RNN gradient signal is limited by back propagation through
| time, it decays. This is in fact the whole selling point of
| transformers; association is not harder or easier in near/short
| distance. But in theory a RNN can remember infinitely far away.
| Fripplebubby wrote:
| To further problematize this question (which I don't feel like
| I can actually answer), consider this paper: "Transformers are
| RNNs: Fast Autoregressive Transformers with Linear Attention" -
| https://arxiv.org/pdf/2006.16236
|
| What this shows is that actually a specific narrow definition
| of transformer (a transformer with "causal masking" - see
| paper) is equivalent to an RNN, and vice versa.
|
| Similarly Mamba (https://arxiv.org/abs/2312.00752), the other
| hot architecture at the moment, has an equivalent unit to a
| gated RNN. For performance reasons, I believe they use an
| equivalent CNN during training and an RNN during inference!
| visarga wrote:
| There still are important distinctions. RNNs have constant
| memory while transformers expand their memory with each new
| token. They are related, but one could in theory process an
| unbounded sequence while the other cannot because of growing
| memory usage.
| Fripplebubby wrote:
| To be more concrete: you might decide not to learn about
| RNNs, but still find them lurking in the things you did learn
| about!
| thih9 wrote:
| > Of course, we should try and avoid writing spaghetti code if we
| can. But there are problems that are so ill-specified that any
| serious attempt to solve them results in just that.
|
| Can you elaborate or do you have an example?
|
| Based on just the above, I disagree - I'd say it's the job of the
| programmer to make sure that the problem is well-specified and
| that they can write maintainable code.
| fnord77 wrote:
| > To model the state, we need to add three hidden layers to the
| network
|
| Why 3?
|
| And why use "h" for layer names?
| muricula wrote:
| `h` is for "hidden" layers.
| skybrian wrote:
| This article doesn't talk much about testing or getting training
| data. It seems like that part is key.
|
| For code that you think you understand, it's because you've
| informally proven to yourself that it has some properties that
| generalize to all inputs. For example, a sort algorithm will sort
| any list, not just the ones you tested.
|
| The thing we're uncertain about for a neural network is that we
| don't know how it will generalize; there are no properties that
| we think are guaranteed for unseen input, even if it's slightly
| different input. It might be because we have an ill-specified
| problem and we don't know how to mathematically specify what
| properties we want.
|
| If you can actually specify a property well enough to write a
| property-based test (like QuickCheck) then you can generate large
| amounts of tests / training data though randomization. Start with
| one example of what you want, then write tests that generate
| every possible version of both positive and negative examples.
|
| It's not a proof, but it's a start. At least you know what you
| would prove, if you could.
|
| If you have such a thing, relying on spaghetti code or a neural
| network seem kind of similar? If you want another property to
| hold, you can write another property-based test for it. I suppose
| with the neural network you can train it instead of doing the
| edits yourself, but then again we have AI assistance for code
| fixes.
|
| I think I'd still trust code more. At least you can debug it.
| dinobones wrote:
| This article was going decently and then it just falls off a
| cliff.
|
| The article basically says: 1) Here's this complex problem 2)
| Here's some hand written heuristics 3) Here's a shitty neural net
| 4) Here's another neural net with some guys last name from the
| PyTorch library 5) Here are the constraints with adopting neural
| nets
|
| You can see why this is so unsatisfying, the leaps in logic
| become more and more generous.
|
| What I would have loved to see, is a comparison of a spaghetti
| code implementation vs a neural net implementation on a large
| dataset/codebase, then show examples in the validation set that
| maybe the neural net _generalizes to_ , or fails at, but the
| heuristic fails at, and so on.
|
| This would demonstrate the value of neural nets, if for example,
| there's a novel example that the neural net finds that the
| spaghetti heuristic can't.
|
| Show tangible results, show some comparison, show something,
| giving some rough numbers on the performance of each in aggregate
| would be really useful.
| ultra_nick wrote:
| I feel like neural networks are increasingly going to look like
| code.
|
| The next big innovation will be whoever figures out how to
| convert MOE style models into something like function calls.
| godelski wrote:
| > Humans are bad at managing spaghetti code. Of course, we should
| try and avoid writing spaghetti code if we can. But there are
| problems that are so ill-specified that any serious attempt to
| solve them results in just that.
|
| Sounds like a skill issue.
|
| But seriously, how many programmers do you know that reach for
| the documents or help pages (man pages?) instead of just looking
| for the first SO post with a similar question? That's how you
| start programming because you're just trying to figure out how to
| do anything in the first place, but not where you should be years
| later. If you've been programming in a language for years you
| should have read a good portion of the docs in that time (in
| addition to SO posts), blogs, and so much more. Because the
| things change too, so you have to be keeping up, and the truth is
| that this will never happen if you just read SO posts to answer
| your one question (and the next, and the next) because it will
| always lag behind what tools exist and even more likely will
| significantly lag because more recent posts have less time to
| gain upvotes.
|
| It kinda reminds me of the meme "how to exit vim." And how people
| state that it is so hard to learn. Not only does just typing
| `vim` into the terminal literally tell you how to quit, but
| there's a built in `vimtutor` that'll tell you how to use it and
| doesn't take very long to use. I've seen people go through this
| and be better than people that have "used" vim for years. And
| even then, how many people write `:help someFunction` into vim
| itself? Because it is FAR better than googling your question and
| you'll actually end up learning how the whole thing fits together
| because it is giving you context. The same is true for literally
| any programming language.
|
| You should also be writing docs to your code because if you have
| spaghetti code, there's a puzzle you haven't solved yet. And
| guess what, documenting is not too different from the rubber
| ducky method. Here's the procedure: write code to make shit work,
| write docs and edit your code as you realize you can make things
| better, go on and repeat but not revisit functions as you fuck
| them up with another function. It's not nearly as much work as it
| sounds and the investments compound. But quality takes time and
| nothing worth doing is easy. It takes time to learn any habit and
| skill. If you always look for the quickest solution to "just get
| it done" and you never come back, then you probably haven't
| learned anything, you've just parroted someone else. Moving fast
| and breaking things is great, but once you have done that you got
| to clean up your mess. You don't clean your kitchen by breaking
| your dining room table. And your house isn't clean if all your
| dishes are on the table! You might have to temporarily move stuff
| around, but eventually you need to clean shit up. And code is
| exactly the same way. If you regularly clean your house, it stays
| clean and is easy to keep clean. But if you do it once a year it
| is a herculean effort that you'll dread.
| suzukigsx1100g wrote:
| That's pretty lightwork for a snake in general. Send me a direct
| message if you come up with something better.
___________________________________________________________________
(page generated 2024-07-01 23:00 UTC)