[HN Gopher] Reverse engineering a neural network's clever soluti...
       ___________________________________________________________________
        
       Reverse engineering a neural network's clever solution to binary
       addition (2023)
        
       Author : Ameo
       Score  : 70 points
       Date   : 2025-11-04 07:22 UTC (4 days ago)
        
 (HTM) web link (cprimozic.net)
 (TXT) w3m dump (cprimozic.net)
        
       | IlikeKitties wrote:
       | >As I mentioned, before, I had imagined the network learning some
       | fancy combination of logic gates to perform the whole addition
       | process digitally, similarly to how a binary adder operates. This
       | trick is yet another example of neural networks finding
       | unexpected ways to solve problems.
       | 
       | My intuition is that this solution allows for some form of
       | gradient approach to a solution, which is why it's unintuitive.
       | We think about solutions as all or nothing and look for complete
       | solutions.
        
         | arjvik wrote:
         | The more interesting question is is it even possible to learn
         | the logic gates solution through gradient descent?
        
           | scarmig wrote:
           | You could riff off an approach similar to https://google-
           | research.github.io/self-organising-systems/di...
        
         | elteto wrote:
         | Right, binary gates are discrete elements but neural networks
         | operate on a continuous domain.
         | 
         | I'm reminded of the Feynman anecdote when he went to work for
         | Thinking Machines and they gave him some task related to
         | figuring out routing in the CPU network of the machine, which
         | is a discrete problem. He came back with a solution that used
         | partial differential equations, which surprised everyone.
        
       | rnhmjoj wrote:
       | Original submission:
       | https://news.ycombinator.com/item?id=34399142
        
       | drougge wrote:
       | This seems interesting, but I got stuck fairly early on when I
       | read "all 32,385 possible input combinations". There are two 8
       | bit numbers, 16 totally independent bits. That's 65_536
       | combinations. 32_285 is close to half that, but not quite.
       | Looking at it in binary it's 01111110_10000001, i.e. two 8 bit
       | words that are the inverse of each other. How was this number
       | arrived at, and why?
       | 
       | Looking later there's also a strange DAC that gives the lowest
       | resistance to the least significant bit, thus making it the
       | biggest contributor to the output. Very confusing.
        
         | dahart wrote:
         | Is that the number of adds that don't overflow an 8-bit result?
         | 
         | On that hunch, I just checked and I get 32896.
         | 
         | Edit: if I exclude either input being zero, I get 32385.
         | 
         | You also get the same number when including input zeros but
         | excluding results above 253. But I'd bet on the author's reason
         | being filtering of input zeros. Maybe the NN does something bad
         | with zeros, maybe can't learn them for some reason.
        
         | jtsiskin wrote:
         | Interesting puzzle. 32385 is 255 pick 2. My guess would be, to
         | hopefully make interpretation easier, they always had the
         | larger number on one side. So (1,2) but not (2,1). And also 0
         | wasn't included. So perhaps their generation loop looks like
         | [[(i,j) for j (i-1 -> 1) for i (256 -> 1)]
        
         | joshribakoff wrote:
         | You are potentially conflating combinations with permutations.
        
       | bob1029 wrote:
       | > While playing around with this setup, I tried re-training the
       | network with the activation function for the first layer replaced
       | with sin(x) and it ends up working pretty much the same way.
       | 
       | There is some evidence that the activation functions and weights
       | can be arbitrarily selected assuming you have a way to evolve the
       | topology of the network.
       | 
       | https://arxiv.org/abs/1906.04358
        
       | anon291 wrote:
       | Very nice. I think people don't appreciate enough the
       | correspondence between linear algebra, differential equations,
       | and wave behavior.
       | 
       | Roughly speaking, it seems the network is essentially converting
       | binary digits to orthogonal basis functions and then manipulating
       | those basis functions. Finally a linear transformation back into
       | the binary digit space.
        
       | YeGoblynQueenne wrote:
       | >> I created training data by generating random 8-bit unsigned
       | integers and adding them together with wrapping.
       | 
       | So, binary addition in [0,256] (base 10). Did the author try the
       | trained network on numbers outside the training range?
       | 
       | It's one thing to find that your neural net discovered this one
       | neat trick for binary addition with 8-bit numbers, and something
       | completely different to find that it figured out binary addition
       | in the general case.
       | 
       | How hard the latter would be... depends. What were the activation
       | functions? E.g. it is quite possible to learn how to add two
       | (arbitrary, base-10) integers with a simple regression for no
       | other reason than regression being itself based on addition (ok,
       | summation).
        
       | xg15 wrote:
       | This is really cool and I hope there will be more experiments
       | like this.
       | 
       | My takeaway is also that we don't really have a good intuition
       | yet how the internal representations of neuronal networks "work"
       | or what kind of internal representations can even be learned
       | through SGD+backpropagation. (And also how those representations
       | depend on the architecture)
       | 
       | Like in this case, where the author first imagined the network
       | would learn a logic network, but the end result was more like an
       | analog circuit.
       | 
       | It's possible to construct the "binary adder" network the author
       | imagined "from scratch" by handpicking the weights. But the
       | question would be interesting if it could also be learned or if
       | SGD would always produce an "analog" solution like this one.
        
       | bgnn wrote:
       | The second step, passing the analog output through shifted tanh
       | functions, is implementing an analog to digital converter (ADC).
       | This type ADCs were common back in the BJT days.
       | 
       | So: DAC + sum in analog domain+ ADC is what the NN is doing.
        
       | krbaccord94f wrote:
       | Binary layer functions, whether for DACs which convert 4-bit or
       | 8-bit inputs to a unitary neuron _allows the network to both sum
       | the inputs as well as convert the sum to analog all within a
       | single layer ... [to] do it all before any [Ameo] activation
       | functions even come into play. "_ This is sin-1(tan)x in the
       | absence of asymptote.
        
       ___________________________________________________________________
       (page generated 2025-11-08 23:01 UTC)