[HN Gopher] Softmax Activation Function, Explained
       ___________________________________________________________________
        
       Softmax Activation Function, Explained
        
       Author : math_is_cool
       Score  : 77 points
       Date   : 2022-06-04 09:17 UTC (13 hours ago)
        
 (HTM) web link (www.pinecone.io)
 (TXT) w3m dump (www.pinecone.io)
        
       | xiphias2 wrote:
       | A simple summary of the article is tht raw outputs are relative
       | log probabilities.
        
         | kordlessagain wrote:
         | If a tree sigmoid functions in the woods and nobody is around,
         | does it still output a 1?
         | 
         | (Relax! It's a log joke, people.)
        
       | wmwmwm wrote:
       | Having just implemented a softmax() function for an online ML
       | course, I think the python implementation here suffers from
       | overflow if any of the elements of z get big(ish) - e.g. e^10000
       | is a big number! A spot of searching online suggests that
       | subtracting max(z) from all entries in z makes it a lot more
       | robust without changing the result e.g.
       | https://www.tutorialexample.com/implement-softmax-function-w...
        
         | sillysaurusx wrote:
         | Correct (horse battery staple). This is how it's done in all
         | production implementations.
        
       | mhh__ wrote:
       | Every time I see a softmax written down I think "I should be
       | doing physics rather than gluing layers together..." (i.e. it
       | looks like a partition function)
       | 
       | https://physics.stackexchange.com/questions/669314/softmax-f...
        
       ___________________________________________________________________
       (page generated 2022-06-04 23:01 UTC)