[HN Gopher] Softmax Activation Function, Explained
___________________________________________________________________
Softmax Activation Function, Explained
Author : math_is_cool
Score : 77 points
Date : 2022-06-04 09:17 UTC (13 hours ago)
(HTM) web link (www.pinecone.io)
(TXT) w3m dump (www.pinecone.io)
| xiphias2 wrote:
| A simple summary of the article is tht raw outputs are relative
| log probabilities.
| kordlessagain wrote:
| If a tree sigmoid functions in the woods and nobody is around,
| does it still output a 1?
|
| (Relax! It's a log joke, people.)
| wmwmwm wrote:
| Having just implemented a softmax() function for an online ML
| course, I think the python implementation here suffers from
| overflow if any of the elements of z get big(ish) - e.g. e^10000
| is a big number! A spot of searching online suggests that
| subtracting max(z) from all entries in z makes it a lot more
| robust without changing the result e.g.
| https://www.tutorialexample.com/implement-softmax-function-w...
| sillysaurusx wrote:
| Correct (horse battery staple). This is how it's done in all
| production implementations.
| mhh__ wrote:
| Every time I see a softmax written down I think "I should be
| doing physics rather than gluing layers together..." (i.e. it
| looks like a partition function)
|
| https://physics.stackexchange.com/questions/669314/softmax-f...
___________________________________________________________________
(page generated 2022-06-04 23:01 UTC)