hngopher.com

       [HN Gopher] Interactive Gradient Descent Demo
       ___________________________________________________________________
        
       Interactive Gradient Descent Demo
        
       Author : skzv
       Score  : 68 points
       Date   : 2021-11-15 17:55 UTC (1 days ago)
        
 (HTM) web link (blog.skz.dev)
 (TXT) w3m dump (blog.skz.dev)
        
       | raghavbali wrote:
       | Kudos. This is quite impressive @skzv
        
       | motohagiography wrote:
       | Plain language descriptions of algorithms and theorems are the
       | best writing of all. Thank you for this! The naive question I
       | have from it is, what makes gradient descent better than 1/e best
       | choice sampling (e.g. secretary problem)?
       | 
       | The "step size" parameter seems the same as given a sample size
       | with what appears to be a random walk. I lack the background, but
       | the example shows that you already know the function, so instead
       | of picking an arbitrary point and descent rate, you pick a number
       | of random samples, then apply the secretary problem to them
       | (https://en.wikipedia.org/wiki/Secretary_problem#1/e-law_of_b...)
       | 
       | Top down from an outsiders perspective, (think product manager or
       | customer for ML solution level) what advantage do I get from the
       | compute cost of gradient descent over random sampling?
        
         | omegalulw wrote:
         | > Plain language descriptions of algorithms and theorems are
         | the best writing of all
         | 
         | Be careful with this. Understanding "edge cases" for an
         | algorithm is just as important if not more than the core idea.
         | With "plain language descriptions" you tend to get the latter
         | but not the former which is _very_ dangerous.
        
         | clix11 wrote:
         | > what advantage do I get from the compute cost of gradient
         | descent over random sampling?
         | 
         | Random sampling becomes prohibitive in higher dimensions due to
         | the curse of dimensionality [0]. Gradient descent doesn't have
         | this problem and will always converge to a local (but, as can
         | be seen here, not necessarily an absolute) minimum.
         | 
         | The step size effectively controls how far from the "real"
         | local minimum you can get: too big a step size and you end up
         | repeatedly "jumping over" the minimum.
         | 
         | [0] - https://en.wikipedia.org/wiki/Curse_of_dimensionality
        
       | episode0x01 wrote:
       | Another cool gradient descent visualization tool:
       | https://distill.pub/2017/momentum/
        
       ___________________________________________________________________
       (page generated 2021-11-16 23:02 UTC)