[HN Gopher] Introduction to Thompson Sampling: The Bernoulli Ban...
       ___________________________________________________________________
        
       Introduction to Thompson Sampling: The Bernoulli Bandit (2017)
        
       Author : pncnmnp
       Score  : 26 points
       Date   : 2024-02-04 19:28 UTC (3 hours ago)
        
 (HTM) web link (gdmarmerola.github.io)
 (TXT) w3m dump (gdmarmerola.github.io)
        
       | rphln wrote:
       | My favorite resource on Thompson Sampling is <https://everyday-
       | data-science.tigyog.app/a-b-testing>.
       | 
       | After learning about it, I went on to replace the UCT formula in
       | MCTS with it and the results were... not much better, actually.
       | But it made me understand both a little better.
        
         | jarym wrote:
         | Love it! Thanks for sharing
        
         | vintermann wrote:
         | My favorite is this series from 2015 by Ian Osband:
         | 
         | https://iosband.github.io/2015/07/19/Efficient-experimentati...
        
       | eggie5 wrote:
       | if you have an NN that is probabilistic, how do you update the
       | prior after sampling from the posterior?
        
         | gwern wrote:
         | You take the action which you computed to be optimal under the
         | hypothetical of your posterior sample; this then yields a new
         | observation. You add that to the dataset, and train a new NN.
        
           | eggie5 wrote:
           | ah, so observe the reward and then take a gradient step
        
       | orasis wrote:
       | I built a contextual bandit combining XGBoost with Thompson
       | Sampling you can check out at https://improve.ai
        
       ___________________________________________________________________
       (page generated 2024-02-04 23:00 UTC)