[HN Gopher] Introduction to Thompson Sampling: The Bernoulli Ban...
___________________________________________________________________
Introduction to Thompson Sampling: The Bernoulli Bandit (2017)
Author : pncnmnp
Score : 26 points
Date : 2024-02-04 19:28 UTC (3 hours ago)
(HTM) web link (gdmarmerola.github.io)
(TXT) w3m dump (gdmarmerola.github.io)
| rphln wrote:
| My favorite resource on Thompson Sampling is <https://everyday-
| data-science.tigyog.app/a-b-testing>.
|
| After learning about it, I went on to replace the UCT formula in
| MCTS with it and the results were... not much better, actually.
| But it made me understand both a little better.
| jarym wrote:
| Love it! Thanks for sharing
| vintermann wrote:
| My favorite is this series from 2015 by Ian Osband:
|
| https://iosband.github.io/2015/07/19/Efficient-experimentati...
| eggie5 wrote:
| if you have an NN that is probabilistic, how do you update the
| prior after sampling from the posterior?
| gwern wrote:
| You take the action which you computed to be optimal under the
| hypothetical of your posterior sample; this then yields a new
| observation. You add that to the dataset, and train a new NN.
| eggie5 wrote:
| ah, so observe the reward and then take a gradient step
| orasis wrote:
| I built a contextual bandit combining XGBoost with Thompson
| Sampling you can check out at https://improve.ai
___________________________________________________________________
(page generated 2024-02-04 23:00 UTC)