[HN Gopher] Reinforcement Learning at Facebook
___________________________________________________________________
Reinforcement Learning at Facebook
Author : agbell
Score : 92 points
Date : 2021-02-01 15:39 UTC (7 hours ago)
(HTM) web link (corecursive.com)
(TXT) w3m dump (corecursive.com)
| rayuela wrote:
| Lol, almost thought the URL was COERCION.com!
| agbell wrote:
| Interviewer here. Happy to answer any questions or take any
| feedback about the episode.
|
| Jason Gauci joined facebook to try to solve some problems with
| the newsfeed using reinforcement learning. He originally got into
| ML via training bots to play capture the flag. What he ended up
| creating is open source [1].
|
| [1]: https://reagent.ai/
| Ozzie_osman wrote:
| This was a great read. It looks like the objective function
| (which seems to be some measure of "did we increase value" vs
| did the user tap the notification ) is really important here.
| Any idea how that was actually measured?
| agbell wrote:
| Thanks!
|
| My understanding is they are looking at page management
| activity and whether it increases above what they would
| expect if they didn't notify.
|
| Some of the details are covered in the paper [1]
|
| >The Markov Decision Process (MDP) is based on a sequence of
| notification candidates for a particular person. The actions
| here are sending and dropping the notification, and the state
| describes a set of features about the person and the
| notification candidate. There are rewards for interactions
| and activity on Facebook, with a penalty for sending the
| notification to control the volume of notifications sent. The
| policy optimizes for the long term value and is able to
| capture incremental effects of sending the notification by
| comparing the Q-values of the send and drop action
|
| > The training data spans multiple weeks to enable the RL
| model to capture page admins' responses and interactions to
| the notifications with their managed pages over a long term
| horizon. The accumulated discounted rewards collected in the
| training allow the model to identify page admins with
| longterm intent to stay active with the help of being
| notified.
|
| [1] https://arxiv.org/pdf/1811.00260.pdf
| mlthoughts2018 wrote:
| The "This is a reinforcement learning problem" section is very
| unconvincing. It presupposes that approaching the problem like
| a game or "capture the flag" is good or somehow better than
| supervised learning based on attributes that are known to
| correlate quite strongly to user preferences.
|
| Given the very widespread complaints about this type of
| recommender system - eg modern YouTube and FB Newsfeed rankings
| are widely panned as reinforcing biases, optimizing for pure
| engagement which leads to reinforcing outrage, both by ML
| experts and general users who don't like the experience & feel
| it is manipulative, what is your take on how we can steer the
| conversation in the other direction - that this is NOT a
| reinforcement learning problem, and that we shouldn't reward
| people looking to pad their ML resume with solutions in search
| of a problem that facilitate their ability to brag about scale
| & complexity over a solution that very demonstrably serves
| users poorly?
| agbell wrote:
| I know very little about ML, being just the host, but I think
| 'explore vs exploit' trade off that Jason mentions sounds
| like an improvement on pure 'exploit'. Explore is finding new
| interests and not just exploiting existing ones.
|
| I think you are correct to have concerns around optimizing
| for pure engagement though. These algos are giant optimizing
| machines. What should we be asking them to optimize? That
| seems like an important question that was only vaguely
| touched on in this discussion.
| mlthoughts2018 wrote:
| That's zero reason to reach for a bazooka like
| reinforcement learning though. You could use simple
| Thompson Sampling or many other multi-armed bandit methods.
|
| Balancing learning vs serving the optimal result is fine -
| many companies have approached ranking and recommendation
| that way for decades - but reinforcement learning would
| still not be justified unless you present some extremely
| compelling evidence.
| mlthoughts2018 wrote:
| Also to be clear - I think it's great to hear this
| perspective and putting together the interview is well
| done. I'm not trying to criticize you nor the merit of
| talking about this topic - it is well worth it.
|
| I was just asking, since you develop these kinds of
| interviews, what do you think would be a good way to get
| the other side of the story and talk to big tech
| practitioners who do not agree with the leap to
| reinforcement learning?
| agbell wrote:
| Isn't multi-armed bandit a simple reinforcement learning
| algo? It is used in reagent's introductory tutorial:
| https://reagent.ai/rasp_tutorial.html
|
| Replying here to sibling: Thank you for the feedback. I
| think it is fair to say this interview does not explore
| in depth the issues that these techniques can cause and
| it certainly only presents one side. I think the
| recommendation to get more than one perspective is a good
| one.
|
| Let me know if there is anyone specific you recommend I
| talk to.
| mlthoughts2018 wrote:
| That's a lot of semantic hairsplitting. They are both
| "reinforcement learning" in the same way a Honda Civic
| and an aircraft carrier are both "vehicles."
| srean wrote:
| > Isn't multi-armed bandit a simple reinforcement
| learning algo
|
| It is indeed.
|
| Its a restricted form of it. In RL one can allow a state
| change after an action which in turn can make the same
| 'arms'|'actions' behave differently because their
| behavior is tied to the state. The state one lands in can
| also exercise control over which state you end up next.
| Its for this reason some extra book keeping is necessary
| for full-fledged RL. But you are absolutely right that
| bandits are considered a simplified version of RL. By
| controlling the size of the state space one can control
| how bandit like the solution will behave.
|
| There is also something called a contextual bandit that
| sits between pure bandits and RL. CBs do not have state
| change, but they do have access to a side information
| that can affect the 'arms'. In RL one needs to think not
| only about the reward but also about the possibility of
| ending up in a 'dead-end| hard-to-recover-from' state
| because the immediate reward was high. CBs do not have
| such 'traps' but have more modeling power because the
| reward of an arm can depend on this side-information,
| usually called context.
|
| The heat that you are getting from some comments is
| unwarranted.
|
| EDIT: Holy mother of monkey milk you have a ton of super
| interesting interviews ! Glad I ran into your content.
| Better late than never.
| PartiallyTyped wrote:
| I'd say that in CBs the action does not affect the
| distribution of future states.
| agbell wrote:
| Bandits being reinforcement makes sense! Thank you, if
| you are looking for a recommendation "Software That
| Doesn't Suck" is a personal fav:
| https://corecursive.com/software-that-doesnt-suck-with-
| jim-b...
| srean wrote:
| You had me at Brian Kernighan's interview. I don't think
| I have met a more modest man.
|
| Once upon a time I had my open cube just behind his open
| cube. I had no idea who he was and his modesty certainly
| did not make it any easier. Once he had got locked out of
| the floor and I had to let him in. Its only after that I
| came to notice his name tag
| glutamate wrote:
| Is reinforcement learning being used to stop the newsfeed
| promoting genocide?
| https://www.nytimes.com/2018/10/15/technology/myanmar-facebo...
| Jugurtha wrote:
| How is Facebook doing machine learning? I know they have their
| internal platform (FBLearner Flow, "equivalent" to Uber's
| Michelangelo), but I have talked with people who have worked
| there and they didn't use it.
|
| I spoke with them to test our own machine learning platform
| (https://iko.ai). The workflow they described was really odd.
| SSHing into boxes to use a cluster, etc. Which is what we have
| been doing as a tiny, immature company a few years ago until it
| became so frustrating that we built our platform. I'm talking
| about a tiny team, so I'm wondering how they get away with it,
| or is it that the people I talked with did simply not adopt it.
|
| Someone went as far as saying that "experiment tracking" was "I
| told my manager which hyperparameters worked best".
| anon_tor_12345 wrote:
| >they didn't use it ... The workflow they described was
| really odd. SSHing into boxes to use a cluster, etc.
|
| no clue what you're talking about - most everyone on a
| product team uses fblearner (the platform you're alluding to)
| which is a job queue type tool i.e. submit fblearner jobs and
| watch them run along with metrics tracking.
|
| >Someone went as far as saying that "experiment tracking" was
| "I told my manager which hyperparameters worked best"
|
| hyperparameters are rarely fiddled with because of how much
| data there is to train on but like i said fblearner has
| plenty of views to help with "experiment tracking" when it
| comes to hpo.
| Jugurtha wrote:
| This is why I found it odd. I wondered why they didn't use
| FBLearner Flow and figured that not all teams were using it
| even though they did machine learning.
|
| We like these conversations where people share problems
| they may be having in order to get a bigger picture. We
| built our ML platform to solve our own problems that we
| faced over the years, but it's always nice to be exposed
| with problems we have not seen before to solve a _slightly_
| more general problem.
| anon_tor_12345 wrote:
| >I wondered why they didn't use FBLearner Flow
|
| were they in FAIR? conceivably FAIR might need more
| flexibility (because they're trying to "innovate") and so
| they fall back on lower level tools. but i know people at
| FAIR and they too use fblearner. regardless FAIR (or
| whatever other org you spoke to) is very small relative
| to the total number of people using/doing ML at FB so
| extrapolating from their needs is unwise (if you're
| trying to build a business around some typical process).
| Jugurtha wrote:
| It makes sense. As I said, I'm building for our own needs
| as we help organizations with machine learning and we
| needed to deliver faster, _but_ I appreciate talking with
| people in the field to cluster families of problems and
| see a slightly bigger picture, and I generally like
| talking with this kind of people. They remind me of my
| colleagues, and I really like my colleagues.
| snicksnak wrote:
| What would be the actual goal RL should aim for when being
| applied to the newsfeed? I understood RL for amazon aims for
| suggesting you things you're looking for or you're likely to
| buy. One might think this correlates directly with minimizing
| the time spent on browsing amazon. For the newsfeed it probably
| would be the complete opposite right, so maximizing the scroll
| of doom?
|
| Also, from the transcript:
|
| > when you go to Facebook, [...], you see all these posts from
| your friends
|
| Maybe I'm an outlier but my newsfeed probably contains ~5% of
| posts related to my friends, birthday wishes included. I use
| facebook primarily as a news aggregator nowadays.
| agbell wrote:
| Thanks for reading or listening to the episode!
|
| I think that is the hardest question, what to optimize for.
| Jason mentions that facebook employs social scientists who
| help set what the value is they are optimizing for.
|
| > I don't work on the social science part of it. We try to
| optimize and we do it on good faith that the goals we're
| optimizing for are good faith goals. But I've been in enough
| of the meetings to see the intent is really a good intent.
| It's just a thing that's very difficult to quantify.
|
| > But I do think that the intent is to provide that value.
| And I do think that they would trade some of the margin for
| the value in a heartbeat.
| dbtc wrote:
| Not profit? I thought it was for profit.
| alexbeloi wrote:
| Ads optimizes for profit, all other content is broadly
| optimized for _meaningful social interaction_ and against
| _problematic content_.
|
| https://www.facebook.com/business/news/news-feed-fyi-
| bringin...
|
| https://about.fb.com/news/2019/04/remove-reduce-inform-
| new-s...
| buitreVirtual wrote:
| > I use facebook primarily as a news aggregator nowadays.
|
| Isn't this how people end up trapped in alternative-facts
| bubbles?
| goguy wrote:
| Depends which news he's taking about.
|
| Sports news is usually pretty factual.
___________________________________________________________________
(page generated 2021-02-01 23:02 UTC)