Post AVupSSqc7QsvlIS7Lk by jeremybmerrill@journa.host
(DIR) More posts by jeremybmerrill@journa.host
(DIR) Post #AVupSSqc7QsvlIS7Lk by jeremybmerrill@journa.host
2023-05-22T13:50:38Z
0 likes, 1 repeats
Why are LLMs trained by having human beings rate proposed answers (RLHF), instead of just with regular supervised learning?@yoavgo@twitter.com's answers: it lets responses be more diversely-worded and lets the model learn how to say it doesn't know.https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81