Post AVupSSqc7QsvlIS7Lk by jeremybmerrill@journa.host
 (DIR) More posts by jeremybmerrill@journa.host
 (DIR) Post #AVupSSqc7QsvlIS7Lk by jeremybmerrill@journa.host
       2023-05-22T13:50:38Z
       
       0 likes, 1 repeats
       
       Why are LLMs trained by having human beings rate proposed answers (RLHF), instead of just with regular supervised learning?@yoavgo@twitter.com's answers: it lets responses be more diversely-worded and lets the model learn how to say it doesn't know.https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81