[HN Gopher] Lessons from the trenches on reproducible evaluation...
       ___________________________________________________________________
        
       Lessons from the trenches on reproducible evaluation of language
       models
        
       Author : veryluckyxyz
       Score  : 25 points
       Date   : 2024-05-25 11:42 UTC (11 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | jerpint wrote:
       | One point they don't seem to spend much time on is also the
       | difficulty in reproducing outputs in closed-source models.
       | Setting temperature to 0 and setting seeds doesn't always seem to
       | be enough to get exactly the same results for a given prompt
        
       ___________________________________________________________________
       (page generated 2024-05-25 23:01 UTC)