[HN Gopher] Lessons from the trenches on reproducible evaluation...
___________________________________________________________________
Lessons from the trenches on reproducible evaluation of language
models
Author : veryluckyxyz
Score : 25 points
Date : 2024-05-25 11:42 UTC (11 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| jerpint wrote:
| One point they don't seem to spend much time on is also the
| difficulty in reproducing outputs in closed-source models.
| Setting temperature to 0 and setting seeds doesn't always seem to
| be enough to get exactly the same results for a given prompt
___________________________________________________________________
(page generated 2024-05-25 23:01 UTC)