Post Alri7Fu312mdGGlzFY by CyberneticForests@assemblag.es
(DIR) More posts by CyberneticForests@assemblag.es
(DIR) Post #Alri7Bf0n0Na6ThvG4 by CyberneticForests@assemblag.es
2024-09-10T11:42:29Z
0 likes, 0 repeats
Here’s a paper that shows the complexities of evaluating AI research without questioning AI myths. In this case, the paper (link at the end) suggests a conclusion — and many are circulating this conclusion — that “LLMs are able to generate novel research ideas beyond the level of human experts in a similar field.” Dig into the experiment, and you see a different story.
(DIR) Post #Alri7CYfS4tmt5kN0a by CyberneticForests@assemblag.es
2024-09-10T11:42:30Z
0 likes, 0 repeats
1️⃣ The human experts who put forth ideas to compete with the LLM were PhD students or PostDocs with at least a single publication in NLP. Great! That’s an expert. They were paid to give away ideas to these researchers — not so great, if you’ve ever talked to anyone doing early career research.
(DIR) Post #Alri7DRGB6ZFcPHy6K by CyberneticForests@assemblag.es
2024-09-10T11:42:30Z
0 likes, 0 repeats
In the survey, these were ideas these PhDs thought were pretty good, but not worth persuing: they themselves evaluated their own contributions as pretty mediocre ideas right from the start. Why? Because generating high-quality research is hard enough, and protecting your most novel ideas in their early stages means a lot. Handing away high quality, distinct, novel research proposals is not really worth it to early career academics. They mostly made them up on the spot, according to the paper.
(DIR) Post #Alri7EDpGXPq320kng by CyberneticForests@assemblag.es
2024-09-10T11:42:30Z
0 likes, 0 repeats
2️⃣ The LLM created 4000 ideas. The results were evaluated by experts and narrowed down, meaning that the ideas generated by the LLM were already pre-selected for the criteria under which they competed. Of the 4000, only 200 showed enough variety from one another to be useful. Those 200 competed for novelty against themselves — so it’s good at generating novel ideas, if you strip away the 3800 that were not! (Of course these are different applications of the term novelty — but it still matters).
(DIR) Post #Alri7Ej1OYMxbmbfQO by CyberneticForests@assemblag.es
2024-09-10T11:42:30Z
0 likes, 0 repeats
So what you see in the “novel research” study are the ideas of PhD students that were prefiltered from their own biases (protecting their most valuable research ideas) against an expert-filtered collection of 4000 ideas.
(DIR) Post #Alri7FS2hANjrPfcbA by CyberneticForests@assemblag.es
2024-09-10T11:42:30Z
0 likes, 0 repeats
True Headline: “LLMs can create research proposals equivalent to a human expert’s throw-away ideas, but only if you hire other experts to whittle away 95% of what it produces.”
(DIR) Post #Alri7Fu312mdGGlzFY by CyberneticForests@assemblag.es
2024-09-10T11:42:31Z
0 likes, 0 repeats
This one fits into a “productivity myth”: anyone who suggests that the LLMs in this study saved anybody time or energy probably didn’t read the study.
(DIR) Post #Alri7GaEUCWlN6Vg0G by CyberneticForests@assemblag.es
2024-09-10T11:42:31Z
1 likes, 0 repeats
Here’s the link. To the credit of the researchers, they acknowledge all of this. A key finding of this research is that LLMs alone were quite poor at evaluating ideas. This is good research, and I don’t blame them for anything other than a technically accurate conclusion that the public is bound to misinterpret. It’s the hype around it that’s bad. ➡️https://arxiv.org/pdf/2409.04109