[HN Gopher] AI tools are spotting errors in research papers
___________________________________________________________________
AI tools are spotting errors in research papers
Author : kgwgk
Score : 355 points
Date : 2025-03-07 22:54 UTC (1 days ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| more_corn wrote:
| This is great to hear. I good use of AI if the false positives
| can be controlled.
| rokkamokka wrote:
| False positives don't seem overly harmful here either, since
| the main use would be bringing it to human attention for
| further thought
| wizzwizz4 wrote:
| Is there advantage over just reviewing papers with a critical
| eye?
| jeffbee wrote:
| If reviewer effort is limited and the model has at least a
| bias in the right direction.
| wizzwizz4 wrote:
| So I just need to make sure my fraud goes under the radar
| of these AI tools, and then the limited reviewer effort
| will be spent elsewhere.
| estarkio wrote:
| I think some people will find an advantage in flagging
| untold numbers of research papers as frivolous or
| fraudulent with minimal effort, while putting the burden of
| re-proving the work on everyone else.
|
| In other words, I fear this is a leap in Gish Gallop
| technology.
| Zigurd wrote:
| There's probably 10 X more problematic academic
| publications, than currently get flagged. Automating the
| search for the likeliest candidates is going to be very
| helpful by focusing the "critical eye" where it can make
| the biggest difference.
| epidemiology wrote:
| The largest problems with most publications (in epi and
| in my opinion at least) is study design. Unfortunately,
| faulty study design or things like data cleaning is
| qualitative, nuanced, and difficult to catch with AI
| unless it has access to the source data.
| AlienRobot wrote:
| Hopefully, one would use this to try to find errors in a
| massive number of papers, and then go through the effort of
| reviewing these papers themselves before bringing up the
| issue. It makes no sense to put effort unto others just
| because the AI said so.
| nyrikki wrote:
| Walking through their interface, it seems like when you click
| though on the relatively few that aren't just tiny
| spelling/formatting errors,
|
| Like this style:
|
| > Methodology check: The paper lacks a quantitative
| evaluation or comparison to ground truth data, relying on a
| purely qu...
|
| They always seem to be edited to be simple formatting errors.
|
| https://yesnoerror.com/doc/eb99aec0-a72a-45f7-bf2c-8cf2cbab1.
| ..
|
| If they can't improve that the signal to noise ratio will be
| to high and people will shut it off/ignore it.
|
| Time is not free, cost people lots of time without them
| seeing value and almost any project will fail.
| topaz0 wrote:
| This is such a bad idea. Skip the first section and read the
| "false positives" section.
| afarah1 wrote:
| I can see its usefulness as a screening tool, though I can also
| see downsides similar to what maintainers face with AI
| vulnerability reporting. It's an imperfect tool attempting to
| tackle a difficult and important problem. I suppose its value
| will be determined by how well it's used and how well it
| evolves.
| camdenreslink wrote:
| Aren't false positives acceptable in this situation? I'm
| assuming a human (paper author, journal editor, peer reviewer,
| etc) is reviewing the errors these tools are identifying. If
| there is a 10% false positive rate, then the only cost is the
| wasted time of whoever needs to identify it's a false positive.
|
| I guess this is a bad idea if these tools replace peer
| reviewers altogether, and papers get published if they can get
| past the error checker. But I haven't seen that proposed.
| xeonmc wrote:
| Let me tell you about this thing called Turnitin and how it
| was a purely advisory screening tool...
| csa wrote:
| > I'm assuming a human (paper author, journal editor, peer
| reviewer, etc) is reviewing the errors these tools are
| identifying.
|
| This made me laugh so hard that I was almost crying.
|
| For a specific journal, editor, or reviewer, _maybe_. For
| most journals, editors, or reviewers... I would bet money
| against it.
| karaterobot wrote:
| You'd win that bet. Most journal reviewers don't do more
| than check that data _exists_ as part of the peer review
| process--the equivalent of typing `ls` and looking at the
| directory metadata. They pretty much never run their own
| analyses to double check the paper. When I say "pretty
| much never", I mean that when I interviewed reviewers and
| asked them if they had ever done it, none of them said yes,
| and when I interviewed journal editors--from significant
| journals--only one of them said their policy was to even
| ask reviewers to do it, and that it was still optional. He
| said he couldn't remember if anyone had ever claimed to do
| it during his tenure. So yeah, if you get good odds on it,
| take that bet!
| RainyDayTmrw wrote:
| That screams "moral hazard"[1] to me. See also the incident
| with curl and AI confabulated bug reports[2].
|
| [1]: Maybe not in the strict original sense of the phrase.
| More like, an incentive to misbehave and cause downstream
| harm to others. [2]:
| https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-
| stands-f...
| nxobject wrote:
| > is reviewing the errors these tools are identifying.
|
| Unfortunately, no one has the incentives or the resources to
| do doubly triply thorough fine tooth combing: no reviewer or
| editor's getting paid; tenure-track researchers who need the
| service to the discipline check mark in their tenure
| portfolios also need to churn out research...
| topaz0 wrote:
| Note that the section with that heading also discusses
| several other negative features.
|
| The only false positive rate mentioned in the article is more
| like 30%, and the true positives in that sample were mostly
| trivial mistakes (as in, having no effect on the validity of
| the message) and that is in preprints that have not been peer
| reviewed, so one would expect that that false positive rate
| would be much worse after peer review (the true positives
| would decrease, false positives remain).
|
| And every indication both from the rhetoric of the people
| developing this and from recent history is that it would
| almost never be applied in good faith, and instead would
| empower ideologically motivated bad actors to claim that
| facts they disapprove of are inadequately supported. That
| kind of user does not care if the "errors" are false
| positives or trivial.
|
| Other comments have made good points about some of the other
| downsides.
| whatever1 wrote:
| Just consider it being a additional mean reviewer who most
| likely is wrong. There is still value in debunking their false
| claims.
| rs186 wrote:
| I don't see this a worse idea than AI code reviewer. If it
| spits out irrelevant advice and only gets 1 out of 10 points
| right, I consider it a win, since the cost is so low and many
| humans can't catch subtle issues in code.
| dartos wrote:
| You're missing the bit where humans can be held responsible
| and improve over time with specific feedback.
|
| AI models only improve through training and good luck
| convincing any given LLM provider to improve their models for
| your specific use case unless you have deep pockets...
| roywiggins wrote:
| And people's willingness to outsource their judgement to a
| computer. If a computer says it, for some people, it's the
| end of the matter.
| LasEspuelas wrote:
| Deploying this on already published work is probably a bad
| idea. But what is wrong with working with such tools on
| submission and review?
| aeturnum wrote:
| Being able to have a machine double check your work for
| problems that you fix or dismiss as false seems great? If the
| bad part is "AI knows best" - I agree with that! Properly
| deployed, this would be another tool in line with peer review
| that helps the scientific community judge the value of new
| work.
| zulban wrote:
| There's also a ton of false positives with spellcheck on
| scientific papers, but it's obviously a useful tool. Humans
| review the results.
| crazygringo wrote:
| This actually feels like an amazing step in the right direction.
|
| If AI can help spot obvious errors in published papers, it can do
| it as part of the review process. And if it can do it as part of
| the review process, authors can run it on their own work before
| submitting. It could massively raise the quality level of a lot
| of papers.
|
| What's important here is that it's part of a process involving
| experts themselves -- the authors, the peer reviewers. They can
| easily dismiss false positives, but especially get warnings about
| statistical mistakes or other aspects of the paper that aren't
| their primary area of expertise but can contain gotchas.
| yojo wrote:
| Relatedly: unethical researchers could run it on their own work
| before submitting. It could massively raise the plausibility of
| fraudulent papers.
|
| I hope your version of the world wins out. I'm still trying to
| figure out what a post-trust future looks like.
| rererereferred wrote:
| Eventually the unethical researchers will have to make actual
| research to make their papers pass. Mission fucking
| accomplished https://xkcd.com/810/
| brookst wrote:
| Both will happen. But the world has been post-trust for
| millennia.
| GuestFAUniverse wrote:
| Maybe raise the "accountability" part?
|
| Baffles me that somebody can be professor, director,
| whatever, meaning: taking the place of somebody _really_
| qualified and not get dragged through court after
| falsifying a publication until nothing is left of that
| betrayer.
|
| It's not only the damage to society due to false,
| misleading claims. If those publications decide who gets
| tenure, a research grant, etc. there are careers of others,
| that were massively damaged.
| StableAlkyne wrote:
| A retraction due to fraud already torches your career.
| It's a black mark that makes it harder to get funding,
| and it's one of the few reasons a university might revoke
| tenure. And you will be explaining it to every future
| employer in an interview.
|
| There generally aren't penalties beyond that in the West
| because - outside of libel - lying is usually protected
| as free speech
| shkkmo wrote:
| > unethical researchers could run it on their own work before
| submitting. It could massively raise the plausibility of
| fraudulent papers
|
| The real low hanging fruit that this helps with is detecting
| accidental errors and preventing researchers with legitimate
| intent from making mistakes.
|
| Research fraud and its detection is always going to be an
| adversarial process between those trying to commit it and
| those trying to detect it. Where I see tools like this making
| a difference against fraud is that it may also make fraud
| harder to plausibly pass off as errors if the fraudster gets
| caught. Since the tools can improve over time, I think this
| increases the risk that research fraud will be detected by
| tools that didn't exist when the fraud was perpetrated and
| which will ideally lead to consequences for the fraudster.
| This risk will hopefully dissuade some researchers from
| committing fraud.
| SubiculumCode wrote:
| I already ask AI it to be a harsh reviewer on a manuscript
| before submitting it. Sometimes blunders are there because of
| how close you are to the work. It hadn't occurred to me that
| bad "scientists" could use it to avoid detection
| SubiculumCode wrote:
| I would add that I've never gotten anything particularly
| insightful in return...but it has pointed out somethings
| that could be written more clearly, or where I forgot to
| cite a particular standardized measure, etc.
| 7speter wrote:
| Peer review will still involve human experts, though?
| rs186 wrote:
| Students and researchers send their own paper to plagiarism
| checker to look for "real" and unintended flags before
| actually submitting the papers, and make revisions
| accordingly. This is a known, standard practice that is
| widely accepted.
|
| And let's say someone modifies their faked lab results so
| that no AI can detect any evidence of photoshopping images.
| Their results get published. Well, nobody will be able to
| reproduce their work (unless other people also publish
| fraudulent work from there), and fellow researchers will
| raise questions, like, a lot of them. Also, guess what, even
| today, badly photoshopped results often don't get caught for
| a few years, and in hindsight that's just some low effort
| image manipulation -- copying a part of image and paste it
| elsewhere.
|
| I doubt any of this changes anything. There is a lot of
| competition in academia, and depending on the field, things
| may move very fast. Getting away with AI detection of
| fraudulent work likely doesn't give anyone enough advantage
| to survive in a competitive field.
| dccsillag wrote:
| I've never seen this done in a research setting. Not sure
| about how much of a standard practice it is.
| StableAlkyne wrote:
| It may be field specific, but I've also never heard of
| anyone running a manuscript through a plagiarism checker
| in chemistry.
| abirch wrote:
| You're right that this won't change the incentives for the
| dishonest researchers. Unfortunately there's not an
| equivalent of "short sellers" in research, people who are
| incentivized for finding fraud.
|
| AI is definitely a good thing (TM) for those honest
| researchers.
| owl_vision wrote:
| Unless documented and reproducible, it does not exist. This
| was the minimum guide when I worked with researchers.
|
| I plus 1 your doubt in the last paragraph.
| BurningFrog wrote:
| Not in academia, but what I hear is that very few results
| are ever attempted to be reproduced.
|
| So if you publish an unreproducible paper, you can probably
| have a full career without anyone noticing.
| jfengel wrote:
| Papers that can't be reproduced sound like they're not
| very useful, either.
|
| I know it's not as simple as that, and "useful" can
| simply mean "cited" (a sadly overrated metric). But
| surely it's easier to get hired if your work actually
| results in something somebody uses.
| dgfitz wrote:
| > Papers that can't be reproduced sound like they're not
| very useful, either.
|
| They're not useful at all. Reproduction of results isn't
| sexy, nobody does it. Almost feels like science is built
| on a web on funding trying to buy the desired results.
| qpiox wrote:
| Reproduction is rarely done because it is not "new
| science". Everyone is funding only "new science".
| jfengel wrote:
| Reproduction is boring, but it would often happen
| incidentally to building off someone else's results.
|
| You tell me that this reaction creates X, and I need X to
| make Y. If I can't make my Y, sooner or later it's going
| to occur to me that X is the cause.
|
| Like I said, I know it's never that easy. Bench work is
| hard and there are a million reasons why your idea
| failed, and you may not take the time to figure out why.
| You won't report such failures. And complicated results,
| like in sociology, are rarely attributable to anything.
| mike_hearn wrote:
| That's true for some kinds of research but a lot of
| academic output isn't as firm as "X creates Y".
|
| Replicability is overrated anyway. Loads of bad papers
| will replicate just fine if you try. They're still making
| false claims.
|
| https://blog.plan99.net/replication-studies-cant-fix-
| science...
| air7 wrote:
| I've had this idea that reproduction studies in one's C.V
| should become a sort of virtue signal, akin to
| philanthropy among the rich. This way, some percentage of
| one's work would need to be reproduction work or
| otherwise they would be looked down upon, and this would
| create the right incentive to do go.
| qpiox wrote:
| The reality is a bit different.
|
| The "better" journals are listed in JCR. Nearly 40% of
| them have impact factor less than 1, it means that on
| average papers in them are cited less than 1 times.
|
| Conclusion: even in better journals, the average paper is
| rarely cited at all, which means that definitely the
| public has rarely heard of it or found it useful.
| gopher_space wrote:
| Papers are reproducible in exactly the same way that
| github projects are buildable, and in both cases anything
| that comes fully assembled for you is already a product.
|
| If your academic research results in immediately useful
| output all of the people waiting for that to happen step
| in and you no longer worry about employment.
| azan_ wrote:
| >Their results get published. Well, nobody will be able to
| reproduce their work (unless other people also publish
| fraudulent work from there), and fellow researchers will
| raise questions, like, a lot of them.
|
| Sadly you seem to underestimate how widespread fraud is in
| academia and overestimate how big the punishment is. In the
| worst case when someone finds you are guilty of fraud, you
| will get slap on the wrist. In the usual case absolutely
| nothing will happen and you will be free to keep publishing
| fraud.
| matthewdgreen wrote:
| I don't actually believe that this is true if "academia"
| is defined as the set of reputable researchers from R1
| schools and similar. If you define Academia as "anyone
| anywhere in the world who submits research papers" then
| yes, _it has vast amounts of fraud_ in the same way that
| most email is spam.
|
| Within the reputable set, as someone convinced that fraud
| is out of control, have you ever tried to calculate the
| fraud rate as a percentage with numerator and denominator
| (either number of papers published or number of reputable
| researchers. I would be very interested and stunned if it
| was over .1% or even .01%.
| azan_ wrote:
| There is lots of evidence that p-hacking is widespread
| (some estimate that up to 20% are p-hacked). This problem
| also exists in top instutions, in fact in some fields it
| appears that this problem is WORSE in higher ranking unis
| - https://mitsloan.mit.edu/sites/default/files/inline-
| files/P-...
| cycomanic wrote:
| Where is that evidence? The paper you cite suggests that
| p hacking is done in experimental accounting studies but
| not archival.
|
| Generally speaking, evidence suggests that fraud rates
| are low ( lower than in most other human endeavours).
| This study cites 2% [1]. This is similar to numbers that
| Elizabeth Bik reports. For comparison self reported
| doping rates were between 6 and 9% here [2]
|
| [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC5723807/ [2]
| https://pmc.ncbi.nlm.nih.gov/articles/PMC11102888/
| mike_hearn wrote:
| The 2% figure isn't a study of the fraud rate, it's just
| a survey asking academics if they've committed fraud
| themselves. Ask them to estimate how many other academics
| commit fraud and they say more like 10%-15%.
| signatoremo wrote:
| So which figure is more accurate in your opinion?
| mike_hearn wrote:
| See my other reply to Matthew. It's very dependent on how
| you define fraud, which field you look at, which country
| you look at, and a few other things.
|
| Depending on what you choose for those variables it can
| range from a few percent up to 100%.
| mike_hearn wrote:
| There's an article that explores the metrics here:
|
| https://fantasticanachronism.com/2020/08/11/how-many-
| undetec...
|
| _> 0.04% of papers are retracted. At least 1.9% of
| papers have duplicate images "suggestive of deliberate
| manipulation". About 2.5% of scientists admit to fraud,
| and they estimate that 10% of other scientists have
| committed fraud. 27% of postdocs said they were willing
| to select or omit data to improve their results. More
| than 50% of published findings in psychology are false.
| The ORI, which makes about 13 misconduct findings per
| year, gives a conservative estimate of over 2000
| misconduct incidents per year._
|
| Although publishing untrue claims isn't the same thing as
| fraud, editors of well known journals like The Lancet or
| the New England Journal of Medicine have estimated that
| maybe half or more of the claims they publish are wrong.
| Statistical consistency detectors run over psych papers
| find that ~50% fail such checks (e.g. that computed means
| are possible given the input data). The authors don't
| care, when asked to share their data so the causes of the
| check failures can be explored they just refuse or ignore
| the request, even if they signed a document saying they'd
| share.
|
| You don't have these sorts of problems in cryptography
| but a lot of fields are rife with it, especially if you
| use a definition of fraud that includes pseudoscientific
| practices. The article goes into some of the issues and
| arguments with how to define and measure it.
| matthewdgreen wrote:
| 0.04% is an extremely small number and (it needs to be
| said) also includes papers retracted due to errors and
| other good-faith corrections. Remember that _we want
| people to retract flawed papers_! treating it as evidence
| of fraud is not only a mischaracterization of the result
| but also a choice that is bad for a society that wants
| quality scientific results.
|
| The other two metrics seem pretty weak. 1.9% of papers in
| a vast database containing 40 journals show signs of
| duplication. But then dig into the details: apparently a
| huge fraction of those are in one journal and in two
| specific years. Look at Figure 1 and it just screams
| "something very weird is going on here, let's look
| closely at this methodology before we accept the top line
| results."
|
| The final result is a meta-survey based on surveys done
| across scientists all over the world, including surveys
| that are written in other languages, presumably based on
| scientists also publishing in smaller local journals.
| Presumably this covers a vast range of scientists with
| different reputations. As I said before, if you cast a
| wide net that includes everyone doing science in the
| entire world, I bet you'll find tons of fraud. This study
| just seems to do that.
| mike_hearn wrote:
| The point about 0.04% is not that it's low, it's that it
| should be much higher. Getting even obviously fraudulent
| papers retracted is difficult and the image duplications
| are being found by unpaid volunteers, not via some
| comprehensive process so the numbers are lower bounds,
| not upper. You can find academic fraud in bulk with a
| tool as simple as grep and yet papers found that way are
| typically not retracted.
|
| Example, select the tortured phrases section of this
| database. It's literally nothing fancier than a big
| regex:
|
| https://dbrech.irit.fr/pls/apex/f?p=9999:24::::::
|
| Randomly chosen paper: https://link.springer.com/article/
| 10.1007/s11042-025-20660-1
|
| "A novel approach on heart disease prediction using
| optimized hybrid deep learning approach", published in
| Multimedia Tools and Applications.
|
| This paper has been run through a thesaurus spinner
| yielding garbage text like "To advance the expectation
| exactness of the anticipated heart malady location show"
| (heart disease -> heart malady). It also has nothing to
| do with the journal it's published in.
|
| Now you might object that the paper in question comes
| from India and not an R1 American university, which is
| how you're defining reputable. The journal itself does,
| though. It's edited by an academic in the Dept. of
| Computer Science and Engineering, Florida Atlantic
| University, which is an R1. It also has many dozens of
| people with the title of editor at other presumably
| reputable western universities like Brunel in the UK, the
| University of Salerno, etc:
|
| https://link.springer.com/journal/11042/editorial-board
|
| Clearly, none of the so-called editors of the journal can
| be reading what's submitted to it. Zombie journals run by
| well known publishers like Spring Nature are common. They
| auto-publish blatant spam yet always have a gazillion
| editors at well known universities. This stuff is so
| basic both generation and detection predate LLMs
| entirely, but it doesn't get fixed.
|
| Then you get into all the papers that aren't trivially
| fake but fake in advanced undetectable ways, or which are
| merely using questionable research practices... the true
| rate of retraction if standards were at the level laymen
| imagine would be orders of magnitude higher.
| refulgentis wrote:
| I agree and am disappointed to see you in gray text. I'm
| old enough to have seen too many pendulum swings from new
| truth to thought-terminating cliche, and am increasingly
| frustrated by a game of telephone, over years, leading to
| it being common wisdom that research fraud is done all
| the time and its shrugged off.
|
| There's some real irony in that, as we wouldn't have
| _gotten_ to this point a ton of self-policing over years
| where it was exposed with great consequence.
| Onawa wrote:
| It depends, independent organizations that track this
| stuff are able to call out unethical research and make
| sure there is more than a slap on the wrist. I also
| suspect that things may get better as the NIH has forced
| all research to be in electronic lab notebooks and
| published in open access journals.
| https://x.com/RetractionWatch
| throwaway4220 wrote:
| On bsky: https://bsky.app/profile/retractionwatch.com
| cycomanic wrote:
| Which researchers are using plagiarism detectors? I'm not
| aware that this is a known and widely accepted practice.
| They are used by students and teachers for student papers
| (in courses etc), but nobody i know would use them for
| submitting research. I also don't see the point for why
| even unethical researchers would use it, it wouldn't
| increase your acceptance chances dramatically.
| pinko wrote:
| Normally I'm an AI skeptic, but in this case there's a good
| analogy to post-quantum crypto: even if the current state of
| the art allows fraudulent researchers to evade detection by
| today's AI by using today's AI, their results, once
| published, will remain unchanged as the AI improves, and
| tomorrow's AI will catch them...
| tmpz22 wrote:
| I think it's not always a world scale problem as scientific
| niches tend to be small communities. The challenge is to
| get these small communities to police themselves.
|
| For the rarer world scale papers we can dedicate more
| resources to getting vetting them.
| atrettel wrote:
| Based on my own experience as a peer reviewer and
| scientist, the issue is not necessarily in detecting
| plagiarism or fraud. It is in getting editors to care after
| a paper is already published.
|
| During peer review, this could be great. It could stop a
| fraudulent paper before it causes any damage. But in my
| experience, I have never gotten a journal editor to retract
| an already-published paper that had obvious plagiarism in
| it ( _very_ obvious plagiarism in one case!). They have no
| incentive to do extra work after the fact with no obvious
| benefit to themselves. They choose to ignore it instead. I
| wish it wasn 't true, but that has been my experience.
| mike_hearn wrote:
| Doesn't matter. Lots of bad papers get caught the moment
| they're published and read by someone, but there's no
| followup. The institutions don't care if they publish auto-
| generated spam that can be detected on literally a single
| read through, they aren't going to deploy advanced AI on
| their archives of papers to create consequences a decade
| later:
|
| https://www.nature.com/articles/d41586-021-02134-0
| kkylin wrote:
| Every tool cuts both ways. This won't remove the need for
| people to be good, but hopefully reduces the scale of the
| problems to the point where good people (and better systems)
| can manage.
|
| FWIW while fraud gets headlines, unintentional errors and
| simply crappy writing are much more common and bigger
| problems I think. As reviewer and editor I often feel I'm the
| first one (counting the authors) to ever read the paper
| beginning to end: inconsistent notation & terminology,
| unnecessary repetitions, unexplained background material,
| etc.
| t_mann wrote:
| AI is fundamentally much more of a danger to the fraudsters.
| Because they can only calibrate their obfuscation to today's
| tools. But the publications are set in stone and can be
| analyzed by tomorrow's tools. There are already startups
| going through old papers with modern tools to detect
| manipulation [0].
|
| [0] https://imagetwin.ai/
| dsabanin wrote:
| Maybe at least in some cases these checkers will help them
| actually find and fix their mistakes and they will end up
| publishing something useful.
| callc wrote:
| Humans are already capable of "post-truth". This is enabled
| by instant global communication and social media (not
| dismissing the massive benefits these can bring), and led by
| dictators who want fealty over independent rational thinking.
|
| The limitations of slow news cycles and slow information
| transmission lends to slow careful thinking. Especially
| compared to social media.
|
| No AI needed.
| hunter2_ wrote:
| The communication enabled by the internet is incredible,
| but this aspect of it is so frustrating. The cat is out of
| the bag, and I struggle to identify a solution.
|
| The other day I saw a Facebook post of a national park
| announcing they'd be closed until further notice. Thousands
| of comments, 99% of which were divisive political banter
| assuming this was the result of a top-down order. A very
| easy-to-miss 1% of the comments were people explaining that
| the closure was due to a burst pipe or something to that
| effect. It's reminiscent of the "tragedy of the commons"
| concept. We are overusing our right to spew nonsense to the
| point that it's masking the truth.
|
| How do we fix this? Guiding people away from the writings
| of random nobodies in favor of mainstream authorities
| doesn't feel entirely proper.
| Salgat wrote:
| My hope is that ML can be used to point out real world things
| you can't fake or work around, such as why an idea is
| considered novel or why the methodology isn't just gaming
| results or why the statistics was done wrong.
| blueboo wrote:
| Just as plagiarism checkers harden the output of plagiarists.
|
| This goes back to a principle of safety engineering: the
| safer, reliable, trustworthy you make the system, the more
| catastrophic the failures when they happen.
| jstummbillig wrote:
| We are "upgrading" from making errors to committing fraud. I
| think that difference will still be important to most people.
| In addition I don't really see why an unethical, but not
| idiotic, researcher would assume, that the same tool that
| they could use to correct errors, would not allow others to
| check for and spot the fraud they are thinking of committing
| instead.
| kubb wrote:
| As always, it depends on the precision.
|
| If the LLM spots a mistake with 90% precision, it's pretty
| good. If it's a 10% precision, people still might take a look
| if they publish a paper once per year. If it's 1% - forget it.
| throwoutway wrote:
| There needs to be some careful human-in-the-loop analysis in
| general, and a feedback loop for false positives.
| Groxx wrote:
| I very much suspect this will fall into the same behaviors as
| AI-submitted bug reports in software.
|
| Obviously it's useful when desired, they can find real issues.
| But it's also absolutely riddled with unchecked "CVE 11 fix
| now!!!" spam that isn't even correct, exhausting maintainers.
| Some of those are legitimate accidents, but many are just
| karma-farming for some other purpose, to appear like a
| legitimate effort by throwing plausible-looking work onto other
| people.
| asdf6969 wrote:
| There's no such thing as an obvious error in most fields. What
| would the AI say to someone who claimed the earth orbited the
| sun 1000 years ago? I don't know how it could ever know the
| truth unless it starts collecting its own information. It could
| be useful for a field that operates from first principles like
| math but more likely is that it just blocks everyone from
| publishing things that go against the orthodoxy.
| TylerE wrote:
| This is exactly the kind of task we need to be using AI for -
| not content generation, but these sort of long running behind
| the scenes things that are difficult for humans, and where
| false positives have minimal cost.
| flenserboy wrote:
| So long as they don't build the models to rely on earlier
| papers, it might work. Fraudulent or mistaken earlier work,
| taken as correct, could easily lead to newer papers which
| disagree or don't use the older data as wrong/mistaken. This
| sort of checking needs to drill down as far as possible.
| epidemiology wrote:
| AI tools are hopefully going to eat lots of manual scientific
| research. This article looks at error spotting, but you follow
| the path of getting better and better at error spotting to it's
| conclusion and you essentially reproduce the work entirely from
| scratch. So in fact AI study generation is really where this is
| going.
|
| All my work could honestly be done instantaneously with better
| data harmonization & collection along with better engineering
| practices. Instead, it requires a lot of manual effort. I
| remember my professors talking about how they used to calculate
| linear regressions by hand back in the old days. Hopefully a lot
| of the data cleaning and study setup that is done now sounds
| similar to a set of future scientists who use AI tools to operate
| and check these basic programatic and statistical tasks.
| zozbot234 wrote:
| I really really hope it doesn't. The last thing I ever want is
| to be living in a world where all the scientific studies are
| written by hallucinating stochastic parrots.
| latexr wrote:
| https://archive.ph/20250307115346/https://www.nature.com/art...
| sega_sai wrote:
| As a researcher I say it is a good thing. Provided it gives a
| small number of errors that are easy to check, it is a no-
| brainer. I would say it is more valuable for authors though to
| spot obvious issues. I don't think it will drastically change the
| research, but is an improvement over a spell check or running
| grammarly.
| simonw wrote:
| "YesNoError is planning to let holders of its cryptocurrency
| dictate which papers get scrutinized first."
|
| Sigh.
| brookst wrote:
| Why sigh? This sounds like shareholders setting corporate
| direction.
| weebull wrote:
| Exactly. That's why sigh.
| jancsika wrote:
| Oh wow, you've got 10,000 HN points and you are asking why
| someone would sigh upon seeing that some technical tool has a
| close association with a cryptocurrency.
|
| Even people working reputable mom-and-pops retail jobs know
| the reputation of retail due to very real high-pressure sales
| techniques (esp. at car dealerships). Those techniques are
| undeniably "sigh-able," and reputable retail shops spend a
| lot of time and energy to distinguish themselves to their
| potential customers and distance themselves from that ick.
|
| Crypto also has an ick from its rich history of scams. I feel
| silly even explicitly writing that they have a history rich
| in scams because everyone on HN knows this.
|
| I could at least understand (though not agree) if you raised
| a question due to your knowledge of a _specific_
| cryptocurrency. But "Why sigh" for general crypto tie-in?
|
| I feel compelled to quote Tim and Eric: "Do you live in a
| hole, or boat?"
|
| Edit: clarification
| loufe wrote:
| Apart from the actual meat of the discussion, which is
| whether the GP's sigh is actually warranted, it's just
| frustrating to see everyone engage in such shallow
| expression. The one word comment could charitably be
| interpreted as thoughtful, in the sense that a lot of
| readers would take the time to understand their view-point,
| but I still think it should be discouraged as they could
| take some time to explain their thoughts more clearly.
| There shouldn't need to be a discussion on what they
| intended to convey.
|
| That said, your "you're _that_ experienced here and you
| didn 't understand _that_ " line really cheapens the
| quality of discourse here, too. It certainly doesn't live
| up to the HN guidelines
| (https://news.ycombinator.com/newsguidelines.html). You
| don't have to demean parent's question to deconstruct and
| disagree with it.
| multjoy wrote:
| You do when it is clearly nonsense.
|
| They're either entirely detached from reality in which
| case they deserve to be gently mocked, or they're
| trolling.
| jacobolus wrote:
| Let me quote Carl T. Bergstrom, evolutionary biologist
| and expert on research quality and misinformation:
|
| _" Is everyone huffing paint?"_
|
| _" Crypto guy claims to have built an LLM-based tool to
| detect errors in research papers; funded using its own
| cryptocurrency; will let coin holders choose what papers
| to go after; it's unvetted and a total black box--and
| Nature reports it as if it's a new protein structure."_
|
| https://bsky.app/profile/carlbergstrom.com/post/3ljsyoju3
| s22...
| cgriswald wrote:
| Other than "it's unvetted and a total black box", which
| is certainly a fair criticism, the rest of the quote
| seems to be an expression of emotion roughly equivalent
| to "sigh". We know Bergstrom doesn't like it, but the
| reasons are left as an exercise to the reader. If
| Bergstrom had posted that same post here, GP's comments
| about post quality would still largely apply.
| roywiggins wrote:
| yeah, but without all those pesky "securities laws" and so
| on.
| ForTheKidz wrote:
| Yes, exactly.
| delusional wrote:
| The nice thing about crypto plays is that you know they won't
| get anywhere so you can safely ignore them. Its all going to
| collapse soon enough.
| yosito wrote:
| While I don't doubt that AI tools can spot some errors that would
| be tedious for humans to look for, they are also responsible for
| far more errors. That's why proper understanding and application
| of AI is important.
| tomrod wrote:
| I know academics that use it to make sure their arguments are
| grounded, after a meaningful draft. This helps them in more
| clearly laying out their arguments, and IMO is no worse than the
| companies that used motivated graduate students for review the
| grammar and coherency of papers written by non-native language
| speakers.
| webdoodle wrote:
| The push for AI is about controlling the narrative. By giving AI
| the editorial review process, it can control the direction of
| science, media and policy. Effectively controlling the course of
| human evolution.
|
| On the other hand, I'm fully supportive of going through ALL of
| the rejected scientific papers to look for editorial bias,
| censorship, propaganda, etc.
| TZubiri wrote:
| It's fine, since it's not really just AI, it's the crypto
| hackers in charge of the AI.
|
| As it stands there's always a company (juristic person) behind
| AIs, I haven't yet seen an independent AI.
| nathan_compton wrote:
| One thing about this is that these kinds of power
| struggles/jostlings are part of every single thing humans do at
| almost all times. There is no silver bullet that will extricate
| human beings from the condition of being human, only constant
| vigilance against the ever changing landscape of who is
| manipulating and how.
| bookofjoe wrote:
| https://archive.ph/fqAig
| TZubiri wrote:
| Didn't this YesNoError thing start as a memecoin?
| huijzer wrote:
| I think improving incentives is the real problem in science.
| Tools aren't gonna fix it.
| gusgus01 wrote:
| I'm extremely skeptical for the value in this. I've already seen
| wasted hours responding to baseless claims that are lent credence
| by AI "reviews" of open source codebases. The claims would have
| happened before but these text generators know how to hallucinate
| in the correct verbiage to convince lay people and amateurs and
| are more annoying to deal with.
| RainyDayTmrw wrote:
| Perhaps our collective memories are too short? Did we forget what
| curl just went through with AI confabulated bug reports[1]?
|
| [1]: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-
| stands-f...
| InkCanon wrote:
| This sounds way, way out of how LLMs work. They can't count the
| R's in strarwberrrrrry, but they can cross reference multiple
| tables of data? Is there something else going on here?
| Groxx wrote:
| Accurately check: lol no chance at all, completely agreed.
|
| Detect deviations from common patterns, which are often pointed
| out via common patterns of review feedback on things, which
| might indicate a mistake: actually I think that fits moderately
| well.
|
| Are they accurate enough to use in bulk? .... given their
| accuracy with code bugs, I'm inclined to say "probably not",
| except by people already knowledgeable in the content. They can
| generally reject false positives without a lot of effort.
| lfsh wrote:
| I am using Jetbrain's AI to do code analysis (find errors).
|
| While it sometimes spot something I missed it also gives a lot of
| confident 'advise' that is just wrong or not useful.
|
| Current AI tools are still sophisticated search engines. They
| cannot reason or think.
|
| So while I think it could spot some errors in research papers I
| am still very sceptical that it is useful as trusted source.
| _tom_ wrote:
| This basically turns research papers as a whole into a big
| generative adversarial network.
| sfink wrote:
| Don't forget that this is driven by present-day AI. Which means
| people will assume that it's checking for fraud and incorrect
| logic, when actually it's checking for self-consistency and
| consistency with training data. So it should be great for typos,
| misleading phrasing, and cross-checking facts and diagrams, but I
| would expect it to do little for manufactured data, plausible but
| incorrect conclusions, and garden variety bullshit (claiming X
| because Y, when Y only implies X because you have a reasonable-
| sounding argument that it ought to).
|
| Not all of that is out of reach. Making the AI evaluate a paper
| in the context of a cluster of related papers might enable
| spotting some "too good to be true" things.
|
| Hey, here's an idea: use AI for mapping out the influence of
| papers that were later retracted (whether for fraud or error, it
| doesn't matter). Not just via citation, but have it try to
| identify the no longer supported conclusions from a retracted
| paper, and see where they show up in downstream papers. (Cheap
| "downstream" is when a paper or a paper in a family of papers by
| the same team ever cited the upstream paper, even in preprints.
| More expensive downstream is doing it without citations.)
| ForTheKidz wrote:
| > people will assume that it's checking for fraud and incorrect
| logic, when actually it's checking for self-consistency and
| consistency with training data.
|
| TBF, this also applies to all humans.
| Groxx wrote:
| There is a clear difference in _capability_ even though they
| share many failures
| nxobject wrote:
| To be fair, at least humans get to have collaborators from
| multiple perspectives and skillsets; a lot of the discussion
| about AI in research has assumed that a research team is one
| hive mind, when the best collaborations aren't.
| lucianbr wrote:
| No, no it does not. Are you actually claiming with a straight
| face that not a single human can check for fraud or incorrect
| logic?
|
| Let's just claim any absurd thing in defense of the AI hype
| now.
| ForTheKidz wrote:
| > Are you actually claiming with a straight face that not a
| single human can check for fraud or incorrect logic?
|
| No of course not, I was pointing out that we largely check
| "for self-consistency and consistency with training data"
| as well. Our checking of the coherency of other peoples
| work is presumably an extension of this.
|
| Regardless, computers _already_ check for fraud and
| incorrect logic as well, albeit in different contexts.
| Neither humans or computers can do this with general
| competency, i.e. without specific training to do so.
| timewizard wrote:
| They spent trillions of dollars to create a lame spell check.
| raincole wrote:
| If you can check for manufactured data, it means you know more
| about what the real data looks like than the author.
|
| If there were an AI that can check manufactured data, science
| would be a solved problem.
| BurningFrog wrote:
| In the not so far future we should have AIs that have read all
| the papers and other info in a field. They can then review any
| new paper as well as answering any questions in the field.
|
| This then becomes the first sanity check for any paper author.
|
| This should save a lot of time and effort, improve the quality of
| papers, and root out at least some fraud.
|
| Don't worry, many problems will remain :)
| surferbayarea wrote:
| Here are 2 examples from the Black Spatula project where we were
| able to detect major errors: - https://github.com/The-Black-
| Spatula-Project/black-spatula-p... - https://github.com/The-
| Black-Spatula-Project/black-spatula-p...
|
| Some things to note : this didn't even require a complex multi-
| agent pipeline. A single shot prompting was able to detect these
| errors.
| systemstops wrote:
| When people starting building tools like this to analyze media
| coverage of historic events, it will be a game changer.
| jongjong wrote:
| I expect that for truly innovative research, it might flag the
| innovative parts of the paper as a mistake if they're not fully
| elaborated upon... E.g. if the author assumed that the reader
| possesses certain niche knowledge.
|
| With software design, I find many mistakes in AI where it says
| things that are incorrect because it parrots common blanket
| statements and ideologies without actually checking if the
| statement applies in this case by looking at it from first
| principles... Once you take the discussion down to first
| principles, it quickly acknowledges its mistake but you had to
| have this deep insight in order to take it there... Some person
| who is trying to learn from AI would not get this insight from
| AI; instead they would be taught a dumbed-down, cartoonish,
| wordcel version of reality.
| delusional wrote:
| Reality check: yesnoerror, the only part of the article that
| actually seems to involve any published AI reviewer comments, is
| just checking arxiv papers. Their website claims that they
| "uncover errors, inconsistencies, and flawed methods that human
| reviewers missed." but arxiv is of course famously NOT a peer-
| reviewed journal. At best they are finding "errors,
| inconsistencies, and flawed methods" in papers that human
| reviewers haven't looked at.
|
| Let's then try and see if we can uncover any "errors,
| inconsistencies, and flawed methods" on their website. The
| "status" is pure madeup garbage. There's no network traffic
| related to it that would actually allow it to show a real status.
| The "RECENT ERROR DETECTIONS" lists a single paper from today,
| but looking at the queue when you click "submit a paper" lists
| the last completed paper as the 21st of February. The front page
| tells us that it found some math issue in a paper titled "Waste
| tea as absorbent for removal of heavy metal present in
| contaminated water" but if we navigate to that paper[1] the math
| error suddenly disappears. Most of the comments are also
| worthless, talking about minor typographical issues or
| misspellings that do not matter, but of course they still
| categorize that as an "error".
|
| It's the same garbage as every time with crypto people.
|
| [1]:
| https://yesnoerror.com/doc/82cd4ea5-4e33-48e1-b517-5ea3e2c5f...
| EigenLord wrote:
| The role of LLMs in research is an ongoing, well, research topic
| of interest of mine. I think it's fine so long as a 1. a pair of
| human eyes has validated any of the generated outputs and 2. The
| "ownership rule": the human researcher is prepared to defend and
| own anything the AI model does on their behalf, implying that
| they have digested and understood it as well as anything else
| they may have read or produced in the course of conducting their
| research. Rule #2 avoids this notion of crypto-plagiarism. If you
| prompted for a certain output, your thought in a manner of
| speaking was the cause of that output. If you agree with it, you
| should be able to use it. In this case, using AI to fact check is
| kind of ironic, considering their hallucination issues. However
| infallibility is the mark of omniscience; it's pretty
| unreasonable to expect these models to be flawless. They can
| still play a supplementary role to the review process, a second
| line of defense for peer-reviewers.
| robwwilliams wrote:
| Great start but definitely will require supervision by experts in
| the fields. I routinely use Claude 3.7 to flag errors in my
| submissions. Here is a prompt I used yesterday:
|
| "This is a paper we are planning to submit to Nature
| Neuroscience. Please generate a numbered list of significant
| errors with text tags I can use to find the errors and make
| corrections."
|
| It gave me a list of 12 errors of which Claude labeled three as
| "inconsistencies", "methods discrepancies". and "contradictions".
| When I requested that Claude reconsider it said "You are right, I
| apologize" in each of these three instances. Nonetheless it was
| still a big win for me and caught a lot of my dummheits.
|
| Claude 3.7 running in standard mode does not use its context
| window very effectively. I suppose I could have demanded that
| Claude "internally review (wait: think again)" for each serious
| error it initially thought it had encountered. I'll try that next
| time. Exposure of chain of thought would help.
| lifeisstillgood wrote:
| It's a nice idea, and I would love to be able to use it for my
| own company reports (spotting my obvious errors before sending
| them to my bosses boss)
|
| But the first thing I noticed was the two approaches highlighted
| - one a small scale approach that does not publish first but
| approaches the authors privately - and the other publishes first,
| does not have human review and has _its own cryptocurrency_
|
| I don't think anything quite speaks more about the current state
| of the world and the choices in our political space
| ysofunny wrote:
| top two links at this moment are:
|
| > AI tools are spotting errors in research papers: inside a
| growing movement (nature.com)
|
| and
|
| > Kill your Feeds - Stop letting algorithms dictate what you
| think (usher.dev)
|
| so we shouldn't let the feed algorithms influence our thoughs,
| but also, AI tools need to tell us when we're wrong
| TheRealPomax wrote:
| Oh look, an _actual_ use case for AI. Very nice.
| mac-mc wrote:
| Now they need to do it for their own outputs to spot their own
| hallucination errors.
| rosstex wrote:
| Why not just skip the human and have AI write, evaluate and
| submit the papers?
| gosub100 wrote:
| Why not skip the AI and remove "referees" that allow papers to
| be published that contain egregious errors?
| YeGoblynQueenne wrote:
| Needs more work.
|
| >> Right now, the YesNoError website contains many false
| positives, says Nick Brown, a researcher in scientific integrity
| at Linnaeus University. Among 40 papers flagged as having issues,
| he found 14 false positives (for example, the model stating that
| a figure referred to in the text did not appear in the paper,
| when it did). "The vast majority of the problems they're finding
| appear to be writing issues," and a lot of the detections are
| wrong, he says.
|
| >> Brown is wary that the effort will create a flood for the
| scientific community to clear up, as well fuss about minor errors
| such as typos, many of which should be spotted during peer review
| (both projects largely look at papers in preprint repositories).
| Unless the technology drastically improves, "this is going to
| generate huge amounts of work for no obvious benefit", says
| Brown. "It strikes me as extraordinarily naive."
___________________________________________________________________
(page generated 2025-03-08 23:00 UTC)