[HN Gopher] AI tools are spotting errors in research papers
       ___________________________________________________________________
        
       AI tools are spotting errors in research papers
        
       Author : kgwgk
       Score  : 355 points
       Date   : 2025-03-07 22:54 UTC (1 days ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | more_corn wrote:
       | This is great to hear. I good use of AI if the false positives
       | can be controlled.
        
         | rokkamokka wrote:
         | False positives don't seem overly harmful here either, since
         | the main use would be bringing it to human attention for
         | further thought
        
           | wizzwizz4 wrote:
           | Is there advantage over just reviewing papers with a critical
           | eye?
        
             | jeffbee wrote:
             | If reviewer effort is limited and the model has at least a
             | bias in the right direction.
        
               | wizzwizz4 wrote:
               | So I just need to make sure my fraud goes under the radar
               | of these AI tools, and then the limited reviewer effort
               | will be spent elsewhere.
        
             | estarkio wrote:
             | I think some people will find an advantage in flagging
             | untold numbers of research papers as frivolous or
             | fraudulent with minimal effort, while putting the burden of
             | re-proving the work on everyone else.
             | 
             | In other words, I fear this is a leap in Gish Gallop
             | technology.
        
             | Zigurd wrote:
             | There's probably 10 X more problematic academic
             | publications, than currently get flagged. Automating the
             | search for the likeliest candidates is going to be very
             | helpful by focusing the "critical eye" where it can make
             | the biggest difference.
        
               | epidemiology wrote:
               | The largest problems with most publications (in epi and
               | in my opinion at least) is study design. Unfortunately,
               | faulty study design or things like data cleaning is
               | qualitative, nuanced, and difficult to catch with AI
               | unless it has access to the source data.
        
             | AlienRobot wrote:
             | Hopefully, one would use this to try to find errors in a
             | massive number of papers, and then go through the effort of
             | reviewing these papers themselves before bringing up the
             | issue. It makes no sense to put effort unto others just
             | because the AI said so.
        
           | nyrikki wrote:
           | Walking through their interface, it seems like when you click
           | though on the relatively few that aren't just tiny
           | spelling/formatting errors,
           | 
           | Like this style:
           | 
           | > Methodology check: The paper lacks a quantitative
           | evaluation or comparison to ground truth data, relying on a
           | purely qu...
           | 
           | They always seem to be edited to be simple formatting errors.
           | 
           | https://yesnoerror.com/doc/eb99aec0-a72a-45f7-bf2c-8cf2cbab1.
           | ..
           | 
           | If they can't improve that the signal to noise ratio will be
           | to high and people will shut it off/ignore it.
           | 
           | Time is not free, cost people lots of time without them
           | seeing value and almost any project will fail.
        
       | topaz0 wrote:
       | This is such a bad idea. Skip the first section and read the
       | "false positives" section.
        
         | afarah1 wrote:
         | I can see its usefulness as a screening tool, though I can also
         | see downsides similar to what maintainers face with AI
         | vulnerability reporting. It's an imperfect tool attempting to
         | tackle a difficult and important problem. I suppose its value
         | will be determined by how well it's used and how well it
         | evolves.
        
         | camdenreslink wrote:
         | Aren't false positives acceptable in this situation? I'm
         | assuming a human (paper author, journal editor, peer reviewer,
         | etc) is reviewing the errors these tools are identifying. If
         | there is a 10% false positive rate, then the only cost is the
         | wasted time of whoever needs to identify it's a false positive.
         | 
         | I guess this is a bad idea if these tools replace peer
         | reviewers altogether, and papers get published if they can get
         | past the error checker. But I haven't seen that proposed.
        
           | xeonmc wrote:
           | Let me tell you about this thing called Turnitin and how it
           | was a purely advisory screening tool...
        
           | csa wrote:
           | > I'm assuming a human (paper author, journal editor, peer
           | reviewer, etc) is reviewing the errors these tools are
           | identifying.
           | 
           | This made me laugh so hard that I was almost crying.
           | 
           | For a specific journal, editor, or reviewer, _maybe_. For
           | most journals, editors, or reviewers... I would bet money
           | against it.
        
             | karaterobot wrote:
             | You'd win that bet. Most journal reviewers don't do more
             | than check that data _exists_ as part of the peer review
             | process--the equivalent of typing `ls` and looking at the
             | directory metadata. They pretty much never run their own
             | analyses to double check the paper. When I say  "pretty
             | much never", I mean that when I interviewed reviewers and
             | asked them if they had ever done it, none of them said yes,
             | and when I interviewed journal editors--from significant
             | journals--only one of them said their policy was to even
             | ask reviewers to do it, and that it was still optional. He
             | said he couldn't remember if anyone had ever claimed to do
             | it during his tenure. So yeah, if you get good odds on it,
             | take that bet!
        
           | RainyDayTmrw wrote:
           | That screams "moral hazard"[1] to me. See also the incident
           | with curl and AI confabulated bug reports[2].
           | 
           | [1]: Maybe not in the strict original sense of the phrase.
           | More like, an incentive to misbehave and cause downstream
           | harm to others. [2]:
           | https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-
           | stands-f...
        
           | nxobject wrote:
           | > is reviewing the errors these tools are identifying.
           | 
           | Unfortunately, no one has the incentives or the resources to
           | do doubly triply thorough fine tooth combing: no reviewer or
           | editor's getting paid; tenure-track researchers who need the
           | service to the discipline check mark in their tenure
           | portfolios also need to churn out research...
        
           | topaz0 wrote:
           | Note that the section with that heading also discusses
           | several other negative features.
           | 
           | The only false positive rate mentioned in the article is more
           | like 30%, and the true positives in that sample were mostly
           | trivial mistakes (as in, having no effect on the validity of
           | the message) and that is in preprints that have not been peer
           | reviewed, so one would expect that that false positive rate
           | would be much worse after peer review (the true positives
           | would decrease, false positives remain).
           | 
           | And every indication both from the rhetoric of the people
           | developing this and from recent history is that it would
           | almost never be applied in good faith, and instead would
           | empower ideologically motivated bad actors to claim that
           | facts they disapprove of are inadequately supported. That
           | kind of user does not care if the "errors" are false
           | positives or trivial.
           | 
           | Other comments have made good points about some of the other
           | downsides.
        
         | whatever1 wrote:
         | Just consider it being a additional mean reviewer who most
         | likely is wrong. There is still value in debunking their false
         | claims.
        
         | rs186 wrote:
         | I don't see this a worse idea than AI code reviewer. If it
         | spits out irrelevant advice and only gets 1 out of 10 points
         | right, I consider it a win, since the cost is so low and many
         | humans can't catch subtle issues in code.
        
           | dartos wrote:
           | You're missing the bit where humans can be held responsible
           | and improve over time with specific feedback.
           | 
           | AI models only improve through training and good luck
           | convincing any given LLM provider to improve their models for
           | your specific use case unless you have deep pockets...
        
             | roywiggins wrote:
             | And people's willingness to outsource their judgement to a
             | computer. If a computer says it, for some people, it's the
             | end of the matter.
        
         | LasEspuelas wrote:
         | Deploying this on already published work is probably a bad
         | idea. But what is wrong with working with such tools on
         | submission and review?
        
         | aeturnum wrote:
         | Being able to have a machine double check your work for
         | problems that you fix or dismiss as false seems great? If the
         | bad part is "AI knows best" - I agree with that! Properly
         | deployed, this would be another tool in line with peer review
         | that helps the scientific community judge the value of new
         | work.
        
         | zulban wrote:
         | There's also a ton of false positives with spellcheck on
         | scientific papers, but it's obviously a useful tool. Humans
         | review the results.
        
       | crazygringo wrote:
       | This actually feels like an amazing step in the right direction.
       | 
       | If AI can help spot obvious errors in published papers, it can do
       | it as part of the review process. And if it can do it as part of
       | the review process, authors can run it on their own work before
       | submitting. It could massively raise the quality level of a lot
       | of papers.
       | 
       | What's important here is that it's part of a process involving
       | experts themselves -- the authors, the peer reviewers. They can
       | easily dismiss false positives, but especially get warnings about
       | statistical mistakes or other aspects of the paper that aren't
       | their primary area of expertise but can contain gotchas.
        
         | yojo wrote:
         | Relatedly: unethical researchers could run it on their own work
         | before submitting. It could massively raise the plausibility of
         | fraudulent papers.
         | 
         | I hope your version of the world wins out. I'm still trying to
         | figure out what a post-trust future looks like.
        
           | rererereferred wrote:
           | Eventually the unethical researchers will have to make actual
           | research to make their papers pass. Mission fucking
           | accomplished https://xkcd.com/810/
        
           | brookst wrote:
           | Both will happen. But the world has been post-trust for
           | millennia.
        
             | GuestFAUniverse wrote:
             | Maybe raise the "accountability" part?
             | 
             | Baffles me that somebody can be professor, director,
             | whatever, meaning: taking the place of somebody _really_
             | qualified and not get dragged through court after
             | falsifying a publication until nothing is left of that
             | betrayer.
             | 
             | It's not only the damage to society due to false,
             | misleading claims. If those publications decide who gets
             | tenure, a research grant, etc. there are careers of others,
             | that were massively damaged.
        
               | StableAlkyne wrote:
               | A retraction due to fraud already torches your career.
               | It's a black mark that makes it harder to get funding,
               | and it's one of the few reasons a university might revoke
               | tenure. And you will be explaining it to every future
               | employer in an interview.
               | 
               | There generally aren't penalties beyond that in the West
               | because - outside of libel - lying is usually protected
               | as free speech
        
           | shkkmo wrote:
           | > unethical researchers could run it on their own work before
           | submitting. It could massively raise the plausibility of
           | fraudulent papers
           | 
           | The real low hanging fruit that this helps with is detecting
           | accidental errors and preventing researchers with legitimate
           | intent from making mistakes.
           | 
           | Research fraud and its detection is always going to be an
           | adversarial process between those trying to commit it and
           | those trying to detect it. Where I see tools like this making
           | a difference against fraud is that it may also make fraud
           | harder to plausibly pass off as errors if the fraudster gets
           | caught. Since the tools can improve over time, I think this
           | increases the risk that research fraud will be detected by
           | tools that didn't exist when the fraud was perpetrated and
           | which will ideally lead to consequences for the fraudster.
           | This risk will hopefully dissuade some researchers from
           | committing fraud.
        
           | SubiculumCode wrote:
           | I already ask AI it to be a harsh reviewer on a manuscript
           | before submitting it. Sometimes blunders are there because of
           | how close you are to the work. It hadn't occurred to me that
           | bad "scientists" could use it to avoid detection
        
             | SubiculumCode wrote:
             | I would add that I've never gotten anything particularly
             | insightful in return...but it has pointed out somethings
             | that could be written more clearly, or where I forgot to
             | cite a particular standardized measure, etc.
        
           | 7speter wrote:
           | Peer review will still involve human experts, though?
        
           | rs186 wrote:
           | Students and researchers send their own paper to plagiarism
           | checker to look for "real" and unintended flags before
           | actually submitting the papers, and make revisions
           | accordingly. This is a known, standard practice that is
           | widely accepted.
           | 
           | And let's say someone modifies their faked lab results so
           | that no AI can detect any evidence of photoshopping images.
           | Their results get published. Well, nobody will be able to
           | reproduce their work (unless other people also publish
           | fraudulent work from there), and fellow researchers will
           | raise questions, like, a lot of them. Also, guess what, even
           | today, badly photoshopped results often don't get caught for
           | a few years, and in hindsight that's just some low effort
           | image manipulation -- copying a part of image and paste it
           | elsewhere.
           | 
           | I doubt any of this changes anything. There is a lot of
           | competition in academia, and depending on the field, things
           | may move very fast. Getting away with AI detection of
           | fraudulent work likely doesn't give anyone enough advantage
           | to survive in a competitive field.
        
             | dccsillag wrote:
             | I've never seen this done in a research setting. Not sure
             | about how much of a standard practice it is.
        
               | StableAlkyne wrote:
               | It may be field specific, but I've also never heard of
               | anyone running a manuscript through a plagiarism checker
               | in chemistry.
        
             | abirch wrote:
             | You're right that this won't change the incentives for the
             | dishonest researchers. Unfortunately there's not an
             | equivalent of "short sellers" in research, people who are
             | incentivized for finding fraud.
             | 
             | AI is definitely a good thing (TM) for those honest
             | researchers.
        
             | owl_vision wrote:
             | Unless documented and reproducible, it does not exist. This
             | was the minimum guide when I worked with researchers.
             | 
             | I plus 1 your doubt in the last paragraph.
        
             | BurningFrog wrote:
             | Not in academia, but what I hear is that very few results
             | are ever attempted to be reproduced.
             | 
             | So if you publish an unreproducible paper, you can probably
             | have a full career without anyone noticing.
        
               | jfengel wrote:
               | Papers that can't be reproduced sound like they're not
               | very useful, either.
               | 
               | I know it's not as simple as that, and "useful" can
               | simply mean "cited" (a sadly overrated metric). But
               | surely it's easier to get hired if your work actually
               | results in something somebody uses.
        
               | dgfitz wrote:
               | > Papers that can't be reproduced sound like they're not
               | very useful, either.
               | 
               | They're not useful at all. Reproduction of results isn't
               | sexy, nobody does it. Almost feels like science is built
               | on a web on funding trying to buy the desired results.
        
               | qpiox wrote:
               | Reproduction is rarely done because it is not "new
               | science". Everyone is funding only "new science".
        
               | jfengel wrote:
               | Reproduction is boring, but it would often happen
               | incidentally to building off someone else's results.
               | 
               | You tell me that this reaction creates X, and I need X to
               | make Y. If I can't make my Y, sooner or later it's going
               | to occur to me that X is the cause.
               | 
               | Like I said, I know it's never that easy. Bench work is
               | hard and there are a million reasons why your idea
               | failed, and you may not take the time to figure out why.
               | You won't report such failures. And complicated results,
               | like in sociology, are rarely attributable to anything.
        
               | mike_hearn wrote:
               | That's true for some kinds of research but a lot of
               | academic output isn't as firm as "X creates Y".
               | 
               | Replicability is overrated anyway. Loads of bad papers
               | will replicate just fine if you try. They're still making
               | false claims.
               | 
               | https://blog.plan99.net/replication-studies-cant-fix-
               | science...
        
               | air7 wrote:
               | I've had this idea that reproduction studies in one's C.V
               | should become a sort of virtue signal, akin to
               | philanthropy among the rich. This way, some percentage of
               | one's work would need to be reproduction work or
               | otherwise they would be looked down upon, and this would
               | create the right incentive to do go.
        
               | qpiox wrote:
               | The reality is a bit different.
               | 
               | The "better" journals are listed in JCR. Nearly 40% of
               | them have impact factor less than 1, it means that on
               | average papers in them are cited less than 1 times.
               | 
               | Conclusion: even in better journals, the average paper is
               | rarely cited at all, which means that definitely the
               | public has rarely heard of it or found it useful.
        
               | gopher_space wrote:
               | Papers are reproducible in exactly the same way that
               | github projects are buildable, and in both cases anything
               | that comes fully assembled for you is already a product.
               | 
               | If your academic research results in immediately useful
               | output all of the people waiting for that to happen step
               | in and you no longer worry about employment.
        
             | azan_ wrote:
             | >Their results get published. Well, nobody will be able to
             | reproduce their work (unless other people also publish
             | fraudulent work from there), and fellow researchers will
             | raise questions, like, a lot of them.
             | 
             | Sadly you seem to underestimate how widespread fraud is in
             | academia and overestimate how big the punishment is. In the
             | worst case when someone finds you are guilty of fraud, you
             | will get slap on the wrist. In the usual case absolutely
             | nothing will happen and you will be free to keep publishing
             | fraud.
        
               | matthewdgreen wrote:
               | I don't actually believe that this is true if "academia"
               | is defined as the set of reputable researchers from R1
               | schools and similar. If you define Academia as "anyone
               | anywhere in the world who submits research papers" then
               | yes, _it has vast amounts of fraud_ in the same way that
               | most email is spam.
               | 
               | Within the reputable set, as someone convinced that fraud
               | is out of control, have you ever tried to calculate the
               | fraud rate as a percentage with numerator and denominator
               | (either number of papers published or number of reputable
               | researchers. I would be very interested and stunned if it
               | was over .1% or even .01%.
        
               | azan_ wrote:
               | There is lots of evidence that p-hacking is widespread
               | (some estimate that up to 20% are p-hacked). This problem
               | also exists in top instutions, in fact in some fields it
               | appears that this problem is WORSE in higher ranking unis
               | - https://mitsloan.mit.edu/sites/default/files/inline-
               | files/P-...
        
               | cycomanic wrote:
               | Where is that evidence? The paper you cite suggests that
               | p hacking is done in experimental accounting studies but
               | not archival.
               | 
               | Generally speaking, evidence suggests that fraud rates
               | are low ( lower than in most other human endeavours).
               | This study cites 2% [1]. This is similar to numbers that
               | Elizabeth Bik reports. For comparison self reported
               | doping rates were between 6 and 9% here [2]
               | 
               | [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC5723807/ [2]
               | https://pmc.ncbi.nlm.nih.gov/articles/PMC11102888/
        
               | mike_hearn wrote:
               | The 2% figure isn't a study of the fraud rate, it's just
               | a survey asking academics if they've committed fraud
               | themselves. Ask them to estimate how many other academics
               | commit fraud and they say more like 10%-15%.
        
               | signatoremo wrote:
               | So which figure is more accurate in your opinion?
        
               | mike_hearn wrote:
               | See my other reply to Matthew. It's very dependent on how
               | you define fraud, which field you look at, which country
               | you look at, and a few other things.
               | 
               | Depending on what you choose for those variables it can
               | range from a few percent up to 100%.
        
               | mike_hearn wrote:
               | There's an article that explores the metrics here:
               | 
               | https://fantasticanachronism.com/2020/08/11/how-many-
               | undetec...
               | 
               |  _> 0.04% of papers are retracted. At least 1.9% of
               | papers have duplicate images  "suggestive of deliberate
               | manipulation". About 2.5% of scientists admit to fraud,
               | and they estimate that 10% of other scientists have
               | committed fraud. 27% of postdocs said they were willing
               | to select or omit data to improve their results. More
               | than 50% of published findings in psychology are false.
               | The ORI, which makes about 13 misconduct findings per
               | year, gives a conservative estimate of over 2000
               | misconduct incidents per year._
               | 
               | Although publishing untrue claims isn't the same thing as
               | fraud, editors of well known journals like The Lancet or
               | the New England Journal of Medicine have estimated that
               | maybe half or more of the claims they publish are wrong.
               | Statistical consistency detectors run over psych papers
               | find that ~50% fail such checks (e.g. that computed means
               | are possible given the input data). The authors don't
               | care, when asked to share their data so the causes of the
               | check failures can be explored they just refuse or ignore
               | the request, even if they signed a document saying they'd
               | share.
               | 
               | You don't have these sorts of problems in cryptography
               | but a lot of fields are rife with it, especially if you
               | use a definition of fraud that includes pseudoscientific
               | practices. The article goes into some of the issues and
               | arguments with how to define and measure it.
        
               | matthewdgreen wrote:
               | 0.04% is an extremely small number and (it needs to be
               | said) also includes papers retracted due to errors and
               | other good-faith corrections. Remember that _we want
               | people to retract flawed papers_! treating it as evidence
               | of fraud is not only a mischaracterization of the result
               | but also a choice that is bad for a society that wants
               | quality scientific results.
               | 
               | The other two metrics seem pretty weak. 1.9% of papers in
               | a vast database containing 40 journals show signs of
               | duplication. But then dig into the details: apparently a
               | huge fraction of those are in one journal and in two
               | specific years. Look at Figure 1 and it just screams
               | "something very weird is going on here, let's look
               | closely at this methodology before we accept the top line
               | results."
               | 
               | The final result is a meta-survey based on surveys done
               | across scientists all over the world, including surveys
               | that are written in other languages, presumably based on
               | scientists also publishing in smaller local journals.
               | Presumably this covers a vast range of scientists with
               | different reputations. As I said before, if you cast a
               | wide net that includes everyone doing science in the
               | entire world, I bet you'll find tons of fraud. This study
               | just seems to do that.
        
               | mike_hearn wrote:
               | The point about 0.04% is not that it's low, it's that it
               | should be much higher. Getting even obviously fraudulent
               | papers retracted is difficult and the image duplications
               | are being found by unpaid volunteers, not via some
               | comprehensive process so the numbers are lower bounds,
               | not upper. You can find academic fraud in bulk with a
               | tool as simple as grep and yet papers found that way are
               | typically not retracted.
               | 
               | Example, select the tortured phrases section of this
               | database. It's literally nothing fancier than a big
               | regex:
               | 
               | https://dbrech.irit.fr/pls/apex/f?p=9999:24::::::
               | 
               | Randomly chosen paper: https://link.springer.com/article/
               | 10.1007/s11042-025-20660-1
               | 
               | "A novel approach on heart disease prediction using
               | optimized hybrid deep learning approach", published in
               | Multimedia Tools and Applications.
               | 
               | This paper has been run through a thesaurus spinner
               | yielding garbage text like "To advance the expectation
               | exactness of the anticipated heart malady location show"
               | (heart disease -> heart malady). It also has nothing to
               | do with the journal it's published in.
               | 
               | Now you might object that the paper in question comes
               | from India and not an R1 American university, which is
               | how you're defining reputable. The journal itself does,
               | though. It's edited by an academic in the Dept. of
               | Computer Science and Engineering, Florida Atlantic
               | University, which is an R1. It also has many dozens of
               | people with the title of editor at other presumably
               | reputable western universities like Brunel in the UK, the
               | University of Salerno, etc:
               | 
               | https://link.springer.com/journal/11042/editorial-board
               | 
               | Clearly, none of the so-called editors of the journal can
               | be reading what's submitted to it. Zombie journals run by
               | well known publishers like Spring Nature are common. They
               | auto-publish blatant spam yet always have a gazillion
               | editors at well known universities. This stuff is so
               | basic both generation and detection predate LLMs
               | entirely, but it doesn't get fixed.
               | 
               | Then you get into all the papers that aren't trivially
               | fake but fake in advanced undetectable ways, or which are
               | merely using questionable research practices... the true
               | rate of retraction if standards were at the level laymen
               | imagine would be orders of magnitude higher.
        
               | refulgentis wrote:
               | I agree and am disappointed to see you in gray text. I'm
               | old enough to have seen too many pendulum swings from new
               | truth to thought-terminating cliche, and am increasingly
               | frustrated by a game of telephone, over years, leading to
               | it being common wisdom that research fraud is done all
               | the time and its shrugged off.
               | 
               | There's some real irony in that, as we wouldn't have
               | _gotten_ to this point a ton of self-policing over years
               | where it was exposed with great consequence.
        
               | Onawa wrote:
               | It depends, independent organizations that track this
               | stuff are able to call out unethical research and make
               | sure there is more than a slap on the wrist. I also
               | suspect that things may get better as the NIH has forced
               | all research to be in electronic lab notebooks and
               | published in open access journals.
               | https://x.com/RetractionWatch
        
               | throwaway4220 wrote:
               | On bsky: https://bsky.app/profile/retractionwatch.com
        
             | cycomanic wrote:
             | Which researchers are using plagiarism detectors? I'm not
             | aware that this is a known and widely accepted practice.
             | They are used by students and teachers for student papers
             | (in courses etc), but nobody i know would use them for
             | submitting research. I also don't see the point for why
             | even unethical researchers would use it, it wouldn't
             | increase your acceptance chances dramatically.
        
           | pinko wrote:
           | Normally I'm an AI skeptic, but in this case there's a good
           | analogy to post-quantum crypto: even if the current state of
           | the art allows fraudulent researchers to evade detection by
           | today's AI by using today's AI, their results, once
           | published, will remain unchanged as the AI improves, and
           | tomorrow's AI will catch them...
        
             | tmpz22 wrote:
             | I think it's not always a world scale problem as scientific
             | niches tend to be small communities. The challenge is to
             | get these small communities to police themselves.
             | 
             | For the rarer world scale papers we can dedicate more
             | resources to getting vetting them.
        
             | atrettel wrote:
             | Based on my own experience as a peer reviewer and
             | scientist, the issue is not necessarily in detecting
             | plagiarism or fraud. It is in getting editors to care after
             | a paper is already published.
             | 
             | During peer review, this could be great. It could stop a
             | fraudulent paper before it causes any damage. But in my
             | experience, I have never gotten a journal editor to retract
             | an already-published paper that had obvious plagiarism in
             | it ( _very_ obvious plagiarism in one case!). They have no
             | incentive to do extra work after the fact with no obvious
             | benefit to themselves. They choose to ignore it instead. I
             | wish it wasn 't true, but that has been my experience.
        
             | mike_hearn wrote:
             | Doesn't matter. Lots of bad papers get caught the moment
             | they're published and read by someone, but there's no
             | followup. The institutions don't care if they publish auto-
             | generated spam that can be detected on literally a single
             | read through, they aren't going to deploy advanced AI on
             | their archives of papers to create consequences a decade
             | later:
             | 
             | https://www.nature.com/articles/d41586-021-02134-0
        
           | kkylin wrote:
           | Every tool cuts both ways. This won't remove the need for
           | people to be good, but hopefully reduces the scale of the
           | problems to the point where good people (and better systems)
           | can manage.
           | 
           | FWIW while fraud gets headlines, unintentional errors and
           | simply crappy writing are much more common and bigger
           | problems I think. As reviewer and editor I often feel I'm the
           | first one (counting the authors) to ever read the paper
           | beginning to end: inconsistent notation & terminology,
           | unnecessary repetitions, unexplained background material,
           | etc.
        
           | t_mann wrote:
           | AI is fundamentally much more of a danger to the fraudsters.
           | Because they can only calibrate their obfuscation to today's
           | tools. But the publications are set in stone and can be
           | analyzed by tomorrow's tools. There are already startups
           | going through old papers with modern tools to detect
           | manipulation [0].
           | 
           | [0] https://imagetwin.ai/
        
           | dsabanin wrote:
           | Maybe at least in some cases these checkers will help them
           | actually find and fix their mistakes and they will end up
           | publishing something useful.
        
           | callc wrote:
           | Humans are already capable of "post-truth". This is enabled
           | by instant global communication and social media (not
           | dismissing the massive benefits these can bring), and led by
           | dictators who want fealty over independent rational thinking.
           | 
           | The limitations of slow news cycles and slow information
           | transmission lends to slow careful thinking. Especially
           | compared to social media.
           | 
           | No AI needed.
        
             | hunter2_ wrote:
             | The communication enabled by the internet is incredible,
             | but this aspect of it is so frustrating. The cat is out of
             | the bag, and I struggle to identify a solution.
             | 
             | The other day I saw a Facebook post of a national park
             | announcing they'd be closed until further notice. Thousands
             | of comments, 99% of which were divisive political banter
             | assuming this was the result of a top-down order. A very
             | easy-to-miss 1% of the comments were people explaining that
             | the closure was due to a burst pipe or something to that
             | effect. It's reminiscent of the "tragedy of the commons"
             | concept. We are overusing our right to spew nonsense to the
             | point that it's masking the truth.
             | 
             | How do we fix this? Guiding people away from the writings
             | of random nobodies in favor of mainstream authorities
             | doesn't feel entirely proper.
        
           | Salgat wrote:
           | My hope is that ML can be used to point out real world things
           | you can't fake or work around, such as why an idea is
           | considered novel or why the methodology isn't just gaming
           | results or why the statistics was done wrong.
        
           | blueboo wrote:
           | Just as plagiarism checkers harden the output of plagiarists.
           | 
           | This goes back to a principle of safety engineering: the
           | safer, reliable, trustworthy you make the system, the more
           | catastrophic the failures when they happen.
        
           | jstummbillig wrote:
           | We are "upgrading" from making errors to committing fraud. I
           | think that difference will still be important to most people.
           | In addition I don't really see why an unethical, but not
           | idiotic, researcher would assume, that the same tool that
           | they could use to correct errors, would not allow others to
           | check for and spot the fraud they are thinking of committing
           | instead.
        
         | kubb wrote:
         | As always, it depends on the precision.
         | 
         | If the LLM spots a mistake with 90% precision, it's pretty
         | good. If it's a 10% precision, people still might take a look
         | if they publish a paper once per year. If it's 1% - forget it.
        
         | throwoutway wrote:
         | There needs to be some careful human-in-the-loop analysis in
         | general, and a feedback loop for false positives.
        
         | Groxx wrote:
         | I very much suspect this will fall into the same behaviors as
         | AI-submitted bug reports in software.
         | 
         | Obviously it's useful when desired, they can find real issues.
         | But it's also absolutely riddled with unchecked "CVE 11 fix
         | now!!!" spam that isn't even correct, exhausting maintainers.
         | Some of those are legitimate accidents, but many are just
         | karma-farming for some other purpose, to appear like a
         | legitimate effort by throwing plausible-looking work onto other
         | people.
        
         | asdf6969 wrote:
         | There's no such thing as an obvious error in most fields. What
         | would the AI say to someone who claimed the earth orbited the
         | sun 1000 years ago? I don't know how it could ever know the
         | truth unless it starts collecting its own information. It could
         | be useful for a field that operates from first principles like
         | math but more likely is that it just blocks everyone from
         | publishing things that go against the orthodoxy.
        
         | TylerE wrote:
         | This is exactly the kind of task we need to be using AI for -
         | not content generation, but these sort of long running behind
         | the scenes things that are difficult for humans, and where
         | false positives have minimal cost.
        
         | flenserboy wrote:
         | So long as they don't build the models to rely on earlier
         | papers, it might work. Fraudulent or mistaken earlier work,
         | taken as correct, could easily lead to newer papers which
         | disagree or don't use the older data as wrong/mistaken. This
         | sort of checking needs to drill down as far as possible.
        
       | epidemiology wrote:
       | AI tools are hopefully going to eat lots of manual scientific
       | research. This article looks at error spotting, but you follow
       | the path of getting better and better at error spotting to it's
       | conclusion and you essentially reproduce the work entirely from
       | scratch. So in fact AI study generation is really where this is
       | going.
       | 
       | All my work could honestly be done instantaneously with better
       | data harmonization & collection along with better engineering
       | practices. Instead, it requires a lot of manual effort. I
       | remember my professors talking about how they used to calculate
       | linear regressions by hand back in the old days. Hopefully a lot
       | of the data cleaning and study setup that is done now sounds
       | similar to a set of future scientists who use AI tools to operate
       | and check these basic programatic and statistical tasks.
        
         | zozbot234 wrote:
         | I really really hope it doesn't. The last thing I ever want is
         | to be living in a world where all the scientific studies are
         | written by hallucinating stochastic parrots.
        
       | latexr wrote:
       | https://archive.ph/20250307115346/https://www.nature.com/art...
        
       | sega_sai wrote:
       | As a researcher I say it is a good thing. Provided it gives a
       | small number of errors that are easy to check, it is a no-
       | brainer. I would say it is more valuable for authors though to
       | spot obvious issues. I don't think it will drastically change the
       | research, but is an improvement over a spell check or running
       | grammarly.
        
       | simonw wrote:
       | "YesNoError is planning to let holders of its cryptocurrency
       | dictate which papers get scrutinized first."
       | 
       | Sigh.
        
         | brookst wrote:
         | Why sigh? This sounds like shareholders setting corporate
         | direction.
        
           | weebull wrote:
           | Exactly. That's why sigh.
        
           | jancsika wrote:
           | Oh wow, you've got 10,000 HN points and you are asking why
           | someone would sigh upon seeing that some technical tool has a
           | close association with a cryptocurrency.
           | 
           | Even people working reputable mom-and-pops retail jobs know
           | the reputation of retail due to very real high-pressure sales
           | techniques (esp. at car dealerships). Those techniques are
           | undeniably "sigh-able," and reputable retail shops spend a
           | lot of time and energy to distinguish themselves to their
           | potential customers and distance themselves from that ick.
           | 
           | Crypto also has an ick from its rich history of scams. I feel
           | silly even explicitly writing that they have a history rich
           | in scams because everyone on HN knows this.
           | 
           | I could at least understand (though not agree) if you raised
           | a question due to your knowledge of a _specific_
           | cryptocurrency. But  "Why sigh" for general crypto tie-in?
           | 
           | I feel compelled to quote Tim and Eric: "Do you live in a
           | hole, or boat?"
           | 
           | Edit: clarification
        
             | loufe wrote:
             | Apart from the actual meat of the discussion, which is
             | whether the GP's sigh is actually warranted, it's just
             | frustrating to see everyone engage in such shallow
             | expression. The one word comment could charitably be
             | interpreted as thoughtful, in the sense that a lot of
             | readers would take the time to understand their view-point,
             | but I still think it should be discouraged as they could
             | take some time to explain their thoughts more clearly.
             | There shouldn't need to be a discussion on what they
             | intended to convey.
             | 
             | That said, your "you're _that_ experienced here and you
             | didn 't understand _that_ " line really cheapens the
             | quality of discourse here, too. It certainly doesn't live
             | up to the HN guidelines
             | (https://news.ycombinator.com/newsguidelines.html). You
             | don't have to demean parent's question to deconstruct and
             | disagree with it.
        
               | multjoy wrote:
               | You do when it is clearly nonsense.
               | 
               | They're either entirely detached from reality in which
               | case they deserve to be gently mocked, or they're
               | trolling.
        
               | jacobolus wrote:
               | Let me quote Carl T. Bergstrom, evolutionary biologist
               | and expert on research quality and misinformation:
               | 
               |  _" Is everyone huffing paint?"_
               | 
               |  _" Crypto guy claims to have built an LLM-based tool to
               | detect errors in research papers; funded using its own
               | cryptocurrency; will let coin holders choose what papers
               | to go after; it's unvetted and a total black box--and
               | Nature reports it as if it's a new protein structure."_
               | 
               | https://bsky.app/profile/carlbergstrom.com/post/3ljsyoju3
               | s22...
        
               | cgriswald wrote:
               | Other than "it's unvetted and a total black box", which
               | is certainly a fair criticism, the rest of the quote
               | seems to be an expression of emotion roughly equivalent
               | to "sigh". We know Bergstrom doesn't like it, but the
               | reasons are left as an exercise to the reader. If
               | Bergstrom had posted that same post here, GP's comments
               | about post quality would still largely apply.
        
           | roywiggins wrote:
           | yeah, but without all those pesky "securities laws" and so
           | on.
        
           | ForTheKidz wrote:
           | Yes, exactly.
        
         | delusional wrote:
         | The nice thing about crypto plays is that you know they won't
         | get anywhere so you can safely ignore them. Its all going to
         | collapse soon enough.
        
       | yosito wrote:
       | While I don't doubt that AI tools can spot some errors that would
       | be tedious for humans to look for, they are also responsible for
       | far more errors. That's why proper understanding and application
       | of AI is important.
        
       | tomrod wrote:
       | I know academics that use it to make sure their arguments are
       | grounded, after a meaningful draft. This helps them in more
       | clearly laying out their arguments, and IMO is no worse than the
       | companies that used motivated graduate students for review the
       | grammar and coherency of papers written by non-native language
       | speakers.
        
       | webdoodle wrote:
       | The push for AI is about controlling the narrative. By giving AI
       | the editorial review process, it can control the direction of
       | science, media and policy. Effectively controlling the course of
       | human evolution.
       | 
       | On the other hand, I'm fully supportive of going through ALL of
       | the rejected scientific papers to look for editorial bias,
       | censorship, propaganda, etc.
        
         | TZubiri wrote:
         | It's fine, since it's not really just AI, it's the crypto
         | hackers in charge of the AI.
         | 
         | As it stands there's always a company (juristic person) behind
         | AIs, I haven't yet seen an independent AI.
        
         | nathan_compton wrote:
         | One thing about this is that these kinds of power
         | struggles/jostlings are part of every single thing humans do at
         | almost all times. There is no silver bullet that will extricate
         | human beings from the condition of being human, only constant
         | vigilance against the ever changing landscape of who is
         | manipulating and how.
        
       | bookofjoe wrote:
       | https://archive.ph/fqAig
        
       | TZubiri wrote:
       | Didn't this YesNoError thing start as a memecoin?
        
       | huijzer wrote:
       | I think improving incentives is the real problem in science.
       | Tools aren't gonna fix it.
        
       | gusgus01 wrote:
       | I'm extremely skeptical for the value in this. I've already seen
       | wasted hours responding to baseless claims that are lent credence
       | by AI "reviews" of open source codebases. The claims would have
       | happened before but these text generators know how to hallucinate
       | in the correct verbiage to convince lay people and amateurs and
       | are more annoying to deal with.
        
       | RainyDayTmrw wrote:
       | Perhaps our collective memories are too short? Did we forget what
       | curl just went through with AI confabulated bug reports[1]?
       | 
       | [1]: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-
       | stands-f...
        
       | InkCanon wrote:
       | This sounds way, way out of how LLMs work. They can't count the
       | R's in strarwberrrrrry, but they can cross reference multiple
       | tables of data? Is there something else going on here?
        
         | Groxx wrote:
         | Accurately check: lol no chance at all, completely agreed.
         | 
         | Detect deviations from common patterns, which are often pointed
         | out via common patterns of review feedback on things, which
         | might indicate a mistake: actually I think that fits moderately
         | well.
         | 
         | Are they accurate enough to use in bulk? .... given their
         | accuracy with code bugs, I'm inclined to say "probably not",
         | except by people already knowledgeable in the content. They can
         | generally reject false positives without a lot of effort.
        
       | lfsh wrote:
       | I am using Jetbrain's AI to do code analysis (find errors).
       | 
       | While it sometimes spot something I missed it also gives a lot of
       | confident 'advise' that is just wrong or not useful.
       | 
       | Current AI tools are still sophisticated search engines. They
       | cannot reason or think.
       | 
       | So while I think it could spot some errors in research papers I
       | am still very sceptical that it is useful as trusted source.
        
       | _tom_ wrote:
       | This basically turns research papers as a whole into a big
       | generative adversarial network.
        
       | sfink wrote:
       | Don't forget that this is driven by present-day AI. Which means
       | people will assume that it's checking for fraud and incorrect
       | logic, when actually it's checking for self-consistency and
       | consistency with training data. So it should be great for typos,
       | misleading phrasing, and cross-checking facts and diagrams, but I
       | would expect it to do little for manufactured data, plausible but
       | incorrect conclusions, and garden variety bullshit (claiming X
       | because Y, when Y only implies X because you have a reasonable-
       | sounding argument that it ought to).
       | 
       | Not all of that is out of reach. Making the AI evaluate a paper
       | in the context of a cluster of related papers might enable
       | spotting some "too good to be true" things.
       | 
       | Hey, here's an idea: use AI for mapping out the influence of
       | papers that were later retracted (whether for fraud or error, it
       | doesn't matter). Not just via citation, but have it try to
       | identify the no longer supported conclusions from a retracted
       | paper, and see where they show up in downstream papers. (Cheap
       | "downstream" is when a paper or a paper in a family of papers by
       | the same team ever cited the upstream paper, even in preprints.
       | More expensive downstream is doing it without citations.)
        
         | ForTheKidz wrote:
         | > people will assume that it's checking for fraud and incorrect
         | logic, when actually it's checking for self-consistency and
         | consistency with training data.
         | 
         | TBF, this also applies to all humans.
        
           | Groxx wrote:
           | There is a clear difference in _capability_ even though they
           | share many failures
        
           | nxobject wrote:
           | To be fair, at least humans get to have collaborators from
           | multiple perspectives and skillsets; a lot of the discussion
           | about AI in research has assumed that a research team is one
           | hive mind, when the best collaborations aren't.
        
           | lucianbr wrote:
           | No, no it does not. Are you actually claiming with a straight
           | face that not a single human can check for fraud or incorrect
           | logic?
           | 
           | Let's just claim any absurd thing in defense of the AI hype
           | now.
        
             | ForTheKidz wrote:
             | > Are you actually claiming with a straight face that not a
             | single human can check for fraud or incorrect logic?
             | 
             | No of course not, I was pointing out that we largely check
             | "for self-consistency and consistency with training data"
             | as well. Our checking of the coherency of other peoples
             | work is presumably an extension of this.
             | 
             | Regardless, computers _already_ check for fraud and
             | incorrect logic as well, albeit in different contexts.
             | Neither humans or computers can do this with general
             | competency, i.e. without specific training to do so.
        
         | timewizard wrote:
         | They spent trillions of dollars to create a lame spell check.
        
         | raincole wrote:
         | If you can check for manufactured data, it means you know more
         | about what the real data looks like than the author.
         | 
         | If there were an AI that can check manufactured data, science
         | would be a solved problem.
        
       | BurningFrog wrote:
       | In the not so far future we should have AIs that have read all
       | the papers and other info in a field. They can then review any
       | new paper as well as answering any questions in the field.
       | 
       | This then becomes the first sanity check for any paper author.
       | 
       | This should save a lot of time and effort, improve the quality of
       | papers, and root out at least some fraud.
       | 
       | Don't worry, many problems will remain :)
        
       | surferbayarea wrote:
       | Here are 2 examples from the Black Spatula project where we were
       | able to detect major errors: - https://github.com/The-Black-
       | Spatula-Project/black-spatula-p... - https://github.com/The-
       | Black-Spatula-Project/black-spatula-p...
       | 
       | Some things to note : this didn't even require a complex multi-
       | agent pipeline. A single shot prompting was able to detect these
       | errors.
        
       | systemstops wrote:
       | When people starting building tools like this to analyze media
       | coverage of historic events, it will be a game changer.
        
       | jongjong wrote:
       | I expect that for truly innovative research, it might flag the
       | innovative parts of the paper as a mistake if they're not fully
       | elaborated upon... E.g. if the author assumed that the reader
       | possesses certain niche knowledge.
       | 
       | With software design, I find many mistakes in AI where it says
       | things that are incorrect because it parrots common blanket
       | statements and ideologies without actually checking if the
       | statement applies in this case by looking at it from first
       | principles... Once you take the discussion down to first
       | principles, it quickly acknowledges its mistake but you had to
       | have this deep insight in order to take it there... Some person
       | who is trying to learn from AI would not get this insight from
       | AI; instead they would be taught a dumbed-down, cartoonish,
       | wordcel version of reality.
        
       | delusional wrote:
       | Reality check: yesnoerror, the only part of the article that
       | actually seems to involve any published AI reviewer comments, is
       | just checking arxiv papers. Their website claims that they
       | "uncover errors, inconsistencies, and flawed methods that human
       | reviewers missed." but arxiv is of course famously NOT a peer-
       | reviewed journal. At best they are finding "errors,
       | inconsistencies, and flawed methods" in papers that human
       | reviewers haven't looked at.
       | 
       | Let's then try and see if we can uncover any "errors,
       | inconsistencies, and flawed methods" on their website. The
       | "status" is pure madeup garbage. There's no network traffic
       | related to it that would actually allow it to show a real status.
       | The "RECENT ERROR DETECTIONS" lists a single paper from today,
       | but looking at the queue when you click "submit a paper" lists
       | the last completed paper as the 21st of February. The front page
       | tells us that it found some math issue in a paper titled "Waste
       | tea as absorbent for removal of heavy metal present in
       | contaminated water" but if we navigate to that paper[1] the math
       | error suddenly disappears. Most of the comments are also
       | worthless, talking about minor typographical issues or
       | misspellings that do not matter, but of course they still
       | categorize that as an "error".
       | 
       | It's the same garbage as every time with crypto people.
       | 
       | [1]:
       | https://yesnoerror.com/doc/82cd4ea5-4e33-48e1-b517-5ea3e2c5f...
        
       | EigenLord wrote:
       | The role of LLMs in research is an ongoing, well, research topic
       | of interest of mine. I think it's fine so long as a 1. a pair of
       | human eyes has validated any of the generated outputs and 2. The
       | "ownership rule": the human researcher is prepared to defend and
       | own anything the AI model does on their behalf, implying that
       | they have digested and understood it as well as anything else
       | they may have read or produced in the course of conducting their
       | research. Rule #2 avoids this notion of crypto-plagiarism. If you
       | prompted for a certain output, your thought in a manner of
       | speaking was the cause of that output. If you agree with it, you
       | should be able to use it. In this case, using AI to fact check is
       | kind of ironic, considering their hallucination issues. However
       | infallibility is the mark of omniscience; it's pretty
       | unreasonable to expect these models to be flawless. They can
       | still play a supplementary role to the review process, a second
       | line of defense for peer-reviewers.
        
       | robwwilliams wrote:
       | Great start but definitely will require supervision by experts in
       | the fields. I routinely use Claude 3.7 to flag errors in my
       | submissions. Here is a prompt I used yesterday:
       | 
       | "This is a paper we are planning to submit to Nature
       | Neuroscience. Please generate a numbered list of significant
       | errors with text tags I can use to find the errors and make
       | corrections."
       | 
       | It gave me a list of 12 errors of which Claude labeled three as
       | "inconsistencies", "methods discrepancies". and "contradictions".
       | When I requested that Claude reconsider it said "You are right, I
       | apologize" in each of these three instances. Nonetheless it was
       | still a big win for me and caught a lot of my dummheits.
       | 
       | Claude 3.7 running in standard mode does not use its context
       | window very effectively. I suppose I could have demanded that
       | Claude "internally review (wait: think again)" for each serious
       | error it initially thought it had encountered. I'll try that next
       | time. Exposure of chain of thought would help.
        
       | lifeisstillgood wrote:
       | It's a nice idea, and I would love to be able to use it for my
       | own company reports (spotting my obvious errors before sending
       | them to my bosses boss)
       | 
       | But the first thing I noticed was the two approaches highlighted
       | - one a small scale approach that does not publish first but
       | approaches the authors privately - and the other publishes first,
       | does not have human review and has _its own cryptocurrency_
       | 
       | I don't think anything quite speaks more about the current state
       | of the world and the choices in our political space
        
       | ysofunny wrote:
       | top two links at this moment are:
       | 
       | > AI tools are spotting errors in research papers: inside a
       | growing movement (nature.com)
       | 
       | and
       | 
       | > Kill your Feeds - Stop letting algorithms dictate what you
       | think (usher.dev)
       | 
       | so we shouldn't let the feed algorithms influence our thoughs,
       | but also, AI tools need to tell us when we're wrong
        
       | TheRealPomax wrote:
       | Oh look, an _actual_ use case for AI. Very nice.
        
       | mac-mc wrote:
       | Now they need to do it for their own outputs to spot their own
       | hallucination errors.
        
       | rosstex wrote:
       | Why not just skip the human and have AI write, evaluate and
       | submit the papers?
        
         | gosub100 wrote:
         | Why not skip the AI and remove "referees" that allow papers to
         | be published that contain egregious errors?
        
       | YeGoblynQueenne wrote:
       | Needs more work.
       | 
       | >> Right now, the YesNoError website contains many false
       | positives, says Nick Brown, a researcher in scientific integrity
       | at Linnaeus University. Among 40 papers flagged as having issues,
       | he found 14 false positives (for example, the model stating that
       | a figure referred to in the text did not appear in the paper,
       | when it did). "The vast majority of the problems they're finding
       | appear to be writing issues," and a lot of the detections are
       | wrong, he says.
       | 
       | >> Brown is wary that the effort will create a flood for the
       | scientific community to clear up, as well fuss about minor errors
       | such as typos, many of which should be spotted during peer review
       | (both projects largely look at papers in preprint repositories).
       | Unless the technology drastically improves, "this is going to
       | generate huge amounts of work for no obvious benefit", says
       | Brown. "It strikes me as extraordinarily naive."
        
       ___________________________________________________________________
       (page generated 2025-03-08 23:00 UTC)