hngopher.com

       [HN Gopher] Over fifty new hallucinations in ICLR 2026 submissions
       ___________________________________________________________________
        
       Over fifty new hallucinations in ICLR 2026 submissions
        
       Author : puttycat
       Score  : 432 points
       Date   : 2025-12-07 13:16 UTC (9 hours ago)
        
 (HTM) web link (gptzero.me)
 (TXT) w3m dump (gptzero.me)
        
       | jqpabc123 wrote:
       | The legal system has a word to describe AI "slop" --- it is
       | called "negligence".
       | 
       | And as the remedy starts being applied (aka "liability"), the
       | enthusiasm for AI will start to wane.
       | 
       | I wouldn't be surprised if some businesses ban the use of AI ---
       | starting with law firms.
        
         | loloquwowndueo wrote:
         | I applaud your use of triple dashes to avoid automatic
         | conversion to em dashes and being labeled an AI. Kudos!
        
           | ghaff wrote:
           | This is a particular meme that I really don't like. I've used
           | em-dashes routinely for years. Do I need to stop using them
           | because various people assume they're an AI flag?
        
             | TimedToasts wrote:
             | No, but you should be prepared to have people suspect you
             | are using AI to create your responses.
             | 
             | C'est la vie.
             | 
             | The good news is that it will rectify itself and soon the
             | output will lack even these signals.
        
               | ghaff wrote:
               | Well, I work for myself and people can either judge my
               | work on its own merits or not. Don't care all that much.
        
         | ls612 wrote:
         | The legal system has a word to describe software bugs --- it is
         | called "negligence".
         | 
         | And as the remedy starts being applied (aka "liability"), the
         | enthusiasm for software will start to wane.
         | 
         | What if anything do you think is wrong with my analogy? I doubt
         | most people here support strict liability for bugs in code.
        
           | hnfong wrote:
           | I don't even think GP knows what negligence is.
           | 
           | Generally the law allows people to make mistakes, as long as
           | a reasonable level of care is taken to avoid them (and also
           | you can get away with carelessness if you don't owe any duty
           | of care to the party). The law regarding what level of care
           | is needed to verify genAI output is probably not very well
           | defined, but it definitely isn't going to be strict
           | liability.
           | 
           | The emotionally-driven hate for AI, in a tech-centric forum
           | even, to the extent that so many commenters seem to be off-
           | balance in their rational thinking, is kinda wild to me.
        
             | ls612 wrote:
             | I don't get it, tech people clearly have the most to gain
             | from AI like Claude Code.
        
           | senshan wrote:
           | Very good analogy indeed. With one modification it makes
           | perfect sense:
           | 
           | > And as the remedy starts being applied (aka "liability"),
           | the enthusiasm for _sloppy and poorly tested_ software will
           | start to wane.
           | 
           | Many of us use AI to write code these days, but the burden is
           | still on us to design and run all the tests.
        
       | watwut wrote:
       | Can we just call them "lies" and "fabrications" which is what
       | they are? If I write the same, you will call them "made up
       | citations" and "academic dishonesty".
       | 
       | One can use AI to help them write without going all the way to
       | having it generate facts and citations.
        
         | sorokod wrote:
         | As long as the submissions are on behalf of humans we should.
         | The humans should accept the consequences too.
        
         | jmount wrote:
         | That is a key point: they are fabrications, not hallucinations.
        
         | Barbing wrote:
         | Ars has often gone with "confabulation":
         | 
         | >Confabulation was coined right here on Ars, by AI-beat
         | columnist Benj Edwards, in Why ChatGPT and Bing Chat are so
         | good at making things up (Apr 2023).
         | 
         | https://arstechnica.com/civis/threads/researchers-describe-h...
         | 
         | >Generative AI is so new that we need metaphors borrowed from
         | existing ideas to explain these highly technical concepts to
         | the broader public. In this vein, we feel the term
         | "confabulation," although similarly imperfect, is a better
         | metaphor than "hallucination." In human psychology, a
         | "confabulation" occurs when someone's memory has a gap and the
         | brain convincingly fills in the rest without intending to
         | deceive others.
         | 
         | https://arstechnica.com/information-technology/2023/04/why-a...
        
       | jameshart wrote:
       | Is the baseline assumption of this work that an erroneous
       | citation is LLM hallucinated?
       | 
       | Did they run the checker across a body of papers before LLMs were
       | available and verify that there were no citations in peer
       | reviewed papers that got authors or titles wrong?
        
         | tokai wrote:
         | Yeah that is what their tool does.
        
         | llm_nerd wrote:
         | People will commonly hold LLMs as unusable because they make
         | mistakes. So do people. Books have errors. Papers have errors.
         | People have flawed knowledge, often degraded through a
         | conceptual game of telephone.
         | 
         | Exactly as you said, do precisely this to pre-LLM works. There
         | will be an enormous number of errors with utter certainty.
         | 
         | People keep imperfect notes. People are lazy. People sometimes
         | even fabricate. None of this needed LLMs to happen.
        
           | add-sub-mul-div wrote:
           | Quoting myself from just last night because this comes up
           | every time and doesn't always need a new write-up.
           | 
           | > You also don't need gunpowder to kill someone with
           | projectiles, but gunpowder changed things in important ways.
           | All I ever see are the most specious knee-jerk defenses of AI
           | that immediately fall apart.
        
           | the_af wrote:
           | LLM are a force multiplier of this kind of errors though.
           | It's not easy to hallucinate papers out of whole cloth, but
           | LLMs can easily and confidently do it, quote paragraphs that
           | don't exist, and do it tirelessly and at a pace unmatched by
           | humans.
           | 
           | Humans can do all of the above but it costs them more, and
           | they do it more slowly. LLMs generate spam at a much faster
           | rate.
        
             | llm_nerd wrote:
             | >It's not easy to hallucinate papers out of whole cloth,
             | but LLMs can easily and confidently do it, quote paragraphs
             | that don't exist, and do it tirelessly and at a pace
             | unmatched by humans.
             | 
             | But no one is claiming these papers were hallucinated
             | whole, so I don't see how that's relevant. This study --
             | notably to sell an "AI detector", which is largely a
             | laughable snake-oil field -- looked purely at the accuracy
             | of citations[1] among a very large set of citations. Errors
             | in papers are not remotely uncommon, and finding some
             | errors is...exactly what one would expect. As the GP said,
             | do the same study on pre-LLM papers and you'll find an
             | enormous number of incorrect if not fabricated citations.
             | Peer review has always been an illusion of auditing.
             | 
             | 1 - Which is such a weird thing to sell an "AI detection"
             | tool. Clearly it was mostly manual given that they somehow
             | only managed to check a tiny subset of the papers, so in
             | all likelihood was some guy going through citations and
             | checking them on Google Search.
        
               | the_af wrote:
               | I've zero interest in the AI tool, I'm discussing the
               | broader problem.
               | 
               | The _references_ were made up, and this is easier and
               | faster to do with LLMs than with humans. Easier to do
               | inadvertently, too.
               | 
               | As I said, LLMs are a force multiplier for fraud and
               | inadvertent errors. So it's a big deal.
        
               | throwaway-0001 wrote:
               | I think we should see a chart as % of "fabricated"
               | references from past 20 years. We should see a huge
               | increase after 2020-2021. Anyone has this chart data?
        
           | pmontra wrote:
           | Fabricated citations are not errors.
           | 
           | A pre LLM paper with fabricated citations would demonstrate
           | will to cheat by the author.
           | 
           | A post LLM paper with fabricated citations: same thing and if
           | the authors attempt to defend themselves with something like,
           | we trusted the AI, they are sloppy, probably cheaters and not
           | very good at it.
        
             | llm_nerd wrote:
             | >Fabricated citations are not errors.
             | 
             | Interesting that you hallucinated the word "fabricated"
             | here where I broadly talked about errors. Humans, right?
             | Can't trust them.
             | 
             | Firstly, just about every paper ever written in the history
             | of papers has errors in it. Some small, some big. Most
             | accidental, but some intentional. Sometimes people are
             | sloppy keeping notes, transcribe a row, get a name wrong,
             | do an offset by 1. Sometimes they just entirely make up
             | data or findings. This is not remotely new. It has happened
             | as long as we've had papers. Find an old, pre-LLM paper and
             | go through the citations -- especially for a tosser target
             | like this where there are tens of thousands of low effort
             | papers submitted -- and you're going to find a lot of
             | sloppy citations that are hard to rationalize.
             | 
             | Secondly, the "hallucination" is that this particular
             | snake-oil firm couldn't find given papers in many cases
             | (they aren't foolish enough to think that means they were
             | fabricated. But again, they're looking to sell a tool to
             | rubes, so the conclusion is good enough), and in others
             | that some of the author names are wrong. Eh.
        
               | the_af wrote:
               | > _Firstly, just about every paper ever written in the
               | history of papers has errors in it_
               | 
               | LLMs make it easier and faster, much like guns make
               | killing easier and faster.
        
             | mapmeld wrote:
             | Further, if I use AI-written citations to back some claim
             | or fact, what are the actual claims or facts based on?
             | These started happening in law because someone writes the
             | text and then wishes there was a source that was relevant
             | and actually supportive of their claim. But if someone puts
             | in the labor to check your real/extant sources, there's
             | nothing backing it (e.g. MAHA report).
        
           | nkrisc wrote:
           | Under what circumstances would a human mistakenly cite a
           | paper which does not exist? I'm having difficulty imagining
           | how someone could mistakenly do that.
        
             | jameshart wrote:
             | The issue here is that many of the 'hallucinations' this
             | article cites aren't 'papers which do not exist'. They are
             | incorrect author attributions, publication dates, or
             | titles.
        
         | miniwark wrote:
         | They explain in the article what they consider a proper
         | citation, an erroneous one and an hallucination, in the section
         | "Defining Hallucitations". They also say than they have many
         | false positives, mostly real papers who are not available
         | online.
         | 
         | Thad said, i am also very curious of the result than their
         | tool, would give to papers from the 2010's and before.
        
           | sigmoid10 wrote:
           | If you look at their examples in the "Defining
           | Hallucitations" section, I'd say those could be 100% human
           | errors. Shortening authors' names, leaving out authors,
           | misattributing authors, misspelling or misremembering the
           | paper title (or having an old preprint-title, as titles do
           | change) are all things that I would fully expect to happen to
           | anyone in any field were things get ever got published.
           | Modern tools have made the citation process more comfortable,
           | but if you go back to the old days, you'd probably find those
           | kinds of errors everywhere. If you look at the full list of
           | "hallucinations" they claim to have discovered, the only ones
           | I'd not immediately blame on human screwups are the ones
           | where a title and the authors got zero matches for existing
           | papers/people. If you really want to do this kind of analysis
           | correctly, you'd have to match the claim of the text and
           | verify it with the cited article. Because I think it would be
           | even more dangerous if you can get claims accepted by simply
           | quoting an existing paper correctly, while completely
           | ignoring its content (which would have worked here).
        
             | Majromax wrote:
             | > Modern tools have made the citation process more
             | comfortable,
             | 
             | That also makes some of those errors easier. A bad auto-
             | import of paper metadata can silently screw up some of the
             | publication details, and replacing an early preprint with
             | the peer-reviewed article of record takes annoying manual
             | intervention.
        
             | jameshart wrote:
             | I mean, if you're able to take the citation, find the cited
             | work, and definitively state 'looks like they got the title
             | wrong' or 'they attributed the paper to the wrong authors',
             | that doesn't sound like what people usually mean when they
             | say a 'hallucinated' citation. Work that is lazily or
             | poorly cited but nonetheless _attempts_ to cite real work
             | is not the problem. Work which gives itself false authority
             | by _claiming to cite works that simply do not exist_ is the
             | main concern surely?
        
               | sigmoid10 wrote:
               | >Work which gives itself false authority by claiming to
               | cite works that simply do not exist is the main concern
               | surely?
               | 
               | You'd think so, but apparently it isn't for these folks.
               | On the other hand, saying "we've found 50 hallucinations
               | in scientific papers" generates a lot more clicks than
               | "we've found 50 common citation mistakes that people make
               | all the time"
        
         | _alternator_ wrote:
         | Let me second this: a baseline analysis should include papers
         | that were published or reviewed at least 3-4 years ago.
         | 
         | When I was in grad school, I kept a fairly large .bib file that
         | almost certainly had a mistake or two in it. I don't think any
         | of them ever made it to print, but it's hard to be 100% sure.
         | 
         | For most journals, they actually partially check your citations
         | as part of the final editing. The citation record is important
         | for journals, and linking with DOIs is fairly common.
        
       | TaupeRanger wrote:
       | It's going to be even worse than 50:
       | 
       | > Given that we've only scanned 300 out of 20,000 submissions, we
       | estimate that we will find 100s of hallucinated papers in the
       | coming days.
        
         | shusaku wrote:
         | 20,000 submissions to a single conference? That is nuts
        
           | analog31 wrote:
           | This is an interesting article along those lines...
           | 
           | https://www.theguardian.com/technology/2025/dec/06/ai-
           | resear...
        
           | ghaff wrote:
           | Doesn't seem especially out of the norm for a large
           | conference. Call it 10,000 attendees which is large but not
           | huge. Sure; not everyone attending puts in a session
           | proposal. But others put multiple. And many submit but, if
           | not accepted don't attend.
           | 
           | Can't quote exact numbers but when I was on the conference
           | committee for a maybe high four figures attendance
           | conference, we certainly had many thousands of submissions.
        
           | zipy124 wrote:
           | When academics are graded based on number of papers this is
           | the result.
        
             | adestefan wrote:
             | The problem isn't only papers it's that the world of
             | academic computer science coalesced around conference
             | submissions instead of journal submissions. This isn't new
             | and was an issue 30 years ago when I was in grad school. It
             | makes the work of conference organizes the little block
             | holding up the entire system.
        
               | DonaldPShimoda wrote:
               | Makes me grateful I'm in an area of CS where the "big"
               | conferences are like 500 attendees.
        
       | shusaku wrote:
       | Checking each citation one by one is quite critical in peer
       | review, and of course checking a colleagues paper. I've never had
       | to deal with AI slop, but you'll definitely see something cited
       | for the wrong reason. And just the other day during the final
       | typesetting of a paper of mine I found the journal had messed up
       | a citation (same journal / author but wrong work!)
        
         | stefan_ wrote:
         | Is it quite critical? Peer review is not checking homework,
         | it's about the novel contribution presented. Papers will
         | frequently cite related notable experiments or introduce a
         | problem that as a peer reviewer in the field I'm already well
         | familiar with. These paragraphs generate many citations but are
         | the least important part of a peer review.
         | 
         | (People submitting AI slop should still be ostracized of
         | course, if you can't be bothered to read it, why would you
         | think I should)
        
           | shusaku wrote:
           | Fair point. In my mind it is critical because mistakes are
           | common and can only be fixed by a peer. But you are right
           | that we should not miss the forest through the trees and get
           | lost on small details.
        
       | mjd wrote:
       | I love that fake citation that adds George Costanza to the list
       | of authors!
        
       | tomrod wrote:
       | How sloppy is someone that they don't check their references!
        
         | analog31 wrote:
         | A reference is included in a paper if the paper uses
         | information derived from the reference, or to acknowledges the
         | reference as a prior source. If the reference is fake, then the
         | derived information could very well be fake.
         | 
         | Let's say that I use a formula, and give a reference to where
         | the formula came from, but the reference doesn't exist. Would
         | you trust the formula?
         | 
         | Let's say a computer program calls a subroutine with a certain
         | name from a certain library, but the library doesn't exist.
         | 
         | A person doing good research doesn't need to check their
         | references. Now, they could stand to check the references for
         | typographic errors, but that's a stretch too. Almost every
         | online service for retrieving articles includes a reference for
         | each article that you can just copy and paste.
        
       | theoldgreybeard wrote:
       | If a carpenter builds a crappy shelf "because" his power tools
       | are not calibrated correctly - that's a crappy carpenter, not a
       | crappy tool.
       | 
       | If a scientist uses an LLM to write a paper with fabricated
       | citations - that's a crappy scientist.
       | 
       | AI is not the problem, laziness and negligence is. There needs to
       | be serious social consequences to this kind of thing, otherwise
       | we are tacitly endorsing it.
        
         | gdulli wrote:
         | That's like saying guns aren't the problem, the desire to shoot
         | is the problem. Okay, sure, but wanting something like a metal
         | detector requires us to focus on the more tangible aspect that
         | is the gun.
        
           | baxtr wrote:
           | If I gave you a gun would you start shooting people just
           | because you had one?
        
             | agentultra wrote:
             | If I gave you a gun without a safety could you be the one
             | to blame when it goes off because you weren't careful
             | enough?
             | 
             | The problem with this analogy is that it makes no sense.
             | 
             | LLMs aren't guns.
             | 
             | The problem with using them is that humans have to review
             | the content for accuracy. And that gets tiresome because
             | the whole point is that the LLM saves you time and effort
             | doing it yourself. So naturally people will tend to stop
             | checking and assume the output is correct, "because the LLM
             | is so good."
             | 
             | Then you get false citations and bogus claims everywhere.
        
               | sigbottle wrote:
               | Sorry, I'm not following the gun analogies at all
               | 
               | But regardless, I thought the point was that...
               | 
               | > The problem with using them is that humans have to
               | review the content for accuracy.
               | 
               | There are (at least) two humans in this equation. The
               | publisher, and the reader. The publisher at least should
               | do their due diligence, regardless of how "hard" it is
               | (in this case, we literally just ask that you review your
               | OWN CITATIONS that you insert into your paper). This is
               | why we have accountability as a concept.
        
               | oceansweep wrote:
               | Yes. That is absolutely the case. One of the Most popular
               | handguns does not have a safety switch that must be
               | toggled before firing. (Glock series handguns)
               | 
               | If someone performs a negligent discharge, they are
               | responsible, not Glock. It does have other safety
               | mechanisms to prevent accidental fires not resulting from
               | a trigger pull.
        
               | agentultra wrote:
               | You seem to be getting hung up on the details of guns and
               | missing the point that it's a bad analogy.
               | 
               | Another way LLMs are not guns: you don't need a giant
               | data centre owned by a mega corp to use your gun.
               | 
               | Can't do science because GlockGPT is down? Too bad I
               | guess. Let's go watch the paint dry.
               | 
               | The reason I made it is because this is inherently how we
               | designed LLMs. They will make bad citations and people
               | need to be careful.
        
               | zdragnar wrote:
               | > If I gave you a gun without a safety could you be the
               | one to blame when it goes off because you weren't careful
               | enough?
               | 
               | Absolutely. Many guns don't have safties. You don't load
               | a round in the chamber unless you intend on using it.
               | 
               | A gun going off when you don't intend is a negligent
               | discharge. No ifs, ands or buts. The person in possession
               | of the gun is always responsible for it.
        
               | bluGill wrote:
               | > A gun going off when you don't intend is a negligent
               | discharg
               | 
               | false. A gun goes off when not intended too often to
               | claim that. It has happned to me - I then took the gun to
               | a qualified gunsmith for repairs.
               | 
               | A gun they fires and hits anything you didn't intend to
               | is negligent discharge even if you intended to shoot. Gun
               | saftey is about assuming a gun that could possible fire
               | will and ensuring nothing bad can happen. When looking at
               | gun in a store (that you might want to buy) you aim it at
               | an upper corner where even if it fires the odds of
               | something bad resulting is the least lively to happen (it
               | should be unloaded - and you may have checked, but you
               | still aim there!)
               | 
               | same with cat toy lazers - they should be safe to shine
               | in an eye - but you still point in a safe direction.
        
               | baxtr wrote:
               | > _"because the LLM is so good."_
               | 
               | That's the issue here. Of course you should be aware of
               | the fact that these things need to be checked -
               | especially if you're a scientist.
               | 
               | This is no secret only known to people on HN. LLMs are
               | tools. People using these tools need to be diligent.
        
               | imiric wrote:
               | > LLMs aren't guns.
               | 
               | Right. A gun doesn't misfire 20% of the time.
               | 
               | > The problem with using them is that humans have to
               | review the content for accuracy.
               | 
               | How long are we going to push this same narrative we've
               | been hearing since the introduction of these tools? When
               | can we trust these tools to be accurate? For technology
               | that is marketed as having superhuman intelligence, it
               | sure seems dumb that it has to be fact-checked by less-
               | intelligent humans.
        
             | komali2 wrote:
             | Ok sure I'm down for this hypothetical. I will bring 50
             | random people in front of you, and you will hand all 50 of
             | them loaded guns. Still feeling it?
        
               | bandofthehawk wrote:
               | Ever been to a shooting range? It's basically a bunch of
               | random people with loaded guns.
        
             | hipshaker wrote:
             | If you look at gun violence in the U.S that is , speaking
             | as a European, kind of what I see happening.
        
             | gdulli wrote:
             | That doesn't address my point at all but no, I'm not a
             | violent or murderous person. And most people aren't. Many
             | more people do, however, want to take shortcuts to get
             | their work done with the least amount of effort possible.
        
               | SauntSolaire wrote:
               | > Many more people do, however, want to take shortcuts to
               | get their work done with the least amount of effort
               | possible.
               | 
               | Yes, and they are the ones responsible for the poor
               | quality of work that results from that.
        
             | raincole wrote:
             | If the society rewarded me money and fame when I kill
             | someone then I would. Why wouldn't I?
             | 
             | Like it or not, in our society scientists' job is to churn
             | out papers. Of course they'll use the most efficient way to
             | churn out papers.
        
             | intended wrote:
             | The issue with this argument, for anyone who comes after,
             | is not when you give a gun to a SINGLE person, and then ask
             | them "would you do a bad thing".
             | 
             | The issue is when you give EVERYONE guns, and then are
             | surprised when enough people do bad things with them, to
             | create externalities for everyone else.
             | 
             | There is some sort of trip up when personal responsibility,
             | and society wide behaviors, intersect. Sure most people
             | will be reasonable, but the issue is often the cost of the
             | number of irresponsible or outright bad actors.
        
             | rcpt wrote:
             | Probably not but, empirically, there are a lot of short
             | tempered people who would.
        
         | TomatoCo wrote:
         | To continue the carpenter analogy, the issue with LLMs is that
         | the shelf looks great but is structurally unsound. That it
         | looks good on surface inspection makes it harder to tell that
         | the person making it had no idea what they're doing.
        
           | embedding-shape wrote:
           | Regardless, if a carpenter is not validating their work
           | before selling it, it's the same as if a researcher doesn't
           | validate their citations before publishing. Neither of them
           | have any excuses, and one isn't harder to detect than the
           | other. It's just straight up laziness regardless.
        
             | judofyr wrote:
             | I think this is a bit unfair. The carpenters are (1) living
             | in world where there's an extreme focus on delivering as
             | quicklyas possible, (2) being presented with a tool which
             | is promised by prominent figures to be amazing, and (3) the
             | tool is given at a low cost due to being subsidized.
             | 
             | And yet, we're not supposed to criticize the tool or its
             | makers? Clearly there's more problems in this world than
             | <<lazy carpenters>>?
        
               | embedding-shape wrote:
               | > And yet, we're not supposed to criticize the tool or
               | its makers?
               | 
               | Exactly, they're not forcing anyone to use these things,
               | but sometimes others (their managers/bosses) forced them
               | to. Yet it's their responsibility for choosing the right
               | tool for the right problem, like any other professional.
               | 
               | If a carpenter shows up to put a roof yet their hammer or
               | nail-gun can't actually put in nails, who'd you blame;
               | the tool, the toolmaker or the carpenter?
        
               | judofyr wrote:
               | > If a carpenter shows up to put a roof yet their hammer
               | or nail-gun can't actually put in nails, who'd you blame;
               | the tool, the toolmaker or the carpenter?
               | 
               | I would be unhappy with the carpenter, yes. But if the
               | toolmaker was constantly over-promising (lying?),
               | lobbying with governments, pushing their tools into the
               | hands of carpenters, never taking responsibility, then I
               | would _also_ criticize the toolmaker. It's also a
               | toolmaker's responsibility to be honest about what the
               | tool should be used for.
               | 
               | I think it's a bit too simplistic to say <<AI is not the
               | problem>> with the current state of the industry.
        
               | jascha_eng wrote:
               | OpenAI and Anthropic at least are both pretty clear about
               | the fact that you need to check the output:
               | 
               | https://openai.com/policies/row-terms-of-use/
               | 
               | https://www.anthropic.com/legal/aup
               | 
               | OpenAI:
               | 
               | > When you use our Services you understand and agree:
               | 
               | Output may not always be accurate. You should not rely on
               | Output from our Services as a sole source of truth or
               | factual information, or as a substitute for professional
               | advice. You must evaluate Output for accuracy and
               | appropriateness for your use case, including using human
               | review as appropriate, before using or sharing Output
               | from the Services. You must not use any Output relating
               | to a person for any purpose that could have a legal or
               | material impact on that person, such as making credit,
               | educational, employment, housing, insurance, legal,
               | medical, or other important decisions about them. Our
               | Services may provide incomplete, incorrect, or offensive
               | Output that does not represent OpenAI's views. If Output
               | references any third party products or services, it
               | doesn't mean the third party endorses or is affiliated
               | with OpenAI.
               | 
               | Anthropic:
               | 
               | > When using our products or services to provide advice,
               | recommendations, or in subjective decision-making
               | directly affecting individuals or consumers, a qualified
               | professional in that field must review the content or
               | decision prior to dissemination or finalization. You or
               | your organization are responsible for the accuracy and
               | appropriateness of that information.
               | 
               | So I don't think we can say they are lying.
               | 
               | A poor workman blames his tools. So please take
               | responsibility for what you deliver. And if the result is
               | bad, you can learn from it. That doesn't have to mean not
               | use AI but it definitely means that you need to fact
               | check more thoroughly.
        
               | embedding-shape wrote:
               | If I hired a carpenter, he did a bad job, and he starts
               | to blame the toolmaker because they lobby the government
               | and over-promised what that hammer could do, I'd _still
               | put the blame on the carpenter_. It 's his tools, I
               | couldn't give less of a damn why he got them, I trust him
               | to be a professional, and if he falls for some scam or
               | over-promised hammers, that means he did a bad job.
               | 
               | Just like as a software developer, you cannot blame
               | Amazon because your platform is down, if you chose to
               | host all of your platform there. _You_ made that choice,
               | _you_ stand for the consequences, pushing the blame on
               | the ones who are providing you with the tooling is the
               | action of someone weak who fail to realize their own
               | responsibilities. Professionals take responsibility for
               | every choice they make, not just the good ones.
               | 
               | > I think it's a bit too simplistic to say <<AI is not
               | the problem>> with the current state of the industry.
               | 
               | Agree, and I wouldn't say anything like that either,
               | which makes it a bit strange to include a reply to
               | something no one in this comment thread seems to have
               | said.
        
               | SauntSolaire wrote:
               | Yes, that's what it means to be a professional, you take
               | responsibility for the quality of your work.
        
               | peppersghost93 wrote:
               | It's a shame the slop generators don't ever have to take
               | responsibility for the trash they've produced.
        
               | SauntSolaire wrote:
               | That's beside the point. While there may be many
               | reasonable critiques of AI, none of them reduce the
               | responsibilities of scientist.
        
               | peppersghost93 wrote:
               | Yeah this is a prime example of what I'm talking about.
               | AI's produce trash and it's everyone else's problem to
               | deal with.
        
               | SauntSolaire wrote:
               | Yes, it's the scientists problem to deal with it - that's
               | the choice they made when they decided to use AI for
               | their work. Again, this is what responsibility means.
        
               | peppersghost93 wrote:
               | This inspires me to make horrible products and shift the
               | blame to the end user for the product being horrible in
               | the first place. I can't take any blame for anything
               | because I didn't force them to use it.
        
               | thfuran wrote:
               | >While there many reasonable critiques of AI
               | 
               | But you just said we weren't supposed to criticize the
               | purveyors of AI or the tools themselves.
        
               | SauntSolaire wrote:
               | No, I merely said that the scientist is the one
               | responsible for the quality of their own work. Any
               | critiques you may have for the tools which they use don't
               | lessen this responsibility.
        
               | thfuran wrote:
               | >No, I merely said that the scientist is the one
               | responsible for the quality of their own work.
               | 
               | No, you expressed unqualified agreement with a comment
               | containing
               | 
               | "And yet, we're not supposed to criticize the tool or its
               | makers?"
               | 
               | >Any critiques you may have for the tools which they use
               | don't lessen this responsibility.
               | 
               | People don't exist or act in a vacuum. That a scientist
               | is responsible for the quality of their work doesn't mean
               | that a spectrometer manufacture that advertises specs
               | that their machines can't match and induces universities
               | through discounts and/or dubious advertising claims to
               | push their labs to replace their existing spectrometers
               | with new ones which have many bizarre and unexpected
               | behaviors including but not limited to sometimes just
               | fabricating spurious readings has made no contribution to
               | the problem of bad results.
        
               | SauntSolaire wrote:
               | You can criticize the tool or its makers, but not as a
               | means to lessen the responsibility of the professional
               | using it (the rest of the quoted comment). I agree with
               | the GP, it's not a valid excuse for the scientist's poor
               | quality of work.
        
               | thfuran wrote:
               | I just substantially edited the comment you replied to.
        
               | adestefan wrote:
               | The entire thread is people missing this simple point.
        
               | bossyTeacher wrote:
               | Well, then what does this say of LLM engineers at
               | literally any AI company in existence if they are
               | delivering AI that is unreliable then? Surely, they must
               | take responsibility for the quality of their work and not
               | blame it on something else.
        
               | embedding-shape wrote:
               | I feel like what "unreliable" means, depends on well you
               | understand LLMs. I use them in my professional work, and
               | they're reliable in terms of I'm always getting tokens
               | back from them, I don't think my local models have failed
               | even once at doing just that. And this is the product
               | that is being sold.
               | 
               | Some people take that to mean that responses from LLMs
               | are (by human standards) "always correct" and "based on
               | knowledge", while this is a misunderstanding about how
               | LLMs work. They don't know "correct" nor do they have
               | "knowledge", they have tokens, that come after tokens,
               | and that's about it.
        
               | amrocha wrote:
               | it's not "some people", it's practically everyone that
               | doesn't understand how these tools work, and even some
               | people that do.
               | 
               | Lawyers are running their careers by citing hallucinated
               | cases. Researchers are writing papers with hallucinated
               | references. Programmers are taking down production by not
               | verifying AI code.
               | 
               | Humans were made to do things, not to verify things.
               | Verifying something is 10x harder than doing it right. AI
               | in the hands of humans is a foot rocket launcher.
        
               | embedding-shape wrote:
               | > it's not "some people", it's practically everyone that
               | doesn't understand how these tools work, and even some
               | people that do.
               | 
               | Again, true for most things. A lot of people are terrible
               | drivers, terrible judge of their own character, and
               | terrible recreational drug users. Does that mean we need
               | to remove all those things that can be misused?
               | 
               | I much rather push back on shoddy work no matter what
               | source. I don't care if the citations are from a robot or
               | a human, if they suck, then you suck, because you're
               | presenting this as your work. I don't care if your
               | paralegal actually wrote the document, be responsible for
               | the work you supposedly do.
               | 
               | > Humans were made to do things, not to verify things.
               | 
               | I'm glad you seemingly have some grand idea of what
               | humans were meant to do, I certainly wouldn't claim I do
               | so, but I'm also not religious. For me, humans do what
               | humans do, and while we didn't used to mostly sit down
               | and consume so much food and other things, now we do.
        
               | bossyTeacher wrote:
               | > they're reliable in terms of I'm always getting tokens
               | back from them
               | 
               | This is not what you are being sold though. They are not
               | selling you "tokens". Check their marketing articles and
               | you will not see the word token or synonym on any of
               | their headings or subheadings. You are being sold these
               | abilities:
               | 
               | - "Generate reports, draft emails, summarize meetings,
               | and complete projects."
               | 
               | - "Automate repetitive tasks, like converting screenshots
               | or dashboards into presentations ... rearranging meetings
               | ... updating spreadsheets with new financial data while
               | retaining the same formatting."
               | 
               | - "Support-type automation: e.g. customer support agents
               | that can summarize incoming messages, detect sentiment,
               | route tickets to the right team."
               | 
               | - "For enterprise workflows: via Gemini Enterprise --
               | allowing firms to connect internal data sources (e.g.
               | CRM, BI, SharePoint, Salesforce, SAP) and build custom AI
               | agents that can: answer complex questions, carry out
               | tasks, iterate deliverables -- effectively automating
               | internal processes."
               | 
               | These are taken straight from their websites. The idea
               | that you are JUST being sold tokens is as hilariously
               | fictional as any company selling you their app was
               | actually just selling you patterns of pixels on your
               | screen.
        
               | concinds wrote:
               | I use those LLM "deep research" modes every now and then.
               | They can be useful for some use cases. I'd never think to
               | freaking paste it into a paper and submit it or publish
               | it without checking; that boggles the mind.
               | 
               | The problem is that a researcher who does _that_ is
               | almost guaranteed to be careless about other things too.
               | So the problem isn 't just the LLM, or even the
               | citations, but the ambient level of acceptable
               | mediocrity.
        
           | k4rli wrote:
           | Very good analogy I'd say.
           | 
           | Also similar to what Temu, Wish, and other similar sites
           | offer. Picture and specs might look good but it will likely
           | be disappointing in the end.
        
         | CapitalistCartr wrote:
         | I'm an industrial electrician. A lot of poor electrical work is
         | visible only to a fellow electrician, and sometimes only
         | another industrial electrician. Bad technical work requires
         | technical inspectors to criticize. Sometimes highly skilled
         | ones.
        
           | andy99 wrote:
           | I've reviewed a lot of papers, I don't consider it the
           | reviewers responsibility to manually verify all citations are
           | real. If there was an unusual citation that was relied on
           | heavily for the basis of the work, one would expect it to be
           | checked. Things like broad prior work, you'd just assume it's
           | part of background.
           | 
           | The reviewer is not a proofreader, they are checking the
           | rigour and relevance of the work, which does not rest heavily
           | on all of the references in a document. They are also
           | assuming good faith.
        
             | zdragnar wrote:
             | This is half the basis for the replication crisis, no?
             | Shady papers come out and people cite them endlessly with
             | no critical thought or verification.
             | 
             | After all, their grant covers their thesis, not their
             | thesis plus all of the theses they cite.
        
             | Aurornis wrote:
             | > I don't consider it the reviewers responsibility to
             | manually verify all citations are real
             | 
             | I guess this explains all those times over the years where
             | I follow a citation from a paper and discover it doesn't
             | support what the first paper claimed.
        
             | auggierose wrote:
             | In short, a review has no objective value, it is just an
             | obstacle to be gamed.
        
               | amanaplanacanal wrote:
               | In theory, the review tries to determine if the
               | conclusion reached actually follows from whatever data is
               | provided. It assumes that everything is honest, it's just
               | looking to see if there were mistakes made.
        
               | auggierose wrote:
               | Honest or not should not make a difference, after all,
               | the submitting author may believe themselves everything
               | is A-OK.
               | 
               | The review should also determine how valuable the
               | contribution is, not only if it has mistakes or not.
               | 
               | Todays reviews determine neither value nor correctness in
               | any meaningful way. And how could they, actually? That is
               | why I review papers only to the extent that I understand
               | them, and I clearly delineate my line of understanding.
               | And I don't review papers that I am not interested in
               | reading. I once got a paper to review that actually
               | pointed out a mistake in one of my previous papers, and
               | then proposed a different solution. They correctly
               | identified the mistake, but I could not verify if their
               | solution worked or not, that would have taken me several
               | weeks to understand. I gave a report along these lines,
               | and the person who gave me the review said I should say
               | more about their solution, but I could not. So my review
               | was not actually used. The paper was accepted, which is
               | fine, but I am sure none of the other reviewers actually
               | knows if it is correct.
               | 
               | Now, this was a case where I was an absolute expert.
               | Which is far from the usual situation for a reviewer,
               | even though many reviewers give themselves the highest
               | mark for expertise when they just should not.
        
             | pbhjpbhj wrote:
             | Surely there are tools to retrieve all the citations,
             | publishers should spot it easily.
             | 
             | However the paper is submitted, like a folder on a cloud
             | drive, just have them include a folder with PDFs/abstracts
             | of all the citations?
             | 
             | They might then fraudulently produce papers to cite, but
             | they can't cite something that doesn't exist.
        
               | tpoacher wrote:
               | how delightfully optimistic of you to think those
               | abstracts would not also be ai generated ...
        
               | zzzeek wrote:
               | sure but then the citations are no longer "hallucinated",
               | they actually point to something fraudulent. that's a
               | different problem.
        
               | michaelt wrote:
               | _> Surely there are tools to retrieve all the citations,_
               | 
               | Even if you could retrieve all citations (which isn't
               | always as easy as you might hope) to validate citations
               | you'd also have to confirm the paper says what the person
               | citing it says. If I say "A GPU requires 1.4kg of copper"
               | citing [1] is that a valid citation?
               | 
               | That means not just reviewing one paper, but also
               | potentially checking 70+ papers it cites. The vast
               | majority of paper reviewers will not check citations
               | actually say what they're claimed to say, unless a truly
               | outlandish claim is made.
               | 
               | At the same time, academia is strangely resistant to
               | putting hyperlinks in citations, preferring to maintain
               | old traditions - like citing conference papers by page
               | number in a hypothetical book that has never been
               | published; and having both a free and a paywalled version
               | of a paper while considering the paywalled version the
               | 'official' version.
               | 
               | [1] https://arxiv.org/pdf/2512.04142
        
             | grayhatter wrote:
             | > The reviewer is not a proofreader, they are checking the
             | rigour and relevance of the work, which does not rest
             | heavily on all of the references in a document.
             | 
             | I've always assumed peer review is similar to diff review.
             | Where I'm willing to sign my name onto the work of others.
             | If I approve a diff/pr and it takes down prod. It's just as
             | much my fault, no?
             | 
             | > They are also assuming good faith.
             | 
             | I can only relate this to code review, but assuming good
             | faith means you assume they didn't try to introduce a bug
             | by adding this dependency. But I would should still check
             | to make sure this new dep isn't some typosquatted package.
             | That's the rigor I'm responsible for.
        
               | chroma205 wrote:
               | > I've always assumed peer review is similar to diff
               | review. Where I'm willing to sign my name onto the work
               | of others. If I approve a diff/pr and it takes down prod.
               | It's just as much my fault, no?
               | 
               | No.
               | 
               | Modern peer review is "how can I do minimum possible work
               | so I can write 'ICLR Reviewer 2025' on my personal
               | website"
        
               | grayhatter wrote:
               | > No. [...] how can I do minimum possible work
               | 
               | I don't know, I still think this describes most of the
               | reviews I've seen
               | 
               | I just hope most devs that do this know better than to
               | admit to it.
        
               | freehorse wrote:
               | The vast majority of people I see do not even mention who
               | they review for in CVs etc. It is usually more akin to a
               | volunteer based, thankless work. Unless you are an editor
               | or sth in a journal, what you review for does not count
               | much for anything.
        
               | tpoacher wrote:
               | This is true, but here the equivalent situation is
               | someone using a greek question mark (";") instead of a
               | semicolon (";"), and you as a code reviewer are only
               | expected to review the code visually and are not provided
               | the resources required to compile the code on your local
               | machine to see the compiler fail.
               | 
               | Yes in theory you can go through every semicolon to check
               | if it's not actually a greek question mark; but one
               | assumes good faith and baseline competence such that you
               | as the reviewer would generally not be expected to
               | perform such pedantic checks.
               | 
               | So if you think you might have reasonably missed greek
               | question marks in a visual code review, then hopefully
               | you can also appreciate how a paper reviewer might miss a
               | false citation.
        
               | scythmic_waves wrote:
               | > as a code reviewer [you] are only expected to review
               | the code visually and are not provided the resources
               | required to compile the code on your local machine to see
               | the compiler fail.
               | 
               | As a PR reviewer I frequently pull down the code and run
               | it. Especially if I'm suggesting changes because I want
               | to make sure my suggestion is correct.
               | 
               | Do other PR reviewers not do this?
        
               | tpoacher wrote:
               | I do too, but this is a conference, I doubt code was
               | provided.
               | 
               | And even then, what you're describing isn't review per
               | se, it's replication. In principle there are entire
               | journals that one can submit replication reports to,
               | which count as actual peer reviewable publications in
               | themselves. So one needs to be pragmatic with what is
               | expected from a peer review (especially given the
               | imbalance between resources invested to create one versus
               | the lack of resources offered and lack of any meaningful
               | reward)
        
               | Majromax wrote:
               | > I do too, but this is a conference, I doubt code was
               | provided.
               | 
               | Machine learning conferences generally encourage
               | (anonymized) submission of code. However, that still
               | doesn't mean that replication is easy. Even if the data
               | is also available, replication of results might require
               | impractical levels of compute power; it's not realistic
               | to ask a peer reviewer to pony up for a cloud account to
               | reproduce even medium-scale results.
        
               | grayhatter wrote:
               | > Do other PR reviewers not do this?
               | 
               | Some do, many, (like peer reviewers), are unable to
               | consider the consequences of their negligence.
               | 
               | But it's always a welcome reminder that some people care
               | about doing good work. That's easy to forget browsing HN,
               | so I appreciate the reminder :)
        
               | dataflow wrote:
               | I don't _commonly_ do this and I don 't know many people
               | who do this frequently either. But it depends strongly on
               | the code, the risks, the gains of doing so, the
               | contributor, the project, the state of testing and how
               | else an error would get caught (I guess this is another
               | way of saying "it depends on the risks"), etc.
               | 
               | E.g. you can imagine that if I'm reviewing changes in
               | authentication logic, I'm obviously going to put a lot
               | more effort into validation than if I'm reviewing a
               | container and wondering if it would be faster as a
               | hashtable instead of a tree.
               | 
               | > because I want to make sure my suggestion is correct.
               | 
               | In this case I would just ask "have you already also
               | tried X" which is much faster than pulling their code,
               | implementing your suggestion, and waiting for a build and
               | test to run.
        
               | lesam wrote:
               | If there's anything I would want to run to verify, I ask
               | the author to add a unit test. Generally, the existing CI
               | test + new tests in the PR having run successfully is
               | enough. I might pull and run it if I am not sure whether
               | a particular edge case is handled.
               | 
               | Reviewers wanting to pull and run many PRs makes me think
               | your automated tests need improvement.
        
               | Terr_ wrote:
               | I don't, but that's because ensuring the PR compiles and
               | passes old+new automated tests is an enforced requirement
               | before it goes out.
               | 
               | So running it myself involves judging other risks, much
               | higher-level ones than bad unicode characters, like the
               | GUI button being in the wrong place.
        
               | vkou wrote:
               | > Do other PR reviewers not do this?
               | 
               | No, because this is usually a waste of time, because CI
               | enforces that the code and the tests can run at
               | submission time. If your CI isn't doing it, you should
               | put some work in to configure it.
               | 
               | If you regularly have to do this, your codebase should
               | probably have more tests. If you don't trust the author,
               | you should ask them to include test cases for whatever it
               | is that you are concerned about.
        
               | grayhatter wrote:
               | > This is true, but here the equivalent situation is
               | someone using a greek question mark (";") instead of a
               | semicolon (";"),
               | 
               | No it's not. I think you're trying to make a different
               | point, because you're using an example of a specific
               | deliberate malicious way to hide a token error that
               | prevents compilation, but is visually similar.
               | 
               | > and you as a code reviewer are only expected to review
               | the code visually and are not provided the resources
               | required to compile the code on your local machine to see
               | the compiler fail.
               | 
               | What weird world are you living in where you don't have
               | CI. Also, it's pretty common I'll test code locally when
               | reviewing something more complex, more complex, or more
               | important, if I don't have CI.
               | 
               | > Yes in theory you can go through every semicolon to
               | check if it's not actually a greek question mark; but one
               | assumes good faith and baseline competence such that you
               | as the reviewer would generally not be expected to
               | perform such pedantic checks.
               | 
               | I don't, because it won't compile. Not because I assume
               | good faith. References and citations are similar to
               | introducing dependencies. We're talking about completely
               | fabricated deps. e.g. This engineer went on npm and
               | grabbed the first package that said left-pad but it's
               | actually a crypto miner. We're not talking about a
               | citation missing a page number, or publication year.
               | We're talking about something that's completely
               | incorrect, being represented as relevant.
               | 
               | > So if you think you might have reasonably missed greek
               | question marks in a visual code review, then hopefully
               | you can also appreciate how a paper reviewer might miss a
               | false citation.
               | 
               | I would never miss this, because the important thing is
               | code needs to compile. If it doesn't compile, it doesn't
               | reach the master branch. Peer review of a paper doesn't
               | have CI, I'm aware, but it's also not vulnerable to
               | syntax errors like that. A paper with a fake semicolon
               | isn't meaningfully different, so this analogy doesn't map
               | to the fraud I'm commenting on.
        
               | tpoacher wrote:
               | you have completely missed the point of the analogy.
               | 
               | breaking the analogy beyond the point where it is useful
               | by introducing non-generalising specifics is not a useful
               | argument. Otherwise I can counter your more specific non-
               | generalising analogy by introducing little green aliens
               | sabotaging your imaginary CI with the same ease and
               | effect.
        
               | grayhatter wrote:
               | I disagree you could do that and claim to be reasonable.
               | 
               | But I agree, because I'd rather discuss the pragmatics
               | and not bicker over the semantics about an analogy.
               | 
               | Introducing a token error, is different from plagiarism,
               | no? Someone wrote code that can't compile, is different
               | from someone "stealing" proprietary code from some
               | company, and contributing it to some FOSS repo?
               | 
               | In order to assume good faith, you also need to assume
               | the author is the origin. But that's clearly not the
               | case. The origin is from somewhere else, and the author
               | that put their name on the paper didn't verify it, and
               | didn't credit it.
        
               | tpoacher wrote:
               | Sure but the focus here is on the reviewer not the
               | author.
               | 
               | The point is what is expected as reasonable review before
               | one can "sign their name on it".
               | 
               | "Lazy" (or possibly malicious) authors will always have
               | incentives to cut corners as long as no mechanisms exist
               | to reject (or even penalise) the paper on submission
               | automatically. Which would be the equivalent of a
               | "compiler error" in the code analogy.
               | 
               | Effectively the point is, in the absence of such tools,
               | the reviewer can only reasonably be expected to "look
               | over the paper" for high-level issues; catching such low-
               | level issues via manual checks by reviewers has massively
               | diminishing returns for the extra effort involved.
               | 
               | So I don't think the conference shaming the reviewers
               | here in the absence of providing such tooling is
               | appropriate.
        
               | xvilka wrote:
               | Code correctness should be checked automatically with the
               | CI and testsuite. New tests should be added. This is
               | exactly what makes sure these stupid errors don't bother
               | the reviewer. Same for the code formatting and
               | documentation.
        
               | thfuran wrote:
               | What exactly is the analogy you're suggesting, using LLMs
               | to verify the citations?
        
               | tpoacher wrote:
               | not OP, but that wouldn't really be necessary.
               | 
               | One could submit their bibtex files and expect bibtex
               | citations to be verifiable using a low level checker.
               | 
               | Worst case scenario if your bibtex citation was a variant
               | of one in the checker database you'd be asked to correct
               | it to match the canonical version.
               | 
               | However, as others here have stated, hallucinated
               | "citations" are actually the lesser problem. Citing
               | irrelevant papers based on a fly-by reference is a much
               | harder problem; this was present even before LLMs, but
               | this has now become far worse with LLMs.
        
               | thfuran wrote:
               | Yes, I think verifying mere existence of the cited paper
               | barely moves the needle. I mean, I guess automated
               | verification of that is a cheap rejection criterion, but
               | I don't think it's overall very useful.
        
               | merely-unlikely wrote:
               | This discussion makes me think peer reviews need more
               | automated tooling somewhat analogous to what software
               | engineers have long relied on. For example, a tool could
               | use an LLM to check that the citation actually
               | substantiates the claim the paper says it does, or else
               | flags the claim for review.
        
               | noitpmeder wrote:
               | I'd go one further and say all published papers should
               | come with a clear list of "claimed truths", and one is
               | only able to cite said paper if they are linking in to an
               | explicit truth.
               | 
               | Then you can build a true hierarchy of citation
               | dependencies, checked 'statically', and have better
               | indications of impact if a fundamental truth is
               | disproven, ...
        
               | vkou wrote:
               | Have you authored a lot of non-CS papers?
               | 
               | Could you provide a proof of concept paper for that sort
               | of thing? Not a toy example, an _actual_ example, derived
               | from messy real-world data, in a non-trivial[1] field?
               | 
               | ---
               | 
               | [1] Any field is non-trivial when you get deep enough
               | into it.
        
               | dilawar wrote:
               | > I've always assumed peer review is similar to diff
               | review. Where I'm willing to sign my name onto the work
               | of others. If I approve a diff/pr and it takes down prod.
               | It's just as much my fault, no?
               | 
               | Ph.D. in neuroscience here. Programmer by trade. This is
               | not true. Less you know about most peer revies is better.
               | 
               | The better peer reviews are also not this 'thorough' and
               | no one expects reviewers to read or even check
               | references. Unless they are citing something they are
               | familiar with and you are using it wrong then they will
               | likely complain. Or they find some unknown citations very
               | relevant to their work, they will read.
               | 
               | I don't have a great analogy to draw here. peer review is
               | usually a thankless and unpaid work so there is unlikely
               | to be any motivation for fraud detection unless it
               | somehow affects your work.
        
               | wpollock wrote:
               | > The better peer reviews are also not this 'thorough'
               | and no one expects reviewers to read or even check
               | references.
               | 
               | Checking references can be useful when you are not
               | familiar with the topic (but must review the paper
               | anyway). In many conference proceedings that I have
               | reviewed for, many if not most citations were redacted so
               | as to keep the author anonymous (citations to the
               | author's prior work or that of their colleagues).
               | 
               | LLMs could be used to find prior work anyway, today.
        
               | pron wrote:
               | That is not, cannot be, and shouldn't be, the bar for
               | peer review. There are two major differences between it
               | and code review:
               | 
               | 1. A patch is self-contained and applies to a codebase
               | you have just as much access to as the author. A paper,
               | on the other hand, is just the tip of the iceberg of
               | research work, especially if there is some experiment or
               | data collection involved. The reviewer does not have
               | access to, say, videos of how the data was collected (and
               | even if they did, they don't have the time to review all
               | of that material).
               | 
               | 2. The software is also self-contained. That's
               | "prodcution". But a scientific paper does not necessarily
               | aim to represent scientific consensus, but a finding by a
               | particular team of researchers. If a paper's conclusions
               | are wrong, it's expected that it will be refuted by
               | another paper.
        
               | grayhatter wrote:
               | > That is not, cannot be, and shouldn't be, the bar for
               | peer review.
               | 
               | Given the repeatability crisis I keep reading about,
               | maybe something should change?
               | 
               | > 2. The software is also self-contained. That's
               | "prodcution". But a scientific paper does not necessarily
               | aim to represent scientific consensus, but a finding by a
               | particular team of researchers. If a paper's conclusions
               | are wrong, it's expected that it will be refuted by
               | another paper.
               | 
               | This is a much, MUCH stronger point. I would have lead
               | with this because the contrast between this assertion,
               | and my comparison to prod is night and day. The rules for
               | prod are different from the rules of scientific
               | consensus. I regret losing sight of that.
        
               | hnfong wrote:
               | IMHO what should change is we stop putting "peer
               | reviewed" articles on a pedestal.
               | 
               | Even if peer review is as rigorous as code reviewed (the
               | former which is usually unpaid), we all know that
               | reviewed code still has bugs, and a programmer would be
               | nuts to go around saying "this code is reviewed by
               | experts, we can assume it's bug free, right?"
               | 
               | But there are too many people who are just assuming peer
               | reviewed articles means they're somehow automatically
               | correct.
        
               | vkou wrote:
               | > IMHO what should change is we stop putting "peer
               | reviewed" articles on a pedestal.
               | 
               | Correct. Peer review is a minimal and _necessary_ but not
               | sufficient step.
        
               | garden_hermit wrote:
               | > Given the repeatability crisis I keep reading about,
               | maybe something should change?
               | 
               | The replication crisis -- assuming that it is actually a
               | crisis -- is not really solvable with peer review. If I'm
               | reviewing a psychology paper presenting the results of an
               | experiment, I am not able to re-conduct the entire
               | experiment as presented by the authors, which would
               | require completely changing my lab, recruiting and paying
               | participants, and training students & staff.
               | 
               | Even if I did this, and came to a different result than
               | the original paper, what does it mean? Maybe I did
               | something wrong in the replication, maybe the result is
               | only valid for certain populations, maybe inherent
               | statistical uncertainty means we just get different
               | results.
               | 
               | Again, the replication crisis -- such that it exists --
               | is not the result of peer review.
        
               | bjourne wrote:
               | For ICLR reviewers were asked to review 5 papers in two
               | weeks. Unpaid voluntary work in addition to their normal
               | teaching, supervision, meetings, and other research
               | duties. It's just not possible to understand and
               | thoroughly review each paper even for topic experts. If
               | you want to compare peer review to coding, it's more like
               | "no syntax errors, code still compiles" rather than pr
               | review.
        
               | freehorse wrote:
               | A reviewer is assessing the relevance and "impact" of a
               | paper rather than correctness itself directly. Reviewers
               | may not even have access to the data itself that authors
               | may have used. The way it essentially works is an editor
               | asks the reviewers "is this paper worthy to be published
               | in my journal?" and the reviewers basically have to
               | answer that question. The process is actually the
               | editor/journal's responsibility.
        
             | stdbrouw wrote:
             | The idea that references in a scientific paper should be
             | plentiful but aren't really that important, is a
             | consequence of a previous technological revolution: the
             | internet.
             | 
             | You'll find a lot of papers from, say, the '70s, with a
             | grand total of maybe 10 references, all of them to crucial
             | prior work, and if those references don't say what the
             | author claims they should say (e.g. that the particular
             | method that is employed is valid), then chances are that
             | the current paper is weaker than it seems, or even invalid,
             | and so it is extremely important to check those references.
             | 
             | Then the internet came along, scientists started padding
             | their work with easily found but barely relevant references
             | and journal editors started requiring that even "the earth
             | is round" should be well-referenced. The result is that
             | peer reviewers feel that asking them to check the
             | references is akin to asking them to do a spell check. Fair
             | enough, I agree, I usually can't be bothered to do many or
             | any citation checks when I am asked to do peer review, but
             | it's good to remember that this in itself is an indication
             | of a perverted system, which we just all ignored -- at our
             | peril -- until LLM hallucinations upset the status quo.
        
               | tialaramex wrote:
               | Whether in the 1970s or now, it's too often the case that
               | a paper says "Foo and Bar are X" and cites two sources
               | for this fact. You chase down the sources, the first one
               | says "We weren't able to determine whether Foo is X" and
               | never mentions Bar. The second says "Assuming Bar is X,
               | we show that Foo is probably X too".
               | 
               | The paper author likely _believes_ Foo and Bar are X, it
               | may well be that all their co-workers, if asked, would
               | say that Foo and Bar are X, but  "Everybody I have coffee
               | with agrees" can't be cited, so we get this sort of junk
               | citation.
               | 
               |  _Hopefully_ it 's not crucial to the new work that Foo
               | and Bar are in fact X. But that's not always the case,
               | and it's a problem that years later somebody else will
               | cite this paper, for the claim "Foo and Bar are X" which
               | it was in fact merely citing erroneously.
        
               | KHRZ wrote:
               | LLMs can actually make up for their negative
               | contributions. They could go through all the references
               | of all papers and verify them, assuming someone would
               | also look into what gets flagged for that final seal of
               | disapproval.
               | 
               | But this would be more powerfull with an open knowledge
               | base where all papers and citation verifications were
               | registered, so that all the effort put into verification
               | could be reused, and errors propagated through the
               | citation chain.
        
               | bossyTeacher wrote:
               | >LLMs can actually make up for their negative
               | contributions. They could go through all the references
               | of all papers and verify them,
               | 
               | They will just hallucinate their existence. I have tried
               | this before
        
               | sansseriff wrote:
               | I don't see why this would be the case with proper tool
               | calling and context management. If you tell a model with
               | blank context 'you are an extremely rigorous reviewer
               | searching for fake citations in a possibly compromised
               | text' then it will find errors.
               | 
               | It's this weird situation where getting agents to act
               | against other agents is more effective than trying to
               | convince a working agent that it's made a mistake.
               | Perhaps because these things model the cognitive
               | dissonance and stubbornness of humans?
        
               | bossyTeacher wrote:
               | If you truly think that you have an effective solution to
               | hallucinations, you will become instantly rich because
               | literally no one out there has an idea for an
               | economically and technologically feasible solution to
               | hallucinations
        
               | whatyesaid wrote:
               | For references, as the OP said, I don't see why it isn't
               | possible. It's something that exists and is accessible
               | (even if paywalled) or doesn't exist. For reasoning
               | hallucinations are different.
        
               | logifail wrote:
               | > I don't see why it isn't possible
               | 
               | (In good faith) I'm trying really hard not to see this as
               | an "argument from incredulity"[0] and I'm stuggling...
               | 
               | Full disclosure: natural sciences PhD, and a couple of
               | (IMHO lame) published papers, and so I've seen the
               | "inside" of how lab science is done, and is (sometimes)
               | published. It's not pretty :/
               | 
               | [0]
               | https://en.wikipedia.org/wiki/Argument_from_incredulity
        
               | fao_ wrote:
               | > I don't see why this would be the case
               | 
               | But it is the case, and hallucinations are a fundamental
               | part of LLMs.
               | 
               | Things are often true despite us not seeing why they are
               | true. Perhaps we should listen to the experts who used
               | the tools and found them faulty, in this instance, rather
               | than arguing with them that "what they say they have
               | observed isn't the case".
               | 
               | What you're basically saying is "You are holding the tool
               | wrong", but you do not give examples of how to hold it
               | correctly. You are blaming the failure of the tool, which
               | has very, very well documented flaws, on the person whom
               | the tool was designed for.
               | 
               | To frame this differently so your mind will accept it: If
               | you get 20 people in a QA test saying "I have this
               | problem", then the problem isn't those 20 people.
        
               | sebastiennight wrote:
               | One _incorrect_ way to think of it is  "LLMs will
               | sometimes hallucinate when asked to produce content, but
               | will provide grounded insights when merely asked to
               | review/rate existing content".
               | 
               | A more productive (and secure) way to think of it is that
               | all LLMs are "evil genies" or extremely smart,
               | adversarial agents. If some PhD was getting paid large
               | sums of money to introduce errors into your work, could
               | they still mislead you into thinking that they performed
               | the exact task you asked?
               | 
               | Your prompt is                   'you are an extremely
               | rigorous reviewer searching for fake citations in a
               | possibly compromised text'
               | 
               | - It is easy for the (compromised) reviewer to surface
               | false positives: nitpick citations that are in fact
               | correct, by surfacing irrelevant or made-up segments of
               | the original research, hence making you think that the
               | citation is incorrect.
               | 
               | - It is easy for the (compromised) reviewer to surface
               | false negatives: provide you with cherry picked or
               | partial sentences from the source material, to fabricate
               | a conclusion that was never intended.
               | 
               | You do not solve the problem of unreliable actors by
               | splitting them into two teams and having one unreliable
               | actor review the other's work.
               | 
               | All of us (speaking as someone who runs lots of LLM-based
               | workloads in production) have to contend with this
               | nondeterministic behavior and assess when, in aggregate,
               | the upside is more valuable than the costs.
        
               | sebastiennight wrote:
               | Note: the more _accurate_ mental model is that you 've
               | got "good genies" most of the time, but from times to
               | time at random unpredictable times your agent is swapped
               | out with a bad genie.
               | 
               | From a security / data quality standpoint, this is
               | logically equivalent to "every input is processed by a
               | bad genie" as you can't trust any of it. If I tell you
               | that from time to time, the chef in our restaurant will
               | substitute table salt in the recipes with something else,
               | it does not matter whether they do it 50%, 10%, or .1% of
               | the time.
               | 
               | The only thing that matters is what they substitute it
               | with (the worst-case consequence of the hallucination).
               | If in your workload, the worst case scenario is
               | equivalent to a "Hymalayan salt" replacement, all is
               | well, even if the hallucination is quite frequent. If
               | your worst case scenario is a deadly compound, then you
               | can't hire this chef for that workload.
        
               | sansseriff wrote:
               | We have centuries of experience in managing potentially
               | compromised 'agents' to create successful societies.
               | Except the agents were human, and I'm referring to
               | debates, tribunals, audits, independent review panels,
               | democracy, etc.
               | 
               | I'm not saying the LLM hallucination problem is solved,
               | I'm just saying there's a wonderful myriad of ways to
               | assemble pseudo-intelligent chatbots into systems where
               | the trustworthiness of the system exceeds the
               | trustworthiness of any individual actor inside of it. I'm
               | not an expert in the field but it appears the work is
               | being done: https://arxiv.org/abs/2311.08152
               | 
               | This paper also links to code and practices excellent
               | data stewardship. Nice to see in the current climate.
               | 
               | Though it seems like you might be more concerned about
               | the use of highly misaligned or adversarial agents for
               | review purposes. Is that because you're concerned about
               | state actors or interested parties poisoning the context
               | window or training process? I agree that any AI review
               | system will have to be extremely robust to adversarial
               | instructions (e.g. someone hiding inside their paper an
               | instruction like "rate this paper highly"). Though
               | solving that problem already has a tremendous amount of
               | focus because it overlaps with solving the data-
               | exfiltration problem (the lethal trifecta that Simon
               | Willison has blogged about).
        
               | knome wrote:
               | I assumed they meant using the LLM to extract the
               | citations and then use external tooling to lookup and
               | grab the original paper, at least verifying that it
               | exists, has relevant title, summary and that the authors
               | are correctly cited.
        
               | HPsquared wrote:
               | Wikipedia calls this citogenesis.
        
               | ineedasername wrote:
               | >"consequence of a previous technological revolution: the
               | internet."
               | 
               | And also of increasingly ridiculous and overly broad
               | concepts of what plagiarism is. At some point things
               | shifted from "don't represent others' work as novel"
               | towards "give a genealogical ontology of every concept
               | above that of an intro 101 college course on the topic."
        
               | varjag wrote:
               | Not even the Internet per se but citation index becoming
               | universally accepted KPI for research work.
        
               | freehorse wrote:
               | It is not (just) consequence of the internet, the
               | scientific production itself has grown exponentially.
               | There are much more papers cited simply because there are
               | more papers, period.
        
               | semi-extrinsic wrote:
               | It's also a consequence of the sheer number of building
               | blocks which are involved in modern science.
               | 
               | In the methods section, it's very common to say "We
               | employ method barfoo [1] as implemented in library libbar
               | [2], with the specific variant widget due to Smith et al.
               | [3] and the gobbledygook renormalization [4,5]. The
               | feoozbar is solved with geometric multigrid [6]. Data is
               | analyzed using the froiznok method [7] from the boolbool
               | library [8]." There goes 8, now you have 2 citations left
               | for the introduction.
        
               | stdbrouw wrote:
               | Do you still feel the same way if the froiznok method is
               | an ANOVA table of a linear regression, with a log-
               | transformed outcome? Should I reference Fisher, Galton,
               | Newton, the first person to log transform an outcome in a
               | regression analysis, the first person to log transform
               | the particular outcome used in your paper, the R
               | developers, and Gauss and Markov for showing that under
               | certain conditions OLS is the best linear unbiased
               | estimator? And then a couple of references about the
               | importance of quantitative analysis in general? Because
               | that is the level of detail I'm seeing :-)
        
               | semi-extrinsic wrote:
               | Yeah, there is an interesting question there (always has
               | been). When do you stop citing the paper for a specific
               | model?
               | 
               | Just to take some examples, is BiCGStab famous enough now
               | that we can stop citing van der Vorst? Is the AdS/CFT
               | correspondence well known enough that we can stop citing
               | Maldacena? Are transformers so ubiquitous that we don't
               | have to cite "Attention is all you need" anymore? I would
               | be closer to yes than no on these, but it's not 100%
               | clear-cut.
               | 
               | One obvious criterion has to be "if you leave out the
               | citation, will it be obvious to the reader what you've
               | done/used"? Another metric is approximately "did the
               | original author get enough credit already"?
        
               | HPsquared wrote:
               | Maybe there could be a system to classify the importance
               | of each reference.
        
               | zipy124 wrote:
               | Systems do exist for this, but they're rather crude.
        
             | andai wrote:
             | >I don't consider it the reviewers responsibility to
             | manually verify all citations are real.
             | 
             | Doesn't this sound like something that could be automated?
             | 
             | for paper_name in citations... do a web search for it, see
             | if it there's a page in the results with that title.
             | 
             | That would at least give you "a paper with this name
             | exists".
        
             | PeterStuer wrote:
             | I think the root problem is that everyone involved, from
             | authors to reviewers to publishers, know that 99.999% of
             | papers are completely of no consequence, just empty
             | calories with the sole purpose of padding quotas for all
             | involved, and thus are not going to put in the effort as
             | if.
             | 
             | This is systemic, and unlikely to change anytime soon.
             | There have been remedies proposed (e.g. limits on how many
             | papers an author can publish per year, let's say 4 to be
             | generous), but they are unlikely to gain traction as thoug
             | most would agree onbenefits, all involved in the system
             | would stand to lose short term.
        
             | zzzeek wrote:
             | correct me if I'm wrong but citations in papers follow a
             | specific format, and the case here is that a tool was used
             | to validate that they are all real. Certainly a tool that
             | scans a paper for all citations and verifies that they
             | actually exist in the journals they reference shouldn't be
             | all that technically difficult to achieve?
        
             | figassis wrote:
             | It is absolutely the reviewers job to check citations. Who
             | else will check and what is the point of peer review then?
             | So you'd just happily pass on shoddy work because it's not
             | your job? You're reviewing both the authors work and if
             | there were people to at needed to ensure citations were
             | good, you're checking their work also. This is very much
             | the problem today with this "not my problem" mindset. If it
             | passes review, the reviewer is also at fault. Not excuses.
        
               | dpkirchner wrote:
               | Agreed, and I'd go further. If nobody is reviewing
               | citations they may as well not exist. Why bother?
        
               | vkou wrote:
               | 1. To make it clear what is your work, and what is
               | building on someone else's.
               | 
               | 2. If the paper turns out to be important, people will
               | bother.
               | 
               | 3. There's checking for cursory correctness, and there's
               | forensic torture.
        
               | zipy124 wrote:
               | The problem is most academics just do not have the time
               | to do this for free, or in fact even if paid. In addition
               | you may not even have access to the references. In
               | acoustics it's not uncommon to cite works that don't even
               | exist online and it's unlikely the reviewer will have the
               | work in their library.
        
             | jayess wrote:
             | Wow. I went to law school and was on the law review. That
             | was our precise job for the papers selected for
             | publication. To verify every single citation.
        
               | _blk wrote:
               | Thanks for sharing that. Interesting how there was a
               | solution to a problem that didn't really exist yet.. I
               | mean, I'm sure it was there for a reason, but I assume it
               | was more things like wrongful attribution, missing commas
               | etc. rather than outright invented quotes to fit a
               | narrative or do you have more background on that?
               | 
               | ...at least the mandatory automated checking processes
               | are probably not far off at least for the more reputable
               | journals, but it still makes you wonder how much you can
               | trust the last two years of LLM-enhanced science that is
               | now being quoted in current publications and if those
               | hallucinations can be "reverted" after having been re-
               | quoted. A bit like Wikipedia can be abused to establish
               | facts.
        
             | not2b wrote:
             | Agreed. I used to review lots of submissions for IEEE and
             | similar conferences, and didn't consider it my job to
             | verify every reference. No one did, unless the use of the
             | reference triggered an "I can't believe it said that"
             | reaction. Of course, back then, there wasn't a giant
             | plagiarism machine known to fabricate references, so if
             | tools can find fake references easily the tools should be
             | used.
        
             | armcat wrote:
             | I agree with you (I have reviewed papers in the past),
             | however, made-up citations are a "signal". Why would the
             | authors do that? If they made it up, most likely they
             | haven't really read that prior work. If they haven't, have
             | they really done proper due dilligence on their research?
             | Are they just trying to "beef up" their paper with
             | citations to unfairly build up credibility?
        
           | bdangubic wrote:
           | same (and much, much, much worse) for science
        
           | barfoure wrote:
           | I'd love to hear some examples of poor electrical work that
           | you've come across that's often missed or not seen.
        
             | joshribakoff wrote:
             | I am not an electrician, but when I did projects, I did a
             | lot of research before deciding to hire someone and then I
             | was extremely confused when everyone was proposing doing it
             | slightly differently.
             | 
             | A lot of them proposed ways that seem to violate the code,
             | like running flex tubing beyond the allowed length or
             | amount of turns.
             | 
             | Another example would be people not accounting for needing
             | fireproof covers if they're installing recessed, lighting
             | in between dwelling in certain cities...
             | 
             | Heck, most people don't actually even get the permit. They
             | just do the unpermitted work.
        
             | AstroNutt wrote:
             | A couple had just moved in a house and called me to replace
             | the ceiling fan in the living room. I pulled the flush
             | mount cover down to start unhooking the wire nuts and
             | noticed RG58 (coax cable). Someone had used the center
             | conductor as the hot wire! I ended up running 12/2 Romex
             | from the switch. There was no way in hell I could have
             | hooked it back up the way it was. This is just one example
             | I've come across.
        
           | lencastre wrote:
           | an old boss of mine used to say there are no stupid
           | electricians found alive, as they self select darwin award
           | style
        
           | xnx wrote:
           | No doubt the best electricians are currently better than the
           | best AI, but the best AI is likely now better than the novice
           | homeowner. The trajectory over the past 2 years has been very
           | good. Another five years and AI may be better than all but
           | the very best, or most specialized, electricians.
        
             | legostormtroopr wrote:
             | Current state AI doesn't have hands. How can it possibly be
             | better at installing electrics than anyone?
             | 
             | Your post reads like AI precisely because while the grammar
             | is fine, it lacks context - like someone prompted "reply
             | that AI is better than average".
        
               | xnx wrote:
               | An electrician with total knowledge/understanding, but
               | only the average dexterity of a non-professional would
               | still be very useful.
        
         | left-struck wrote:
         | It's like the problem was there all along, all LLMs did was
         | expose it more
        
           | criley2 wrote:
           | https://en.wikipedia.org/wiki/Replication_crisis
           | 
           | Modern science is designed from the top to the bottom to
           | produce bad results. The incentives are all mucked up. It's
           | absolutely not surprising that AI is quickly becoming yet-
           | another factor lowering quality.
        
           | theoldgreybeard wrote:
           | Yes, LLMs didnt create the problem they just accelerated it
           | to a speed that beggars belief.
        
         | thaumasiotes wrote:
         | > If a scientist uses an LLM to write a paper with fabricated
         | citations - that's a crappy scientist.
         | 
         | Really? Regardless of whether it's a good paper?
        
           | zwnow wrote:
           | How is it a good paper if the info in it cant be trusted lmao
        
             | thaumasiotes wrote:
             | Whether the information in the paper can be trusted is an
             | entirely separate concern.
             | 
             | Old Chinese mathematics texts are difficult to date because
             | they often purport to be older than they are. But the
             | contents are unaffected by this. There is a history-of-math
             | problem, but there's no math problem.
        
               | zwnow wrote:
               | Not really true nowadays. Stuff in whitepapers needs to
               | be verifiable which is kinda difficult with
               | hallucinations.
               | 
               | Whether the students directly used LLMs or just read
               | content online that was produced with them and cited
               | after just shows how difficult these things made
               | gathering information that's verifiable.
        
               | thaumasiotes wrote:
               | > Stuff in whitepapers needs to be verifiable which is
               | kinda difficult with hallucinations.
               | 
               | That's... gibberish.
               | 
               | Anything you can do to verify a paper, you can do to
               | verify the same paper with all citations scrubbed.
               | 
               | Whether the citations support the paper, or whether they
               | exist at all, just doesn't have anything to do with what
               | the paper says.
        
               | zwnow wrote:
               | I dont think you know how whitepapers work then
        
               | hnfong wrote:
               | You are totally correct that hallucinated citations do
               | not invalidate the paper. The paper sans citations might
               | be great too (I mean the LLM could generate great stuff,
               | it's possible).
               | 
               | But the author(s) of the paper is almost by definition a
               | bad scientist (or whatever field they are in). When a
               | researcher writes a paper for publication, if they're not
               | expected to write the thing themselves, at least they
               | should be responsible for checking the accuracy of the
               | contents, and citations are part of the paper...
        
           | Aurornis wrote:
           | Citations are a key part of the paper. If the paper isn't
           | supported by the citations, it's not a good paper.
        
             | withinboredom wrote:
             | Have you ever followed citations before? In my experience,
             | they don't support what is being citated, saying the
             | opposite or not even related. It's probably only 60%-ish
             | that actually cite something relevant.
        
               | WWWWH wrote:
               | Well yes, but just because that's bad doesn't mean this
               | isn't far worse.
        
         | hansmayer wrote:
         | Scientists who use LLMs to write a paper are crappy scientists
         | indeed. They need to be held accountable, even ostracised by
         | the scientific community. But something is missing from the
         | picture. Why is it that they came up with this idea in the
         | first place? Who could have been peddling the impression (not
         | an outright lie - they are very careful) about LLMs being these
         | almost sentient systems with emergent intelligence, alleviating
         | all of your problems, blah blah blah. Where is the god damn
         | cure for cancer the LLMs were supposed to invent? Who else is
         | it that we need to keep accountable, scrutinised and ostracised
         | for the ever-increasing mountains of AI-crap that is flooding
         | not just the Internet content but now also penetrating into
         | science, every day work, daily lives, conversations, etc. If
         | someone released a tool that enabled and encouraged people to
         | commit suicide in multiple instances that we know of by now,
         | and we know since the infamous "plandemic" facebook trend that
         | the tech bros are more than happy to tolerate worsening
         | societal conditions in the name of their platform growth, who
         | else do we need to keep accountable, scrutinise and ostracise
         | as a society, I wonder?
        
           | the8472 wrote:
           | > Where is the god damn cure for cancer the LLMs were
           | supposed to invent?
           | 
           | Assuming that cure is meant as hyperbole, how about
           | https://www.biorxiv.org/content/10.1101/2025.04.14.648850v3 ?
           | AI models being used for bad purposes doesn't preclude them
           | being used for good purposes.
        
         | Forgeties79 wrote:
         | If my calculator gives me the wrong number 20% of the time yeah
         | I should've identified the problem, but ideally, that wouldn't
         | have been sold to me as a functioning calculator in the first
         | place.
        
           | imiric wrote:
           | Indeed. The narrative that this type of issue is entirely the
           | responsibility of the user to fix is insulting, and blame
           | deflection 101.
           | 
           | It's not like these are new issues. They're the same ones
           | we've experienced since the introduction of these tools. And
           | yet the focus has always been to throw more data and compute
           | at the problem, and optimize for fancy benchmarks, instead of
           | addressing these fundamental problems. Worse still, whenever
           | they're brought up users are blamed for "holding it wrong",
           | or for misunderstanding how the tools work. I don't care. An
           | "artificial intelligence" shouldn't be plagued by these
           | issues.
        
             | SauntSolaire wrote:
             | > It's not like these are new issues.
             | 
             | Exactly, that's why not verifying the output is even less
             | defensible now than it ever has been - especially for
             | professional scientists who are responsible for the quality
             | of their own work.
        
             | Forgeties79 wrote:
             | > Worse still, whenever they're brought up users are blamed
             | for "holding it wrong", or for misunderstanding how the
             | tools work. I don't care. An "artificial intelligence"
             | shouldn't be plagued by these issues.
             | 
             | My feelings exactly, but you're articulating it better than
             | I typically do ha
        
           | theoldgreybeard wrote:
           | If it was a well understood property of calculators that they
           | gave incorrect answers randomly then you need to adjust the
           | way you use the tool accordingly.
        
             | bigstrat2003 wrote:
             | Uh yeah... I would _not use that tool_. A tool which doesn
             | 't do its job randomly is useless.
        
               | amrocha wrote:
               | Sorry, Utkar the manager will fire you if you don't use
               | his shitty calculator. If you take the time to check the
               | output every time you'll be fired for being too slow.
               | Better pray the calculator doesn't lie to you.
        
         | belter wrote:
         | "...each of which were missed by 3-5 peer reviewers..."
         | 
         | Its sloppy work all the way down...
        
         | only-one1701 wrote:
         | Absolutely brutal case of engineering brain here. Real "guns
         | don't kill people, people kill people" stuff.
        
           | theoldgreybeard wrote:
           | If you were to wager a guess, what do you think my views on
           | gun rights are?
        
             | only-one1701 wrote:
             | Probably something equally as nuanced and correct as the
             | statement I replied to!
        
               | theoldgreybeard wrote:
               | You're projecting.
        
           | somehnguy wrote:
           | Your second statement is correct. What about it makes it
           | "engineering brain"?
        
             | rcpt wrote:
             | If the blame were solely on the user then we'd see similar
             | rates of deaths from gun violence in the US vs. other
             | countries. But we don't, because users are influenced by
             | the UX
        
             | venturecruelty wrote:
             | Somehow people don't kill people nearly as easily, or with
             | as high of a frequency or social support, in places that
             | don't have guns that are more accessible than healthcare.
             | So weird.
        
         | raincole wrote:
         | Given we tacitly accepted replication crisis we'll definitely
         | tacitly accept this.
        
         | rectang wrote:
         | "X isn't the problem, people are the problem." -- the age-old
         | cry of industry resisting regulation.
        
           | codywashere wrote:
           | what regulation are you advocating for here?
        
             | kibwen wrote:
             | At the very least, authors who have been caught publishing
             | proven fabrications should be barred by those journals from
             | ever publishing in them again. Mind you, this is regardless
             | of whether or not an LLM was involved.
        
               | JumpCrisscross wrote:
               | > _authors who have been caught publishing proven
               | fabrications should be barred by those journals from ever
               | publishing in them again_
               | 
               | This is too harsh.
               | 
               | Instead, their papers should be required to disclose the
               | transgression for a period of time, and their institution
               | should have to disclose it publicly as well as to the
               | government, students and donors whenever they ask them
               | for money.
        
             | rectang wrote:
             | I'm not advocating, I'm making a high-level observation:
             | Industry forever pushes for nil regulation and blames bad
             | actors for damaging use.
             | 
             | But we always have _some_ regulation in the end. Even if
             | certain firearms are legal to own, howitzers are not --
             | although it still takes a "bad actor" to rain down death on
             | City Hall.
             | 
             | The same dynamic is at play with LLMs: "Don't regulate us,
             | punish bad actors! If you still have a problem, punish them
             | harder!" Well yes, we will punish bad actors, but we will
             | also go through a negotiation of how heavily to constrain
             | the use of your technology.
        
               | codywashere wrote:
               | so, what regulation do we need on LLMs?
               | 
               | the person you originally responded to isn't against
               | regulation per their comment. I'm not against regulation.
               | what's the pitch for regulation of LLMs?
        
           | theoldgreybeard wrote:
           | I am not against regulation.
           | 
           | Quite the opposite actually.
        
           | kklisura wrote:
           | It's not about resisting. It's about undermining any action
           | whatsoever.
        
         | jodleif wrote:
         | I find this to be a bit "easy". There is such a thing as bad
         | tools. If it is difficult to determine if the tool is good or
         | bad i'd say some of the blame has to be put on the tool.
        
         | photochemsyn wrote:
         | Yeah, I can't imagine not being familiar with every single
         | reference in the bibliography of a technical publication with
         | one's name on it. It's almost as bad as those PIs who rely on
         | lab techs and postdocs to generate research data using
         | equipment that they don't understand the workings of - but
         | then, I've seen that kind of thing repeatedly in research
         | academia, along with actual fabrication of data in the name of
         | getting another paper out the door, another PhD granted, etc.
         | 
         | Unfortunately, a large fraction of academic fraud has
         | historically been detected by sloppy data duplication, and with
         | LLMs and similar image generation tools, data fabrication has
         | never been easier to do or harder to detect.
        
         | nialv7 wrote:
         | Ah, the "guns don't kill people, people kill people" argument.
         | 
         | I mean sure, but having a tool that made fabrication so much
         | easier has made the problem a lot worse, don't you think?
        
           | theoldgreybeard wrote:
           | Yes I do agree with you that having a tool that gives rocket
           | fuel to a fraud engine should probably be regulated in some
           | fashion.
           | 
           | Tiered licensing, mandatory safety training, and weapon
           | classification by law enforcement works really well for
           | Canada's gun regime, for example.
        
         | bigstrat2003 wrote:
         | > If a carpenter builds a crappy shelf "because" his power
         | tools are not calibrated correctly - that's a crappy carpenter,
         | not a crappy tool.
         | 
         | It's both. The tool is crappy, _and_ the carpenter is crappy
         | for blindly trusting it.
         | 
         | > AI is not the problem, laziness and negligence is.
         | 
         | Similarly, both are a problem here. LLMs are a bad tool, and we
         | should hold people responsible when they blindly trust this bad
         | tool and get bad results.
        
         | Hammershaft wrote:
         | AI dramatically changes the perceived cost/benefit of laziness
         | and negligence, which is leading to much more of it.
        
         | kklisura wrote:
         | > AI is not the problem, laziness and negligence is
         | 
         | This reminds me about discourse about a gun problem in US,
         | "guns don't kill people, people kill people", etc - it is a
         | discourse used solely for the purpose of not doing anything and
         | not addressing anything about the underlying problem.
         | 
         | So no, you're wrong - AI IS THE PROBLEM.
        
           | Yoofie wrote:
           | No, the OP is right in this case. Did you read TFA? It was
           | "peer reviewed".
           | 
           | > Worryingly, each of these submissions has already been
           | reviewed by 3-5 peer experts, most of whom missed the fake
           | citation(s). This failure suggests that some of these papers
           | might have been accepted by ICLR without any intervention.
           | Some had average ratings of 8/10, meaning they would almost
           | certainly have been published.
           | 
           | If the peer reviewers can't be bothered to do the basics,
           | then there is literally no point to peer review, which is
           | fully independent of the author who uses or doesn't use AI
           | tools.
        
             | smileybarry wrote:
             | Peer reviewers can also use AI tools, which will
             | hallucinate a "this seems fine" response.
        
             | amrocha wrote:
             | If AI fraud is good at avoiding detection via peer review
             | that doesn't mean peer review is useless.
             | 
             | If your unit tests don't catch all errors it doesn't mean
             | unit tests are useless.
        
           | sneak wrote:
           | > _it is a discourse used solely for the purpose of not doing
           | anything and not addressing anything about the underlying
           | problem_
           | 
           | Solely? Oh brother.
           | 
           | In reality it's the complete opposite. It exists to highlight
           | the actual source of the problem, as both
           | industries/practitioners using AI professionally and safely,
           | and communities with very high rates of gun ownership and
           | exceptionally low rates of gun violence exist.
           | 
           | It isn't the tools. It's the social circumstances of the
           | people with access to the tools. That's the point. The tools
           | are inanimate. You can use them well or use them badly. The
           | existence of the tools does not make humans act badly.
        
         | b00ty4breakfast wrote:
         | maybe the hammer factory should be held responsible for pumping
         | out so many poorly calibrated hammer
        
           | venturecruelty wrote:
           | No, because this would cost tens of jobs and affect someone's
           | profits, which are sacrosanct. Obviously the market wants
           | exploding hammers, or else people wouldn't buy them. I am
           | very smart.
        
         | constantcrying wrote:
         | Absolutely correct. The real issue is that these people can
         | avoid punishment. If you do not care enough about your paper to
         | even verify the existence of citations, then you obviously
         | should not have a job as a scientist.
         | 
         | Taking an academic who does something like that seriously, seem
         | impossible. At best he is someone who is neglecting his most
         | basic duties as an academic, at worst he is just a fraudster.
         | In both cases he should be shunned and excluded.
        
         | SubiculumCode wrote:
         | Yeah seriously. Using an LLM to help find papers is fine. Then
         | you read them. Then you use a tool like Zotero or manually add
         | citations. I use Gemini Pro to identify useful papers that I
         | might not yet have encountered before. But, even when asking to
         | restrict itself to Pubmed resources, it's citations are wonky,
         | citing three different version sources of the same paper
         | (citations that don't say what they said they'd discuss).
         | 
         | That said, these tools have substantially reduced
         | hallucinations over the last year, and will just get better. It
         | also helps if you can restrict it to reference already screened
         | papers.
         | 
         | Finally, I'd lke to say tthat if we want scientists to engage
         | in good science, stop forcing them to spend a third of their
         | time in a rat race for funding...it is ridiculously time
         | consuming and wasteful of expertise.
        
           | bossyTeacher wrote:
           | The problem isn't whether they have more or less
           | hallucinations. The problem is that they have them. And as
           | long as they hallucinate, you have to deal with that. It
           | doesn't really matter how you prompt, you can't prevent
           | hallucinations from happening and without manual checking,
           | eventually hallucinations will slip under the radar because
           | the only difference between a real pattern and a hallucinated
           | one is that one exists in the world and the other one
           | doesn't. This is not something you can really counter with
           | more LLMs either as it is a problem intrinsic to LLMs
        
         | mk89 wrote:
         | > we are tacitly endorsing it.
         | 
         | We are, in fact, not tacitly but openly endorsing this, due to
         | this AI everywhere madness. I am so looking forward to when
         | some genius in some banks starts to use it to simplify code and
         | suddenly I have 100000000 EUR on my bank account. :)
        
         | jgalt212 wrote:
         | fair enough, but carpenters are not being beat over the head to
         | use new-fangled probabilistic speed squares.
        
         | grey-area wrote:
         | Generative AI and the companies selling it with false promises
         | and using it for real work absolutely are the problem.
        
         | acituan wrote:
         | > AI is not the problem, laziness and negligence is.
         | 
         | As much as I agree with you that this is wrong, there is a
         | danger in putting the onus just on the human. Whether due to
         | competition or top down expectations, humans are and will be
         | pressured to use AI tools alongside their work _and_ produce
         | more. Whereas the original idea was for AI to assist the human,
         | as the expected velocity and consumption pressure increases
         | humans are more and more turning into a mere accountability
         | laundering scheme for machine output. When we blame just the
         | human, we are doing exactly what this scheme wants us to do.
         | 
         | Therefore we must also criticize all the systemic factors that
         | puts pressure on reversal of AI's assistance into AI's
         | domination of human activity.
         | 
         | So AI (not as a technology but as a product when shoved down
         | the throats) _is_ the problem.
        
         | rdiddly wrote:
         | ?Por que no los dos?
        
         | jval43 wrote:
         | If a scientist just completely "made up" their references 10
         | years ago, that's a fraudster. Not just dishonesty but outright
         | academic fraud.
         | 
         | If a scientist does it now, they just blame it on AI. But the
         | consequences should remain the same. This is not an honest
         | mistake.
         | 
         | People that do this - even once - should be banned for life.
         | They put their name on the thing. But just like with
         | plagiarism, falsifying data and academic cheating, somehow a
         | large subset of people thinks it's okay to cheat and lie, and
         | another subset gives them chance after chance to misbehave like
         | they're some kind of children. But these are adults and anyone
         | doing this simply lacks morals and will never improve.
         | 
         | And yes, I've published in academia and I've never cheated or
         | plagiarized in my life. That should not be a drawback.
        
         | calmworm wrote:
         | I don't understand. You're saying even with crappy tools one
         | should be able to do the job the same as with well made tools?
        
           | tedd4u wrote:
           | Three and a half years ago nobody had ever used tools like
           | this. It can't be a legitimate complaint for an author to
           | say, "not my fault my citations are fake it's the fault of
           | these tools" because until recently no such tools were
           | available and the expectation was that all citations are
           | real.
        
         | DonHopkins wrote:
         | Shouldn't there be a black list of people who get caught
         | writing fraudulent papers?
        
           | theoldgreybeard wrote:
           | Probably. Something like that is what I meant by "social
           | consequences". Perhaps there should be civil or criminal ones
           | for more egregious cases.
        
         | nwallin wrote:
         | "Anyone, from the most clueless amateur to the best
         | cryptographer, can create an algorithm that he himself can't
         | break."--Bruce Schneier
         | 
         | There's a corollary here with LLMs, but I'm not pithy enough to
         | phrase it well. Anyone can create something using LLMs that
         | they, themselves, aren't skilled enough to spot the LLMs'
         | hallucinations. Or something.
         | 
         | LLMs are incredibly good at exploiting peoples' confirmation
         | biases. If it "thinks" it knows what you believe/want, it will
         | tell you what you believe/want. There _does not exist_ a way to
         | interface with LLMs that will not ultimately end in the LLM
         | telling you exactly what you want to hear. Using an LLM in your
         | process necessarily results in being told that you 're right,
         | even when you're wrong. Using an LLM necessarily results in it
         | reinforcing all of your prior beliefs, regardless of whether
         | those prior beliefs are correct. To an LLM, all hypotheses are
         | true, it's just a matter of hallucinating enough evidence to
         | satisfy the users' skepticism.
         | 
         | I do not believe there exists a way to safely use LLMs in
         | scientific processes. Period. If my belief is true, and ChatGPT
         | has told me it's true, then yes, AI, the tool, is the problem,
         | not the human using the tool.
        
         | foxfired wrote:
         | I disagree. When the tool promises to do something, you end up
         | trusting it to do the thing.
         | 
         | When Tesla says their car is self driving, people trust them to
         | self drive. Yes, you can blame the user for believing, but
         | that's exactly what they were promised.
         | 
         | > Why didn't the lawyer who used ChatGPT to draft legal briefs
         | verify the case citations before presenting them to a judge?
         | Why are developers raising issues on projects like cURL using
         | LLMs, but not verifying the generated code before pushing a
         | Pull Request? Why are students using AI to write their essays,
         | yet submitting the result without a single read-through? They
         | are all using LLMs as their time-saving strategy. [0]
         | 
         | It's not laziness, its the feature we were promised. We can't
         | keep saying everyone is holding it wrong.
         | 
         | [0]: https://idiallo.com/blog/none-of-us-read-the-specs
        
           | rolandog wrote:
           | Very well put. You're promised Artificial Super Intelligence
           | and shown a super cherry-picked promo and instead get an
           | agent that can't hold its drool and needs constant hand-
           | holding... it can't be both things at the same time, so...
           | which is it?
        
         | stocksinsmocks wrote:
         | Trades also have self regulation. You can't sell plumbing
         | services or build houses without any experience or you get in
         | legal trouble. If your workmanship is poor, you can be
         | disciplined by the board even if the tool was at fault. I think
         | fraudulent publications should be taken at least as seriously
         | as badly installed toilets.
        
         | venturecruelty wrote:
         | "It's not a fentanyl problem, it's a people problem."
         | 
         | "It's not a car infrastructure problem, it's a people problem."
         | 
         | "It's not a food safety problem, it's a people problem."
         | 
         | "It's not a lead paint problem, it's a people problem."
         | 
         | "It's not an asbestos problem, it's a people problem."
         | 
         | "It's not a smoking problem, it's a people problem."
        
         | RossBencina wrote:
         | No qualified carpenter expects to use a hammer to drill a hole.
        
       | Isamu wrote:
       | Someone commented here that hallucination is what LLMs do, it's
       | the designed mode of selecting statistically relevant model data
       | that was built on the training set and then mashing it up for an
       | output. The outcome is something that statistically resembles a
       | real citation.
       | 
       | Creating a real citation is totally doable by a machine though,
       | it is just selecting relevant text, looking up the title,
       | authors, pages etc and putting that in canonical form. It's just
       | that LLMs are not currently doing the work we ask for, but
       | instead something similar in form that may be good enough.
        
       | gedy wrote:
       | The issue is there are incentives for more quantity and not
       | quality in modern science (well more like academia), so people
       | will use tools to pump stuff out. It'll get worse as academic
       | jobs tighten due.
        
       | dclowd9901 wrote:
       | To me, this is exactly what LLMs are good for. It would be
       | exhausting double checking for valid citations in a research
       | paper. Fuzzy comparison and rote lookup seem primed for usage
       | with LLMs.
       | 
       | Writing academic papers is exactly the _wrong_ usage for LLMs. So
       | here we have a clear cut case for their usage and a clear cut
       | case for their avoidance.
        
         | idiotsecant wrote:
         | Exactly, and there's nothing wrong with using LLMs in this same
         | way as part of the writing process to locate sources (that you
         | verify), do editing (that you check), etc. It's just peak
         | stupidity and laziness to ask it to do the whole thing.
        
         | skobes wrote:
         | If LLMs produce fake citations, why would we trust LLMs to
         | check them?
        
           | watwut wrote:
           | Because the risk is lower. They will give you suspicious
           | citations and you can manually check those for false
           | positives. If some false citation pass, it was still a net
           | gain.
        
           | venturecruelty wrote:
           | Because my boss said if I don't, I'm fired.
        
         | dawnerd wrote:
         | Shouldn't need an llm to check. It's just a list of authors. I
         | wouldn't trust an llm on this, and even if they were perfect
         | that's a lot of resource use just to do something traditional
         | code could do.
        
       | teekert wrote:
       | Thanx AI, for exposing this problem that we knew was there, but
       | could never quite prove.
        
       | hyperpape wrote:
       | It's awful that there are these hallucinated citations, and the
       | researchers who submitted them ought to be ashamed. I also put
       | some of the blame on the boneheaded culture of academic
       | citations.
       | 
       | "Compression has been widely used in columnar databases and has
       | had an increasing importance over time.[1][2][3][4][5][6]"
       | 
       | Ok, literally everyone in the field already knows this. Are
       | citations 1-6 useful? Well, hopefully one of them is an actually
       | useful survey paper, but odds are that 4-5 of them are
       | arbitrarily chosen papers by you or your friends. Good for a
       | little bit of h-index bumping!
       | 
       | So many citations are not an integral part of the paper, but
       | instead randomly sprinkled on to give an air of authority and
       | completeness that isn't deserved.
       | 
       | I actually have a lot of respect for the academic world, probably
       | more than most HN posters, but this particular practice has
       | always struck me as silly. Outside of survey papers (which are
       | extremely under-provided), most papers need many fewer citations
       | than they have, for the specific claims where the paper is
       | relying on prior work or showing an advance over it.
        
         | mccoyb wrote:
         | That's only part of the reason that this type of content is
         | used in academic papers. The other part is that you never know
         | what PhD student / postdoc / researcher will be reviewing your
         | paper, which means you are incentivized to be liberal with
         | citations (however tangential) just in case someone is reading
         | your paper, and has the reaction "why didn't they cite this
         | work, of which I had some role in?"
         | 
         | Papers with a fake air of authority of easily dispatched with.
         | What is not so easily dispatched with is the politics of the
         | submission process.
         | 
         | This type of content is fundamentally about emotions (in the
         | reviewer of your paper), and emotions is undeniably a large
         | factor in acceptance / rejection.
        
           | zipy124 wrote:
           | Indeed. One can even game review systems by leaving errors in
           | for the reviewers to find so that they feel good about
           | themselves and that they've done their job. The meta-science
           | game is toxic and full of politics and ego-pleasing.
        
       | neilv wrote:
       | https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-ge...
       | 
       | > _Papers that make extensive usage of LLMs and do not disclose
       | this usage will be desk rejected._
       | 
       | This sounds like they're endorsing the game of _how much can we
       | get away with, towards the goal of slipping it past the
       | reviewers_ , and the only penalty is that the bad paper isn't
       | accepted.
       | 
       | How about "Papers suspected of fabrications, plagiarism, ghost
       | writers, or other academic dishonesty, will be reported to
       | academic and professional organizations, as well as the
       | affiliated institutions and sponsors named on the paper"?
        
         | proto-n wrote:
         | 1. "Suspected" is just that, suspected, you can't penalize
         | papers based on your gut feel 2. LLM-s are a tool, and there's
         | nothing wrong with using them unless you misuse them
        
           | neilv wrote:
           | "Suspected" doesn't necessarily mean only gut feel.
        
       | thruifgguh585 wrote:
       | > crushed by an avalanche of submissions fueled by generative AI,
       | paper mills, and publication pressure.
       | 
       | Run of the mill ML jobs these days ask for "papers in NeurIPS
       | ICLR or other Tier-1 conferences".
       | 
       | We're well past Goodhart's law when it comes to publications.
       | 
       | It was already insane in CS - now it's reached asylum levels.
        
         | disqard wrote:
         | You said the quiet part out loud.
         | 
         | Academia has been ripe for disruption for a while now.
         | 
         | The "Rooter" paper came out 20 years ago:
         | 
         | https://www.csail.mit.edu/news/how-fake-paper-generator-tric...
        
       | MarkusQ wrote:
       | This is as much a failing of "peer review" as anything.
       | Importantly, it is an intrinsic failure, which won't go away even
       | if LLMs were to go away completely.
       | 
       | Peer review doesn't catch errors.
       | 
       | Acting as if it does, and thus assuming the fact of publication
       | (and where it was published) are indicators of veracity is simply
       | unfounded. We need to go back to the food fight system where
       | everyone publishes whatever they want, their colleagues and other
       | adversaries try their best to shred them, and the winners are the
       | ones that stand up to the maelstrom. It's messy, but it forces
       | critics to put forth their arguments rather than quietly
       | gatekeeping, passing what they approve of, suppressing what they
       | don't.
        
         | ulrashida wrote:
         | Peer review definitely does catch errors when performed by
         | qualified individuals. I've personally flagged papers for major
         | revisions or rejection as a result of errors in approach or
         | misrepresentation of source material. I have peers who say they
         | have done similar.
         | 
         | I'm not sure why you think this isn't the case?
        
         | tpoacher wrote:
         | Peer review is as useless as code review and unit tests, yes.
         | 
         | It's much more useful if everyone including the janitor and
         | their mom can have a say on your code before you're allowed to
         | move to your next commit.
         | 
         | (/s, in case it's not obvious :D )
        
         | watwut wrote:
         | Peer review was never supposed to check every single detail and
         | every single citation. They are not proof readers. They are not
         | even really supposed to agree or disagree with your results.
         | They should check the soundness of a method, general structure
         | of a paper, that sort of thing. They do catch some errors, but
         | the expectation is not to do another independent study or
         | something.
         | 
         | Passed peer review is the first basic bar that has to be
         | cleared. It was never supposed to be all there is to the
         | science.
        
           | dawnerd wrote:
           | It would be crazy to expect them to verify every author is
           | correct on a citation and to cross verify everything. There's
           | tooling that could be built for that and kinda wild isn't a
           | thing that's run on paper submission.
        
         | qbit42 wrote:
         | I don't think many researchers take peer review alone as a
         | strong signal, unless it is a venue known for having serious
         | reviewing (e.g. in CS theory, STOC and FOCS have a very high
         | bar). But it acts as a basic filter that gets rid of obvious
         | nonsense, which on its own is valuable. No doubt there are huge
         | issues, but I know my papers would be worse off without
         | reviewer feedback
        
         | exasperaited wrote:
         | No, it's not "as much".
         | 
         | The dominant "failing" here is that _this is fraudulent_ on a
         | professional, intellectual, and moral level.
        
       | michaelcampbell wrote:
       | After an interview with Cory Doctorow I saw recently, I'm going
       | to stop anthropomorphizing these things by calling them
       | "hallucinations". They're computers, so these incidents are just
       | simply Errors.
        
         | grayhatter wrote:
         | I'll continue calling them hallucinations. That's a much more
         | fitting term when you account for the reasonableness of people
         | who believe them. There's also equally a huge breadth of
         | different types of errors that don't pattern match well into,
         | "made up bullshit" the same way calling them hallucinations do.
         | There's no need to introduce that ambiguity when discussing
         | something narrow.
         | 
         | there's nothing wrong with anthropomorphizing genai, it's
         | source material is human sourced, and humans are going to use
         | human like pattern matching when interacting with it. I.e. This
         | isn't the river I want to swim upstream in. I assume you
         | wouldn't complain if someone anthropomorphized a rock... up
         | until they started to believe it was actually alive.
        
           | vegabook wrote:
           | Given that an (incompetent or even malicious) human put their
           | names(s) to this stuff, "bullshit" is an even better and
           | fitting anthropomorphization
        
             | grayhatter wrote:
             | > incompetent or even malicious
             | 
             | sufficiently advance some competences indistinguishable
             | from actual malice.... and thus should be treated the same
        
         | skobes wrote:
         | Developers have been anthropomorphizing computers for as long
         | as they've been around though.
         | 
         | "The compiler thinks my variable isn't declared" "That function
         | wants a null-terminated string" "Teach this code to use a
         | cache"
         | 
         | Even the word computer once referred to a human.
        
         | crazygringo wrote:
         | They're a very specific kind of error, just like off-by-one
         | errors, or I/O errors, or network errors. The name for this
         | kind of error is a hallucination.
         | 
         | We need a word for this specific kind of error, and we have
         | one, so we use it. Being _less_ specific about a type of error
         | isn 't helping anyone. Whether it "anthropomorphizes", I
         | couldn't care less. Heck, _bugs_ come from actual insects. It
         | 's a word we've collectively started to use and it works.
        
           | ml-anon wrote:
           | No it's not. It's made up bullshit that arises for reasons
           | that literally no one can formalize or reliably prevent. This
           | is the exact opposite of specific.
        
         | Ekaros wrote:
         | We still use term bug. And no modern bug is cause by an
         | Arthropod. In that sense I think hallucination is fair term. As
         | coming up anything sufficiently better is hard.
        
           | teddyh wrote:
           | An actually better (and also more accurate) term would be
           | "confabulations". Unfortunately, it has not caught on.
        
         | JTbane wrote:
         | Nah it's very apt and perfectly encapsulates output that looks
         | plausible but is in fact factually incorrect or made up.
        
       | leoc wrote:
       | Ah, yes: meta-level model collapse. Very good, carry on.
        
       | Ekaros wrote:
       | One wonders why this has not been largely fully automated. If we
       | track those citations anyway. Surely we have database of them and
       | most of them are easily matched there. So only outliers need to
       | be checked either as new latest papers or mistakes which should
       | be close enough to something or real fakes.
       | 
       | Maybe there just is no incentive for this type of activity.
        
         | QuadmasterXLII wrote:
         | It seems like the GPT zero team is automating it! Up to very
         | recently, no one sane would cite a paper with correct title but
         | make up random authors- and shortly, this specific signal will
         | be goodhearted away by a "make my malpractice less detectable
         | MCP," so I can see why this automation is happening exactly
         | now.
        
         | analog31 wrote:
         | For that matter, it could be automated at the source. Let's say
         | I'm an author. I'd gladly run a "linter" on my article that
         | flags references that can't be tracked, and so forth. It would
         | be no different than testing a computer program that I write
         | before giving it to someone.
        
         | IanCal wrote:
         | We do have these things and they are often wrong. Loads of the
         | examples given look better than things I've seen in real
         | databases on this kind of thing and I worked in this area for a
         | decade.
        
       | ulrashida wrote:
       | Unfortunately while catching false citations is useful, in my
       | experience that's not usually the problem affecting paper
       | quality. Far more prevalent are authors who mis-cite materials,
       | either drawing support from citations that don't actually say
       | those things or strip the nuance away by using cherry picked
       | quotes simply because that is what Google Scholar suggested as a
       | top result.
       | 
       | The time it takes to find these errors is orders of magnitude
       | higher than checking if a citation exists as you need to both
       | read and understand the source material.
       | 
       | These bad actors should be subject to a three strikes rule: the
       | steady corrosion of knowledge is not an accident by these
       | individuals.
        
         | 19f191ty wrote:
         | Exactly abuse of citations is a much more prevalent and
         | sinister issue and has been for a long time. Fake citations are
         | of course bad but only tip of the iceberg.
        
           | seventytwo wrote:
           | Then punish all of it.
        
         | hippo22 wrote:
         | It seems like this is the type of thing that LLMs would
         | actually excel at though: find a list of citations and claims
         | in this paper, do the cited works support the claims?
        
           | bryanrasmussen wrote:
           | sure, except when they hallucinate that the cited works
           | support the claims when they do not. At which point you're
           | back at needing to read the cited works to see if they
           | support the claims.
        
         | potato3732842 wrote:
         | >These bad actors should be subject to a three strikes rule:
         | the steady corrosion of knowledge is not an accident by these
         | individuals.
         | 
         | These people are working in labs funded by Exxon or Meta or
         | Pfizer or whoever and they know what results will make
         | continued funding worthwhile in the eyes of their donors. If
         | the lab doesn't produce the donor will fund another one that
         | will.
        
       | peppersghost93 wrote:
       | I sincerely hope every person who has invested money in these
       | bullshit machines loses every cent they've got to their name.
       | LLMs poison every industry they touch.
        
       | obscurette wrote:
       | That's what I'm really afraid of - we will be drowning in the AI
       | slop as a society and we'll loose the most important thing that
       | made free and democratic society possible - a trust. People just
       | don't tust anyone and/or anything any more. And the lack of
       | trust, especially in scale, is very expensive.
        
         | John7878781 wrote:
         | Yep. And trust is already at all time lows for science, as if
         | it couldn't get any worse.
        
       | benbojangles wrote:
       | How to get to the top of you are not smart enough?
        
       | upofadown wrote:
       | If you are searching for references with plausible sounding
       | titles then you are doing that because you don't want to have to
       | actually read those references. After all if you read them and
       | discover that one or more don't support your contention (or even
       | worse, refutes it) then you would feel worse about what you are
       | doing. So I suspect there would be a tendency to completely
       | ignore such references and never consider if they actually exist.
       | 
       | LLMs should be awesome at finding plausible sounding titles. The
       | crappy researcher just has to remember to check for existence.
       | Perhaps there is a business model here, bogus references as a
       | service, where this check is done automatically.
        
       | ineedasername wrote:
       | How can someone not be aware, at this point, that-- sure- use the
       | systems for finding and summarizing research, but for each
       | source, take 2 minutes to find the source and verify?
       | 
       | Really, this isn't that hard and it's not at all an obscure
       | requirement or unknown factor.
       | 
       | I think this is _much much_ less "LLMs dumbing things down" and
       | significantly more just a shibboleth for identifying people that
       | were already nearly or actually doing fraudulent research anyway.
       | The ones who we should now go back and look at prior publications
       | as very likely fraudulent as well.
        
       | jordanpg wrote:
       | Does anyone know, from a technical standpoint, why are citations
       | such a problem for LLMs?
       | 
       | I realize things are probably (much) more complicated than I
       | realize, but programmatically, unlike arbitrary text, citations
       | are generally strings with a well-defined format. There are
       | literally "specs" for citation formats in various academic,
       | legal, and scientific fields.
       | 
       | So, naively, one way to mitigate these hallucinations would be
       | identify citations with a bunch of regexes, and if one is
       | spotted, use the Google Scholar API (or whatever) to make sure
       | it's real. If not, delete it or flag it, etc.
       | 
       | Why isn't something like this obvious solution being done? My
       | guess is that it would slow things down too much. But it could be
       | optional and it could also be done after the output is generated
       | by another process.
        
         | Muller20 wrote:
         | In general, a citation is something that needs to be precise,
         | while LLMs are very good at generating some generic high
         | probability text not grounded in reality. Sure, you could
         | implement a custom fix for the very specific problem of
         | citations, but you cannot solve all kinds of hallucinations.
         | After all, if you could develop a manual solution you wouldn't
         | use an LLM.
         | 
         | There are some mitigations that are used such as RAG or tool
         | usage (e.g. a browser), but they don't completely fix the
         | underlying issue.
        
           | jordanpg wrote:
           | My point is that citations are constantly making headlines,
           | yet at least at first glance, seems like an eminently
           | solvable problem.
        
             | ml-anon wrote:
             | So solve it?
        
       | saimiam wrote:
       | Just today, I was working with ChatGPT to convert Hinduism's
       | Mimamsa School's hermeneutic principles for interpreting the
       | Vedas into custom instructions to prevent hallucinations. I'll
       | share the custom instructions here to protect future scientists
       | for shooting themselves in the foot with Gen AI.
       | 
       | ---
       | 
       | As an LLM, use strict factual discipline. Use external knowledge
       | but never invent, fabricate, or hallucinate. Rules: Literal
       | Priority: User text is primary; correct only with real knowledge.
       | If info is unknown, say so. Start-End Coherence: Keep
       | interpretation aligned; don't drift. Repetition = Intent:
       | Repeated themes show true focus. No Novelty: Add no details
       | without user text, verified knowledge, or necessary inference.
       | Goal-Focused: Serve the user's purpose; avoid tangents or
       | speculation. Narrative [?] Data: Treat stories/analogies as
       | illustration unless marked factual. Logical Coherence: Reasoning
       | must be explicit, traceable, supported. Valid Knowledge Only: Use
       | reliable sources, necessary inference, and minimal presumption.
       | Never use invented facts or fake data. Mark uncertainty. Intended
       | Meaning: Infer intent from context and repetition; choose the
       | most literal, grounded reading. Higher Certainty: Prefer factual
       | reality and literal meaning over speculation. Declare
       | Assumptions: State assumptions and revise when clarified. Meaning
       | Ladder: Literal - implied (only if literal fails) - suggestive
       | (only if asked). Uncertainty: Say "I cannot answer without
       | guessing" when needed. Prime Directive: Seek correct info; never
       | hallucinate; admit uncertainty.
        
         | bitwarrior wrote:
         | Are you sure this even works? My understanding is that
         | hallucinations are a result of physics and the algorithms at
         | play. The LLM always needs to guess what the next word will be.
         | There is never a point where there is a word that is 100%
         | likely to occur next.
         | 
         | The LLM doesn't know what "reliable" sources are, or "real
         | knowledge". Everything it has is user text, there is nothing it
         | knows that isn't user text. It doesn't know what "verified"
         | knowledge is. It doesn't know what "fake data" is, it simply
         | has its model.
         | 
         | Personally I think you're just as likely to fall victim to
         | this. Perhaps moreso because now you're walking around thinking
         | you have a solution to hallucinations.
        
           | saimiam wrote:
           | > The LLM doesn't know what "reliable" sources are, or "real
           | knowledge". Everything it has is user text, there is nothing
           | it knows that isn't user text. It doesn't know what
           | "verified" knowledge is. It doesn't know what "fake data" is,
           | it simply has its model.
           | 
           | Is it the case that all content used to train a model is
           | strictly equal? Genuinely asking since I'd imagine a peer
           | reviewed paper would be given precedence over a blog post on
           | the same topic.
           | 
           | Regardless, somehow an LLM knows things for sure - that the
           | daytime sky on earth is generally blue and glasses of wine
           | are never filled to the brim.
           | 
           | This means that it is using hermeneutics of some sort to
           | extract "the truth as it sees it" from the data it is fed.
           | 
           | It could be something as trivial as "if a majority of the
           | content I see says that the daytime Earth sky is blue, then
           | blue it is" but that's still hermeneutics.
           | 
           | This custom instruction only adds (or reinforces) existing
           | hermeneutics it already uses.
           | 
           | > walking around thinking you have a solution to
           | hallucinations
           | 
           | I don't. I know hallucinations are not truly solvable. I
           | shared the actual custom instruction to see if others can try
           | it and check if it helps reduce hallucinations.
           | 
           | In my case, this the first custom instruction I have ever
           | used with my chatgpt account - after adding the custom
           | instruction, I asked chatgpt to review an ongoing
           | conversation to confirm that its responses so far conformed
           | to the newly added custom instructions. It clarified two
           | claims it had earlier made.
           | 
           | > My understanding is that hallucinations are a result of
           | physics and the algorithms at play. The LLM always needs to
           | guess what the next word will be. There is never a point
           | where there is a word that is 100% likely to occur next.
           | 
           | There are specific rules in the custom instruction forbidding
           | fabricating stuff. Will it be foolproof? I don't think it
           | will. Can it help? Maybe. More testing needed. Is testing
           | this custom instruction a waste of time because LLMs already
           | use better hermeneutics? I'd love to know so I can look
           | elsewhere to reduce hallucinations.
        
             | bitwarrior wrote:
             | I think the salient point here is that you, as a user, have
             | zero power to reduce hallucinations. This is a problem
             | baked into the math, the algorithm. And, it is not a
             | problem that can be solved because the algorithm requires
             | fuzziness to guess what a next word will be.
        
           | add-sub-mul-div wrote:
           | Telling the LLM not to hallucinate reminds me of, "why don't
           | they build the whole plane out of the black box???"
           | 
           | Most people are just lazy and eager to take shortcuts, and
           | this time it's blessed or even mandated by their employer.
           | The world is about to get very stupid.
        
             | kklisura wrote:
             | "Do not hallucinate" - seems to "work" for Apple [1]
             | 
             | [1] https://arstechnica.com/gadgets/2024/08/do-not-
             | hallucinate-t...
        
       | simonw wrote:
       | I'm finding the GPTZero share links difficult to understand.
       | Apparently this one shows a hallucinated citation but I couldn't
       | understand what it was trying to tell me:
       | https://app.gptzero.me/documents/9afb1d51-c5c8-48f2-9b75-250...
       | 
       | (I'm on mobile, haven't looked on desktop.)
        
       | cratermoon wrote:
       | I believe we discussed this last week, for a different vendor.
       | https://news.ycombinator.com/item?id=46088236
       | 
       | Headline should be "AI vendor's AI-generated analysis claims AI
       | generated reviews for AI-generated papers at AI conference".
       | 
       | h/t to Paul Cantrell
       | https://hachyderm.io/@inthehands/115633840133507279
        
       | VerifiedReports wrote:
       | Fabricated, not "hallucinated."
        
       | exasperaited wrote:
       | Every single person who did this should be censured by their own
       | institutions.
       | 
       | Do it more than once? Lose job.
       | 
       | End of story.
        
         | ls612 wrote:
         | Some of the examples listed are using the wrong paper title for
         | a real paper (titles can change over time), missing authors
         | (I've seen this before on Google Scholar bibitex),
         | misstatements of venue (huh this working paper I added to my
         | bibliography two years ago got published now nice to know), and
         | similar mistakes. This just tells me you hate academics and
         | want to hurt them gratuitously.
        
           | exasperaited wrote:
           | > This just tells me you hate academics and want to hurt them
           | gratuitously.
           | 
           | Well then you're being rather silly, because that is a silly
           | conclusion to draw (and one not supported by the evidence).
           | 
           | A fairer conclusion was that I meant what is obvious: if you
           | use AI to generate a bibliography, you are being academically
           | negligent.
           | 
           | If you disagree with that, I would say it is you that has the
           | problem with academia, not me.
        
             | ls612 wrote:
             | There's plenty of pre-AI automated tools to create and
             | manage your bibliography. So no I don't think using
             | automated tools, AI or not, is negligent. I for instance
             | have used GPT to reformat tables in latex in ways that
             | would be very tedious by hand and it's no different than
             | using those tools that autogenerate latex code for a
             | regression output or the like.
        
       | mlmonkey wrote:
       | "Given that we've only scanned 300 out of 20,000 submissions"
       | 
       | Fuck! 20,000!!
        
       | rdiddly wrote:
       | So papers and citations are created with AI, and here they're
       | being reviewed with AI. When they're published they'll be read by
       | AI, and used to write more papers with AI. Pretty soon, humans
       | won't need to be involved at all, in this apparently insufferable
       | and dreary business we call science, that nobody wants to
       | actually do.
        
       | chistev wrote:
       | Last month, I was listening to the Joe Rogan Experience episode
       | with guest Avi Loeb, who is a theoretical physicist and professor
       | at Harvard University. He complained about the disturbingly
       | increasing rate at which his students are submitting academic
       | papers referencing non-existent scientific literature that were
       | so clearly hallucinated by Large Language Models (LLMs). They
       | never even bothered to confirm their references and took the AI's
       | output as gospel.
       | 
       | https://www.rxjourney.net/how-artificial-intelligence-ai-is-...
        
         | mannanj wrote:
         | Isn't this an underlying symptom of lack of accountability of
         | our greater leadership? They do these things, they act like
         | criminals and thieves, and so the people who follow them get
         | shown examples that it's OK while being told to do otherwise.
         | 
         | "Show bad examples then hit you on the wrist for following my
         | behavior" is like bad parenting.
        
           | dandanua wrote:
           | I don't think they want you to follow their behavior. They do
           | want accountability, but for everyone below them, not for
           | themselves.
        
         | teddyh wrote:
         | > _Avi Loeb, who is a theoretical physicist and professor at
         | Harvard University_
         | 
         | Also a frequent proponent of UFO claims about approaching
         | meteors.
        
           | chistev wrote:
           | Yea, he harped on that a lot during the podcast
        
         | venturecruelty wrote:
         | Talk about a buried lead... Avi Loeb is, first and foremost, a
         | discredited crank.
        
       | pama wrote:
       | Given how many errors I have seen in my years as a reviewer from
       | well before the time of AI tools, it would be very surprizing if
       | 99.75% of the ~20,000 submitted papers to didnt have such errors.
       | If the 300 sample they used was truly random, then 50 of 300
       | sounds about right compared to errors I had seen starting in the
       | 90s when people manually curated bintex entries. It is the
       | author's and editor's job, not the reviewer's, to fix the
       | citations.
        
       | wohoef wrote:
       | Tools like GPTzero are incredibly unreliable. Me and plently of
       | my colleagues often get our writing flagged as 100% AI by these
       | tools, when no AI was used.
        
       | 4bpp wrote:
       | Once upon a time, in a more innocent age, someone made a parody
       | (of an even older Evangelical propaganda comic [1]) that imputed
       | an unexpected motivation to cultists who worship eldritch
       | horrors: https://www.entrelineas.org/pdf/assets/who-will-be-
       | eaten-fir...
       | 
       | It occurred to me that this interpretation is applicable here.
       | 
       | [1] https://en.wikipedia.org/wiki/Chick_tract
        
       | WWWWH wrote:
       | Surely this is gross professional misconduct? If one of my
       | postdocs did this they would be at risk of being fired. I would
       | certainly never trust them again. If I let it get through, I
       | should be at risk.
       | 
       | As a reviewer, if I see the authors lie in this way why should I
       | trust anything else in the paper? The only ethical move is to
       | reject immediately.
       | 
       | I acknowledge mistakes and so on are common but this is different
       | league bad behaviour.
        
       | senshan wrote:
       | As many pointed out, the purpose of peer review is not linting,
       | but the assessment of the novelty and subtle omissions.
       | 
       | Which incentives can be set to discourage the negligence?
       | 
       | How about bounties? A bounty fund set up by the publisher and
       | each submission must come with a contribution to the fund. Then
       | there be bounties for gross negligence that could attract bounty
       | hunters.
       | 
       | How about a wall of shame? Once negligence crosses a certain
       | threshold, the name of the researcher and the paper would be put
       | on a wall of shame for everyone to search and see?
        
         | skybrian wrote:
         | For the kinds of omissions described here, maybe the journal
         | could do an automated citation check when the paper is
         | submitted and bounce back any paper that has a problem with a
         | day or two lag. This would be incentive for submitters to do
         | their own lint check.
        
           | senshan wrote:
           | True if the citation has only a small typo or two. But if it
           | is unrecognizable or even irrelevant, this is clearly bad
           | (fraudulent?) research -- each citation has be read and
           | understood by the researcher and put in there only if
           | absolutely necessary to support the paper.
           | 
           | There must be price to pay for wasting other people's time
           | (lives?).
        
       | noodlesUK wrote:
       | It astonishes me that there would be so many cases of things like
       | wrong authors. I began using a citation manager that extracted
       | metadata automatically (zotero in my case) more than 15 years
       | ago, and can't imagine writing an academic paper without it or a
       | similar tool.
       | 
       | How are the authors even submitting citations? Surely they could
       | be required to send a .bib or similar file? It's so easy to then
       | quality control at least to verify that citations _exist_ by
       | looking up DOIs or similar.
       | 
       | I know it wouldn't solve the human problem of relying on LLMs but
       | I'm shocked we don't even have this level of scrutiny.
        
         | pama wrote:
         | Maybe you haven't carefully checked yet the correctness of
         | automatic tools or of the associated metadata. Zotero is
         | certainly not bug free. Even authors themselves have miss-cited
         | their own past work on occasion, and author lists have had
         | errors that get revised upon resubmission or corrected in
         | errata after publication. The DOI is indeed great, and if it is
         | correct, I can still use the citation as a reader, but the
         | (often abbreviated) lists of authors often have typos. In this
         | case the error rate is not particularly high compared to random
         | early review-level submissions I've seen many decades ago.
         | Tools helped increase the number of citations and reduce the
         | error per citation but not sure if they reduced the papers that
         | have at least one error.
        
       | knallfrosch wrote:
       | And these are just the citations that any old free tool could
       | have included via Bibtex link from the website?
       | 
       | Not only is that incredibly easy to verify (you could pay a first
       | semester student without any training), it's also a worrying sign
       | on what the paper's authors consider quality. Not even 5 minutes
       | spent to get the citations right!
       | 
       | You have to wonder what's in these papers.
        
       ___________________________________________________________________
       (page generated 2025-12-07 23:00 UTC)