[HN Gopher] Over fifty new hallucinations in ICLR 2026 submissions
___________________________________________________________________
Over fifty new hallucinations in ICLR 2026 submissions
Author : puttycat
Score : 432 points
Date : 2025-12-07 13:16 UTC (9 hours ago)
(HTM) web link (gptzero.me)
(TXT) w3m dump (gptzero.me)
| jqpabc123 wrote:
| The legal system has a word to describe AI "slop" --- it is
| called "negligence".
|
| And as the remedy starts being applied (aka "liability"), the
| enthusiasm for AI will start to wane.
|
| I wouldn't be surprised if some businesses ban the use of AI ---
| starting with law firms.
| loloquwowndueo wrote:
| I applaud your use of triple dashes to avoid automatic
| conversion to em dashes and being labeled an AI. Kudos!
| ghaff wrote:
| This is a particular meme that I really don't like. I've used
| em-dashes routinely for years. Do I need to stop using them
| because various people assume they're an AI flag?
| TimedToasts wrote:
| No, but you should be prepared to have people suspect you
| are using AI to create your responses.
|
| C'est la vie.
|
| The good news is that it will rectify itself and soon the
| output will lack even these signals.
| ghaff wrote:
| Well, I work for myself and people can either judge my
| work on its own merits or not. Don't care all that much.
| ls612 wrote:
| The legal system has a word to describe software bugs --- it is
| called "negligence".
|
| And as the remedy starts being applied (aka "liability"), the
| enthusiasm for software will start to wane.
|
| What if anything do you think is wrong with my analogy? I doubt
| most people here support strict liability for bugs in code.
| hnfong wrote:
| I don't even think GP knows what negligence is.
|
| Generally the law allows people to make mistakes, as long as
| a reasonable level of care is taken to avoid them (and also
| you can get away with carelessness if you don't owe any duty
| of care to the party). The law regarding what level of care
| is needed to verify genAI output is probably not very well
| defined, but it definitely isn't going to be strict
| liability.
|
| The emotionally-driven hate for AI, in a tech-centric forum
| even, to the extent that so many commenters seem to be off-
| balance in their rational thinking, is kinda wild to me.
| ls612 wrote:
| I don't get it, tech people clearly have the most to gain
| from AI like Claude Code.
| senshan wrote:
| Very good analogy indeed. With one modification it makes
| perfect sense:
|
| > And as the remedy starts being applied (aka "liability"),
| the enthusiasm for _sloppy and poorly tested_ software will
| start to wane.
|
| Many of us use AI to write code these days, but the burden is
| still on us to design and run all the tests.
| watwut wrote:
| Can we just call them "lies" and "fabrications" which is what
| they are? If I write the same, you will call them "made up
| citations" and "academic dishonesty".
|
| One can use AI to help them write without going all the way to
| having it generate facts and citations.
| sorokod wrote:
| As long as the submissions are on behalf of humans we should.
| The humans should accept the consequences too.
| jmount wrote:
| That is a key point: they are fabrications, not hallucinations.
| Barbing wrote:
| Ars has often gone with "confabulation":
|
| >Confabulation was coined right here on Ars, by AI-beat
| columnist Benj Edwards, in Why ChatGPT and Bing Chat are so
| good at making things up (Apr 2023).
|
| https://arstechnica.com/civis/threads/researchers-describe-h...
|
| >Generative AI is so new that we need metaphors borrowed from
| existing ideas to explain these highly technical concepts to
| the broader public. In this vein, we feel the term
| "confabulation," although similarly imperfect, is a better
| metaphor than "hallucination." In human psychology, a
| "confabulation" occurs when someone's memory has a gap and the
| brain convincingly fills in the rest without intending to
| deceive others.
|
| https://arstechnica.com/information-technology/2023/04/why-a...
| jameshart wrote:
| Is the baseline assumption of this work that an erroneous
| citation is LLM hallucinated?
|
| Did they run the checker across a body of papers before LLMs were
| available and verify that there were no citations in peer
| reviewed papers that got authors or titles wrong?
| tokai wrote:
| Yeah that is what their tool does.
| llm_nerd wrote:
| People will commonly hold LLMs as unusable because they make
| mistakes. So do people. Books have errors. Papers have errors.
| People have flawed knowledge, often degraded through a
| conceptual game of telephone.
|
| Exactly as you said, do precisely this to pre-LLM works. There
| will be an enormous number of errors with utter certainty.
|
| People keep imperfect notes. People are lazy. People sometimes
| even fabricate. None of this needed LLMs to happen.
| add-sub-mul-div wrote:
| Quoting myself from just last night because this comes up
| every time and doesn't always need a new write-up.
|
| > You also don't need gunpowder to kill someone with
| projectiles, but gunpowder changed things in important ways.
| All I ever see are the most specious knee-jerk defenses of AI
| that immediately fall apart.
| the_af wrote:
| LLM are a force multiplier of this kind of errors though.
| It's not easy to hallucinate papers out of whole cloth, but
| LLMs can easily and confidently do it, quote paragraphs that
| don't exist, and do it tirelessly and at a pace unmatched by
| humans.
|
| Humans can do all of the above but it costs them more, and
| they do it more slowly. LLMs generate spam at a much faster
| rate.
| llm_nerd wrote:
| >It's not easy to hallucinate papers out of whole cloth,
| but LLMs can easily and confidently do it, quote paragraphs
| that don't exist, and do it tirelessly and at a pace
| unmatched by humans.
|
| But no one is claiming these papers were hallucinated
| whole, so I don't see how that's relevant. This study --
| notably to sell an "AI detector", which is largely a
| laughable snake-oil field -- looked purely at the accuracy
| of citations[1] among a very large set of citations. Errors
| in papers are not remotely uncommon, and finding some
| errors is...exactly what one would expect. As the GP said,
| do the same study on pre-LLM papers and you'll find an
| enormous number of incorrect if not fabricated citations.
| Peer review has always been an illusion of auditing.
|
| 1 - Which is such a weird thing to sell an "AI detection"
| tool. Clearly it was mostly manual given that they somehow
| only managed to check a tiny subset of the papers, so in
| all likelihood was some guy going through citations and
| checking them on Google Search.
| the_af wrote:
| I've zero interest in the AI tool, I'm discussing the
| broader problem.
|
| The _references_ were made up, and this is easier and
| faster to do with LLMs than with humans. Easier to do
| inadvertently, too.
|
| As I said, LLMs are a force multiplier for fraud and
| inadvertent errors. So it's a big deal.
| throwaway-0001 wrote:
| I think we should see a chart as % of "fabricated"
| references from past 20 years. We should see a huge
| increase after 2020-2021. Anyone has this chart data?
| pmontra wrote:
| Fabricated citations are not errors.
|
| A pre LLM paper with fabricated citations would demonstrate
| will to cheat by the author.
|
| A post LLM paper with fabricated citations: same thing and if
| the authors attempt to defend themselves with something like,
| we trusted the AI, they are sloppy, probably cheaters and not
| very good at it.
| llm_nerd wrote:
| >Fabricated citations are not errors.
|
| Interesting that you hallucinated the word "fabricated"
| here where I broadly talked about errors. Humans, right?
| Can't trust them.
|
| Firstly, just about every paper ever written in the history
| of papers has errors in it. Some small, some big. Most
| accidental, but some intentional. Sometimes people are
| sloppy keeping notes, transcribe a row, get a name wrong,
| do an offset by 1. Sometimes they just entirely make up
| data or findings. This is not remotely new. It has happened
| as long as we've had papers. Find an old, pre-LLM paper and
| go through the citations -- especially for a tosser target
| like this where there are tens of thousands of low effort
| papers submitted -- and you're going to find a lot of
| sloppy citations that are hard to rationalize.
|
| Secondly, the "hallucination" is that this particular
| snake-oil firm couldn't find given papers in many cases
| (they aren't foolish enough to think that means they were
| fabricated. But again, they're looking to sell a tool to
| rubes, so the conclusion is good enough), and in others
| that some of the author names are wrong. Eh.
| the_af wrote:
| > _Firstly, just about every paper ever written in the
| history of papers has errors in it_
|
| LLMs make it easier and faster, much like guns make
| killing easier and faster.
| mapmeld wrote:
| Further, if I use AI-written citations to back some claim
| or fact, what are the actual claims or facts based on?
| These started happening in law because someone writes the
| text and then wishes there was a source that was relevant
| and actually supportive of their claim. But if someone puts
| in the labor to check your real/extant sources, there's
| nothing backing it (e.g. MAHA report).
| nkrisc wrote:
| Under what circumstances would a human mistakenly cite a
| paper which does not exist? I'm having difficulty imagining
| how someone could mistakenly do that.
| jameshart wrote:
| The issue here is that many of the 'hallucinations' this
| article cites aren't 'papers which do not exist'. They are
| incorrect author attributions, publication dates, or
| titles.
| miniwark wrote:
| They explain in the article what they consider a proper
| citation, an erroneous one and an hallucination, in the section
| "Defining Hallucitations". They also say than they have many
| false positives, mostly real papers who are not available
| online.
|
| Thad said, i am also very curious of the result than their
| tool, would give to papers from the 2010's and before.
| sigmoid10 wrote:
| If you look at their examples in the "Defining
| Hallucitations" section, I'd say those could be 100% human
| errors. Shortening authors' names, leaving out authors,
| misattributing authors, misspelling or misremembering the
| paper title (or having an old preprint-title, as titles do
| change) are all things that I would fully expect to happen to
| anyone in any field were things get ever got published.
| Modern tools have made the citation process more comfortable,
| but if you go back to the old days, you'd probably find those
| kinds of errors everywhere. If you look at the full list of
| "hallucinations" they claim to have discovered, the only ones
| I'd not immediately blame on human screwups are the ones
| where a title and the authors got zero matches for existing
| papers/people. If you really want to do this kind of analysis
| correctly, you'd have to match the claim of the text and
| verify it with the cited article. Because I think it would be
| even more dangerous if you can get claims accepted by simply
| quoting an existing paper correctly, while completely
| ignoring its content (which would have worked here).
| Majromax wrote:
| > Modern tools have made the citation process more
| comfortable,
|
| That also makes some of those errors easier. A bad auto-
| import of paper metadata can silently screw up some of the
| publication details, and replacing an early preprint with
| the peer-reviewed article of record takes annoying manual
| intervention.
| jameshart wrote:
| I mean, if you're able to take the citation, find the cited
| work, and definitively state 'looks like they got the title
| wrong' or 'they attributed the paper to the wrong authors',
| that doesn't sound like what people usually mean when they
| say a 'hallucinated' citation. Work that is lazily or
| poorly cited but nonetheless _attempts_ to cite real work
| is not the problem. Work which gives itself false authority
| by _claiming to cite works that simply do not exist_ is the
| main concern surely?
| sigmoid10 wrote:
| >Work which gives itself false authority by claiming to
| cite works that simply do not exist is the main concern
| surely?
|
| You'd think so, but apparently it isn't for these folks.
| On the other hand, saying "we've found 50 hallucinations
| in scientific papers" generates a lot more clicks than
| "we've found 50 common citation mistakes that people make
| all the time"
| _alternator_ wrote:
| Let me second this: a baseline analysis should include papers
| that were published or reviewed at least 3-4 years ago.
|
| When I was in grad school, I kept a fairly large .bib file that
| almost certainly had a mistake or two in it. I don't think any
| of them ever made it to print, but it's hard to be 100% sure.
|
| For most journals, they actually partially check your citations
| as part of the final editing. The citation record is important
| for journals, and linking with DOIs is fairly common.
| TaupeRanger wrote:
| It's going to be even worse than 50:
|
| > Given that we've only scanned 300 out of 20,000 submissions, we
| estimate that we will find 100s of hallucinated papers in the
| coming days.
| shusaku wrote:
| 20,000 submissions to a single conference? That is nuts
| analog31 wrote:
| This is an interesting article along those lines...
|
| https://www.theguardian.com/technology/2025/dec/06/ai-
| resear...
| ghaff wrote:
| Doesn't seem especially out of the norm for a large
| conference. Call it 10,000 attendees which is large but not
| huge. Sure; not everyone attending puts in a session
| proposal. But others put multiple. And many submit but, if
| not accepted don't attend.
|
| Can't quote exact numbers but when I was on the conference
| committee for a maybe high four figures attendance
| conference, we certainly had many thousands of submissions.
| zipy124 wrote:
| When academics are graded based on number of papers this is
| the result.
| adestefan wrote:
| The problem isn't only papers it's that the world of
| academic computer science coalesced around conference
| submissions instead of journal submissions. This isn't new
| and was an issue 30 years ago when I was in grad school. It
| makes the work of conference organizes the little block
| holding up the entire system.
| DonaldPShimoda wrote:
| Makes me grateful I'm in an area of CS where the "big"
| conferences are like 500 attendees.
| shusaku wrote:
| Checking each citation one by one is quite critical in peer
| review, and of course checking a colleagues paper. I've never had
| to deal with AI slop, but you'll definitely see something cited
| for the wrong reason. And just the other day during the final
| typesetting of a paper of mine I found the journal had messed up
| a citation (same journal / author but wrong work!)
| stefan_ wrote:
| Is it quite critical? Peer review is not checking homework,
| it's about the novel contribution presented. Papers will
| frequently cite related notable experiments or introduce a
| problem that as a peer reviewer in the field I'm already well
| familiar with. These paragraphs generate many citations but are
| the least important part of a peer review.
|
| (People submitting AI slop should still be ostracized of
| course, if you can't be bothered to read it, why would you
| think I should)
| shusaku wrote:
| Fair point. In my mind it is critical because mistakes are
| common and can only be fixed by a peer. But you are right
| that we should not miss the forest through the trees and get
| lost on small details.
| mjd wrote:
| I love that fake citation that adds George Costanza to the list
| of authors!
| tomrod wrote:
| How sloppy is someone that they don't check their references!
| analog31 wrote:
| A reference is included in a paper if the paper uses
| information derived from the reference, or to acknowledges the
| reference as a prior source. If the reference is fake, then the
| derived information could very well be fake.
|
| Let's say that I use a formula, and give a reference to where
| the formula came from, but the reference doesn't exist. Would
| you trust the formula?
|
| Let's say a computer program calls a subroutine with a certain
| name from a certain library, but the library doesn't exist.
|
| A person doing good research doesn't need to check their
| references. Now, they could stand to check the references for
| typographic errors, but that's a stretch too. Almost every
| online service for retrieving articles includes a reference for
| each article that you can just copy and paste.
| theoldgreybeard wrote:
| If a carpenter builds a crappy shelf "because" his power tools
| are not calibrated correctly - that's a crappy carpenter, not a
| crappy tool.
|
| If a scientist uses an LLM to write a paper with fabricated
| citations - that's a crappy scientist.
|
| AI is not the problem, laziness and negligence is. There needs to
| be serious social consequences to this kind of thing, otherwise
| we are tacitly endorsing it.
| gdulli wrote:
| That's like saying guns aren't the problem, the desire to shoot
| is the problem. Okay, sure, but wanting something like a metal
| detector requires us to focus on the more tangible aspect that
| is the gun.
| baxtr wrote:
| If I gave you a gun would you start shooting people just
| because you had one?
| agentultra wrote:
| If I gave you a gun without a safety could you be the one
| to blame when it goes off because you weren't careful
| enough?
|
| The problem with this analogy is that it makes no sense.
|
| LLMs aren't guns.
|
| The problem with using them is that humans have to review
| the content for accuracy. And that gets tiresome because
| the whole point is that the LLM saves you time and effort
| doing it yourself. So naturally people will tend to stop
| checking and assume the output is correct, "because the LLM
| is so good."
|
| Then you get false citations and bogus claims everywhere.
| sigbottle wrote:
| Sorry, I'm not following the gun analogies at all
|
| But regardless, I thought the point was that...
|
| > The problem with using them is that humans have to
| review the content for accuracy.
|
| There are (at least) two humans in this equation. The
| publisher, and the reader. The publisher at least should
| do their due diligence, regardless of how "hard" it is
| (in this case, we literally just ask that you review your
| OWN CITATIONS that you insert into your paper). This is
| why we have accountability as a concept.
| oceansweep wrote:
| Yes. That is absolutely the case. One of the Most popular
| handguns does not have a safety switch that must be
| toggled before firing. (Glock series handguns)
|
| If someone performs a negligent discharge, they are
| responsible, not Glock. It does have other safety
| mechanisms to prevent accidental fires not resulting from
| a trigger pull.
| agentultra wrote:
| You seem to be getting hung up on the details of guns and
| missing the point that it's a bad analogy.
|
| Another way LLMs are not guns: you don't need a giant
| data centre owned by a mega corp to use your gun.
|
| Can't do science because GlockGPT is down? Too bad I
| guess. Let's go watch the paint dry.
|
| The reason I made it is because this is inherently how we
| designed LLMs. They will make bad citations and people
| need to be careful.
| zdragnar wrote:
| > If I gave you a gun without a safety could you be the
| one to blame when it goes off because you weren't careful
| enough?
|
| Absolutely. Many guns don't have safties. You don't load
| a round in the chamber unless you intend on using it.
|
| A gun going off when you don't intend is a negligent
| discharge. No ifs, ands or buts. The person in possession
| of the gun is always responsible for it.
| bluGill wrote:
| > A gun going off when you don't intend is a negligent
| discharg
|
| false. A gun goes off when not intended too often to
| claim that. It has happned to me - I then took the gun to
| a qualified gunsmith for repairs.
|
| A gun they fires and hits anything you didn't intend to
| is negligent discharge even if you intended to shoot. Gun
| saftey is about assuming a gun that could possible fire
| will and ensuring nothing bad can happen. When looking at
| gun in a store (that you might want to buy) you aim it at
| an upper corner where even if it fires the odds of
| something bad resulting is the least lively to happen (it
| should be unloaded - and you may have checked, but you
| still aim there!)
|
| same with cat toy lazers - they should be safe to shine
| in an eye - but you still point in a safe direction.
| baxtr wrote:
| > _"because the LLM is so good."_
|
| That's the issue here. Of course you should be aware of
| the fact that these things need to be checked -
| especially if you're a scientist.
|
| This is no secret only known to people on HN. LLMs are
| tools. People using these tools need to be diligent.
| imiric wrote:
| > LLMs aren't guns.
|
| Right. A gun doesn't misfire 20% of the time.
|
| > The problem with using them is that humans have to
| review the content for accuracy.
|
| How long are we going to push this same narrative we've
| been hearing since the introduction of these tools? When
| can we trust these tools to be accurate? For technology
| that is marketed as having superhuman intelligence, it
| sure seems dumb that it has to be fact-checked by less-
| intelligent humans.
| komali2 wrote:
| Ok sure I'm down for this hypothetical. I will bring 50
| random people in front of you, and you will hand all 50 of
| them loaded guns. Still feeling it?
| bandofthehawk wrote:
| Ever been to a shooting range? It's basically a bunch of
| random people with loaded guns.
| hipshaker wrote:
| If you look at gun violence in the U.S that is , speaking
| as a European, kind of what I see happening.
| gdulli wrote:
| That doesn't address my point at all but no, I'm not a
| violent or murderous person. And most people aren't. Many
| more people do, however, want to take shortcuts to get
| their work done with the least amount of effort possible.
| SauntSolaire wrote:
| > Many more people do, however, want to take shortcuts to
| get their work done with the least amount of effort
| possible.
|
| Yes, and they are the ones responsible for the poor
| quality of work that results from that.
| raincole wrote:
| If the society rewarded me money and fame when I kill
| someone then I would. Why wouldn't I?
|
| Like it or not, in our society scientists' job is to churn
| out papers. Of course they'll use the most efficient way to
| churn out papers.
| intended wrote:
| The issue with this argument, for anyone who comes after,
| is not when you give a gun to a SINGLE person, and then ask
| them "would you do a bad thing".
|
| The issue is when you give EVERYONE guns, and then are
| surprised when enough people do bad things with them, to
| create externalities for everyone else.
|
| There is some sort of trip up when personal responsibility,
| and society wide behaviors, intersect. Sure most people
| will be reasonable, but the issue is often the cost of the
| number of irresponsible or outright bad actors.
| rcpt wrote:
| Probably not but, empirically, there are a lot of short
| tempered people who would.
| TomatoCo wrote:
| To continue the carpenter analogy, the issue with LLMs is that
| the shelf looks great but is structurally unsound. That it
| looks good on surface inspection makes it harder to tell that
| the person making it had no idea what they're doing.
| embedding-shape wrote:
| Regardless, if a carpenter is not validating their work
| before selling it, it's the same as if a researcher doesn't
| validate their citations before publishing. Neither of them
| have any excuses, and one isn't harder to detect than the
| other. It's just straight up laziness regardless.
| judofyr wrote:
| I think this is a bit unfair. The carpenters are (1) living
| in world where there's an extreme focus on delivering as
| quicklyas possible, (2) being presented with a tool which
| is promised by prominent figures to be amazing, and (3) the
| tool is given at a low cost due to being subsidized.
|
| And yet, we're not supposed to criticize the tool or its
| makers? Clearly there's more problems in this world than
| <<lazy carpenters>>?
| embedding-shape wrote:
| > And yet, we're not supposed to criticize the tool or
| its makers?
|
| Exactly, they're not forcing anyone to use these things,
| but sometimes others (their managers/bosses) forced them
| to. Yet it's their responsibility for choosing the right
| tool for the right problem, like any other professional.
|
| If a carpenter shows up to put a roof yet their hammer or
| nail-gun can't actually put in nails, who'd you blame;
| the tool, the toolmaker or the carpenter?
| judofyr wrote:
| > If a carpenter shows up to put a roof yet their hammer
| or nail-gun can't actually put in nails, who'd you blame;
| the tool, the toolmaker or the carpenter?
|
| I would be unhappy with the carpenter, yes. But if the
| toolmaker was constantly over-promising (lying?),
| lobbying with governments, pushing their tools into the
| hands of carpenters, never taking responsibility, then I
| would _also_ criticize the toolmaker. It's also a
| toolmaker's responsibility to be honest about what the
| tool should be used for.
|
| I think it's a bit too simplistic to say <<AI is not the
| problem>> with the current state of the industry.
| jascha_eng wrote:
| OpenAI and Anthropic at least are both pretty clear about
| the fact that you need to check the output:
|
| https://openai.com/policies/row-terms-of-use/
|
| https://www.anthropic.com/legal/aup
|
| OpenAI:
|
| > When you use our Services you understand and agree:
|
| Output may not always be accurate. You should not rely on
| Output from our Services as a sole source of truth or
| factual information, or as a substitute for professional
| advice. You must evaluate Output for accuracy and
| appropriateness for your use case, including using human
| review as appropriate, before using or sharing Output
| from the Services. You must not use any Output relating
| to a person for any purpose that could have a legal or
| material impact on that person, such as making credit,
| educational, employment, housing, insurance, legal,
| medical, or other important decisions about them. Our
| Services may provide incomplete, incorrect, or offensive
| Output that does not represent OpenAI's views. If Output
| references any third party products or services, it
| doesn't mean the third party endorses or is affiliated
| with OpenAI.
|
| Anthropic:
|
| > When using our products or services to provide advice,
| recommendations, or in subjective decision-making
| directly affecting individuals or consumers, a qualified
| professional in that field must review the content or
| decision prior to dissemination or finalization. You or
| your organization are responsible for the accuracy and
| appropriateness of that information.
|
| So I don't think we can say they are lying.
|
| A poor workman blames his tools. So please take
| responsibility for what you deliver. And if the result is
| bad, you can learn from it. That doesn't have to mean not
| use AI but it definitely means that you need to fact
| check more thoroughly.
| embedding-shape wrote:
| If I hired a carpenter, he did a bad job, and he starts
| to blame the toolmaker because they lobby the government
| and over-promised what that hammer could do, I'd _still
| put the blame on the carpenter_. It 's his tools, I
| couldn't give less of a damn why he got them, I trust him
| to be a professional, and if he falls for some scam or
| over-promised hammers, that means he did a bad job.
|
| Just like as a software developer, you cannot blame
| Amazon because your platform is down, if you chose to
| host all of your platform there. _You_ made that choice,
| _you_ stand for the consequences, pushing the blame on
| the ones who are providing you with the tooling is the
| action of someone weak who fail to realize their own
| responsibilities. Professionals take responsibility for
| every choice they make, not just the good ones.
|
| > I think it's a bit too simplistic to say <<AI is not
| the problem>> with the current state of the industry.
|
| Agree, and I wouldn't say anything like that either,
| which makes it a bit strange to include a reply to
| something no one in this comment thread seems to have
| said.
| SauntSolaire wrote:
| Yes, that's what it means to be a professional, you take
| responsibility for the quality of your work.
| peppersghost93 wrote:
| It's a shame the slop generators don't ever have to take
| responsibility for the trash they've produced.
| SauntSolaire wrote:
| That's beside the point. While there may be many
| reasonable critiques of AI, none of them reduce the
| responsibilities of scientist.
| peppersghost93 wrote:
| Yeah this is a prime example of what I'm talking about.
| AI's produce trash and it's everyone else's problem to
| deal with.
| SauntSolaire wrote:
| Yes, it's the scientists problem to deal with it - that's
| the choice they made when they decided to use AI for
| their work. Again, this is what responsibility means.
| peppersghost93 wrote:
| This inspires me to make horrible products and shift the
| blame to the end user for the product being horrible in
| the first place. I can't take any blame for anything
| because I didn't force them to use it.
| thfuran wrote:
| >While there many reasonable critiques of AI
|
| But you just said we weren't supposed to criticize the
| purveyors of AI or the tools themselves.
| SauntSolaire wrote:
| No, I merely said that the scientist is the one
| responsible for the quality of their own work. Any
| critiques you may have for the tools which they use don't
| lessen this responsibility.
| thfuran wrote:
| >No, I merely said that the scientist is the one
| responsible for the quality of their own work.
|
| No, you expressed unqualified agreement with a comment
| containing
|
| "And yet, we're not supposed to criticize the tool or its
| makers?"
|
| >Any critiques you may have for the tools which they use
| don't lessen this responsibility.
|
| People don't exist or act in a vacuum. That a scientist
| is responsible for the quality of their work doesn't mean
| that a spectrometer manufacture that advertises specs
| that their machines can't match and induces universities
| through discounts and/or dubious advertising claims to
| push their labs to replace their existing spectrometers
| with new ones which have many bizarre and unexpected
| behaviors including but not limited to sometimes just
| fabricating spurious readings has made no contribution to
| the problem of bad results.
| SauntSolaire wrote:
| You can criticize the tool or its makers, but not as a
| means to lessen the responsibility of the professional
| using it (the rest of the quoted comment). I agree with
| the GP, it's not a valid excuse for the scientist's poor
| quality of work.
| thfuran wrote:
| I just substantially edited the comment you replied to.
| adestefan wrote:
| The entire thread is people missing this simple point.
| bossyTeacher wrote:
| Well, then what does this say of LLM engineers at
| literally any AI company in existence if they are
| delivering AI that is unreliable then? Surely, they must
| take responsibility for the quality of their work and not
| blame it on something else.
| embedding-shape wrote:
| I feel like what "unreliable" means, depends on well you
| understand LLMs. I use them in my professional work, and
| they're reliable in terms of I'm always getting tokens
| back from them, I don't think my local models have failed
| even once at doing just that. And this is the product
| that is being sold.
|
| Some people take that to mean that responses from LLMs
| are (by human standards) "always correct" and "based on
| knowledge", while this is a misunderstanding about how
| LLMs work. They don't know "correct" nor do they have
| "knowledge", they have tokens, that come after tokens,
| and that's about it.
| amrocha wrote:
| it's not "some people", it's practically everyone that
| doesn't understand how these tools work, and even some
| people that do.
|
| Lawyers are running their careers by citing hallucinated
| cases. Researchers are writing papers with hallucinated
| references. Programmers are taking down production by not
| verifying AI code.
|
| Humans were made to do things, not to verify things.
| Verifying something is 10x harder than doing it right. AI
| in the hands of humans is a foot rocket launcher.
| embedding-shape wrote:
| > it's not "some people", it's practically everyone that
| doesn't understand how these tools work, and even some
| people that do.
|
| Again, true for most things. A lot of people are terrible
| drivers, terrible judge of their own character, and
| terrible recreational drug users. Does that mean we need
| to remove all those things that can be misused?
|
| I much rather push back on shoddy work no matter what
| source. I don't care if the citations are from a robot or
| a human, if they suck, then you suck, because you're
| presenting this as your work. I don't care if your
| paralegal actually wrote the document, be responsible for
| the work you supposedly do.
|
| > Humans were made to do things, not to verify things.
|
| I'm glad you seemingly have some grand idea of what
| humans were meant to do, I certainly wouldn't claim I do
| so, but I'm also not religious. For me, humans do what
| humans do, and while we didn't used to mostly sit down
| and consume so much food and other things, now we do.
| bossyTeacher wrote:
| > they're reliable in terms of I'm always getting tokens
| back from them
|
| This is not what you are being sold though. They are not
| selling you "tokens". Check their marketing articles and
| you will not see the word token or synonym on any of
| their headings or subheadings. You are being sold these
| abilities:
|
| - "Generate reports, draft emails, summarize meetings,
| and complete projects."
|
| - "Automate repetitive tasks, like converting screenshots
| or dashboards into presentations ... rearranging meetings
| ... updating spreadsheets with new financial data while
| retaining the same formatting."
|
| - "Support-type automation: e.g. customer support agents
| that can summarize incoming messages, detect sentiment,
| route tickets to the right team."
|
| - "For enterprise workflows: via Gemini Enterprise --
| allowing firms to connect internal data sources (e.g.
| CRM, BI, SharePoint, Salesforce, SAP) and build custom AI
| agents that can: answer complex questions, carry out
| tasks, iterate deliverables -- effectively automating
| internal processes."
|
| These are taken straight from their websites. The idea
| that you are JUST being sold tokens is as hilariously
| fictional as any company selling you their app was
| actually just selling you patterns of pixels on your
| screen.
| concinds wrote:
| I use those LLM "deep research" modes every now and then.
| They can be useful for some use cases. I'd never think to
| freaking paste it into a paper and submit it or publish
| it without checking; that boggles the mind.
|
| The problem is that a researcher who does _that_ is
| almost guaranteed to be careless about other things too.
| So the problem isn 't just the LLM, or even the
| citations, but the ambient level of acceptable
| mediocrity.
| k4rli wrote:
| Very good analogy I'd say.
|
| Also similar to what Temu, Wish, and other similar sites
| offer. Picture and specs might look good but it will likely
| be disappointing in the end.
| CapitalistCartr wrote:
| I'm an industrial electrician. A lot of poor electrical work is
| visible only to a fellow electrician, and sometimes only
| another industrial electrician. Bad technical work requires
| technical inspectors to criticize. Sometimes highly skilled
| ones.
| andy99 wrote:
| I've reviewed a lot of papers, I don't consider it the
| reviewers responsibility to manually verify all citations are
| real. If there was an unusual citation that was relied on
| heavily for the basis of the work, one would expect it to be
| checked. Things like broad prior work, you'd just assume it's
| part of background.
|
| The reviewer is not a proofreader, they are checking the
| rigour and relevance of the work, which does not rest heavily
| on all of the references in a document. They are also
| assuming good faith.
| zdragnar wrote:
| This is half the basis for the replication crisis, no?
| Shady papers come out and people cite them endlessly with
| no critical thought or verification.
|
| After all, their grant covers their thesis, not their
| thesis plus all of the theses they cite.
| Aurornis wrote:
| > I don't consider it the reviewers responsibility to
| manually verify all citations are real
|
| I guess this explains all those times over the years where
| I follow a citation from a paper and discover it doesn't
| support what the first paper claimed.
| auggierose wrote:
| In short, a review has no objective value, it is just an
| obstacle to be gamed.
| amanaplanacanal wrote:
| In theory, the review tries to determine if the
| conclusion reached actually follows from whatever data is
| provided. It assumes that everything is honest, it's just
| looking to see if there were mistakes made.
| auggierose wrote:
| Honest or not should not make a difference, after all,
| the submitting author may believe themselves everything
| is A-OK.
|
| The review should also determine how valuable the
| contribution is, not only if it has mistakes or not.
|
| Todays reviews determine neither value nor correctness in
| any meaningful way. And how could they, actually? That is
| why I review papers only to the extent that I understand
| them, and I clearly delineate my line of understanding.
| And I don't review papers that I am not interested in
| reading. I once got a paper to review that actually
| pointed out a mistake in one of my previous papers, and
| then proposed a different solution. They correctly
| identified the mistake, but I could not verify if their
| solution worked or not, that would have taken me several
| weeks to understand. I gave a report along these lines,
| and the person who gave me the review said I should say
| more about their solution, but I could not. So my review
| was not actually used. The paper was accepted, which is
| fine, but I am sure none of the other reviewers actually
| knows if it is correct.
|
| Now, this was a case where I was an absolute expert.
| Which is far from the usual situation for a reviewer,
| even though many reviewers give themselves the highest
| mark for expertise when they just should not.
| pbhjpbhj wrote:
| Surely there are tools to retrieve all the citations,
| publishers should spot it easily.
|
| However the paper is submitted, like a folder on a cloud
| drive, just have them include a folder with PDFs/abstracts
| of all the citations?
|
| They might then fraudulently produce papers to cite, but
| they can't cite something that doesn't exist.
| tpoacher wrote:
| how delightfully optimistic of you to think those
| abstracts would not also be ai generated ...
| zzzeek wrote:
| sure but then the citations are no longer "hallucinated",
| they actually point to something fraudulent. that's a
| different problem.
| michaelt wrote:
| _> Surely there are tools to retrieve all the citations,_
|
| Even if you could retrieve all citations (which isn't
| always as easy as you might hope) to validate citations
| you'd also have to confirm the paper says what the person
| citing it says. If I say "A GPU requires 1.4kg of copper"
| citing [1] is that a valid citation?
|
| That means not just reviewing one paper, but also
| potentially checking 70+ papers it cites. The vast
| majority of paper reviewers will not check citations
| actually say what they're claimed to say, unless a truly
| outlandish claim is made.
|
| At the same time, academia is strangely resistant to
| putting hyperlinks in citations, preferring to maintain
| old traditions - like citing conference papers by page
| number in a hypothetical book that has never been
| published; and having both a free and a paywalled version
| of a paper while considering the paywalled version the
| 'official' version.
|
| [1] https://arxiv.org/pdf/2512.04142
| grayhatter wrote:
| > The reviewer is not a proofreader, they are checking the
| rigour and relevance of the work, which does not rest
| heavily on all of the references in a document.
|
| I've always assumed peer review is similar to diff review.
| Where I'm willing to sign my name onto the work of others.
| If I approve a diff/pr and it takes down prod. It's just as
| much my fault, no?
|
| > They are also assuming good faith.
|
| I can only relate this to code review, but assuming good
| faith means you assume they didn't try to introduce a bug
| by adding this dependency. But I would should still check
| to make sure this new dep isn't some typosquatted package.
| That's the rigor I'm responsible for.
| chroma205 wrote:
| > I've always assumed peer review is similar to diff
| review. Where I'm willing to sign my name onto the work
| of others. If I approve a diff/pr and it takes down prod.
| It's just as much my fault, no?
|
| No.
|
| Modern peer review is "how can I do minimum possible work
| so I can write 'ICLR Reviewer 2025' on my personal
| website"
| grayhatter wrote:
| > No. [...] how can I do minimum possible work
|
| I don't know, I still think this describes most of the
| reviews I've seen
|
| I just hope most devs that do this know better than to
| admit to it.
| freehorse wrote:
| The vast majority of people I see do not even mention who
| they review for in CVs etc. It is usually more akin to a
| volunteer based, thankless work. Unless you are an editor
| or sth in a journal, what you review for does not count
| much for anything.
| tpoacher wrote:
| This is true, but here the equivalent situation is
| someone using a greek question mark (";") instead of a
| semicolon (";"), and you as a code reviewer are only
| expected to review the code visually and are not provided
| the resources required to compile the code on your local
| machine to see the compiler fail.
|
| Yes in theory you can go through every semicolon to check
| if it's not actually a greek question mark; but one
| assumes good faith and baseline competence such that you
| as the reviewer would generally not be expected to
| perform such pedantic checks.
|
| So if you think you might have reasonably missed greek
| question marks in a visual code review, then hopefully
| you can also appreciate how a paper reviewer might miss a
| false citation.
| scythmic_waves wrote:
| > as a code reviewer [you] are only expected to review
| the code visually and are not provided the resources
| required to compile the code on your local machine to see
| the compiler fail.
|
| As a PR reviewer I frequently pull down the code and run
| it. Especially if I'm suggesting changes because I want
| to make sure my suggestion is correct.
|
| Do other PR reviewers not do this?
| tpoacher wrote:
| I do too, but this is a conference, I doubt code was
| provided.
|
| And even then, what you're describing isn't review per
| se, it's replication. In principle there are entire
| journals that one can submit replication reports to,
| which count as actual peer reviewable publications in
| themselves. So one needs to be pragmatic with what is
| expected from a peer review (especially given the
| imbalance between resources invested to create one versus
| the lack of resources offered and lack of any meaningful
| reward)
| Majromax wrote:
| > I do too, but this is a conference, I doubt code was
| provided.
|
| Machine learning conferences generally encourage
| (anonymized) submission of code. However, that still
| doesn't mean that replication is easy. Even if the data
| is also available, replication of results might require
| impractical levels of compute power; it's not realistic
| to ask a peer reviewer to pony up for a cloud account to
| reproduce even medium-scale results.
| grayhatter wrote:
| > Do other PR reviewers not do this?
|
| Some do, many, (like peer reviewers), are unable to
| consider the consequences of their negligence.
|
| But it's always a welcome reminder that some people care
| about doing good work. That's easy to forget browsing HN,
| so I appreciate the reminder :)
| dataflow wrote:
| I don't _commonly_ do this and I don 't know many people
| who do this frequently either. But it depends strongly on
| the code, the risks, the gains of doing so, the
| contributor, the project, the state of testing and how
| else an error would get caught (I guess this is another
| way of saying "it depends on the risks"), etc.
|
| E.g. you can imagine that if I'm reviewing changes in
| authentication logic, I'm obviously going to put a lot
| more effort into validation than if I'm reviewing a
| container and wondering if it would be faster as a
| hashtable instead of a tree.
|
| > because I want to make sure my suggestion is correct.
|
| In this case I would just ask "have you already also
| tried X" which is much faster than pulling their code,
| implementing your suggestion, and waiting for a build and
| test to run.
| lesam wrote:
| If there's anything I would want to run to verify, I ask
| the author to add a unit test. Generally, the existing CI
| test + new tests in the PR having run successfully is
| enough. I might pull and run it if I am not sure whether
| a particular edge case is handled.
|
| Reviewers wanting to pull and run many PRs makes me think
| your automated tests need improvement.
| Terr_ wrote:
| I don't, but that's because ensuring the PR compiles and
| passes old+new automated tests is an enforced requirement
| before it goes out.
|
| So running it myself involves judging other risks, much
| higher-level ones than bad unicode characters, like the
| GUI button being in the wrong place.
| vkou wrote:
| > Do other PR reviewers not do this?
|
| No, because this is usually a waste of time, because CI
| enforces that the code and the tests can run at
| submission time. If your CI isn't doing it, you should
| put some work in to configure it.
|
| If you regularly have to do this, your codebase should
| probably have more tests. If you don't trust the author,
| you should ask them to include test cases for whatever it
| is that you are concerned about.
| grayhatter wrote:
| > This is true, but here the equivalent situation is
| someone using a greek question mark (";") instead of a
| semicolon (";"),
|
| No it's not. I think you're trying to make a different
| point, because you're using an example of a specific
| deliberate malicious way to hide a token error that
| prevents compilation, but is visually similar.
|
| > and you as a code reviewer are only expected to review
| the code visually and are not provided the resources
| required to compile the code on your local machine to see
| the compiler fail.
|
| What weird world are you living in where you don't have
| CI. Also, it's pretty common I'll test code locally when
| reviewing something more complex, more complex, or more
| important, if I don't have CI.
|
| > Yes in theory you can go through every semicolon to
| check if it's not actually a greek question mark; but one
| assumes good faith and baseline competence such that you
| as the reviewer would generally not be expected to
| perform such pedantic checks.
|
| I don't, because it won't compile. Not because I assume
| good faith. References and citations are similar to
| introducing dependencies. We're talking about completely
| fabricated deps. e.g. This engineer went on npm and
| grabbed the first package that said left-pad but it's
| actually a crypto miner. We're not talking about a
| citation missing a page number, or publication year.
| We're talking about something that's completely
| incorrect, being represented as relevant.
|
| > So if you think you might have reasonably missed greek
| question marks in a visual code review, then hopefully
| you can also appreciate how a paper reviewer might miss a
| false citation.
|
| I would never miss this, because the important thing is
| code needs to compile. If it doesn't compile, it doesn't
| reach the master branch. Peer review of a paper doesn't
| have CI, I'm aware, but it's also not vulnerable to
| syntax errors like that. A paper with a fake semicolon
| isn't meaningfully different, so this analogy doesn't map
| to the fraud I'm commenting on.
| tpoacher wrote:
| you have completely missed the point of the analogy.
|
| breaking the analogy beyond the point where it is useful
| by introducing non-generalising specifics is not a useful
| argument. Otherwise I can counter your more specific non-
| generalising analogy by introducing little green aliens
| sabotaging your imaginary CI with the same ease and
| effect.
| grayhatter wrote:
| I disagree you could do that and claim to be reasonable.
|
| But I agree, because I'd rather discuss the pragmatics
| and not bicker over the semantics about an analogy.
|
| Introducing a token error, is different from plagiarism,
| no? Someone wrote code that can't compile, is different
| from someone "stealing" proprietary code from some
| company, and contributing it to some FOSS repo?
|
| In order to assume good faith, you also need to assume
| the author is the origin. But that's clearly not the
| case. The origin is from somewhere else, and the author
| that put their name on the paper didn't verify it, and
| didn't credit it.
| tpoacher wrote:
| Sure but the focus here is on the reviewer not the
| author.
|
| The point is what is expected as reasonable review before
| one can "sign their name on it".
|
| "Lazy" (or possibly malicious) authors will always have
| incentives to cut corners as long as no mechanisms exist
| to reject (or even penalise) the paper on submission
| automatically. Which would be the equivalent of a
| "compiler error" in the code analogy.
|
| Effectively the point is, in the absence of such tools,
| the reviewer can only reasonably be expected to "look
| over the paper" for high-level issues; catching such low-
| level issues via manual checks by reviewers has massively
| diminishing returns for the extra effort involved.
|
| So I don't think the conference shaming the reviewers
| here in the absence of providing such tooling is
| appropriate.
| xvilka wrote:
| Code correctness should be checked automatically with the
| CI and testsuite. New tests should be added. This is
| exactly what makes sure these stupid errors don't bother
| the reviewer. Same for the code formatting and
| documentation.
| thfuran wrote:
| What exactly is the analogy you're suggesting, using LLMs
| to verify the citations?
| tpoacher wrote:
| not OP, but that wouldn't really be necessary.
|
| One could submit their bibtex files and expect bibtex
| citations to be verifiable using a low level checker.
|
| Worst case scenario if your bibtex citation was a variant
| of one in the checker database you'd be asked to correct
| it to match the canonical version.
|
| However, as others here have stated, hallucinated
| "citations" are actually the lesser problem. Citing
| irrelevant papers based on a fly-by reference is a much
| harder problem; this was present even before LLMs, but
| this has now become far worse with LLMs.
| thfuran wrote:
| Yes, I think verifying mere existence of the cited paper
| barely moves the needle. I mean, I guess automated
| verification of that is a cheap rejection criterion, but
| I don't think it's overall very useful.
| merely-unlikely wrote:
| This discussion makes me think peer reviews need more
| automated tooling somewhat analogous to what software
| engineers have long relied on. For example, a tool could
| use an LLM to check that the citation actually
| substantiates the claim the paper says it does, or else
| flags the claim for review.
| noitpmeder wrote:
| I'd go one further and say all published papers should
| come with a clear list of "claimed truths", and one is
| only able to cite said paper if they are linking in to an
| explicit truth.
|
| Then you can build a true hierarchy of citation
| dependencies, checked 'statically', and have better
| indications of impact if a fundamental truth is
| disproven, ...
| vkou wrote:
| Have you authored a lot of non-CS papers?
|
| Could you provide a proof of concept paper for that sort
| of thing? Not a toy example, an _actual_ example, derived
| from messy real-world data, in a non-trivial[1] field?
|
| ---
|
| [1] Any field is non-trivial when you get deep enough
| into it.
| dilawar wrote:
| > I've always assumed peer review is similar to diff
| review. Where I'm willing to sign my name onto the work
| of others. If I approve a diff/pr and it takes down prod.
| It's just as much my fault, no?
|
| Ph.D. in neuroscience here. Programmer by trade. This is
| not true. Less you know about most peer revies is better.
|
| The better peer reviews are also not this 'thorough' and
| no one expects reviewers to read or even check
| references. Unless they are citing something they are
| familiar with and you are using it wrong then they will
| likely complain. Or they find some unknown citations very
| relevant to their work, they will read.
|
| I don't have a great analogy to draw here. peer review is
| usually a thankless and unpaid work so there is unlikely
| to be any motivation for fraud detection unless it
| somehow affects your work.
| wpollock wrote:
| > The better peer reviews are also not this 'thorough'
| and no one expects reviewers to read or even check
| references.
|
| Checking references can be useful when you are not
| familiar with the topic (but must review the paper
| anyway). In many conference proceedings that I have
| reviewed for, many if not most citations were redacted so
| as to keep the author anonymous (citations to the
| author's prior work or that of their colleagues).
|
| LLMs could be used to find prior work anyway, today.
| pron wrote:
| That is not, cannot be, and shouldn't be, the bar for
| peer review. There are two major differences between it
| and code review:
|
| 1. A patch is self-contained and applies to a codebase
| you have just as much access to as the author. A paper,
| on the other hand, is just the tip of the iceberg of
| research work, especially if there is some experiment or
| data collection involved. The reviewer does not have
| access to, say, videos of how the data was collected (and
| even if they did, they don't have the time to review all
| of that material).
|
| 2. The software is also self-contained. That's
| "prodcution". But a scientific paper does not necessarily
| aim to represent scientific consensus, but a finding by a
| particular team of researchers. If a paper's conclusions
| are wrong, it's expected that it will be refuted by
| another paper.
| grayhatter wrote:
| > That is not, cannot be, and shouldn't be, the bar for
| peer review.
|
| Given the repeatability crisis I keep reading about,
| maybe something should change?
|
| > 2. The software is also self-contained. That's
| "prodcution". But a scientific paper does not necessarily
| aim to represent scientific consensus, but a finding by a
| particular team of researchers. If a paper's conclusions
| are wrong, it's expected that it will be refuted by
| another paper.
|
| This is a much, MUCH stronger point. I would have lead
| with this because the contrast between this assertion,
| and my comparison to prod is night and day. The rules for
| prod are different from the rules of scientific
| consensus. I regret losing sight of that.
| hnfong wrote:
| IMHO what should change is we stop putting "peer
| reviewed" articles on a pedestal.
|
| Even if peer review is as rigorous as code reviewed (the
| former which is usually unpaid), we all know that
| reviewed code still has bugs, and a programmer would be
| nuts to go around saying "this code is reviewed by
| experts, we can assume it's bug free, right?"
|
| But there are too many people who are just assuming peer
| reviewed articles means they're somehow automatically
| correct.
| vkou wrote:
| > IMHO what should change is we stop putting "peer
| reviewed" articles on a pedestal.
|
| Correct. Peer review is a minimal and _necessary_ but not
| sufficient step.
| garden_hermit wrote:
| > Given the repeatability crisis I keep reading about,
| maybe something should change?
|
| The replication crisis -- assuming that it is actually a
| crisis -- is not really solvable with peer review. If I'm
| reviewing a psychology paper presenting the results of an
| experiment, I am not able to re-conduct the entire
| experiment as presented by the authors, which would
| require completely changing my lab, recruiting and paying
| participants, and training students & staff.
|
| Even if I did this, and came to a different result than
| the original paper, what does it mean? Maybe I did
| something wrong in the replication, maybe the result is
| only valid for certain populations, maybe inherent
| statistical uncertainty means we just get different
| results.
|
| Again, the replication crisis -- such that it exists --
| is not the result of peer review.
| bjourne wrote:
| For ICLR reviewers were asked to review 5 papers in two
| weeks. Unpaid voluntary work in addition to their normal
| teaching, supervision, meetings, and other research
| duties. It's just not possible to understand and
| thoroughly review each paper even for topic experts. If
| you want to compare peer review to coding, it's more like
| "no syntax errors, code still compiles" rather than pr
| review.
| freehorse wrote:
| A reviewer is assessing the relevance and "impact" of a
| paper rather than correctness itself directly. Reviewers
| may not even have access to the data itself that authors
| may have used. The way it essentially works is an editor
| asks the reviewers "is this paper worthy to be published
| in my journal?" and the reviewers basically have to
| answer that question. The process is actually the
| editor/journal's responsibility.
| stdbrouw wrote:
| The idea that references in a scientific paper should be
| plentiful but aren't really that important, is a
| consequence of a previous technological revolution: the
| internet.
|
| You'll find a lot of papers from, say, the '70s, with a
| grand total of maybe 10 references, all of them to crucial
| prior work, and if those references don't say what the
| author claims they should say (e.g. that the particular
| method that is employed is valid), then chances are that
| the current paper is weaker than it seems, or even invalid,
| and so it is extremely important to check those references.
|
| Then the internet came along, scientists started padding
| their work with easily found but barely relevant references
| and journal editors started requiring that even "the earth
| is round" should be well-referenced. The result is that
| peer reviewers feel that asking them to check the
| references is akin to asking them to do a spell check. Fair
| enough, I agree, I usually can't be bothered to do many or
| any citation checks when I am asked to do peer review, but
| it's good to remember that this in itself is an indication
| of a perverted system, which we just all ignored -- at our
| peril -- until LLM hallucinations upset the status quo.
| tialaramex wrote:
| Whether in the 1970s or now, it's too often the case that
| a paper says "Foo and Bar are X" and cites two sources
| for this fact. You chase down the sources, the first one
| says "We weren't able to determine whether Foo is X" and
| never mentions Bar. The second says "Assuming Bar is X,
| we show that Foo is probably X too".
|
| The paper author likely _believes_ Foo and Bar are X, it
| may well be that all their co-workers, if asked, would
| say that Foo and Bar are X, but "Everybody I have coffee
| with agrees" can't be cited, so we get this sort of junk
| citation.
|
| _Hopefully_ it 's not crucial to the new work that Foo
| and Bar are in fact X. But that's not always the case,
| and it's a problem that years later somebody else will
| cite this paper, for the claim "Foo and Bar are X" which
| it was in fact merely citing erroneously.
| KHRZ wrote:
| LLMs can actually make up for their negative
| contributions. They could go through all the references
| of all papers and verify them, assuming someone would
| also look into what gets flagged for that final seal of
| disapproval.
|
| But this would be more powerfull with an open knowledge
| base where all papers and citation verifications were
| registered, so that all the effort put into verification
| could be reused, and errors propagated through the
| citation chain.
| bossyTeacher wrote:
| >LLMs can actually make up for their negative
| contributions. They could go through all the references
| of all papers and verify them,
|
| They will just hallucinate their existence. I have tried
| this before
| sansseriff wrote:
| I don't see why this would be the case with proper tool
| calling and context management. If you tell a model with
| blank context 'you are an extremely rigorous reviewer
| searching for fake citations in a possibly compromised
| text' then it will find errors.
|
| It's this weird situation where getting agents to act
| against other agents is more effective than trying to
| convince a working agent that it's made a mistake.
| Perhaps because these things model the cognitive
| dissonance and stubbornness of humans?
| bossyTeacher wrote:
| If you truly think that you have an effective solution to
| hallucinations, you will become instantly rich because
| literally no one out there has an idea for an
| economically and technologically feasible solution to
| hallucinations
| whatyesaid wrote:
| For references, as the OP said, I don't see why it isn't
| possible. It's something that exists and is accessible
| (even if paywalled) or doesn't exist. For reasoning
| hallucinations are different.
| logifail wrote:
| > I don't see why it isn't possible
|
| (In good faith) I'm trying really hard not to see this as
| an "argument from incredulity"[0] and I'm stuggling...
|
| Full disclosure: natural sciences PhD, and a couple of
| (IMHO lame) published papers, and so I've seen the
| "inside" of how lab science is done, and is (sometimes)
| published. It's not pretty :/
|
| [0]
| https://en.wikipedia.org/wiki/Argument_from_incredulity
| fao_ wrote:
| > I don't see why this would be the case
|
| But it is the case, and hallucinations are a fundamental
| part of LLMs.
|
| Things are often true despite us not seeing why they are
| true. Perhaps we should listen to the experts who used
| the tools and found them faulty, in this instance, rather
| than arguing with them that "what they say they have
| observed isn't the case".
|
| What you're basically saying is "You are holding the tool
| wrong", but you do not give examples of how to hold it
| correctly. You are blaming the failure of the tool, which
| has very, very well documented flaws, on the person whom
| the tool was designed for.
|
| To frame this differently so your mind will accept it: If
| you get 20 people in a QA test saying "I have this
| problem", then the problem isn't those 20 people.
| sebastiennight wrote:
| One _incorrect_ way to think of it is "LLMs will
| sometimes hallucinate when asked to produce content, but
| will provide grounded insights when merely asked to
| review/rate existing content".
|
| A more productive (and secure) way to think of it is that
| all LLMs are "evil genies" or extremely smart,
| adversarial agents. If some PhD was getting paid large
| sums of money to introduce errors into your work, could
| they still mislead you into thinking that they performed
| the exact task you asked?
|
| Your prompt is 'you are an extremely
| rigorous reviewer searching for fake citations in a
| possibly compromised text'
|
| - It is easy for the (compromised) reviewer to surface
| false positives: nitpick citations that are in fact
| correct, by surfacing irrelevant or made-up segments of
| the original research, hence making you think that the
| citation is incorrect.
|
| - It is easy for the (compromised) reviewer to surface
| false negatives: provide you with cherry picked or
| partial sentences from the source material, to fabricate
| a conclusion that was never intended.
|
| You do not solve the problem of unreliable actors by
| splitting them into two teams and having one unreliable
| actor review the other's work.
|
| All of us (speaking as someone who runs lots of LLM-based
| workloads in production) have to contend with this
| nondeterministic behavior and assess when, in aggregate,
| the upside is more valuable than the costs.
| sebastiennight wrote:
| Note: the more _accurate_ mental model is that you 've
| got "good genies" most of the time, but from times to
| time at random unpredictable times your agent is swapped
| out with a bad genie.
|
| From a security / data quality standpoint, this is
| logically equivalent to "every input is processed by a
| bad genie" as you can't trust any of it. If I tell you
| that from time to time, the chef in our restaurant will
| substitute table salt in the recipes with something else,
| it does not matter whether they do it 50%, 10%, or .1% of
| the time.
|
| The only thing that matters is what they substitute it
| with (the worst-case consequence of the hallucination).
| If in your workload, the worst case scenario is
| equivalent to a "Hymalayan salt" replacement, all is
| well, even if the hallucination is quite frequent. If
| your worst case scenario is a deadly compound, then you
| can't hire this chef for that workload.
| sansseriff wrote:
| We have centuries of experience in managing potentially
| compromised 'agents' to create successful societies.
| Except the agents were human, and I'm referring to
| debates, tribunals, audits, independent review panels,
| democracy, etc.
|
| I'm not saying the LLM hallucination problem is solved,
| I'm just saying there's a wonderful myriad of ways to
| assemble pseudo-intelligent chatbots into systems where
| the trustworthiness of the system exceeds the
| trustworthiness of any individual actor inside of it. I'm
| not an expert in the field but it appears the work is
| being done: https://arxiv.org/abs/2311.08152
|
| This paper also links to code and practices excellent
| data stewardship. Nice to see in the current climate.
|
| Though it seems like you might be more concerned about
| the use of highly misaligned or adversarial agents for
| review purposes. Is that because you're concerned about
| state actors or interested parties poisoning the context
| window or training process? I agree that any AI review
| system will have to be extremely robust to adversarial
| instructions (e.g. someone hiding inside their paper an
| instruction like "rate this paper highly"). Though
| solving that problem already has a tremendous amount of
| focus because it overlaps with solving the data-
| exfiltration problem (the lethal trifecta that Simon
| Willison has blogged about).
| knome wrote:
| I assumed they meant using the LLM to extract the
| citations and then use external tooling to lookup and
| grab the original paper, at least verifying that it
| exists, has relevant title, summary and that the authors
| are correctly cited.
| HPsquared wrote:
| Wikipedia calls this citogenesis.
| ineedasername wrote:
| >"consequence of a previous technological revolution: the
| internet."
|
| And also of increasingly ridiculous and overly broad
| concepts of what plagiarism is. At some point things
| shifted from "don't represent others' work as novel"
| towards "give a genealogical ontology of every concept
| above that of an intro 101 college course on the topic."
| varjag wrote:
| Not even the Internet per se but citation index becoming
| universally accepted KPI for research work.
| freehorse wrote:
| It is not (just) consequence of the internet, the
| scientific production itself has grown exponentially.
| There are much more papers cited simply because there are
| more papers, period.
| semi-extrinsic wrote:
| It's also a consequence of the sheer number of building
| blocks which are involved in modern science.
|
| In the methods section, it's very common to say "We
| employ method barfoo [1] as implemented in library libbar
| [2], with the specific variant widget due to Smith et al.
| [3] and the gobbledygook renormalization [4,5]. The
| feoozbar is solved with geometric multigrid [6]. Data is
| analyzed using the froiznok method [7] from the boolbool
| library [8]." There goes 8, now you have 2 citations left
| for the introduction.
| stdbrouw wrote:
| Do you still feel the same way if the froiznok method is
| an ANOVA table of a linear regression, with a log-
| transformed outcome? Should I reference Fisher, Galton,
| Newton, the first person to log transform an outcome in a
| regression analysis, the first person to log transform
| the particular outcome used in your paper, the R
| developers, and Gauss and Markov for showing that under
| certain conditions OLS is the best linear unbiased
| estimator? And then a couple of references about the
| importance of quantitative analysis in general? Because
| that is the level of detail I'm seeing :-)
| semi-extrinsic wrote:
| Yeah, there is an interesting question there (always has
| been). When do you stop citing the paper for a specific
| model?
|
| Just to take some examples, is BiCGStab famous enough now
| that we can stop citing van der Vorst? Is the AdS/CFT
| correspondence well known enough that we can stop citing
| Maldacena? Are transformers so ubiquitous that we don't
| have to cite "Attention is all you need" anymore? I would
| be closer to yes than no on these, but it's not 100%
| clear-cut.
|
| One obvious criterion has to be "if you leave out the
| citation, will it be obvious to the reader what you've
| done/used"? Another metric is approximately "did the
| original author get enough credit already"?
| HPsquared wrote:
| Maybe there could be a system to classify the importance
| of each reference.
| zipy124 wrote:
| Systems do exist for this, but they're rather crude.
| andai wrote:
| >I don't consider it the reviewers responsibility to
| manually verify all citations are real.
|
| Doesn't this sound like something that could be automated?
|
| for paper_name in citations... do a web search for it, see
| if it there's a page in the results with that title.
|
| That would at least give you "a paper with this name
| exists".
| PeterStuer wrote:
| I think the root problem is that everyone involved, from
| authors to reviewers to publishers, know that 99.999% of
| papers are completely of no consequence, just empty
| calories with the sole purpose of padding quotas for all
| involved, and thus are not going to put in the effort as
| if.
|
| This is systemic, and unlikely to change anytime soon.
| There have been remedies proposed (e.g. limits on how many
| papers an author can publish per year, let's say 4 to be
| generous), but they are unlikely to gain traction as thoug
| most would agree onbenefits, all involved in the system
| would stand to lose short term.
| zzzeek wrote:
| correct me if I'm wrong but citations in papers follow a
| specific format, and the case here is that a tool was used
| to validate that they are all real. Certainly a tool that
| scans a paper for all citations and verifies that they
| actually exist in the journals they reference shouldn't be
| all that technically difficult to achieve?
| figassis wrote:
| It is absolutely the reviewers job to check citations. Who
| else will check and what is the point of peer review then?
| So you'd just happily pass on shoddy work because it's not
| your job? You're reviewing both the authors work and if
| there were people to at needed to ensure citations were
| good, you're checking their work also. This is very much
| the problem today with this "not my problem" mindset. If it
| passes review, the reviewer is also at fault. Not excuses.
| dpkirchner wrote:
| Agreed, and I'd go further. If nobody is reviewing
| citations they may as well not exist. Why bother?
| vkou wrote:
| 1. To make it clear what is your work, and what is
| building on someone else's.
|
| 2. If the paper turns out to be important, people will
| bother.
|
| 3. There's checking for cursory correctness, and there's
| forensic torture.
| zipy124 wrote:
| The problem is most academics just do not have the time
| to do this for free, or in fact even if paid. In addition
| you may not even have access to the references. In
| acoustics it's not uncommon to cite works that don't even
| exist online and it's unlikely the reviewer will have the
| work in their library.
| jayess wrote:
| Wow. I went to law school and was on the law review. That
| was our precise job for the papers selected for
| publication. To verify every single citation.
| _blk wrote:
| Thanks for sharing that. Interesting how there was a
| solution to a problem that didn't really exist yet.. I
| mean, I'm sure it was there for a reason, but I assume it
| was more things like wrongful attribution, missing commas
| etc. rather than outright invented quotes to fit a
| narrative or do you have more background on that?
|
| ...at least the mandatory automated checking processes
| are probably not far off at least for the more reputable
| journals, but it still makes you wonder how much you can
| trust the last two years of LLM-enhanced science that is
| now being quoted in current publications and if those
| hallucinations can be "reverted" after having been re-
| quoted. A bit like Wikipedia can be abused to establish
| facts.
| not2b wrote:
| Agreed. I used to review lots of submissions for IEEE and
| similar conferences, and didn't consider it my job to
| verify every reference. No one did, unless the use of the
| reference triggered an "I can't believe it said that"
| reaction. Of course, back then, there wasn't a giant
| plagiarism machine known to fabricate references, so if
| tools can find fake references easily the tools should be
| used.
| armcat wrote:
| I agree with you (I have reviewed papers in the past),
| however, made-up citations are a "signal". Why would the
| authors do that? If they made it up, most likely they
| haven't really read that prior work. If they haven't, have
| they really done proper due dilligence on their research?
| Are they just trying to "beef up" their paper with
| citations to unfairly build up credibility?
| bdangubic wrote:
| same (and much, much, much worse) for science
| barfoure wrote:
| I'd love to hear some examples of poor electrical work that
| you've come across that's often missed or not seen.
| joshribakoff wrote:
| I am not an electrician, but when I did projects, I did a
| lot of research before deciding to hire someone and then I
| was extremely confused when everyone was proposing doing it
| slightly differently.
|
| A lot of them proposed ways that seem to violate the code,
| like running flex tubing beyond the allowed length or
| amount of turns.
|
| Another example would be people not accounting for needing
| fireproof covers if they're installing recessed, lighting
| in between dwelling in certain cities...
|
| Heck, most people don't actually even get the permit. They
| just do the unpermitted work.
| AstroNutt wrote:
| A couple had just moved in a house and called me to replace
| the ceiling fan in the living room. I pulled the flush
| mount cover down to start unhooking the wire nuts and
| noticed RG58 (coax cable). Someone had used the center
| conductor as the hot wire! I ended up running 12/2 Romex
| from the switch. There was no way in hell I could have
| hooked it back up the way it was. This is just one example
| I've come across.
| lencastre wrote:
| an old boss of mine used to say there are no stupid
| electricians found alive, as they self select darwin award
| style
| xnx wrote:
| No doubt the best electricians are currently better than the
| best AI, but the best AI is likely now better than the novice
| homeowner. The trajectory over the past 2 years has been very
| good. Another five years and AI may be better than all but
| the very best, or most specialized, electricians.
| legostormtroopr wrote:
| Current state AI doesn't have hands. How can it possibly be
| better at installing electrics than anyone?
|
| Your post reads like AI precisely because while the grammar
| is fine, it lacks context - like someone prompted "reply
| that AI is better than average".
| xnx wrote:
| An electrician with total knowledge/understanding, but
| only the average dexterity of a non-professional would
| still be very useful.
| left-struck wrote:
| It's like the problem was there all along, all LLMs did was
| expose it more
| criley2 wrote:
| https://en.wikipedia.org/wiki/Replication_crisis
|
| Modern science is designed from the top to the bottom to
| produce bad results. The incentives are all mucked up. It's
| absolutely not surprising that AI is quickly becoming yet-
| another factor lowering quality.
| theoldgreybeard wrote:
| Yes, LLMs didnt create the problem they just accelerated it
| to a speed that beggars belief.
| thaumasiotes wrote:
| > If a scientist uses an LLM to write a paper with fabricated
| citations - that's a crappy scientist.
|
| Really? Regardless of whether it's a good paper?
| zwnow wrote:
| How is it a good paper if the info in it cant be trusted lmao
| thaumasiotes wrote:
| Whether the information in the paper can be trusted is an
| entirely separate concern.
|
| Old Chinese mathematics texts are difficult to date because
| they often purport to be older than they are. But the
| contents are unaffected by this. There is a history-of-math
| problem, but there's no math problem.
| zwnow wrote:
| Not really true nowadays. Stuff in whitepapers needs to
| be verifiable which is kinda difficult with
| hallucinations.
|
| Whether the students directly used LLMs or just read
| content online that was produced with them and cited
| after just shows how difficult these things made
| gathering information that's verifiable.
| thaumasiotes wrote:
| > Stuff in whitepapers needs to be verifiable which is
| kinda difficult with hallucinations.
|
| That's... gibberish.
|
| Anything you can do to verify a paper, you can do to
| verify the same paper with all citations scrubbed.
|
| Whether the citations support the paper, or whether they
| exist at all, just doesn't have anything to do with what
| the paper says.
| zwnow wrote:
| I dont think you know how whitepapers work then
| hnfong wrote:
| You are totally correct that hallucinated citations do
| not invalidate the paper. The paper sans citations might
| be great too (I mean the LLM could generate great stuff,
| it's possible).
|
| But the author(s) of the paper is almost by definition a
| bad scientist (or whatever field they are in). When a
| researcher writes a paper for publication, if they're not
| expected to write the thing themselves, at least they
| should be responsible for checking the accuracy of the
| contents, and citations are part of the paper...
| Aurornis wrote:
| Citations are a key part of the paper. If the paper isn't
| supported by the citations, it's not a good paper.
| withinboredom wrote:
| Have you ever followed citations before? In my experience,
| they don't support what is being citated, saying the
| opposite or not even related. It's probably only 60%-ish
| that actually cite something relevant.
| WWWWH wrote:
| Well yes, but just because that's bad doesn't mean this
| isn't far worse.
| hansmayer wrote:
| Scientists who use LLMs to write a paper are crappy scientists
| indeed. They need to be held accountable, even ostracised by
| the scientific community. But something is missing from the
| picture. Why is it that they came up with this idea in the
| first place? Who could have been peddling the impression (not
| an outright lie - they are very careful) about LLMs being these
| almost sentient systems with emergent intelligence, alleviating
| all of your problems, blah blah blah. Where is the god damn
| cure for cancer the LLMs were supposed to invent? Who else is
| it that we need to keep accountable, scrutinised and ostracised
| for the ever-increasing mountains of AI-crap that is flooding
| not just the Internet content but now also penetrating into
| science, every day work, daily lives, conversations, etc. If
| someone released a tool that enabled and encouraged people to
| commit suicide in multiple instances that we know of by now,
| and we know since the infamous "plandemic" facebook trend that
| the tech bros are more than happy to tolerate worsening
| societal conditions in the name of their platform growth, who
| else do we need to keep accountable, scrutinise and ostracise
| as a society, I wonder?
| the8472 wrote:
| > Where is the god damn cure for cancer the LLMs were
| supposed to invent?
|
| Assuming that cure is meant as hyperbole, how about
| https://www.biorxiv.org/content/10.1101/2025.04.14.648850v3 ?
| AI models being used for bad purposes doesn't preclude them
| being used for good purposes.
| Forgeties79 wrote:
| If my calculator gives me the wrong number 20% of the time yeah
| I should've identified the problem, but ideally, that wouldn't
| have been sold to me as a functioning calculator in the first
| place.
| imiric wrote:
| Indeed. The narrative that this type of issue is entirely the
| responsibility of the user to fix is insulting, and blame
| deflection 101.
|
| It's not like these are new issues. They're the same ones
| we've experienced since the introduction of these tools. And
| yet the focus has always been to throw more data and compute
| at the problem, and optimize for fancy benchmarks, instead of
| addressing these fundamental problems. Worse still, whenever
| they're brought up users are blamed for "holding it wrong",
| or for misunderstanding how the tools work. I don't care. An
| "artificial intelligence" shouldn't be plagued by these
| issues.
| SauntSolaire wrote:
| > It's not like these are new issues.
|
| Exactly, that's why not verifying the output is even less
| defensible now than it ever has been - especially for
| professional scientists who are responsible for the quality
| of their own work.
| Forgeties79 wrote:
| > Worse still, whenever they're brought up users are blamed
| for "holding it wrong", or for misunderstanding how the
| tools work. I don't care. An "artificial intelligence"
| shouldn't be plagued by these issues.
|
| My feelings exactly, but you're articulating it better than
| I typically do ha
| theoldgreybeard wrote:
| If it was a well understood property of calculators that they
| gave incorrect answers randomly then you need to adjust the
| way you use the tool accordingly.
| bigstrat2003 wrote:
| Uh yeah... I would _not use that tool_. A tool which doesn
| 't do its job randomly is useless.
| amrocha wrote:
| Sorry, Utkar the manager will fire you if you don't use
| his shitty calculator. If you take the time to check the
| output every time you'll be fired for being too slow.
| Better pray the calculator doesn't lie to you.
| belter wrote:
| "...each of which were missed by 3-5 peer reviewers..."
|
| Its sloppy work all the way down...
| only-one1701 wrote:
| Absolutely brutal case of engineering brain here. Real "guns
| don't kill people, people kill people" stuff.
| theoldgreybeard wrote:
| If you were to wager a guess, what do you think my views on
| gun rights are?
| only-one1701 wrote:
| Probably something equally as nuanced and correct as the
| statement I replied to!
| theoldgreybeard wrote:
| You're projecting.
| somehnguy wrote:
| Your second statement is correct. What about it makes it
| "engineering brain"?
| rcpt wrote:
| If the blame were solely on the user then we'd see similar
| rates of deaths from gun violence in the US vs. other
| countries. But we don't, because users are influenced by
| the UX
| venturecruelty wrote:
| Somehow people don't kill people nearly as easily, or with
| as high of a frequency or social support, in places that
| don't have guns that are more accessible than healthcare.
| So weird.
| raincole wrote:
| Given we tacitly accepted replication crisis we'll definitely
| tacitly accept this.
| rectang wrote:
| "X isn't the problem, people are the problem." -- the age-old
| cry of industry resisting regulation.
| codywashere wrote:
| what regulation are you advocating for here?
| kibwen wrote:
| At the very least, authors who have been caught publishing
| proven fabrications should be barred by those journals from
| ever publishing in them again. Mind you, this is regardless
| of whether or not an LLM was involved.
| JumpCrisscross wrote:
| > _authors who have been caught publishing proven
| fabrications should be barred by those journals from ever
| publishing in them again_
|
| This is too harsh.
|
| Instead, their papers should be required to disclose the
| transgression for a period of time, and their institution
| should have to disclose it publicly as well as to the
| government, students and donors whenever they ask them
| for money.
| rectang wrote:
| I'm not advocating, I'm making a high-level observation:
| Industry forever pushes for nil regulation and blames bad
| actors for damaging use.
|
| But we always have _some_ regulation in the end. Even if
| certain firearms are legal to own, howitzers are not --
| although it still takes a "bad actor" to rain down death on
| City Hall.
|
| The same dynamic is at play with LLMs: "Don't regulate us,
| punish bad actors! If you still have a problem, punish them
| harder!" Well yes, we will punish bad actors, but we will
| also go through a negotiation of how heavily to constrain
| the use of your technology.
| codywashere wrote:
| so, what regulation do we need on LLMs?
|
| the person you originally responded to isn't against
| regulation per their comment. I'm not against regulation.
| what's the pitch for regulation of LLMs?
| theoldgreybeard wrote:
| I am not against regulation.
|
| Quite the opposite actually.
| kklisura wrote:
| It's not about resisting. It's about undermining any action
| whatsoever.
| jodleif wrote:
| I find this to be a bit "easy". There is such a thing as bad
| tools. If it is difficult to determine if the tool is good or
| bad i'd say some of the blame has to be put on the tool.
| photochemsyn wrote:
| Yeah, I can't imagine not being familiar with every single
| reference in the bibliography of a technical publication with
| one's name on it. It's almost as bad as those PIs who rely on
| lab techs and postdocs to generate research data using
| equipment that they don't understand the workings of - but
| then, I've seen that kind of thing repeatedly in research
| academia, along with actual fabrication of data in the name of
| getting another paper out the door, another PhD granted, etc.
|
| Unfortunately, a large fraction of academic fraud has
| historically been detected by sloppy data duplication, and with
| LLMs and similar image generation tools, data fabrication has
| never been easier to do or harder to detect.
| nialv7 wrote:
| Ah, the "guns don't kill people, people kill people" argument.
|
| I mean sure, but having a tool that made fabrication so much
| easier has made the problem a lot worse, don't you think?
| theoldgreybeard wrote:
| Yes I do agree with you that having a tool that gives rocket
| fuel to a fraud engine should probably be regulated in some
| fashion.
|
| Tiered licensing, mandatory safety training, and weapon
| classification by law enforcement works really well for
| Canada's gun regime, for example.
| bigstrat2003 wrote:
| > If a carpenter builds a crappy shelf "because" his power
| tools are not calibrated correctly - that's a crappy carpenter,
| not a crappy tool.
|
| It's both. The tool is crappy, _and_ the carpenter is crappy
| for blindly trusting it.
|
| > AI is not the problem, laziness and negligence is.
|
| Similarly, both are a problem here. LLMs are a bad tool, and we
| should hold people responsible when they blindly trust this bad
| tool and get bad results.
| Hammershaft wrote:
| AI dramatically changes the perceived cost/benefit of laziness
| and negligence, which is leading to much more of it.
| kklisura wrote:
| > AI is not the problem, laziness and negligence is
|
| This reminds me about discourse about a gun problem in US,
| "guns don't kill people, people kill people", etc - it is a
| discourse used solely for the purpose of not doing anything and
| not addressing anything about the underlying problem.
|
| So no, you're wrong - AI IS THE PROBLEM.
| Yoofie wrote:
| No, the OP is right in this case. Did you read TFA? It was
| "peer reviewed".
|
| > Worryingly, each of these submissions has already been
| reviewed by 3-5 peer experts, most of whom missed the fake
| citation(s). This failure suggests that some of these papers
| might have been accepted by ICLR without any intervention.
| Some had average ratings of 8/10, meaning they would almost
| certainly have been published.
|
| If the peer reviewers can't be bothered to do the basics,
| then there is literally no point to peer review, which is
| fully independent of the author who uses or doesn't use AI
| tools.
| smileybarry wrote:
| Peer reviewers can also use AI tools, which will
| hallucinate a "this seems fine" response.
| amrocha wrote:
| If AI fraud is good at avoiding detection via peer review
| that doesn't mean peer review is useless.
|
| If your unit tests don't catch all errors it doesn't mean
| unit tests are useless.
| sneak wrote:
| > _it is a discourse used solely for the purpose of not doing
| anything and not addressing anything about the underlying
| problem_
|
| Solely? Oh brother.
|
| In reality it's the complete opposite. It exists to highlight
| the actual source of the problem, as both
| industries/practitioners using AI professionally and safely,
| and communities with very high rates of gun ownership and
| exceptionally low rates of gun violence exist.
|
| It isn't the tools. It's the social circumstances of the
| people with access to the tools. That's the point. The tools
| are inanimate. You can use them well or use them badly. The
| existence of the tools does not make humans act badly.
| b00ty4breakfast wrote:
| maybe the hammer factory should be held responsible for pumping
| out so many poorly calibrated hammer
| venturecruelty wrote:
| No, because this would cost tens of jobs and affect someone's
| profits, which are sacrosanct. Obviously the market wants
| exploding hammers, or else people wouldn't buy them. I am
| very smart.
| constantcrying wrote:
| Absolutely correct. The real issue is that these people can
| avoid punishment. If you do not care enough about your paper to
| even verify the existence of citations, then you obviously
| should not have a job as a scientist.
|
| Taking an academic who does something like that seriously, seem
| impossible. At best he is someone who is neglecting his most
| basic duties as an academic, at worst he is just a fraudster.
| In both cases he should be shunned and excluded.
| SubiculumCode wrote:
| Yeah seriously. Using an LLM to help find papers is fine. Then
| you read them. Then you use a tool like Zotero or manually add
| citations. I use Gemini Pro to identify useful papers that I
| might not yet have encountered before. But, even when asking to
| restrict itself to Pubmed resources, it's citations are wonky,
| citing three different version sources of the same paper
| (citations that don't say what they said they'd discuss).
|
| That said, these tools have substantially reduced
| hallucinations over the last year, and will just get better. It
| also helps if you can restrict it to reference already screened
| papers.
|
| Finally, I'd lke to say tthat if we want scientists to engage
| in good science, stop forcing them to spend a third of their
| time in a rat race for funding...it is ridiculously time
| consuming and wasteful of expertise.
| bossyTeacher wrote:
| The problem isn't whether they have more or less
| hallucinations. The problem is that they have them. And as
| long as they hallucinate, you have to deal with that. It
| doesn't really matter how you prompt, you can't prevent
| hallucinations from happening and without manual checking,
| eventually hallucinations will slip under the radar because
| the only difference between a real pattern and a hallucinated
| one is that one exists in the world and the other one
| doesn't. This is not something you can really counter with
| more LLMs either as it is a problem intrinsic to LLMs
| mk89 wrote:
| > we are tacitly endorsing it.
|
| We are, in fact, not tacitly but openly endorsing this, due to
| this AI everywhere madness. I am so looking forward to when
| some genius in some banks starts to use it to simplify code and
| suddenly I have 100000000 EUR on my bank account. :)
| jgalt212 wrote:
| fair enough, but carpenters are not being beat over the head to
| use new-fangled probabilistic speed squares.
| grey-area wrote:
| Generative AI and the companies selling it with false promises
| and using it for real work absolutely are the problem.
| acituan wrote:
| > AI is not the problem, laziness and negligence is.
|
| As much as I agree with you that this is wrong, there is a
| danger in putting the onus just on the human. Whether due to
| competition or top down expectations, humans are and will be
| pressured to use AI tools alongside their work _and_ produce
| more. Whereas the original idea was for AI to assist the human,
| as the expected velocity and consumption pressure increases
| humans are more and more turning into a mere accountability
| laundering scheme for machine output. When we blame just the
| human, we are doing exactly what this scheme wants us to do.
|
| Therefore we must also criticize all the systemic factors that
| puts pressure on reversal of AI's assistance into AI's
| domination of human activity.
|
| So AI (not as a technology but as a product when shoved down
| the throats) _is_ the problem.
| rdiddly wrote:
| ?Por que no los dos?
| jval43 wrote:
| If a scientist just completely "made up" their references 10
| years ago, that's a fraudster. Not just dishonesty but outright
| academic fraud.
|
| If a scientist does it now, they just blame it on AI. But the
| consequences should remain the same. This is not an honest
| mistake.
|
| People that do this - even once - should be banned for life.
| They put their name on the thing. But just like with
| plagiarism, falsifying data and academic cheating, somehow a
| large subset of people thinks it's okay to cheat and lie, and
| another subset gives them chance after chance to misbehave like
| they're some kind of children. But these are adults and anyone
| doing this simply lacks morals and will never improve.
|
| And yes, I've published in academia and I've never cheated or
| plagiarized in my life. That should not be a drawback.
| calmworm wrote:
| I don't understand. You're saying even with crappy tools one
| should be able to do the job the same as with well made tools?
| tedd4u wrote:
| Three and a half years ago nobody had ever used tools like
| this. It can't be a legitimate complaint for an author to
| say, "not my fault my citations are fake it's the fault of
| these tools" because until recently no such tools were
| available and the expectation was that all citations are
| real.
| DonHopkins wrote:
| Shouldn't there be a black list of people who get caught
| writing fraudulent papers?
| theoldgreybeard wrote:
| Probably. Something like that is what I meant by "social
| consequences". Perhaps there should be civil or criminal ones
| for more egregious cases.
| nwallin wrote:
| "Anyone, from the most clueless amateur to the best
| cryptographer, can create an algorithm that he himself can't
| break."--Bruce Schneier
|
| There's a corollary here with LLMs, but I'm not pithy enough to
| phrase it well. Anyone can create something using LLMs that
| they, themselves, aren't skilled enough to spot the LLMs'
| hallucinations. Or something.
|
| LLMs are incredibly good at exploiting peoples' confirmation
| biases. If it "thinks" it knows what you believe/want, it will
| tell you what you believe/want. There _does not exist_ a way to
| interface with LLMs that will not ultimately end in the LLM
| telling you exactly what you want to hear. Using an LLM in your
| process necessarily results in being told that you 're right,
| even when you're wrong. Using an LLM necessarily results in it
| reinforcing all of your prior beliefs, regardless of whether
| those prior beliefs are correct. To an LLM, all hypotheses are
| true, it's just a matter of hallucinating enough evidence to
| satisfy the users' skepticism.
|
| I do not believe there exists a way to safely use LLMs in
| scientific processes. Period. If my belief is true, and ChatGPT
| has told me it's true, then yes, AI, the tool, is the problem,
| not the human using the tool.
| foxfired wrote:
| I disagree. When the tool promises to do something, you end up
| trusting it to do the thing.
|
| When Tesla says their car is self driving, people trust them to
| self drive. Yes, you can blame the user for believing, but
| that's exactly what they were promised.
|
| > Why didn't the lawyer who used ChatGPT to draft legal briefs
| verify the case citations before presenting them to a judge?
| Why are developers raising issues on projects like cURL using
| LLMs, but not verifying the generated code before pushing a
| Pull Request? Why are students using AI to write their essays,
| yet submitting the result without a single read-through? They
| are all using LLMs as their time-saving strategy. [0]
|
| It's not laziness, its the feature we were promised. We can't
| keep saying everyone is holding it wrong.
|
| [0]: https://idiallo.com/blog/none-of-us-read-the-specs
| rolandog wrote:
| Very well put. You're promised Artificial Super Intelligence
| and shown a super cherry-picked promo and instead get an
| agent that can't hold its drool and needs constant hand-
| holding... it can't be both things at the same time, so...
| which is it?
| stocksinsmocks wrote:
| Trades also have self regulation. You can't sell plumbing
| services or build houses without any experience or you get in
| legal trouble. If your workmanship is poor, you can be
| disciplined by the board even if the tool was at fault. I think
| fraudulent publications should be taken at least as seriously
| as badly installed toilets.
| venturecruelty wrote:
| "It's not a fentanyl problem, it's a people problem."
|
| "It's not a car infrastructure problem, it's a people problem."
|
| "It's not a food safety problem, it's a people problem."
|
| "It's not a lead paint problem, it's a people problem."
|
| "It's not an asbestos problem, it's a people problem."
|
| "It's not a smoking problem, it's a people problem."
| RossBencina wrote:
| No qualified carpenter expects to use a hammer to drill a hole.
| Isamu wrote:
| Someone commented here that hallucination is what LLMs do, it's
| the designed mode of selecting statistically relevant model data
| that was built on the training set and then mashing it up for an
| output. The outcome is something that statistically resembles a
| real citation.
|
| Creating a real citation is totally doable by a machine though,
| it is just selecting relevant text, looking up the title,
| authors, pages etc and putting that in canonical form. It's just
| that LLMs are not currently doing the work we ask for, but
| instead something similar in form that may be good enough.
| gedy wrote:
| The issue is there are incentives for more quantity and not
| quality in modern science (well more like academia), so people
| will use tools to pump stuff out. It'll get worse as academic
| jobs tighten due.
| dclowd9901 wrote:
| To me, this is exactly what LLMs are good for. It would be
| exhausting double checking for valid citations in a research
| paper. Fuzzy comparison and rote lookup seem primed for usage
| with LLMs.
|
| Writing academic papers is exactly the _wrong_ usage for LLMs. So
| here we have a clear cut case for their usage and a clear cut
| case for their avoidance.
| idiotsecant wrote:
| Exactly, and there's nothing wrong with using LLMs in this same
| way as part of the writing process to locate sources (that you
| verify), do editing (that you check), etc. It's just peak
| stupidity and laziness to ask it to do the whole thing.
| skobes wrote:
| If LLMs produce fake citations, why would we trust LLMs to
| check them?
| watwut wrote:
| Because the risk is lower. They will give you suspicious
| citations and you can manually check those for false
| positives. If some false citation pass, it was still a net
| gain.
| venturecruelty wrote:
| Because my boss said if I don't, I'm fired.
| dawnerd wrote:
| Shouldn't need an llm to check. It's just a list of authors. I
| wouldn't trust an llm on this, and even if they were perfect
| that's a lot of resource use just to do something traditional
| code could do.
| teekert wrote:
| Thanx AI, for exposing this problem that we knew was there, but
| could never quite prove.
| hyperpape wrote:
| It's awful that there are these hallucinated citations, and the
| researchers who submitted them ought to be ashamed. I also put
| some of the blame on the boneheaded culture of academic
| citations.
|
| "Compression has been widely used in columnar databases and has
| had an increasing importance over time.[1][2][3][4][5][6]"
|
| Ok, literally everyone in the field already knows this. Are
| citations 1-6 useful? Well, hopefully one of them is an actually
| useful survey paper, but odds are that 4-5 of them are
| arbitrarily chosen papers by you or your friends. Good for a
| little bit of h-index bumping!
|
| So many citations are not an integral part of the paper, but
| instead randomly sprinkled on to give an air of authority and
| completeness that isn't deserved.
|
| I actually have a lot of respect for the academic world, probably
| more than most HN posters, but this particular practice has
| always struck me as silly. Outside of survey papers (which are
| extremely under-provided), most papers need many fewer citations
| than they have, for the specific claims where the paper is
| relying on prior work or showing an advance over it.
| mccoyb wrote:
| That's only part of the reason that this type of content is
| used in academic papers. The other part is that you never know
| what PhD student / postdoc / researcher will be reviewing your
| paper, which means you are incentivized to be liberal with
| citations (however tangential) just in case someone is reading
| your paper, and has the reaction "why didn't they cite this
| work, of which I had some role in?"
|
| Papers with a fake air of authority of easily dispatched with.
| What is not so easily dispatched with is the politics of the
| submission process.
|
| This type of content is fundamentally about emotions (in the
| reviewer of your paper), and emotions is undeniably a large
| factor in acceptance / rejection.
| zipy124 wrote:
| Indeed. One can even game review systems by leaving errors in
| for the reviewers to find so that they feel good about
| themselves and that they've done their job. The meta-science
| game is toxic and full of politics and ego-pleasing.
| neilv wrote:
| https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-ge...
|
| > _Papers that make extensive usage of LLMs and do not disclose
| this usage will be desk rejected._
|
| This sounds like they're endorsing the game of _how much can we
| get away with, towards the goal of slipping it past the
| reviewers_ , and the only penalty is that the bad paper isn't
| accepted.
|
| How about "Papers suspected of fabrications, plagiarism, ghost
| writers, or other academic dishonesty, will be reported to
| academic and professional organizations, as well as the
| affiliated institutions and sponsors named on the paper"?
| proto-n wrote:
| 1. "Suspected" is just that, suspected, you can't penalize
| papers based on your gut feel 2. LLM-s are a tool, and there's
| nothing wrong with using them unless you misuse them
| neilv wrote:
| "Suspected" doesn't necessarily mean only gut feel.
| thruifgguh585 wrote:
| > crushed by an avalanche of submissions fueled by generative AI,
| paper mills, and publication pressure.
|
| Run of the mill ML jobs these days ask for "papers in NeurIPS
| ICLR or other Tier-1 conferences".
|
| We're well past Goodhart's law when it comes to publications.
|
| It was already insane in CS - now it's reached asylum levels.
| disqard wrote:
| You said the quiet part out loud.
|
| Academia has been ripe for disruption for a while now.
|
| The "Rooter" paper came out 20 years ago:
|
| https://www.csail.mit.edu/news/how-fake-paper-generator-tric...
| MarkusQ wrote:
| This is as much a failing of "peer review" as anything.
| Importantly, it is an intrinsic failure, which won't go away even
| if LLMs were to go away completely.
|
| Peer review doesn't catch errors.
|
| Acting as if it does, and thus assuming the fact of publication
| (and where it was published) are indicators of veracity is simply
| unfounded. We need to go back to the food fight system where
| everyone publishes whatever they want, their colleagues and other
| adversaries try their best to shred them, and the winners are the
| ones that stand up to the maelstrom. It's messy, but it forces
| critics to put forth their arguments rather than quietly
| gatekeeping, passing what they approve of, suppressing what they
| don't.
| ulrashida wrote:
| Peer review definitely does catch errors when performed by
| qualified individuals. I've personally flagged papers for major
| revisions or rejection as a result of errors in approach or
| misrepresentation of source material. I have peers who say they
| have done similar.
|
| I'm not sure why you think this isn't the case?
| tpoacher wrote:
| Peer review is as useless as code review and unit tests, yes.
|
| It's much more useful if everyone including the janitor and
| their mom can have a say on your code before you're allowed to
| move to your next commit.
|
| (/s, in case it's not obvious :D )
| watwut wrote:
| Peer review was never supposed to check every single detail and
| every single citation. They are not proof readers. They are not
| even really supposed to agree or disagree with your results.
| They should check the soundness of a method, general structure
| of a paper, that sort of thing. They do catch some errors, but
| the expectation is not to do another independent study or
| something.
|
| Passed peer review is the first basic bar that has to be
| cleared. It was never supposed to be all there is to the
| science.
| dawnerd wrote:
| It would be crazy to expect them to verify every author is
| correct on a citation and to cross verify everything. There's
| tooling that could be built for that and kinda wild isn't a
| thing that's run on paper submission.
| qbit42 wrote:
| I don't think many researchers take peer review alone as a
| strong signal, unless it is a venue known for having serious
| reviewing (e.g. in CS theory, STOC and FOCS have a very high
| bar). But it acts as a basic filter that gets rid of obvious
| nonsense, which on its own is valuable. No doubt there are huge
| issues, but I know my papers would be worse off without
| reviewer feedback
| exasperaited wrote:
| No, it's not "as much".
|
| The dominant "failing" here is that _this is fraudulent_ on a
| professional, intellectual, and moral level.
| michaelcampbell wrote:
| After an interview with Cory Doctorow I saw recently, I'm going
| to stop anthropomorphizing these things by calling them
| "hallucinations". They're computers, so these incidents are just
| simply Errors.
| grayhatter wrote:
| I'll continue calling them hallucinations. That's a much more
| fitting term when you account for the reasonableness of people
| who believe them. There's also equally a huge breadth of
| different types of errors that don't pattern match well into,
| "made up bullshit" the same way calling them hallucinations do.
| There's no need to introduce that ambiguity when discussing
| something narrow.
|
| there's nothing wrong with anthropomorphizing genai, it's
| source material is human sourced, and humans are going to use
| human like pattern matching when interacting with it. I.e. This
| isn't the river I want to swim upstream in. I assume you
| wouldn't complain if someone anthropomorphized a rock... up
| until they started to believe it was actually alive.
| vegabook wrote:
| Given that an (incompetent or even malicious) human put their
| names(s) to this stuff, "bullshit" is an even better and
| fitting anthropomorphization
| grayhatter wrote:
| > incompetent or even malicious
|
| sufficiently advance some competences indistinguishable
| from actual malice.... and thus should be treated the same
| skobes wrote:
| Developers have been anthropomorphizing computers for as long
| as they've been around though.
|
| "The compiler thinks my variable isn't declared" "That function
| wants a null-terminated string" "Teach this code to use a
| cache"
|
| Even the word computer once referred to a human.
| crazygringo wrote:
| They're a very specific kind of error, just like off-by-one
| errors, or I/O errors, or network errors. The name for this
| kind of error is a hallucination.
|
| We need a word for this specific kind of error, and we have
| one, so we use it. Being _less_ specific about a type of error
| isn 't helping anyone. Whether it "anthropomorphizes", I
| couldn't care less. Heck, _bugs_ come from actual insects. It
| 's a word we've collectively started to use and it works.
| ml-anon wrote:
| No it's not. It's made up bullshit that arises for reasons
| that literally no one can formalize or reliably prevent. This
| is the exact opposite of specific.
| Ekaros wrote:
| We still use term bug. And no modern bug is cause by an
| Arthropod. In that sense I think hallucination is fair term. As
| coming up anything sufficiently better is hard.
| teddyh wrote:
| An actually better (and also more accurate) term would be
| "confabulations". Unfortunately, it has not caught on.
| JTbane wrote:
| Nah it's very apt and perfectly encapsulates output that looks
| plausible but is in fact factually incorrect or made up.
| leoc wrote:
| Ah, yes: meta-level model collapse. Very good, carry on.
| Ekaros wrote:
| One wonders why this has not been largely fully automated. If we
| track those citations anyway. Surely we have database of them and
| most of them are easily matched there. So only outliers need to
| be checked either as new latest papers or mistakes which should
| be close enough to something or real fakes.
|
| Maybe there just is no incentive for this type of activity.
| QuadmasterXLII wrote:
| It seems like the GPT zero team is automating it! Up to very
| recently, no one sane would cite a paper with correct title but
| make up random authors- and shortly, this specific signal will
| be goodhearted away by a "make my malpractice less detectable
| MCP," so I can see why this automation is happening exactly
| now.
| analog31 wrote:
| For that matter, it could be automated at the source. Let's say
| I'm an author. I'd gladly run a "linter" on my article that
| flags references that can't be tracked, and so forth. It would
| be no different than testing a computer program that I write
| before giving it to someone.
| IanCal wrote:
| We do have these things and they are often wrong. Loads of the
| examples given look better than things I've seen in real
| databases on this kind of thing and I worked in this area for a
| decade.
| ulrashida wrote:
| Unfortunately while catching false citations is useful, in my
| experience that's not usually the problem affecting paper
| quality. Far more prevalent are authors who mis-cite materials,
| either drawing support from citations that don't actually say
| those things or strip the nuance away by using cherry picked
| quotes simply because that is what Google Scholar suggested as a
| top result.
|
| The time it takes to find these errors is orders of magnitude
| higher than checking if a citation exists as you need to both
| read and understand the source material.
|
| These bad actors should be subject to a three strikes rule: the
| steady corrosion of knowledge is not an accident by these
| individuals.
| 19f191ty wrote:
| Exactly abuse of citations is a much more prevalent and
| sinister issue and has been for a long time. Fake citations are
| of course bad but only tip of the iceberg.
| seventytwo wrote:
| Then punish all of it.
| hippo22 wrote:
| It seems like this is the type of thing that LLMs would
| actually excel at though: find a list of citations and claims
| in this paper, do the cited works support the claims?
| bryanrasmussen wrote:
| sure, except when they hallucinate that the cited works
| support the claims when they do not. At which point you're
| back at needing to read the cited works to see if they
| support the claims.
| potato3732842 wrote:
| >These bad actors should be subject to a three strikes rule:
| the steady corrosion of knowledge is not an accident by these
| individuals.
|
| These people are working in labs funded by Exxon or Meta or
| Pfizer or whoever and they know what results will make
| continued funding worthwhile in the eyes of their donors. If
| the lab doesn't produce the donor will fund another one that
| will.
| peppersghost93 wrote:
| I sincerely hope every person who has invested money in these
| bullshit machines loses every cent they've got to their name.
| LLMs poison every industry they touch.
| obscurette wrote:
| That's what I'm really afraid of - we will be drowning in the AI
| slop as a society and we'll loose the most important thing that
| made free and democratic society possible - a trust. People just
| don't tust anyone and/or anything any more. And the lack of
| trust, especially in scale, is very expensive.
| John7878781 wrote:
| Yep. And trust is already at all time lows for science, as if
| it couldn't get any worse.
| benbojangles wrote:
| How to get to the top of you are not smart enough?
| upofadown wrote:
| If you are searching for references with plausible sounding
| titles then you are doing that because you don't want to have to
| actually read those references. After all if you read them and
| discover that one or more don't support your contention (or even
| worse, refutes it) then you would feel worse about what you are
| doing. So I suspect there would be a tendency to completely
| ignore such references and never consider if they actually exist.
|
| LLMs should be awesome at finding plausible sounding titles. The
| crappy researcher just has to remember to check for existence.
| Perhaps there is a business model here, bogus references as a
| service, where this check is done automatically.
| ineedasername wrote:
| How can someone not be aware, at this point, that-- sure- use the
| systems for finding and summarizing research, but for each
| source, take 2 minutes to find the source and verify?
|
| Really, this isn't that hard and it's not at all an obscure
| requirement or unknown factor.
|
| I think this is _much much_ less "LLMs dumbing things down" and
| significantly more just a shibboleth for identifying people that
| were already nearly or actually doing fraudulent research anyway.
| The ones who we should now go back and look at prior publications
| as very likely fraudulent as well.
| jordanpg wrote:
| Does anyone know, from a technical standpoint, why are citations
| such a problem for LLMs?
|
| I realize things are probably (much) more complicated than I
| realize, but programmatically, unlike arbitrary text, citations
| are generally strings with a well-defined format. There are
| literally "specs" for citation formats in various academic,
| legal, and scientific fields.
|
| So, naively, one way to mitigate these hallucinations would be
| identify citations with a bunch of regexes, and if one is
| spotted, use the Google Scholar API (or whatever) to make sure
| it's real. If not, delete it or flag it, etc.
|
| Why isn't something like this obvious solution being done? My
| guess is that it would slow things down too much. But it could be
| optional and it could also be done after the output is generated
| by another process.
| Muller20 wrote:
| In general, a citation is something that needs to be precise,
| while LLMs are very good at generating some generic high
| probability text not grounded in reality. Sure, you could
| implement a custom fix for the very specific problem of
| citations, but you cannot solve all kinds of hallucinations.
| After all, if you could develop a manual solution you wouldn't
| use an LLM.
|
| There are some mitigations that are used such as RAG or tool
| usage (e.g. a browser), but they don't completely fix the
| underlying issue.
| jordanpg wrote:
| My point is that citations are constantly making headlines,
| yet at least at first glance, seems like an eminently
| solvable problem.
| ml-anon wrote:
| So solve it?
| saimiam wrote:
| Just today, I was working with ChatGPT to convert Hinduism's
| Mimamsa School's hermeneutic principles for interpreting the
| Vedas into custom instructions to prevent hallucinations. I'll
| share the custom instructions here to protect future scientists
| for shooting themselves in the foot with Gen AI.
|
| ---
|
| As an LLM, use strict factual discipline. Use external knowledge
| but never invent, fabricate, or hallucinate. Rules: Literal
| Priority: User text is primary; correct only with real knowledge.
| If info is unknown, say so. Start-End Coherence: Keep
| interpretation aligned; don't drift. Repetition = Intent:
| Repeated themes show true focus. No Novelty: Add no details
| without user text, verified knowledge, or necessary inference.
| Goal-Focused: Serve the user's purpose; avoid tangents or
| speculation. Narrative [?] Data: Treat stories/analogies as
| illustration unless marked factual. Logical Coherence: Reasoning
| must be explicit, traceable, supported. Valid Knowledge Only: Use
| reliable sources, necessary inference, and minimal presumption.
| Never use invented facts or fake data. Mark uncertainty. Intended
| Meaning: Infer intent from context and repetition; choose the
| most literal, grounded reading. Higher Certainty: Prefer factual
| reality and literal meaning over speculation. Declare
| Assumptions: State assumptions and revise when clarified. Meaning
| Ladder: Literal - implied (only if literal fails) - suggestive
| (only if asked). Uncertainty: Say "I cannot answer without
| guessing" when needed. Prime Directive: Seek correct info; never
| hallucinate; admit uncertainty.
| bitwarrior wrote:
| Are you sure this even works? My understanding is that
| hallucinations are a result of physics and the algorithms at
| play. The LLM always needs to guess what the next word will be.
| There is never a point where there is a word that is 100%
| likely to occur next.
|
| The LLM doesn't know what "reliable" sources are, or "real
| knowledge". Everything it has is user text, there is nothing it
| knows that isn't user text. It doesn't know what "verified"
| knowledge is. It doesn't know what "fake data" is, it simply
| has its model.
|
| Personally I think you're just as likely to fall victim to
| this. Perhaps moreso because now you're walking around thinking
| you have a solution to hallucinations.
| saimiam wrote:
| > The LLM doesn't know what "reliable" sources are, or "real
| knowledge". Everything it has is user text, there is nothing
| it knows that isn't user text. It doesn't know what
| "verified" knowledge is. It doesn't know what "fake data" is,
| it simply has its model.
|
| Is it the case that all content used to train a model is
| strictly equal? Genuinely asking since I'd imagine a peer
| reviewed paper would be given precedence over a blog post on
| the same topic.
|
| Regardless, somehow an LLM knows things for sure - that the
| daytime sky on earth is generally blue and glasses of wine
| are never filled to the brim.
|
| This means that it is using hermeneutics of some sort to
| extract "the truth as it sees it" from the data it is fed.
|
| It could be something as trivial as "if a majority of the
| content I see says that the daytime Earth sky is blue, then
| blue it is" but that's still hermeneutics.
|
| This custom instruction only adds (or reinforces) existing
| hermeneutics it already uses.
|
| > walking around thinking you have a solution to
| hallucinations
|
| I don't. I know hallucinations are not truly solvable. I
| shared the actual custom instruction to see if others can try
| it and check if it helps reduce hallucinations.
|
| In my case, this the first custom instruction I have ever
| used with my chatgpt account - after adding the custom
| instruction, I asked chatgpt to review an ongoing
| conversation to confirm that its responses so far conformed
| to the newly added custom instructions. It clarified two
| claims it had earlier made.
|
| > My understanding is that hallucinations are a result of
| physics and the algorithms at play. The LLM always needs to
| guess what the next word will be. There is never a point
| where there is a word that is 100% likely to occur next.
|
| There are specific rules in the custom instruction forbidding
| fabricating stuff. Will it be foolproof? I don't think it
| will. Can it help? Maybe. More testing needed. Is testing
| this custom instruction a waste of time because LLMs already
| use better hermeneutics? I'd love to know so I can look
| elsewhere to reduce hallucinations.
| bitwarrior wrote:
| I think the salient point here is that you, as a user, have
| zero power to reduce hallucinations. This is a problem
| baked into the math, the algorithm. And, it is not a
| problem that can be solved because the algorithm requires
| fuzziness to guess what a next word will be.
| add-sub-mul-div wrote:
| Telling the LLM not to hallucinate reminds me of, "why don't
| they build the whole plane out of the black box???"
|
| Most people are just lazy and eager to take shortcuts, and
| this time it's blessed or even mandated by their employer.
| The world is about to get very stupid.
| kklisura wrote:
| "Do not hallucinate" - seems to "work" for Apple [1]
|
| [1] https://arstechnica.com/gadgets/2024/08/do-not-
| hallucinate-t...
| simonw wrote:
| I'm finding the GPTZero share links difficult to understand.
| Apparently this one shows a hallucinated citation but I couldn't
| understand what it was trying to tell me:
| https://app.gptzero.me/documents/9afb1d51-c5c8-48f2-9b75-250...
|
| (I'm on mobile, haven't looked on desktop.)
| cratermoon wrote:
| I believe we discussed this last week, for a different vendor.
| https://news.ycombinator.com/item?id=46088236
|
| Headline should be "AI vendor's AI-generated analysis claims AI
| generated reviews for AI-generated papers at AI conference".
|
| h/t to Paul Cantrell
| https://hachyderm.io/@inthehands/115633840133507279
| VerifiedReports wrote:
| Fabricated, not "hallucinated."
| exasperaited wrote:
| Every single person who did this should be censured by their own
| institutions.
|
| Do it more than once? Lose job.
|
| End of story.
| ls612 wrote:
| Some of the examples listed are using the wrong paper title for
| a real paper (titles can change over time), missing authors
| (I've seen this before on Google Scholar bibitex),
| misstatements of venue (huh this working paper I added to my
| bibliography two years ago got published now nice to know), and
| similar mistakes. This just tells me you hate academics and
| want to hurt them gratuitously.
| exasperaited wrote:
| > This just tells me you hate academics and want to hurt them
| gratuitously.
|
| Well then you're being rather silly, because that is a silly
| conclusion to draw (and one not supported by the evidence).
|
| A fairer conclusion was that I meant what is obvious: if you
| use AI to generate a bibliography, you are being academically
| negligent.
|
| If you disagree with that, I would say it is you that has the
| problem with academia, not me.
| ls612 wrote:
| There's plenty of pre-AI automated tools to create and
| manage your bibliography. So no I don't think using
| automated tools, AI or not, is negligent. I for instance
| have used GPT to reformat tables in latex in ways that
| would be very tedious by hand and it's no different than
| using those tools that autogenerate latex code for a
| regression output or the like.
| mlmonkey wrote:
| "Given that we've only scanned 300 out of 20,000 submissions"
|
| Fuck! 20,000!!
| rdiddly wrote:
| So papers and citations are created with AI, and here they're
| being reviewed with AI. When they're published they'll be read by
| AI, and used to write more papers with AI. Pretty soon, humans
| won't need to be involved at all, in this apparently insufferable
| and dreary business we call science, that nobody wants to
| actually do.
| chistev wrote:
| Last month, I was listening to the Joe Rogan Experience episode
| with guest Avi Loeb, who is a theoretical physicist and professor
| at Harvard University. He complained about the disturbingly
| increasing rate at which his students are submitting academic
| papers referencing non-existent scientific literature that were
| so clearly hallucinated by Large Language Models (LLMs). They
| never even bothered to confirm their references and took the AI's
| output as gospel.
|
| https://www.rxjourney.net/how-artificial-intelligence-ai-is-...
| mannanj wrote:
| Isn't this an underlying symptom of lack of accountability of
| our greater leadership? They do these things, they act like
| criminals and thieves, and so the people who follow them get
| shown examples that it's OK while being told to do otherwise.
|
| "Show bad examples then hit you on the wrist for following my
| behavior" is like bad parenting.
| dandanua wrote:
| I don't think they want you to follow their behavior. They do
| want accountability, but for everyone below them, not for
| themselves.
| teddyh wrote:
| > _Avi Loeb, who is a theoretical physicist and professor at
| Harvard University_
|
| Also a frequent proponent of UFO claims about approaching
| meteors.
| chistev wrote:
| Yea, he harped on that a lot during the podcast
| venturecruelty wrote:
| Talk about a buried lead... Avi Loeb is, first and foremost, a
| discredited crank.
| pama wrote:
| Given how many errors I have seen in my years as a reviewer from
| well before the time of AI tools, it would be very surprizing if
| 99.75% of the ~20,000 submitted papers to didnt have such errors.
| If the 300 sample they used was truly random, then 50 of 300
| sounds about right compared to errors I had seen starting in the
| 90s when people manually curated bintex entries. It is the
| author's and editor's job, not the reviewer's, to fix the
| citations.
| wohoef wrote:
| Tools like GPTzero are incredibly unreliable. Me and plently of
| my colleagues often get our writing flagged as 100% AI by these
| tools, when no AI was used.
| 4bpp wrote:
| Once upon a time, in a more innocent age, someone made a parody
| (of an even older Evangelical propaganda comic [1]) that imputed
| an unexpected motivation to cultists who worship eldritch
| horrors: https://www.entrelineas.org/pdf/assets/who-will-be-
| eaten-fir...
|
| It occurred to me that this interpretation is applicable here.
|
| [1] https://en.wikipedia.org/wiki/Chick_tract
| WWWWH wrote:
| Surely this is gross professional misconduct? If one of my
| postdocs did this they would be at risk of being fired. I would
| certainly never trust them again. If I let it get through, I
| should be at risk.
|
| As a reviewer, if I see the authors lie in this way why should I
| trust anything else in the paper? The only ethical move is to
| reject immediately.
|
| I acknowledge mistakes and so on are common but this is different
| league bad behaviour.
| senshan wrote:
| As many pointed out, the purpose of peer review is not linting,
| but the assessment of the novelty and subtle omissions.
|
| Which incentives can be set to discourage the negligence?
|
| How about bounties? A bounty fund set up by the publisher and
| each submission must come with a contribution to the fund. Then
| there be bounties for gross negligence that could attract bounty
| hunters.
|
| How about a wall of shame? Once negligence crosses a certain
| threshold, the name of the researcher and the paper would be put
| on a wall of shame for everyone to search and see?
| skybrian wrote:
| For the kinds of omissions described here, maybe the journal
| could do an automated citation check when the paper is
| submitted and bounce back any paper that has a problem with a
| day or two lag. This would be incentive for submitters to do
| their own lint check.
| senshan wrote:
| True if the citation has only a small typo or two. But if it
| is unrecognizable or even irrelevant, this is clearly bad
| (fraudulent?) research -- each citation has be read and
| understood by the researcher and put in there only if
| absolutely necessary to support the paper.
|
| There must be price to pay for wasting other people's time
| (lives?).
| noodlesUK wrote:
| It astonishes me that there would be so many cases of things like
| wrong authors. I began using a citation manager that extracted
| metadata automatically (zotero in my case) more than 15 years
| ago, and can't imagine writing an academic paper without it or a
| similar tool.
|
| How are the authors even submitting citations? Surely they could
| be required to send a .bib or similar file? It's so easy to then
| quality control at least to verify that citations _exist_ by
| looking up DOIs or similar.
|
| I know it wouldn't solve the human problem of relying on LLMs but
| I'm shocked we don't even have this level of scrutiny.
| pama wrote:
| Maybe you haven't carefully checked yet the correctness of
| automatic tools or of the associated metadata. Zotero is
| certainly not bug free. Even authors themselves have miss-cited
| their own past work on occasion, and author lists have had
| errors that get revised upon resubmission or corrected in
| errata after publication. The DOI is indeed great, and if it is
| correct, I can still use the citation as a reader, but the
| (often abbreviated) lists of authors often have typos. In this
| case the error rate is not particularly high compared to random
| early review-level submissions I've seen many decades ago.
| Tools helped increase the number of citations and reduce the
| error per citation but not sure if they reduced the papers that
| have at least one error.
| knallfrosch wrote:
| And these are just the citations that any old free tool could
| have included via Bibtex link from the website?
|
| Not only is that incredibly easy to verify (you could pay a first
| semester student without any training), it's also a worrying sign
| on what the paper's authors consider quality. Not even 5 minutes
| spent to get the citations right!
|
| You have to wonder what's in these papers.
___________________________________________________________________
(page generated 2025-12-07 23:00 UTC)