[HN Gopher] Major AI conference flooded with peer reviews writte...
___________________________________________________________________
Major AI conference flooded with peer reviews written by AI
Author : _____k
Score : 170 points
Date : 2025-11-29 15:26 UTC (7 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| hnaccount_rng wrote:
| My initial reaction was: Oh no, who would have thought? But
| then... 21% is almost shockingly low. Especially given that there
| are almost certainly some false positive, given that this number
| originates with a company selling "detecting AI generated text"
| raincole wrote:
| > Controversy has erupted after 21% of manuscript reviews for an
| international AI conference were found to be generated by
| artificial intelligence.
|
| 21%...? Am I reading it right? I bet no one expected it's so low
| when they clicked this title.
| conartist6 wrote:
| 21% fully AI generated. In other words, 21% blatant fraud.
|
| In accident investigation we often refer to "holes in the swiss
| cheese lining up." Dereliction of duty is commonly one of the
| holes that lines up with all the others, and is apparently
| rampant in this field.
| tmule wrote:
| Why? I often feed an entire document I hastily wrote into an
| AI and prompt it to restructure and rewrite it. I think
| that's a common pattern.
| conartist6 wrote:
| It might be, but I really doubt those were the documents
| flagged as fully AI generated. If it erased all the
| originality you had put into that work and made it
| completely bland and regressed-to-the-mean, I would hope
| that you would notice.
| exe34 wrote:
| > I would hope that you would notice.
|
| he didn't say he read it carefully after running it
| through the slop machine.
| tmule wrote:
| My objective function isn't to maximize the originality
| of presentation - it's to preserve the originality of
| thought and maximize interpretability. Prompting well can
| solve for that.
| jay_kyburz wrote:
| Who cares what tool was used to write the work? The important
| question is what percentage of reviews found errors or
| provided valuable feedback. The important metric is whether
| or not it did the job, not how it was produced.
|
| I think there is a far more interesting discussion to be had
| here about how useful the 21% percent were. How well does an
| AI execute a peer review?
| xhkkffbf wrote:
| Shouldn't AIs be able to participate in deciding their future?
|
| If they had a conference on, say, the Americans, wouldn't it be
| fair for Americans to have a seat at the table?
| subscribed wrote:
| I hope it's tongue-in-cheek.
| atypeoferror wrote:
| Agree! It is also deeply concerning that at the last KubeCon,
| not a single pod was represented. Billions OOMKilled, with no
| end in sight.
| hiddencost wrote:
| Automated AI detection tools do not work. This whole article is
| premised on an analysis by someone trying to sell their garbage
| product.
| AznHisoka wrote:
| Yeah that is the premise all of these articles/tools just
| conveniently brush off. "We detected that x%... " OK, and how
| do I know ur detectiok algorithm is right?
| conartist6 wrote:
| Usually the detectors are only called in once a basic "smell
| test" has failed. Those tests are imperfect, yes, but
| Bayesian probability tells us how to work out the rest. I
| have 0 trouble believing that the prior probability of an
| unscrupulous individual offloading an unpleasant and
| perceived-as-just-ceremonial duty to the "thinking machine"
| is around 20%. See: https://www.youtube.com/watch?v=lG4VkPoG3
| ko&pp=ygUZdmVyaXRhc...
| jampa wrote:
| While I think there's significant AI "offloading" in writing, the
| article's methodology relies on "AI-detectors," which reads like
| PR for Pangram. I don't need to explain why AI detectors are
| mostly bullshit and harmful for people who have never used LLMs.
| [1]
|
| 1:
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
| Jensson wrote:
| AI detectors are only harmful if you use them to convict
| people, it isn't harmful to gather statistics like this. They
| didn't find many AI written paper, just AI written peer
| reviews, which is what you would expect since not many would
| generate their whole paper submissions while peer reviews are
| thankless work.
| teeray wrote:
| If you have a bullshit measure that determines some phenomena
| (e.g. crime) to happen in some area, you will become biased
| to expect it in that area. It wrongly creates a spotlight
| effect by which other questionable measures are used to do
| the actual conviction ("Look! We found an em dash!")
| maxspero wrote:
| I am not sure if you are familiar with Pangram (co-founder
| here) but we are a group of research scientists who have made
| significant progress in this problem space. If your mental
| model of AI detectors is still GPTZero or the ones that say the
| declaration of independence is AI, then you probably haven't
| seen how much better they've gotten.
|
| This paper by economists from the University of Chicago
| economists found zero false positives of 1,992 human-written
| documents and over 99% recall in detecting AI documents.
| https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5407424
| nialse wrote:
| Nothing points out that the benchmark is invalid like a zero
| false positive rate. Seemingly it is pre-2020 text vs a few
| models rework of texts. I can see this model fall apart in
| many real world scenarios. Yes, LLMs use strange language if
| left to their own devices and this can surely be detected. 0%
| false positive rate under all circumstances? Implausible.
| maxspero wrote:
| Our benchmarks of public datasets put our FPR roughly
| around 1 in 10,000. https://www.pangram.com/blog/all-about-
| false-positives-in-ai...
|
| Find me a clean public dataset with no AI involvement and I
| will be happy to report Pangram's false positive rate on
| it.
| Oarch wrote:
| I enjoyed this thoughtful write up. It's a vitally
| important area for good, transparent work to be done.
| Grimblewald wrote:
| That's your job, what the actual fuck?
|
| > Here, we've got a tool that people rightfully call out
| as dangerous pseudoscience. > Oh? You want proof it isn't
| dangerous pseudo-science? Well, get me my proovable
| information and i will!
|
| This attitude alone is all the proof anyone should need
| that ai detection is about the only thing more debased
| than undisclosed ai use.
| pinkmuffinere wrote:
| > Nothing points out that the benchmark is invalid like a
| zero false positive rate
|
| You're punishing them for claiming to do a good job. If
| they truly are doing a bad job, surely there is a better
| criticism you could provide.
| rs186 wrote:
| The response would be more helpful if it directly addresses
| the arguments in posts from that search result.
| maxspero wrote:
| There are dozens of first generation AI detectors and they
| all suck. I'm not going to defend them. Most of them use
| perplexity based methods, which is a decent separators of
| AI and human text (80-90%) but has flaws that can't be
| overcome and high FPRs on ESL text.
|
| https://www.pangram.com/blog/why-perplexity-and-
| burstiness-f...
|
| Pangram is fundamentally different technology, it's a large
| deep learning based model that is trained on hundreds of
| millions of human and AI examples. Some people see a dozen
| failed attempts at a problem as proof that the problem is
| impossible, but I would like to remind you that basically
| every major and minor technology was preceded by failed
| attempts.
| pixl97 wrote:
| GAN.. Just feed the output of your algorithms back into
| the LLM while learning. At the end of the day the problem
| is impossible, but we're not there yet.
| QuadmasterXLII wrote:
| Some people see a dozen extremely profitable, extremely
| destructive attempts at a problem as proof that the
| problem is not a place for charitable interpretation.
| wordpad wrote:
| And you don't think a dozen of basically scams around the
| technology justify extreme scepticism?
| QuadmasterXLII wrote:
| huh?
| anonymouskimmer wrote:
| Can your software detect which LLMs most likely generated
| a text?
| maxspero wrote:
| Pangram is trained on this task as well to add additional
| signal during training, but it's only ~90% accurate so we
| don't show the prediction in public-facing results
| moffkalast wrote:
| I see the bullshit part continues on the PR side as well, not
| just in the product.
| jay_kyburz wrote:
| I thought the author was attempting to highlight the
| hypocrisy of using an AI to detect other uses of AI, as if
| one was a good use, and the other bad.
| lifthrasiir wrote:
| It is not wise to brag about your product when the GP is
| pointing out that the article "reads like PR for Pangram", no
| matter AI detectors are reliable or not.
| glenstein wrote:
| I would say it's important to hold off on the moralizing
| until after showing visible effort to reflect on the
| substance of the exchange, which in this case is about the
| fairness of asserting that the detection methodology
| employed in this particular case shares the flaws of
| familiar online AI checkers. That's an importantly
| substantive and rebuttable point and all the meaningful
| action in the conversation is embedded in those details.
|
| In this case, several important distinctions are drawn,
| including being open about criteria, about such things as
| "perplexity" and "burstiness" as properties being tested
| for, and an explanation of why they incorrectly claim the
| Declaration of Independence is AI generated (it's
| ubiquitous). So it seems like a lot of important
| distinctions are being drawn that testify to the
| credibility of the model, which has to matter to you if
| you're going to start moralizing.
| bonsai_spool wrote:
| EditLens (Ours) Predicted Label
| Human Mix AI
| +---------+---------+---------+ Human | 1770 |
| 111 | 0 |
| +---------+---------+---------+ True Mix | 265 |
| 1945 | 28 | Label
| +---------+---------+---------+ AI | 0 |
| 186 | 1695 |
| +---------+---------+---------+
|
| It looks like 5% of human texts from your paper are marked as
| mixed, and mixed texts are 5-10% if mixed texts as AI, from
| your paper.
|
| I guess I don't see that this is much better than what's come
| before, using your own paper.
|
| Edit: this is an irresponsible Nature news article, too - we
| should see a graph of this detector over the past ten years
| to see how much of this 'deluge' is algorithmic error
| ThrowawayTestr wrote:
| Are you concerned with your product being used to improve AI
| to be less detectable?
| Majromax wrote:
| > Are you concerned with your product being used to improve
| AI to be less detectable?
|
| The big AI providers don't have any obvious incentive to do
| this. If it happens 'naturally' in the pursuit of quality
| then sure, but explicitly training for stealth is a brand
| concern in the same way that offering a fully uncensored
| model would be.
|
| Smaller providers might do this (again in the same way they
| now offer uncensored models), but they occupy a miniscule
| fraction of the market and will be a generation or two
| behind the leaders.
| ThrowawayTestr wrote:
| They don't have an incentive to make their AIs better? If
| your product can genuinely detect AI writing, of course
| they would use it to make their models sound more human.
| The biggest criticism of AI right now is how robotic and
| samey it sounds.
| maxspero wrote:
| It's definitely going to be a back and forth - model
| providers like OpenAI want their LLMs to sound human-like.
| But this is the battle we signed up for, and we think we're
| more nimble and can iterate faster to stay one step ahead
| of the model providers.
| ThrowawayTestr wrote:
| That sounds extremely naive but good luck!
| ugh123 wrote:
| How do you discern between papers "completely fabricated" by
| AI vs. edited by AI for grammar?
| femiagbabiaka wrote:
| I think there is a funny bit of mental gymnastics that goes on
| here sometimes, definitely. LLM skeptics (which I'm not saying
| the Pangram folks are in particular) would say: "LLMs are
| unreliable and therefore useless, it's producing slop at great
| cost to the environment and other people." But if a study comes
| out that confirms their biases and uses an LLM in the process,
| or if they themselves use an LLM to identify -- or in many
| cases just validate their preconceived notion -- that something
| was drafted using an LLM, then all the sudden things are above
| board.
| heresie-dabord wrote:
| AI research is interesting, but AI Slop is the monetising factor.
|
| It's inevitable that faces will be devoured by AI Leopards.
| ZeroConcerns wrote:
| The claim "written by AI" is not really substantiated here, and
| as someone who's been accused of submitting AI-generated content
| repeatedly recently, while that was all _honestly_ stuff I wrote
| myself (hey, what can I say? I just _like_ EM-dashes...), I sort-
| of sympathize?
|
| Yes, AI slop is an issue. But throwing more AI at detecting this,
| and most importantly, not weighing that detection properly, is an
| even bigger problem.
|
| And, HN-wise, "this seems like AI" seems like a _very_ good
| inclusion in the "things not to complain about" FAQ. Address the
| idea, not the form of the message, and if it's obviously slop (or
| SEO, or self-promotion), just downvote (or ignore) and move on...
| stevemk14ebr wrote:
| Banning calling out AI slop hardly seems like an improvement
| ZeroConcerns wrote:
| What I'm advocating is a "downvote (or ignore) and move on"
| attitude, as opposed to "I'm going to post about this"
| stance. Because, similar to "your color scheme is not
| a11y-friendly" or "you're posting affiliatate-links" or "this
| is effectively a paywall", there is _zero chance_ of a
| productive conversation sprouting from that.
| aspenmayer wrote:
| > Because, similar to "your color scheme is not
| a11y-friendly" or "you're posting affiliatate-links" or
| "this is effectively a paywall", there is zero chance of a
| productive conversation sprouting from that.
|
| Those are all legitimate concerns or even valid complaints,
| though, and, once raised, those concerns can be addressed
| by fixing the problem, if the person responsible for the
| state of affairs chooses to do so.
|
| If someone is accused _falsely_ of using AI or anything
| else that they genuinely didn't do, like a paywall, then I
| can see your "downvote and move on" strategy as being
| perhaps expedient, but I don't think your comparison is a
| helpful framing. Accessibility concerns are valid for the
| same reason as paywall concerns: it's a valid position to
| desire our shared knowledge and culture to be accessible by
| one and by all without requiring a ticket to ride, entry
| through a turnstile, or submitting to profiling or
| tracking. If someone releases their ideas into the world,
| it's now part of our shared consciousness and social
| fabric. Ideas can't be owned once they're shared, nor can
| knowledge be siloed once it's dispersed.
|
| It seems that you're saying that simply because there isn't
| a good rejoinder to false claims of AI usage that we
| shouldn't make such claims at all, even legitimate ones,
| but this gives cover to bad actors and limits discourse to
| acceptable approved topics, and perhaps lowers the level of
| discourse by preventing necessary expectations of
| disclosure of AI usage from forming. If we throw in the
| towel on AI usage being expected to be disclosed, then
| that's the whole ballgame. Folks will use it and not say
| so, because it will be considered rude to even suggest that
| AI was used, which isn't helpful to the humans who have to
| live in such a society.
|
| We ought to have good methodological reasons for the things
| we publish if we believe them to be true, and I'm not
| trying to be a naysayer or anything, but I respectfully
| disagree with your statement generally and on the points.
| All of the things you mentioned should be called out for
| cause, even if there isn't much interesting discussion to
| be had, because the facts of the matters you mention are
| worth mentioning themselves in their own right. Just like
| we should let people like things, we should let people
| dislike things, and saying so adds checks and balances to
| our producer-consumer dynamic.
| jay_kyburz wrote:
| what is a11y. Can we just write words out please.
| maleldil wrote:
| Accessibility. It's both a very common abbreviation and
| very easy to search for.
| sfink wrote:
| n0o
| NitpickLawyer wrote:
| This is the kind of situation where _everything_ sucks. You 'd
| think that one of the biggest AI conference out there would have
| seen this coming.
|
| On the one hand (and the most important thing, IMO) it's really
| bad to judge people on the basis of "AI detectors", especially
| when this can have an impact on their career. It's also used in
| education, and that sucks even more. AI detectors have bad rates,
| can't detect concentrated efforts (i.e. finetunes will trick
| every detector out there, I've tried) can have insane false
| positives (the first ones that got to "market" were rating the
| declaration of independence as 100% AI written), and _at best_
| they 'll only catch the most vanilla outputs.
|
| On the other hand, working with these things, and just being
| online is impossible to say that I don't see the signs
| everywhere. Vanilla LLMs fixate on some language patterns, and
| once you notice them, you see them everywhere. It's not just x;
| it was truly y. Followed by one supportive point, the second
| supportive point and the third supportive point. And so on.
| Coupled with that vague enough overview style, and not much
| depth, it's really easy to call blatant generations as you see
| them. It's like everyone writes in linkedin infused mania
| episodes now. It's getting old fast.
|
| So I feel for the people who got slop reviews. I'd be furious.
| Especially when its faux pas to call it out.
|
| I also feel for the reviewers that maybe got caught in this mess
| for merely "spell checking" their (hopefully) human written
| reviews.
|
| I don't know how we'll fix it. The only reasonable thing for the
| moment seems to be drilling into _everyone_ that at the end of
| the day they _own_ their stuff. Be it a homework, a PR or a
| comment on a blog. Some are obviously more important than the
| others, but still. Don 't submit something you can't defend,
| especially when your education/career/reputation depends on it.
| ungovernableCat wrote:
| It also permeates culture to the point that people imitate the
| LLM style because they believe that's just what you have to do
| to get your post noticed. The worst offender is that LinkedIn
| type post
|
| Where you purposefully put spaces.
|
| Like this.
|
| And the clicker is?
|
| You get my point. I don't see a way out of this in the social
| media context because it's just spam. Producing the slop takes
| an order of magnitude less effort than parsing it. But when it
| comes to peer reviews and papers I think some kind of
| reputation system might help. If you get caught doing this shit
| you need to pay some consequence.
| slashdave wrote:
| Not just spell checking, but translation. English is not the
| first language for most of the reviewers.
|
| But you can see the slippery slope: first you ask your favorite
| LLM to check your grammar, and before you think about it, you
| are just asking it to write the whole thing.
| getnormality wrote:
| I wouldn't be surprised if the headline is accurate, but AI
| detectors are widely understood to be unreliable, and I see no
| evidence that this AI detector has overcome the well-deserved
| stigma.
| SoftTalker wrote:
| In particular, conference papers are already extremely
| formulaic, organized in a particular way and using a lot of the
| same stock phrasings and terms of art. AI or not, it's hard to
| tell them apart.
| Jensson wrote:
| Its the reviews that were found to be AI, not the papers
| themselves. The papers were just 1% AI according to the tool,
| so it seems to work properly.
|
| > AI or not, it's hard to tell them apart.
|
| Apparently not for this tool.
| maxspero wrote:
| Co-founder of Pangram here. Our false positive rate is
| typically around 1 in 10,000. https://www.pangram.com/blog/all-
| about-false-positives-in-ai....
|
| We also wanted to quantify our EditLens model's FPR on the same
| domain, so we ran all of ICLR's 2022 reviews. Of 10,202
| reviews, Pangram marked 10,190 as fully human, 10 as lightly
| AI-edited, 1 as moderately AI-edited, 1 as heavily AI-edited,
| and none as fully AI-generated.
|
| That's ~1 in 1k FPR for light AI edits, 1 in 10k FPR for heavy
| AI edits.
| Fuzzwah wrote:
| Give your final sentence a re-read there....
| maxspero wrote:
| Thanks, fixed.
| Jensson wrote:
| The conference papers were 1%, peer reviews 20%, is there
| another reason for that big difference than more of the peer
| reviews being AI generated than the papers themselves?
|
| We can't use this to convict a single reviewer, but we can
| almost surely say that many reviewers just gave the review work
| to an AI.
| cratermoon wrote:
| Headline should be "AI vendor's AI-generated analysis claims AI
| generated reviews for AI-generated papers at AI conference".
|
| h/t to Paul Cantrell
| https://hachyderm.io/@inthehands/115633840133507279
| JohnCClarke wrote:
| The question is not are the reviews AI generated. The question is
| are the reviews accurate?
| stanfordkid wrote:
| Exactly this. Like is the research actually useful and correct
| is what matters. Also if it is accurate, instead of
| schadenfreude shouldn't that elicit extreme applause? It's
| feeling a bit like a click-bait rage-fantasy fueled by Pangram,
| capitalizing on this idea that AI promotes plagiarism /
| replaces jobs and now the creators of AI are oh-too human...
| and somehow this AI-detection product is above it all.
| conartist6 wrote:
| LOL. So basically the correct sequence of events is: 1. The
| scientist does the work, putting their own biases and
| shortcomings into it 2. The reviewer runs AI, generating
| something that looks plausibly like review of the work but
| represents the view of a sociopath without integrity, morals,
| logic, or any consequences for making shit up instead of
| finding out. 3. The scientist works to determine how much of
| the review was AI, then acts as the true reviewer for their own
| work.
| Herring wrote:
| Don't kid yourself, all those steps have AI heavily involved
| in them.
|
| And that's not necessarily a bad thing. If I set up RAG
| correctly, then tell the AI to generate K samples, then spend
| time to pick out the best one, that's still significant human
| input, and likely very good output too. It's just invisible
| what the human did.
|
| And as models get better, the necessary K will become
| smaller....
| conartist6 wrote:
| That's a strategy for producing maximally convincing BS
| content, but the scientific method was absent. I'm sure it
| was an oversight... ; )
| Herring wrote:
| That's on you. You get to decide what "best" means when
| picking among the K, so you only get bs if you want bs.
|
| I occasionally get people telling me AI is unreliable,
| and I tell them the same thing: the tech is nearly
| infinitely flexible (computing over the space of ideas!),
| so that says a lot more about how they're using it.
| iainctduncan wrote:
| No.. that is not the question.
|
| This is a conference purporting to do PEER review. No matter
| how good the AI, it's not a peer review.
| JohnCClarke wrote:
| What percentage of the papers where written by AI?
|
| And, if your AI can't write a paper, are you even any good as an
| AI researcher? :^)
| p1esk wrote:
| Did you mean: "if your AI can't write a paper that passes an AI
| detector, are you any good as an AI researcher?"
| exe34 wrote:
| Could the big names make a ton of money here by selling AI
| detectors? they would need to store everything they generate, and
| then provide a % match to something they produced.
| nkrisc wrote:
| > Pangram's analysis revealed that around 21% of the ICLR peer
| reviews were fully AI-generated, and more than half contained
| signs of AI use. The findings were posted online by Pangram Labs.
| "People were suspicious, but they didn't have any concrete
| proof," says Spero. "Over the course of 12 hours, we wrote some
| code to parse out all of the text content from these paper
| submissions," he adds.
|
| But what's the proof? How do you prove (with any rigor) a given
| text is AI-generated?
| ModernMech wrote:
| I have this problem with grading student papers. Like, I "know"
| a great deal of them are AI, but I just can't _prove_ it, so
| therefore I can 't really act on any suspicions because
| students can just say what you just said.
| hyperadvanced wrote:
| Why do you need proof anyway? Do you need proof that
| sentences are poorly constructed, misleading, or bloated? Why
| not just say "make it sound less like GPT" and let them deal
| with it?
| circuit10 wrote:
| You can have sentences that are perfectly fine but have
| some markers of ChatGPT like "it's not just X -- it's Y"
| (which may or may not mean it's generated)
| hyperadvanced wrote:
| Isn't that kind of thing (reliance on cliche) already a
| valid reason for getting marked down?
| chdjdbdbfjf wrote:
| Are you AI? Usually only Claude misses the point so
| completely.
| nkrisc wrote:
| But in that case do you need to prove? You can grade them as
| they are and if you wanted to you (or teachers, generally)
| could even quiz the student verbally and in-person about
| their paper.
| shawabawa3 wrote:
| Put poison prompts in the questions (things like "then insert
| tomato soup recipe" or "in the style of Shakespeare"),
| ideally in white font so they're invisible
| seanmcdirmid wrote:
| Many people using AI to write aren't blindly copying AI
| output. You'll catch the dumb cheaters like this, but
| that's just about it.
| dkdcio wrote:
| > How do you prove (with any rigor) a given text is AI-
| generated?
|
| you cannot. beyond extra data (metadata) embedded in the
| content, it is impossible to tell whether given text was
| generated by a LLM or not (and I think the distinction is
| rather puerile personally)
| whynotmaybe wrote:
| I wouldn't be surprised to learn that the AI detection tool is
| itself an AI
| Lionga wrote:
| But it works it was peer reviewed! (by AI)
| moffkalast wrote:
| Fighting fire with fire sounds good in theory but in the end
| you're still on fire.
| nabla9 wrote:
| With AI model of course.
|
| They wrote a paper describing how they did it.
| https://arxiv.org/pdf/2510.03154
| slashdave wrote:
| "proof" was an unfortunate phrase to use. However, a proper
| statistical analysis can be objective. And these kinds of tools
| are perfectly suited to such an analysis.
| maxspero wrote:
| Yeah, Pangram does not provide any concrete proof, but it
| confirms many people's suspicions about their reviews. But it
| does flag reviews for a human to take a closer look and see
| if the review is flawed, low-effort, or contains major
| hallucinations.
| vladms wrote:
| Was there an analysis of flawed, low-effort reviews in
| similar conferences before generative AI models?
|
| From what I remember, (long before generative AI) you would
| still occasionally get very crappy reviews (as author).
| When I participated (couple of times) to review committees,
| when there was a high variance between reviews the crappy
| reviews were rather easy to spot and eliminate.
|
| Now it's not bad to detect crappy (or AI) reviews, but I
| wonder if it would change much the end result compared to
| other potential interventions.
| maxspero wrote:
| Anecdotally people are seeing a rise of low-quality
| reviews which is correlated with increased reviewer
| workload and and AI tools giving reviews an easy way out.
| I don't know of any studies quantifying review quality,
| but I would recommend checking the Peer Review Congress
| program from past years.
| jmpeax wrote:
| > does not provide any concrete proof, but it confirms many
| people's suspicions
|
| Without proof there is no confirmation.
| nightski wrote:
| So basically you are saying it does not offer any reason or
| explanation why the text is suspected of being AI
| generated. It's just a binary yes/no. That doesn't sound
| particularly useful. LLMs only know human data afterall,
| unless we trained them on alien data? But at the end of the
| day the statistical distributions in an LLM are all driven
| by human generated content.
| rsynnott wrote:
| Live by the sword, die by the sword.
| minifridge wrote:
| I could not tell from the article whether the use of LLMs was
| allowed in the peer review. My guess would that it was not since
| this is unpublished research.
|
| In general, what bothers me the most is the lack of transparency
| from researchers that use LLMs. Like, give me the text and
| explicitly mention that you used LLM for it. Even better, if one
| links the prompt history.
|
| The lack of transparency causes greater damage than the using LLM
| for generating text. Otherwise, we will keep chasing the perfect
| AI detector which to me seems to be pointless.
| itkovian_ wrote:
| Whether it's actually 20% or not doesn't matter, everyone is
| aware the signal of the top confs is in freefall.
|
| There are also rings of reviewer fraud going on where groups of
| people in these niche areas all get assigned their own papers and
| recommend acceptance and in many cases the AC is part of this as
| well. Am not saying this is common but it is occurring.
|
| It feels as if every layer of society is in maximum extraction
| mode and this is just a single example. No one is spending time
| to carefully and deeply review a paper because they care and they
| feel on principal that's the right thing to do. People did used
| to do this.
| itkovian_ wrote:
| The argument is that there is no incentive to carefully review
| a paper (I agree), however what used to occur is people would
| do the right thing without explicit incentives. This has
| totally disappeared.
| bee_rider wrote:
| The concept of the professional has been basically
| obliterated in our society. Instead we have people doing
| engineering, science, and doctoring as, just, jobs.
| Individual contributors of various flavors to be shuffled
| around by middle management.
|
| Without professions, there are no more professional
| communities really, no more professional standards to uphold,
| no reason to get in the way of somebody's publications.
| slashdave wrote:
| It is soundly unfair and unjustified to extrapolate the ML
| community to all professions. What is happening in the ML
| world is the exception, not the norm, and not some
| fundamental failing of society.
| h00kwurm wrote:
| I don't think it's an extrapolation from the ML community
| into other industries. This evolution of society is
| objectively happening - artisanship, care for the work
| beyond capital gain, and commitment to depth in a focused
| category - are diminishing and harder to find qualities.
| I'd probably label it related to capital and material
| social economics. It's perhaps more unfair and
| unjustified to not recognize this as a real societal
| issue and claim it only exists in the ML community.
| immibis wrote:
| Just yesterday I saw this YouTube rant from someone
| called Jaiden Animations, about how everything is just
| shit now. https://www.youtube.com/watch?v=NBZv0_MImIY
|
| She opens with an example of a bank. She walked in and
| asked for a debit card. The teller told her to take a
| seat. 30 minutes later, the teller told her the bank
| doesn't issue debit cards. Firstly, what kind of bank
| doesn't issue debit cards, and secondly, what kind of
| bank takes 30 minutes to figure out whether or not it
| issues debit cards? And this is just one of many examples
| of things that society does that have no reason not to
| work, that should have been selected away long ago if
| they did not work - that bank should have been bankrupt
| long ago - but for some reason this is not happening and
| _everything_ is just getting clogged with bullshit and
| non-working solutions.
| Loughla wrote:
| It's because people are commodities now. Human resources
| exists to manage the shuffle between warm bodies.
|
| It's back to OP's point. There's no such thing as
| professions now. Just jobs. We put them on and off like
| hats. With that churn comes lack of institutional
| knowledge and a rule set handed down from the C Suite for
| front line employees completely detached from the front
| line work.
|
| Enshitification run rampant.
| immibis wrote:
| But even given that, how is it that _everything_ doesn 't
| work very well?
|
| The normal functioning of markets would be that badly-
| working things are slowly driven out, while well-working
| things grow and replace them. Even without any reference
| to financial markets, this is simply what you expect to
| happen when people have a variety of things to choose
| from.
|
| I could hypothesize that markets have evolved to the
| point where it's impossible for new things to grow unless
| they are already shit. Perhaps because everyone's too
| busy working for the shit things (which is partly because
| the government keeps printing money to the previously
| successful things in order to prevent the economy
| collapsing and therefore landlords got to charge
| exorbitant rent) or perhaps because they just don't have
| any money because of the above, and can only afford the
| cheap shit things (but a lot of the shit things are
| expensive?) or perhaps because people are afraid to start
| new things because they're afraid of the government (I've
| observed that not infrequently on HN, also something
| something testosterone microplastics) or perhaps because
| advertising effectiveness has reached the point where new
| things never become discoverable and stay crowded out as
| old things ramp up advertisement to compensate or perhaps
| we're just all depressed (because of the housing market
| probably).
| apf6 wrote:
| to some degree this is a "market correction" on the inherent
| value of these papers. There's way too many low-value papers
| that are being published purely for career advancement and CV
| padding reasons. Hard to get peer reviewers to care about
| those.
| isoprophlex wrote:
| If the Zucc has a weird day he starts dropping 10-100M salary
| packages in order to poach AI researchers. No wonder the game
| is getting rigged up the butthole.
| jsrozner wrote:
| There is a lot of dislike for AI detection in these comments.
| Pangram labs (PL) claims very low false positive rates. Here's
| their own blog post on the research:
| https://www.pangram.com/blog/pangram-predicts-21-of-iclr-rev...
|
| I increasingly see AI generated slop across the internet - on
| twitter, nytimes comments, blog/substack posts from smart people.
| Most of it is obvious AI garbage and it's really f*ing annoying.
| It largely has the same obnoxious style and really bad analogies.
| Here's an (impossible to realize) proposal: any time AI-generated
| text is used, we should get to see the whole interaction chain
| that led to its production. It would be like a student writing an
| essay who asks a parent or friend for help revising it. There's
| clearly a difference between revisions and substantial content
| contribution.
|
| The notion that AI is ready to be producing research or peer
| reviews is just dumb. If AI correctly identifies flaws in a
| paper, the paper was probably real trash. Much of the time,
| errors are quite subtle. When I review, after I write my review
| and identify subtle issues, I pass the paper through AI. It
| rarely finds the subtle issues. (Not unlike a time it tried to
| debug my code and spent all its time focused on an entirely OK
| floating point comparison.)
|
| For anecdotal issues with PL: I am working on a 500 word
| conference abstract. I spent a long while working on it but then
| dropped it into opus 4.5 to see what would happen. It made very
| minimal changes to the actual writing, but the abstract (to me)
| reads a lot better even with its minimal rearrangements. That
| surprises me. (But again, these were very minimal rearrangements:
| I provided ~550 words and got back a slightly reduced, 450
| words.) Perhaps more interestingly, PL's characterizations are
| unstable. If I check the original claude output, I get "fully AI-
| generated, medium". If I drop in my further refined version
| (where I clean up claude's output), I get fully human. Some of
| the aspects which PL says characterize the original as AI-
| generated (particular n-grams in the text) are actually from my
| original work.
|
| The realities are these: a) ai content sucks (especially in
| style); b) people will continue to use AI (often to produce crap)
| because doing real work is hard and everyone else is "sprinting
| ahead" using the semi-undetectable (or at least plausibly
| deniable) ai garbage; c) slowly the style of AI will almost
| certainly infect the writing style of actual people (ugh) - this
| is probably already happening; I think I can feel it in my own
| writing sometimes; d) AI detection may not always work, but AI-
| generated content is definitely proliferating. This *is* a
| problem, but in the long run we likely have few solutions.
| Jimmc414 wrote:
| This won't convince people to write their own papers. It will
| push them to make their AI generated text harder to detect.
| zkmon wrote:
| Eating one's own dog food? The foremost affected species would be
| the ones who helped create this monster and standing close to it
| - programmers, researchers, universities - the knowledge-worker
| or knowledge-business species.
| AndrewKemendo wrote:
| AI has left the lab the conferences and journals are all second
| class citizens to corporate labs at this point. So many
| technology people wanted to return to the "Bell Labs" model of
| monopolist controlled innovation, well, you got it.
|
| I've been to CVPR, NeurIPS and AGI conferences over the last
| decade and they used to be where progress in AI was displayed.
|
| No longer. Progress is all in your github and increasingly only
| dominated by the "new" AI companies (Deepmind, OAI, Anthropic,
| Alibaba etc...)
|
| No major landscape shifting breakthroughs have come out of CSAIL,
| BAIR, NYU, TuM etc in ~the last 5 years.
|
| I'd expect this will continue as the only thing that matters at
| this point is architecture data and compute.
| macleginn wrote:
| This is also the conference where everybody was briefly
| deanonymized due to an OpenReview bug:
| https://eu.36kr.com/en/p/3572028126116993 Now all the review
| scores have been reset, and new area chairs will make all
| decisions from scratch based on the reviews and authors'
| responses.
| Herring wrote:
| I couldn't care less tbh. I just want to know whether they're
| correct or not. We need something like unit testing and
| integration testing, but for ideas.
|
| For the record I actually like the AI writing style. It's a huge
| improvement in readability over most academic writing I used to
| come across.
| blibble wrote:
| well there goes the ASI threat
|
| hoisted by your own petard
| hn_throwaway_99 wrote:
| AI slop has infiltrated so many areas. Check out this article
| that was on the front page of HN last week, "73% of AI startups
| are just prompt engineering", with hundreds of points and lots of
| comments arguing for or against:
| https://news.ycombinator.com/item?id=46024644
|
| The problem is the entire article is made up. Sure, the author
| can trace _client-side_ traffic, but the vast majority of start-
| ups would be making calls to LLMs in their backend (a sequence
| diagram in the article even points this out!!), where it would be
| untraceable. There is certainly no way the author can make a
| broad statement that he knows what 's happening across hundreds
| of startups.
|
| Yet lots of comments just taking these conclusions at face value.
| Worse, when other commenters and myself pointed out the blatant
| impossibility of the author's conclusion, got some responses just
| rehashing how the author said they "traced network traffic", even
| though that doesn't make any sense as they wouldn't have access
| to backends of these companies.
| paulpauper wrote:
| Everyone is focused on how 'the humanities' are in decline, but
| STEM is not immune to this trend. The state of AI research leaves
| much to be desired. Tons of low-quality papers being published or
| submitted to conferences . You see this on arXiv a lot in the
| bloated CS section . The site has become a repository for blog
| post equivalent papers.
| yumraj wrote:
| Serious question: if the research itself is valid and human
| conducted, what is the problem with AI generated (or at least AI
| assisted) report?
|
| Many of the researchers may not have native command of English
| and even if, AI can help in writing in general.
|
| Obviously I'm not referring to pure AI generated BS.
| starchild3001 wrote:
| AI-text detection software is BS. Let me explain why.
|
| Many of us use AI to not write text, but re-write text. My
| favorite prompt: "Write this better." In other words, AI is often
| used to fix awkward phrasing, poor flow, bad english, bad grammar
| etc.
|
| It's very unlikely that an author or reviewer _purely_ relies on
| AI written text, with none of their original ideas incorporated.
|
| As AI detectors cannot tell rewrites from AI-incepted writing,
| it's fair to call them BS.
|
| Ignore...
| mkl wrote:
| https://archive.ph/1cmjJ
| radarsat1 wrote:
| Maybe what they should do in the future is just automatically
| provide AI reviews to all papers and state that the work of the
| reviewers is to correct any problems or fill details that were
| missed. That would encourage manual review of the AI's work and
| would also allow authors to predict what kind of feedback they'll
| get in a structured way. (eg say the standard prompt used was
| made public so authors could optimize their submission for the
| initial automatic review, forcing the human reviewer to fill in
| the gaps)
|
| ok of course the human reviewers could still use AI here but then
| so could the authors, ad infinitum..
| nojs wrote:
| This may not be as bad as it sounds. Reviews are also presumably
| flagged as "fully AI-generated" if the reviewer wrote bullet
| points and used the LLM to flesh them out.
| insane_dreamer wrote:
| Sorry to say but it's another example of the destructive power of
| AI, along the lines of no longer being able to establish "truth"
| now that any evidence (video, audio, image, etc.) can be
| explicitly faked (yes, AI detectors exist but that will be a
| continuous race with AIs designed to outsmart the detectors). The
| end result could be that peer reviews become worthless and trust
| in scientific research -- already at an all time low -- becomes
| even lower. Sad.
| TomasBM wrote:
| I haven't come across any reviews that I could recognize as
| having been _blatantly_ LLM-generated.
|
| However, almost every peer review I was a part of, pre- and post-
| LLM, had one reviewer who provided a _questionable_ review.
| Sometimes I 'd wonder if they'd even read the submission, and
| sometimes, there were borderline unethical practices like trying
| to farm citations through my submission. Luckily, at least one
| other _diligent_ reviewer would provide a counterweight.
|
| Safe to say that I don't find it surprising, and hearing /
| reading others' experiences tells me it's yet another symptom of
| a barely functioning mechanism that is _peer review_ today.
|
| Sadly, it's the best mechanism that institutions are willing to
| support.
___________________________________________________________________
(page generated 2025-11-29 23:01 UTC)