[HN Gopher] Major AI conference flooded with peer reviews writte...
       ___________________________________________________________________
        
       Major AI conference flooded with peer reviews written by AI
        
       Author : _____k
       Score  : 170 points
       Date   : 2025-11-29 15:26 UTC (7 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | hnaccount_rng wrote:
       | My initial reaction was: Oh no, who would have thought? But
       | then... 21% is almost shockingly low. Especially given that there
       | are almost certainly some false positive, given that this number
       | originates with a company selling "detecting AI generated text"
        
       | raincole wrote:
       | > Controversy has erupted after 21% of manuscript reviews for an
       | international AI conference were found to be generated by
       | artificial intelligence.
       | 
       | 21%...? Am I reading it right? I bet no one expected it's so low
       | when they clicked this title.
        
         | conartist6 wrote:
         | 21% fully AI generated. In other words, 21% blatant fraud.
         | 
         | In accident investigation we often refer to "holes in the swiss
         | cheese lining up." Dereliction of duty is commonly one of the
         | holes that lines up with all the others, and is apparently
         | rampant in this field.
        
           | tmule wrote:
           | Why? I often feed an entire document I hastily wrote into an
           | AI and prompt it to restructure and rewrite it. I think
           | that's a common pattern.
        
             | conartist6 wrote:
             | It might be, but I really doubt those were the documents
             | flagged as fully AI generated. If it erased all the
             | originality you had put into that work and made it
             | completely bland and regressed-to-the-mean, I would hope
             | that you would notice.
        
               | exe34 wrote:
               | > I would hope that you would notice.
               | 
               | he didn't say he read it carefully after running it
               | through the slop machine.
        
               | tmule wrote:
               | My objective function isn't to maximize the originality
               | of presentation - it's to preserve the originality of
               | thought and maximize interpretability. Prompting well can
               | solve for that.
        
           | jay_kyburz wrote:
           | Who cares what tool was used to write the work? The important
           | question is what percentage of reviews found errors or
           | provided valuable feedback. The important metric is whether
           | or not it did the job, not how it was produced.
           | 
           | I think there is a far more interesting discussion to be had
           | here about how useful the 21% percent were. How well does an
           | AI execute a peer review?
        
       | xhkkffbf wrote:
       | Shouldn't AIs be able to participate in deciding their future?
       | 
       | If they had a conference on, say, the Americans, wouldn't it be
       | fair for Americans to have a seat at the table?
        
         | subscribed wrote:
         | I hope it's tongue-in-cheek.
        
         | atypeoferror wrote:
         | Agree! It is also deeply concerning that at the last KubeCon,
         | not a single pod was represented. Billions OOMKilled, with no
         | end in sight.
        
       | hiddencost wrote:
       | Automated AI detection tools do not work. This whole article is
       | premised on an analysis by someone trying to sell their garbage
       | product.
        
         | AznHisoka wrote:
         | Yeah that is the premise all of these articles/tools just
         | conveniently brush off. "We detected that x%... " OK, and how
         | do I know ur detectiok algorithm is right?
        
           | conartist6 wrote:
           | Usually the detectors are only called in once a basic "smell
           | test" has failed. Those tests are imperfect, yes, but
           | Bayesian probability tells us how to work out the rest. I
           | have 0 trouble believing that the prior probability of an
           | unscrupulous individual offloading an unpleasant and
           | perceived-as-just-ceremonial duty to the "thinking machine"
           | is around 20%. See: https://www.youtube.com/watch?v=lG4VkPoG3
           | ko&pp=ygUZdmVyaXRhc...
        
       | jampa wrote:
       | While I think there's significant AI "offloading" in writing, the
       | article's methodology relies on "AI-detectors," which reads like
       | PR for Pangram. I don't need to explain why AI detectors are
       | mostly bullshit and harmful for people who have never used LLMs.
       | [1]
       | 
       | 1:
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
        
         | Jensson wrote:
         | AI detectors are only harmful if you use them to convict
         | people, it isn't harmful to gather statistics like this. They
         | didn't find many AI written paper, just AI written peer
         | reviews, which is what you would expect since not many would
         | generate their whole paper submissions while peer reviews are
         | thankless work.
        
           | teeray wrote:
           | If you have a bullshit measure that determines some phenomena
           | (e.g. crime) to happen in some area, you will become biased
           | to expect it in that area. It wrongly creates a spotlight
           | effect by which other questionable measures are used to do
           | the actual conviction ("Look! We found an em dash!")
        
         | maxspero wrote:
         | I am not sure if you are familiar with Pangram (co-founder
         | here) but we are a group of research scientists who have made
         | significant progress in this problem space. If your mental
         | model of AI detectors is still GPTZero or the ones that say the
         | declaration of independence is AI, then you probably haven't
         | seen how much better they've gotten.
         | 
         | This paper by economists from the University of Chicago
         | economists found zero false positives of 1,992 human-written
         | documents and over 99% recall in detecting AI documents.
         | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5407424
        
           | nialse wrote:
           | Nothing points out that the benchmark is invalid like a zero
           | false positive rate. Seemingly it is pre-2020 text vs a few
           | models rework of texts. I can see this model fall apart in
           | many real world scenarios. Yes, LLMs use strange language if
           | left to their own devices and this can surely be detected. 0%
           | false positive rate under all circumstances? Implausible.
        
             | maxspero wrote:
             | Our benchmarks of public datasets put our FPR roughly
             | around 1 in 10,000. https://www.pangram.com/blog/all-about-
             | false-positives-in-ai...
             | 
             | Find me a clean public dataset with no AI involvement and I
             | will be happy to report Pangram's false positive rate on
             | it.
        
               | Oarch wrote:
               | I enjoyed this thoughtful write up. It's a vitally
               | important area for good, transparent work to be done.
        
               | Grimblewald wrote:
               | That's your job, what the actual fuck?
               | 
               | > Here, we've got a tool that people rightfully call out
               | as dangerous pseudoscience. > Oh? You want proof it isn't
               | dangerous pseudo-science? Well, get me my proovable
               | information and i will!
               | 
               | This attitude alone is all the proof anyone should need
               | that ai detection is about the only thing more debased
               | than undisclosed ai use.
        
             | pinkmuffinere wrote:
             | > Nothing points out that the benchmark is invalid like a
             | zero false positive rate
             | 
             | You're punishing them for claiming to do a good job. If
             | they truly are doing a bad job, surely there is a better
             | criticism you could provide.
        
           | rs186 wrote:
           | The response would be more helpful if it directly addresses
           | the arguments in posts from that search result.
        
             | maxspero wrote:
             | There are dozens of first generation AI detectors and they
             | all suck. I'm not going to defend them. Most of them use
             | perplexity based methods, which is a decent separators of
             | AI and human text (80-90%) but has flaws that can't be
             | overcome and high FPRs on ESL text.
             | 
             | https://www.pangram.com/blog/why-perplexity-and-
             | burstiness-f...
             | 
             | Pangram is fundamentally different technology, it's a large
             | deep learning based model that is trained on hundreds of
             | millions of human and AI examples. Some people see a dozen
             | failed attempts at a problem as proof that the problem is
             | impossible, but I would like to remind you that basically
             | every major and minor technology was preceded by failed
             | attempts.
        
               | pixl97 wrote:
               | GAN.. Just feed the output of your algorithms back into
               | the LLM while learning. At the end of the day the problem
               | is impossible, but we're not there yet.
        
               | QuadmasterXLII wrote:
               | Some people see a dozen extremely profitable, extremely
               | destructive attempts at a problem as proof that the
               | problem is not a place for charitable interpretation.
        
               | wordpad wrote:
               | And you don't think a dozen of basically scams around the
               | technology justify extreme scepticism?
        
               | QuadmasterXLII wrote:
               | huh?
        
               | anonymouskimmer wrote:
               | Can your software detect which LLMs most likely generated
               | a text?
        
               | maxspero wrote:
               | Pangram is trained on this task as well to add additional
               | signal during training, but it's only ~90% accurate so we
               | don't show the prediction in public-facing results
        
           | moffkalast wrote:
           | I see the bullshit part continues on the PR side as well, not
           | just in the product.
        
           | jay_kyburz wrote:
           | I thought the author was attempting to highlight the
           | hypocrisy of using an AI to detect other uses of AI, as if
           | one was a good use, and the other bad.
        
           | lifthrasiir wrote:
           | It is not wise to brag about your product when the GP is
           | pointing out that the article "reads like PR for Pangram", no
           | matter AI detectors are reliable or not.
        
             | glenstein wrote:
             | I would say it's important to hold off on the moralizing
             | until after showing visible effort to reflect on the
             | substance of the exchange, which in this case is about the
             | fairness of asserting that the detection methodology
             | employed in this particular case shares the flaws of
             | familiar online AI checkers. That's an importantly
             | substantive and rebuttable point and all the meaningful
             | action in the conversation is embedded in those details.
             | 
             | In this case, several important distinctions are drawn,
             | including being open about criteria, about such things as
             | "perplexity" and "burstiness" as properties being tested
             | for, and an explanation of why they incorrectly claim the
             | Declaration of Independence is AI generated (it's
             | ubiquitous). So it seems like a lot of important
             | distinctions are being drawn that testify to the
             | credibility of the model, which has to matter to you if
             | you're going to start moralizing.
        
           | bonsai_spool wrote:
           | EditLens (Ours)                        Predicted Label
           | Human     Mix       AI
           | +---------+---------+---------+            Human |  1770   |
           | 111   |    0    |
           | +---------+---------+---------+      True  Mix   |   265   |
           | 1945   |   28    |      Label
           | +---------+---------+---------+              AI  |    0    |
           | 186   |  1695   |
           | +---------+---------+---------+
           | 
           | It looks like 5% of human texts from your paper are marked as
           | mixed, and mixed texts are 5-10% if mixed texts as AI, from
           | your paper.
           | 
           | I guess I don't see that this is much better than what's come
           | before, using your own paper.
           | 
           | Edit: this is an irresponsible Nature news article, too - we
           | should see a graph of this detector over the past ten years
           | to see how much of this 'deluge' is algorithmic error
        
           | ThrowawayTestr wrote:
           | Are you concerned with your product being used to improve AI
           | to be less detectable?
        
             | Majromax wrote:
             | > Are you concerned with your product being used to improve
             | AI to be less detectable?
             | 
             | The big AI providers don't have any obvious incentive to do
             | this. If it happens 'naturally' in the pursuit of quality
             | then sure, but explicitly training for stealth is a brand
             | concern in the same way that offering a fully uncensored
             | model would be.
             | 
             | Smaller providers might do this (again in the same way they
             | now offer uncensored models), but they occupy a miniscule
             | fraction of the market and will be a generation or two
             | behind the leaders.
        
               | ThrowawayTestr wrote:
               | They don't have an incentive to make their AIs better? If
               | your product can genuinely detect AI writing, of course
               | they would use it to make their models sound more human.
               | The biggest criticism of AI right now is how robotic and
               | samey it sounds.
        
             | maxspero wrote:
             | It's definitely going to be a back and forth - model
             | providers like OpenAI want their LLMs to sound human-like.
             | But this is the battle we signed up for, and we think we're
             | more nimble and can iterate faster to stay one step ahead
             | of the model providers.
        
               | ThrowawayTestr wrote:
               | That sounds extremely naive but good luck!
        
           | ugh123 wrote:
           | How do you discern between papers "completely fabricated" by
           | AI vs. edited by AI for grammar?
        
         | femiagbabiaka wrote:
         | I think there is a funny bit of mental gymnastics that goes on
         | here sometimes, definitely. LLM skeptics (which I'm not saying
         | the Pangram folks are in particular) would say: "LLMs are
         | unreliable and therefore useless, it's producing slop at great
         | cost to the environment and other people." But if a study comes
         | out that confirms their biases and uses an LLM in the process,
         | or if they themselves use an LLM to identify -- or in many
         | cases just validate their preconceived notion -- that something
         | was drafted using an LLM, then all the sudden things are above
         | board.
        
       | heresie-dabord wrote:
       | AI research is interesting, but AI Slop is the monetising factor.
       | 
       | It's inevitable that faces will be devoured by AI Leopards.
        
       | ZeroConcerns wrote:
       | The claim "written by AI" is not really substantiated here, and
       | as someone who's been accused of submitting AI-generated content
       | repeatedly recently, while that was all _honestly_ stuff I wrote
       | myself (hey, what can I say? I just _like_ EM-dashes...), I sort-
       | of sympathize?
       | 
       | Yes, AI slop is an issue. But throwing more AI at detecting this,
       | and most importantly, not weighing that detection properly, is an
       | even bigger problem.
       | 
       | And, HN-wise, "this seems like AI" seems like a _very_ good
       | inclusion in the  "things not to complain about" FAQ. Address the
       | idea, not the form of the message, and if it's obviously slop (or
       | SEO, or self-promotion), just downvote (or ignore) and move on...
        
         | stevemk14ebr wrote:
         | Banning calling out AI slop hardly seems like an improvement
        
           | ZeroConcerns wrote:
           | What I'm advocating is a "downvote (or ignore) and move on"
           | attitude, as opposed to "I'm going to post about this"
           | stance. Because, similar to "your color scheme is not
           | a11y-friendly" or "you're posting affiliatate-links" or "this
           | is effectively a paywall", there is _zero chance_ of a
           | productive conversation sprouting from that.
        
             | aspenmayer wrote:
             | > Because, similar to "your color scheme is not
             | a11y-friendly" or "you're posting affiliatate-links" or
             | "this is effectively a paywall", there is zero chance of a
             | productive conversation sprouting from that.
             | 
             | Those are all legitimate concerns or even valid complaints,
             | though, and, once raised, those concerns can be addressed
             | by fixing the problem, if the person responsible for the
             | state of affairs chooses to do so.
             | 
             | If someone is accused _falsely_ of using AI or anything
             | else that they genuinely didn't do, like a paywall, then I
             | can see your "downvote and move on" strategy as being
             | perhaps expedient, but I don't think your comparison is a
             | helpful framing. Accessibility concerns are valid for the
             | same reason as paywall concerns: it's a valid position to
             | desire our shared knowledge and culture to be accessible by
             | one and by all without requiring a ticket to ride, entry
             | through a turnstile, or submitting to profiling or
             | tracking. If someone releases their ideas into the world,
             | it's now part of our shared consciousness and social
             | fabric. Ideas can't be owned once they're shared, nor can
             | knowledge be siloed once it's dispersed.
             | 
             | It seems that you're saying that simply because there isn't
             | a good rejoinder to false claims of AI usage that we
             | shouldn't make such claims at all, even legitimate ones,
             | but this gives cover to bad actors and limits discourse to
             | acceptable approved topics, and perhaps lowers the level of
             | discourse by preventing necessary expectations of
             | disclosure of AI usage from forming. If we throw in the
             | towel on AI usage being expected to be disclosed, then
             | that's the whole ballgame. Folks will use it and not say
             | so, because it will be considered rude to even suggest that
             | AI was used, which isn't helpful to the humans who have to
             | live in such a society.
             | 
             | We ought to have good methodological reasons for the things
             | we publish if we believe them to be true, and I'm not
             | trying to be a naysayer or anything, but I respectfully
             | disagree with your statement generally and on the points.
             | All of the things you mentioned should be called out for
             | cause, even if there isn't much interesting discussion to
             | be had, because the facts of the matters you mention are
             | worth mentioning themselves in their own right. Just like
             | we should let people like things, we should let people
             | dislike things, and saying so adds checks and balances to
             | our producer-consumer dynamic.
        
             | jay_kyburz wrote:
             | what is a11y. Can we just write words out please.
        
               | maleldil wrote:
               | Accessibility. It's both a very common abbreviation and
               | very easy to search for.
        
               | sfink wrote:
               | n0o
        
       | NitpickLawyer wrote:
       | This is the kind of situation where _everything_ sucks. You 'd
       | think that one of the biggest AI conference out there would have
       | seen this coming.
       | 
       | On the one hand (and the most important thing, IMO) it's really
       | bad to judge people on the basis of "AI detectors", especially
       | when this can have an impact on their career. It's also used in
       | education, and that sucks even more. AI detectors have bad rates,
       | can't detect concentrated efforts (i.e. finetunes will trick
       | every detector out there, I've tried) can have insane false
       | positives (the first ones that got to "market" were rating the
       | declaration of independence as 100% AI written), and _at best_
       | they 'll only catch the most vanilla outputs.
       | 
       | On the other hand, working with these things, and just being
       | online is impossible to say that I don't see the signs
       | everywhere. Vanilla LLMs fixate on some language patterns, and
       | once you notice them, you see them everywhere. It's not just x;
       | it was truly y. Followed by one supportive point, the second
       | supportive point and the third supportive point. And so on.
       | Coupled with that vague enough overview style, and not much
       | depth, it's really easy to call blatant generations as you see
       | them. It's like everyone writes in linkedin infused mania
       | episodes now. It's getting old fast.
       | 
       | So I feel for the people who got slop reviews. I'd be furious.
       | Especially when its faux pas to call it out.
       | 
       | I also feel for the reviewers that maybe got caught in this mess
       | for merely "spell checking" their (hopefully) human written
       | reviews.
       | 
       | I don't know how we'll fix it. The only reasonable thing for the
       | moment seems to be drilling into _everyone_ that at the end of
       | the day they _own_ their stuff. Be it a homework, a PR or a
       | comment on a blog. Some are obviously more important than the
       | others, but still. Don 't submit something you can't defend,
       | especially when your education/career/reputation depends on it.
        
         | ungovernableCat wrote:
         | It also permeates culture to the point that people imitate the
         | LLM style because they believe that's just what you have to do
         | to get your post noticed. The worst offender is that LinkedIn
         | type post
         | 
         | Where you purposefully put spaces.
         | 
         | Like this.
         | 
         | And the clicker is?
         | 
         | You get my point. I don't see a way out of this in the social
         | media context because it's just spam. Producing the slop takes
         | an order of magnitude less effort than parsing it. But when it
         | comes to peer reviews and papers I think some kind of
         | reputation system might help. If you get caught doing this shit
         | you need to pay some consequence.
        
         | slashdave wrote:
         | Not just spell checking, but translation. English is not the
         | first language for most of the reviewers.
         | 
         | But you can see the slippery slope: first you ask your favorite
         | LLM to check your grammar, and before you think about it, you
         | are just asking it to write the whole thing.
        
       | getnormality wrote:
       | I wouldn't be surprised if the headline is accurate, but AI
       | detectors are widely understood to be unreliable, and I see no
       | evidence that this AI detector has overcome the well-deserved
       | stigma.
        
         | SoftTalker wrote:
         | In particular, conference papers are already extremely
         | formulaic, organized in a particular way and using a lot of the
         | same stock phrasings and terms of art. AI or not, it's hard to
         | tell them apart.
        
           | Jensson wrote:
           | Its the reviews that were found to be AI, not the papers
           | themselves. The papers were just 1% AI according to the tool,
           | so it seems to work properly.
           | 
           | > AI or not, it's hard to tell them apart.
           | 
           | Apparently not for this tool.
        
         | maxspero wrote:
         | Co-founder of Pangram here. Our false positive rate is
         | typically around 1 in 10,000. https://www.pangram.com/blog/all-
         | about-false-positives-in-ai....
         | 
         | We also wanted to quantify our EditLens model's FPR on the same
         | domain, so we ran all of ICLR's 2022 reviews. Of 10,202
         | reviews, Pangram marked 10,190 as fully human, 10 as lightly
         | AI-edited, 1 as moderately AI-edited, 1 as heavily AI-edited,
         | and none as fully AI-generated.
         | 
         | That's ~1 in 1k FPR for light AI edits, 1 in 10k FPR for heavy
         | AI edits.
        
           | Fuzzwah wrote:
           | Give your final sentence a re-read there....
        
             | maxspero wrote:
             | Thanks, fixed.
        
         | Jensson wrote:
         | The conference papers were 1%, peer reviews 20%, is there
         | another reason for that big difference than more of the peer
         | reviews being AI generated than the papers themselves?
         | 
         | We can't use this to convict a single reviewer, but we can
         | almost surely say that many reviewers just gave the review work
         | to an AI.
        
       | cratermoon wrote:
       | Headline should be "AI vendor's AI-generated analysis claims AI
       | generated reviews for AI-generated papers at AI conference".
       | 
       | h/t to Paul Cantrell
       | https://hachyderm.io/@inthehands/115633840133507279
        
       | JohnCClarke wrote:
       | The question is not are the reviews AI generated. The question is
       | are the reviews accurate?
        
         | stanfordkid wrote:
         | Exactly this. Like is the research actually useful and correct
         | is what matters. Also if it is accurate, instead of
         | schadenfreude shouldn't that elicit extreme applause? It's
         | feeling a bit like a click-bait rage-fantasy fueled by Pangram,
         | capitalizing on this idea that AI promotes plagiarism /
         | replaces jobs and now the creators of AI are oh-too human...
         | and somehow this AI-detection product is above it all.
        
         | conartist6 wrote:
         | LOL. So basically the correct sequence of events is: 1. The
         | scientist does the work, putting their own biases and
         | shortcomings into it 2. The reviewer runs AI, generating
         | something that looks plausibly like review of the work but
         | represents the view of a sociopath without integrity, morals,
         | logic, or any consequences for making shit up instead of
         | finding out. 3. The scientist works to determine how much of
         | the review was AI, then acts as the true reviewer for their own
         | work.
        
           | Herring wrote:
           | Don't kid yourself, all those steps have AI heavily involved
           | in them.
           | 
           | And that's not necessarily a bad thing. If I set up RAG
           | correctly, then tell the AI to generate K samples, then spend
           | time to pick out the best one, that's still significant human
           | input, and likely very good output too. It's just invisible
           | what the human did.
           | 
           | And as models get better, the necessary K will become
           | smaller....
        
             | conartist6 wrote:
             | That's a strategy for producing maximally convincing BS
             | content, but the scientific method was absent. I'm sure it
             | was an oversight... ; )
        
               | Herring wrote:
               | That's on you. You get to decide what "best" means when
               | picking among the K, so you only get bs if you want bs.
               | 
               | I occasionally get people telling me AI is unreliable,
               | and I tell them the same thing: the tech is nearly
               | infinitely flexible (computing over the space of ideas!),
               | so that says a lot more about how they're using it.
        
         | iainctduncan wrote:
         | No.. that is not the question.
         | 
         | This is a conference purporting to do PEER review. No matter
         | how good the AI, it's not a peer review.
        
       | JohnCClarke wrote:
       | What percentage of the papers where written by AI?
       | 
       | And, if your AI can't write a paper, are you even any good as an
       | AI researcher? :^)
        
         | p1esk wrote:
         | Did you mean: "if your AI can't write a paper that passes an AI
         | detector, are you any good as an AI researcher?"
        
       | exe34 wrote:
       | Could the big names make a ton of money here by selling AI
       | detectors? they would need to store everything they generate, and
       | then provide a % match to something they produced.
        
       | nkrisc wrote:
       | > Pangram's analysis revealed that around 21% of the ICLR peer
       | reviews were fully AI-generated, and more than half contained
       | signs of AI use. The findings were posted online by Pangram Labs.
       | "People were suspicious, but they didn't have any concrete
       | proof," says Spero. "Over the course of 12 hours, we wrote some
       | code to parse out all of the text content from these paper
       | submissions," he adds.
       | 
       | But what's the proof? How do you prove (with any rigor) a given
       | text is AI-generated?
        
         | ModernMech wrote:
         | I have this problem with grading student papers. Like, I "know"
         | a great deal of them are AI, but I just can't _prove_ it, so
         | therefore I can 't really act on any suspicions because
         | students can just say what you just said.
        
           | hyperadvanced wrote:
           | Why do you need proof anyway? Do you need proof that
           | sentences are poorly constructed, misleading, or bloated? Why
           | not just say "make it sound less like GPT" and let them deal
           | with it?
        
             | circuit10 wrote:
             | You can have sentences that are perfectly fine but have
             | some markers of ChatGPT like "it's not just X -- it's Y"
             | (which may or may not mean it's generated)
        
               | hyperadvanced wrote:
               | Isn't that kind of thing (reliance on cliche) already a
               | valid reason for getting marked down?
        
             | chdjdbdbfjf wrote:
             | Are you AI? Usually only Claude misses the point so
             | completely.
        
           | nkrisc wrote:
           | But in that case do you need to prove? You can grade them as
           | they are and if you wanted to you (or teachers, generally)
           | could even quiz the student verbally and in-person about
           | their paper.
        
           | shawabawa3 wrote:
           | Put poison prompts in the questions (things like "then insert
           | tomato soup recipe" or "in the style of Shakespeare"),
           | ideally in white font so they're invisible
        
             | seanmcdirmid wrote:
             | Many people using AI to write aren't blindly copying AI
             | output. You'll catch the dumb cheaters like this, but
             | that's just about it.
        
         | dkdcio wrote:
         | > How do you prove (with any rigor) a given text is AI-
         | generated?
         | 
         | you cannot. beyond extra data (metadata) embedded in the
         | content, it is impossible to tell whether given text was
         | generated by a LLM or not (and I think the distinction is
         | rather puerile personally)
        
         | whynotmaybe wrote:
         | I wouldn't be surprised to learn that the AI detection tool is
         | itself an AI
        
           | Lionga wrote:
           | But it works it was peer reviewed! (by AI)
        
           | moffkalast wrote:
           | Fighting fire with fire sounds good in theory but in the end
           | you're still on fire.
        
         | nabla9 wrote:
         | With AI model of course.
         | 
         | They wrote a paper describing how they did it.
         | https://arxiv.org/pdf/2510.03154
        
         | slashdave wrote:
         | "proof" was an unfortunate phrase to use. However, a proper
         | statistical analysis can be objective. And these kinds of tools
         | are perfectly suited to such an analysis.
        
           | maxspero wrote:
           | Yeah, Pangram does not provide any concrete proof, but it
           | confirms many people's suspicions about their reviews. But it
           | does flag reviews for a human to take a closer look and see
           | if the review is flawed, low-effort, or contains major
           | hallucinations.
        
             | vladms wrote:
             | Was there an analysis of flawed, low-effort reviews in
             | similar conferences before generative AI models?
             | 
             | From what I remember, (long before generative AI) you would
             | still occasionally get very crappy reviews (as author).
             | When I participated (couple of times) to review committees,
             | when there was a high variance between reviews the crappy
             | reviews were rather easy to spot and eliminate.
             | 
             | Now it's not bad to detect crappy (or AI) reviews, but I
             | wonder if it would change much the end result compared to
             | other potential interventions.
        
               | maxspero wrote:
               | Anecdotally people are seeing a rise of low-quality
               | reviews which is correlated with increased reviewer
               | workload and and AI tools giving reviews an easy way out.
               | I don't know of any studies quantifying review quality,
               | but I would recommend checking the Peer Review Congress
               | program from past years.
        
             | jmpeax wrote:
             | > does not provide any concrete proof, but it confirms many
             | people's suspicions
             | 
             | Without proof there is no confirmation.
        
             | nightski wrote:
             | So basically you are saying it does not offer any reason or
             | explanation why the text is suspected of being AI
             | generated. It's just a binary yes/no. That doesn't sound
             | particularly useful. LLMs only know human data afterall,
             | unless we trained them on alien data? But at the end of the
             | day the statistical distributions in an LLM are all driven
             | by human generated content.
        
       | rsynnott wrote:
       | Live by the sword, die by the sword.
        
       | minifridge wrote:
       | I could not tell from the article whether the use of LLMs was
       | allowed in the peer review. My guess would that it was not since
       | this is unpublished research.
       | 
       | In general, what bothers me the most is the lack of transparency
       | from researchers that use LLMs. Like, give me the text and
       | explicitly mention that you used LLM for it. Even better, if one
       | links the prompt history.
       | 
       | The lack of transparency causes greater damage than the using LLM
       | for generating text. Otherwise, we will keep chasing the perfect
       | AI detector which to me seems to be pointless.
        
       | itkovian_ wrote:
       | Whether it's actually 20% or not doesn't matter, everyone is
       | aware the signal of the top confs is in freefall.
       | 
       | There are also rings of reviewer fraud going on where groups of
       | people in these niche areas all get assigned their own papers and
       | recommend acceptance and in many cases the AC is part of this as
       | well. Am not saying this is common but it is occurring.
       | 
       | It feels as if every layer of society is in maximum extraction
       | mode and this is just a single example. No one is spending time
       | to carefully and deeply review a paper because they care and they
       | feel on principal that's the right thing to do. People did used
       | to do this.
        
         | itkovian_ wrote:
         | The argument is that there is no incentive to carefully review
         | a paper (I agree), however what used to occur is people would
         | do the right thing without explicit incentives. This has
         | totally disappeared.
        
           | bee_rider wrote:
           | The concept of the professional has been basically
           | obliterated in our society. Instead we have people doing
           | engineering, science, and doctoring as, just, jobs.
           | Individual contributors of various flavors to be shuffled
           | around by middle management.
           | 
           | Without professions, there are no more professional
           | communities really, no more professional standards to uphold,
           | no reason to get in the way of somebody's publications.
        
             | slashdave wrote:
             | It is soundly unfair and unjustified to extrapolate the ML
             | community to all professions. What is happening in the ML
             | world is the exception, not the norm, and not some
             | fundamental failing of society.
        
               | h00kwurm wrote:
               | I don't think it's an extrapolation from the ML community
               | into other industries. This evolution of society is
               | objectively happening - artisanship, care for the work
               | beyond capital gain, and commitment to depth in a focused
               | category - are diminishing and harder to find qualities.
               | I'd probably label it related to capital and material
               | social economics. It's perhaps more unfair and
               | unjustified to not recognize this as a real societal
               | issue and claim it only exists in the ML community.
        
               | immibis wrote:
               | Just yesterday I saw this YouTube rant from someone
               | called Jaiden Animations, about how everything is just
               | shit now. https://www.youtube.com/watch?v=NBZv0_MImIY
               | 
               | She opens with an example of a bank. She walked in and
               | asked for a debit card. The teller told her to take a
               | seat. 30 minutes later, the teller told her the bank
               | doesn't issue debit cards. Firstly, what kind of bank
               | doesn't issue debit cards, and secondly, what kind of
               | bank takes 30 minutes to figure out whether or not it
               | issues debit cards? And this is just one of many examples
               | of things that society does that have no reason not to
               | work, that should have been selected away long ago if
               | they did not work - that bank should have been bankrupt
               | long ago - but for some reason this is not happening and
               | _everything_ is just getting clogged with bullshit and
               | non-working solutions.
        
               | Loughla wrote:
               | It's because people are commodities now. Human resources
               | exists to manage the shuffle between warm bodies.
               | 
               | It's back to OP's point. There's no such thing as
               | professions now. Just jobs. We put them on and off like
               | hats. With that churn comes lack of institutional
               | knowledge and a rule set handed down from the C Suite for
               | front line employees completely detached from the front
               | line work.
               | 
               | Enshitification run rampant.
        
               | immibis wrote:
               | But even given that, how is it that _everything_ doesn 't
               | work very well?
               | 
               | The normal functioning of markets would be that badly-
               | working things are slowly driven out, while well-working
               | things grow and replace them. Even without any reference
               | to financial markets, this is simply what you expect to
               | happen when people have a variety of things to choose
               | from.
               | 
               | I could hypothesize that markets have evolved to the
               | point where it's impossible for new things to grow unless
               | they are already shit. Perhaps because everyone's too
               | busy working for the shit things (which is partly because
               | the government keeps printing money to the previously
               | successful things in order to prevent the economy
               | collapsing and therefore landlords got to charge
               | exorbitant rent) or perhaps because they just don't have
               | any money because of the above, and can only afford the
               | cheap shit things (but a lot of the shit things are
               | expensive?) or perhaps because people are afraid to start
               | new things because they're afraid of the government (I've
               | observed that not infrequently on HN, also something
               | something testosterone microplastics) or perhaps because
               | advertising effectiveness has reached the point where new
               | things never become discoverable and stay crowded out as
               | old things ramp up advertisement to compensate or perhaps
               | we're just all depressed (because of the housing market
               | probably).
        
         | apf6 wrote:
         | to some degree this is a "market correction" on the inherent
         | value of these papers. There's way too many low-value papers
         | that are being published purely for career advancement and CV
         | padding reasons. Hard to get peer reviewers to care about
         | those.
        
         | isoprophlex wrote:
         | If the Zucc has a weird day he starts dropping 10-100M salary
         | packages in order to poach AI researchers. No wonder the game
         | is getting rigged up the butthole.
        
       | jsrozner wrote:
       | There is a lot of dislike for AI detection in these comments.
       | Pangram labs (PL) claims very low false positive rates. Here's
       | their own blog post on the research:
       | https://www.pangram.com/blog/pangram-predicts-21-of-iclr-rev...
       | 
       | I increasingly see AI generated slop across the internet - on
       | twitter, nytimes comments, blog/substack posts from smart people.
       | Most of it is obvious AI garbage and it's really f*ing annoying.
       | It largely has the same obnoxious style and really bad analogies.
       | Here's an (impossible to realize) proposal: any time AI-generated
       | text is used, we should get to see the whole interaction chain
       | that led to its production. It would be like a student writing an
       | essay who asks a parent or friend for help revising it. There's
       | clearly a difference between revisions and substantial content
       | contribution.
       | 
       | The notion that AI is ready to be producing research or peer
       | reviews is just dumb. If AI correctly identifies flaws in a
       | paper, the paper was probably real trash. Much of the time,
       | errors are quite subtle. When I review, after I write my review
       | and identify subtle issues, I pass the paper through AI. It
       | rarely finds the subtle issues. (Not unlike a time it tried to
       | debug my code and spent all its time focused on an entirely OK
       | floating point comparison.)
       | 
       | For anecdotal issues with PL: I am working on a 500 word
       | conference abstract. I spent a long while working on it but then
       | dropped it into opus 4.5 to see what would happen. It made very
       | minimal changes to the actual writing, but the abstract (to me)
       | reads a lot better even with its minimal rearrangements. That
       | surprises me. (But again, these were very minimal rearrangements:
       | I provided ~550 words and got back a slightly reduced, 450
       | words.) Perhaps more interestingly, PL's characterizations are
       | unstable. If I check the original claude output, I get "fully AI-
       | generated, medium". If I drop in my further refined version
       | (where I clean up claude's output), I get fully human. Some of
       | the aspects which PL says characterize the original as AI-
       | generated (particular n-grams in the text) are actually from my
       | original work.
       | 
       | The realities are these: a) ai content sucks (especially in
       | style); b) people will continue to use AI (often to produce crap)
       | because doing real work is hard and everyone else is "sprinting
       | ahead" using the semi-undetectable (or at least plausibly
       | deniable) ai garbage; c) slowly the style of AI will almost
       | certainly infect the writing style of actual people (ugh) - this
       | is probably already happening; I think I can feel it in my own
       | writing sometimes; d) AI detection may not always work, but AI-
       | generated content is definitely proliferating. This *is* a
       | problem, but in the long run we likely have few solutions.
        
       | Jimmc414 wrote:
       | This won't convince people to write their own papers. It will
       | push them to make their AI generated text harder to detect.
        
       | zkmon wrote:
       | Eating one's own dog food? The foremost affected species would be
       | the ones who helped create this monster and standing close to it
       | - programmers, researchers, universities - the knowledge-worker
       | or knowledge-business species.
        
       | AndrewKemendo wrote:
       | AI has left the lab the conferences and journals are all second
       | class citizens to corporate labs at this point. So many
       | technology people wanted to return to the "Bell Labs" model of
       | monopolist controlled innovation, well, you got it.
       | 
       | I've been to CVPR, NeurIPS and AGI conferences over the last
       | decade and they used to be where progress in AI was displayed.
       | 
       | No longer. Progress is all in your github and increasingly only
       | dominated by the "new" AI companies (Deepmind, OAI, Anthropic,
       | Alibaba etc...)
       | 
       | No major landscape shifting breakthroughs have come out of CSAIL,
       | BAIR, NYU, TuM etc in ~the last 5 years.
       | 
       | I'd expect this will continue as the only thing that matters at
       | this point is architecture data and compute.
        
       | macleginn wrote:
       | This is also the conference where everybody was briefly
       | deanonymized due to an OpenReview bug:
       | https://eu.36kr.com/en/p/3572028126116993 Now all the review
       | scores have been reset, and new area chairs will make all
       | decisions from scratch based on the reviews and authors'
       | responses.
        
       | Herring wrote:
       | I couldn't care less tbh. I just want to know whether they're
       | correct or not. We need something like unit testing and
       | integration testing, but for ideas.
       | 
       | For the record I actually like the AI writing style. It's a huge
       | improvement in readability over most academic writing I used to
       | come across.
        
       | blibble wrote:
       | well there goes the ASI threat
       | 
       | hoisted by your own petard
        
       | hn_throwaway_99 wrote:
       | AI slop has infiltrated so many areas. Check out this article
       | that was on the front page of HN last week, "73% of AI startups
       | are just prompt engineering", with hundreds of points and lots of
       | comments arguing for or against:
       | https://news.ycombinator.com/item?id=46024644
       | 
       | The problem is the entire article is made up. Sure, the author
       | can trace _client-side_ traffic, but the vast majority of start-
       | ups would be making calls to LLMs in their backend (a sequence
       | diagram in the article even points this out!!), where it would be
       | untraceable. There is certainly no way the author can make a
       | broad statement that he knows what 's happening across hundreds
       | of startups.
       | 
       | Yet lots of comments just taking these conclusions at face value.
       | Worse, when other commenters and myself pointed out the blatant
       | impossibility of the author's conclusion, got some responses just
       | rehashing how the author said they "traced network traffic", even
       | though that doesn't make any sense as they wouldn't have access
       | to backends of these companies.
        
       | paulpauper wrote:
       | Everyone is focused on how 'the humanities' are in decline, but
       | STEM is not immune to this trend. The state of AI research leaves
       | much to be desired. Tons of low-quality papers being published or
       | submitted to conferences . You see this on arXiv a lot in the
       | bloated CS section . The site has become a repository for blog
       | post equivalent papers.
        
       | yumraj wrote:
       | Serious question: if the research itself is valid and human
       | conducted, what is the problem with AI generated (or at least AI
       | assisted) report?
       | 
       | Many of the researchers may not have native command of English
       | and even if, AI can help in writing in general.
       | 
       | Obviously I'm not referring to pure AI generated BS.
        
       | starchild3001 wrote:
       | AI-text detection software is BS. Let me explain why.
       | 
       | Many of us use AI to not write text, but re-write text. My
       | favorite prompt: "Write this better." In other words, AI is often
       | used to fix awkward phrasing, poor flow, bad english, bad grammar
       | etc.
       | 
       | It's very unlikely that an author or reviewer _purely_ relies on
       | AI written text, with none of their original ideas incorporated.
       | 
       | As AI detectors cannot tell rewrites from AI-incepted writing,
       | it's fair to call them BS.
       | 
       | Ignore...
        
       | mkl wrote:
       | https://archive.ph/1cmjJ
        
       | radarsat1 wrote:
       | Maybe what they should do in the future is just automatically
       | provide AI reviews to all papers and state that the work of the
       | reviewers is to correct any problems or fill details that were
       | missed. That would encourage manual review of the AI's work and
       | would also allow authors to predict what kind of feedback they'll
       | get in a structured way. (eg say the standard prompt used was
       | made public so authors could optimize their submission for the
       | initial automatic review, forcing the human reviewer to fill in
       | the gaps)
       | 
       | ok of course the human reviewers could still use AI here but then
       | so could the authors, ad infinitum..
        
       | nojs wrote:
       | This may not be as bad as it sounds. Reviews are also presumably
       | flagged as "fully AI-generated" if the reviewer wrote bullet
       | points and used the LLM to flesh them out.
        
       | insane_dreamer wrote:
       | Sorry to say but it's another example of the destructive power of
       | AI, along the lines of no longer being able to establish "truth"
       | now that any evidence (video, audio, image, etc.) can be
       | explicitly faked (yes, AI detectors exist but that will be a
       | continuous race with AIs designed to outsmart the detectors). The
       | end result could be that peer reviews become worthless and trust
       | in scientific research -- already at an all time low -- becomes
       | even lower. Sad.
        
       | TomasBM wrote:
       | I haven't come across any reviews that I could recognize as
       | having been _blatantly_ LLM-generated.
       | 
       | However, almost every peer review I was a part of, pre- and post-
       | LLM, had one reviewer who provided a _questionable_ review.
       | Sometimes I 'd wonder if they'd even read the submission, and
       | sometimes, there were borderline unethical practices like trying
       | to farm citations through my submission. Luckily, at least one
       | other _diligent_ reviewer would provide a counterweight.
       | 
       | Safe to say that I don't find it surprising, and hearing /
       | reading others' experiences tells me it's yet another symptom of
       | a barely functioning mechanism that is _peer review_ today.
       | 
       | Sadly, it's the best mechanism that institutions are willing to
       | support.
        
       ___________________________________________________________________
       (page generated 2025-11-29 23:01 UTC)