[HN Gopher] Spotting LLMs with Binoculars: Zero-Shot Detection o...
___________________________________________________________________
Spotting LLMs with Binoculars: Zero-Shot Detection of Machine-
Generated Text
Author : victormustar
Score : 54 points
Date : 2024-01-23 20:29 UTC (2 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ISL wrote:
| Is a layman's interpretation of this to state: LLMs tend to
| perform like aggregated humanity, but any given human will
| differ. Since all the volume of a high-dimensional sphere is at
| the edge, almost nobody is like the mean, so the false-positive-
| rate is low?
|
| It's a clever plan, until the LLMs do some adversarial
| training....
| adamgordonbell wrote:
| This is a super clear explanation!
|
| Perhaps this measurement approximates a human reaction to
| chatGPT: 'This writing is distinctly indistinct.'
| actionfromafar wrote:
| Made me think of how the bomber planes pilot seats were
| designed to the average human, which meant it fit no human.
| sdsaga12 wrote:
| Sounds like the bed of Procrustes:
| https://en.wikipedia.org/wiki/Procrustes#Mythology
| __loam wrote:
| Yeah it kind of sucks for people who don't like these systems
| that efforts to resist them are essentially the same as using
| GANs to train them.
| shagie wrote:
| https://www.smithsonianmag.com/history/what-the-luddites-
| rea...
|
| > They did not invent a machine to destroy technology, but
| they knew how to use one. In Yorkshire, they attacked frames
| with massive sledgehammers they called "Great Enoch," after a
| local blacksmith who had manufactured both the hammers and
| many of the machines they intended to destroy. "Enoch made
| them," they declared, "Enoch shall break them."
|
| ... And another reference for the phrase...
|
| https://www.nigeltyas.co.uk/nigel-tyas-news/post/enoch-
| the-p...
|
| > And here's the funny thing. The weapons they reached for to
| wield and smash the machines were sledge hammers made by ...
| the Taylor brothers of Marsden. This irony was not lost on
| the Luddites and as they swung 'Enoch's hammers' to damage
| his hated machines they cried: "Enoch made them, and Enoch
| shall break them".
| lawlessone wrote:
| >It's a clever plan, until the LLMs do some adversarial
| training....
|
| it's an unwinnable war.
| jjackson5324 wrote:
| > false positive rate of 0.01%
|
| What would be an acceptable false positive rate for something
| like this to be used at schools and universities?
|
| Like, obviously 0.01% is not acceptable, but what would be?
| Etheryte wrote:
| Why do you think this rate is not acceptable? I would say it's
| more than acceptable, even as a single data point. If someone's
| submission comes up as positive on two separate occasions
| you've pretty much eliminated the chance of a false positive.
| declaredapple wrote:
| I would agree with that it's usable as a single datapoint,
| used with others (other submissions, general performance,
| etc).
|
| However given that we already have professors literally
| failing people by just pasting and asking chatgpt, I'm not
| sure I'm comfortable with that.
| FLT8 wrote:
| Unless that person's writing style just happens to be very
| close to how the average human writes? If it's 0.01% for "any
| given human", I wonder what the numbers would look like for a
| human closer to average than usual?
| wrs wrote:
| Only if the measures for an individual are uncorrelated,
| which there's no reason to assume.
|
| It needs to be <= 0.01% false positive _for each individual
| author_. If it's just that 0.01% of _all_ tests are false
| positive, that leaves the possibility than a given individual
| might have anywhere up to 100% false positives.
| Lerc wrote:
| I feel like 0.01% is one of the more dangerous levels of
| false positives. Widespread use would result in far more than
| 10,000 tests resulting in the near certainty that innocent
| people would be accused. The moderately low false positive
| rate would then be leaned on to imply guilt.
|
| You might find for every 10,000 tests you get around 151
| targets. For each one individually you can say they probably
| did it, but cumulatively you can say that one of them is
| probably innocent.
|
| Consider using two positives. Do schools and universities
| generate 100,000,000 essays per year? Sure would suck for
| that innocent person tagged as having a one in a hundred
| million chance of not being guilty.
| yreg wrote:
| 0.01% sounds far better than what I would expect!
|
| If the tech froze at its current state, this would be useful
| for schools. You don't need to expel a student right away after
| finding a match, but it is a strong indication that something
| is worth looking into.
|
| (If the goal is to make students write essays, theses, etc.
| without an LLM writing it for them.)
| shagie wrote:
| > (If the goal is to make students write essays, theses, etc.
| without an LLM writing it for them.)
|
| This is one of the "I don't think that this is the path that
| we should be taking."
|
| When I was in school, my parents would proof read the essays
| to catch the spelling and grammatical errors that were in
| what I wrote (Bank Street Writer had a rudimentary spelling
| checker but that was it -
| https://en.wikipedia.org/wiki/Bank_Street_Writer ).
|
| While my parents are both native English speakers and college
| educated, some of my classmates had less involved parents, or
| parents that didn't have the same degree of proficiency for
| writing. Did their essays suffer from a lack of parental
| proof reading?
|
| In the past few months I wrote two short works of fiction as
| lore for a game that I play. I used ChatGPT to act as an
| editor for those works looking at it and occasionally
| prompting it to help refine a passage.
|
| https://chat.openai.com/share/204de7f7-9cd7-4c45-aa2b-556791.
| .. for part of the editor session with it.
|
| Having ChatGPT act as an editor (not text editor but as a
| critique of the text) helped refine the text that I wrote.
|
| Working _with_ ChatGPT as a tool (that is far beyond the red
| squiggles in a word processor) to help people working with
| the written word is a good and useful endeavor. This isn 't
| trying to have ChatGPT supplant human creativity but rather
| help the person communicate more clearly.
|
| ---
|
| I am leaving this in an unedited form, but here is _this_
| post with ChatGPT as an editor as an example of how I believe
| students should try to interact with it. https://chat.openai.
| com/share/d891f9ac-923b-47a8-8de9-ab7301...
| yreg wrote:
| Yeah, at some point it's sensible to learn using all the
| available tools. Use calculators in math class, the web
| while programming, etc.
|
| Still, there is a reason why calculators are not used since
| the very first grade -> it makes sense to learn how to do
| basic calculations without them.
| vicgalle_ wrote:
| > https://twitter.com/minimaxir/status/1749893683137454194
|
| Not very promising, though.
| yreg wrote:
| Well that is not a novel text, is it?
| comex wrote:
| The paper itself goes into detail about why the US Constitution
| and other memorized texts are misclassified. It's surprising
| but not a killer flaw, since in most contexts it would only
| apply to direct quotes of famous texts.
| TuringNYC wrote:
| The false positive rate would kill most use cases here. Even
| 1/10000 false accusations of academic integrity would be too
| much.
| berkes wrote:
| Is it bad if it's an accusation?
|
| Wouldn't it need additional data, such as actual proof, to
| become an allegation or even a claim or charge?
| lawlessone wrote:
| >Wouldn't it need additional data, such as actual proof, to
| become an allegation or even a claim or charge?
|
| Yes but that hasn't stopped people before.
| JoeJonathan wrote:
| Do you really think professors flag plagiarism only when it's
| cut and dry? Absolutely not. Plenty, if not most, flagged cases
| of plagiarism are ambiguous. The process at most colleges and
| universities typically accounts for this via something like
| review by an academic integrity committee.
| binsquare wrote:
| I'm not convinced that we're on the right path in detecting ai
| generated content.
|
| We've been looking at the end result and making conclusions about
| the journey - and that will always comes with degrees of
| uncertainty. A false positive rate of 0.01% now probably will not
| be applicable as people adapt and grow alongside ai content.
|
| I wonder if anyone's working on software that documents the
| journey of the output similar to like git commits, such that we
| can analyze both the metadata (journey) & output (end result) to
| determine human authenticity.
| tomaskafka wrote:
| Edit history is an awesome learning data (as they show the
| train of thought), but I can imagine that models will easily
| learn to generate it.
| Imnimo wrote:
| According to their demo, their Limitations section of their
| github repo is AI-generated.
|
| >All AI-generated text detectors aim for accuracy, but none are
| perfect and can have multiple failure modes (e.g., Binoculars is
| more proficient in detecting English language text compared to
| other languages). This implementation is for academic purposes
| only and should not be considered as a consumer product. We also
| strongly caution against using Binoculars (or any detector)
| without human supervision.
| etwigg wrote:
| If the text is good, and someday it will be, I don't care if an
| LLM wrote it. If it's bad, I don't care if a person wrote it.
|
| The only reason to care is that the implicit proof-of-work signal
| has broken because LLM text is so cheap. Open forums might need
| to be pay-per-submission someday...
| Jedd wrote:
| The problem as I see it is that 'text is good' has two distinct
| meanings - first, it doesn't sound like an AI wrote it, it's
| interesting, entertaining, in word 'readable'. The second is
| that it's accurate / true.
|
| It feels like we're happy to take the first as a surrogate for
| the second, or at least being good at the first drops our guard
| on questioning the second.
| Jedd wrote:
| https://huggingface.co/spaces/tomg-group-umd/Binoculars
___________________________________________________________________
(page generated 2024-01-23 23:00 UTC)