[HN Gopher] Scalable watermarking for identifying large language...
___________________________________________________________________
Scalable watermarking for identifying large language model outputs
Author : ghshephard
Score : 55 points
Date : 2024-10-31 18:00 UTC (4 days ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| viraptor wrote:
| What they didn't put in the limitations or other sections (unless
| I missed it) is that it can only apply to larger creative text,
| not to structured or repeated output. For example if you want to
| watermark generated code, you can't produce it as a diff to the
| existing file - the sampling changes will cause unwanted
| modifications.
|
| Similar for things like "fix grammar in this long text" will have
| to tweak random words without a reason, because the existing text
| can't be 100% reproduced while injecting synth-id.
| jsenn wrote:
| This is discussed in the "Watermarking with Synth-ID Text"
| section right after they define the Score function:
|
| > There are two primary factors that affect the detection
| performance of the scoring function. The first is the length of
| the text x: longer texts contain more watermarking evidence,
| and so we have more statistical certainty when making a
| decision. The second is the amount of entropy in the LLM
| distribution when it generates the watermarked text x. For
| example, if the LLM distribution is very low entropy, meaning
| it almost always returns the exact same response to the given
| prompt, then Tournament sampling cannot choose tokens that
| score more highly under the g functions. In short, like other
| generative watermarks, Tournament sampling performs better when
| there is more entropy in the LLM distribution, and is less
| effective when there is less entropy.
| zebomon wrote:
| Worth pointing out that while watermarking is mathematically
| reliable, the scammers who are selling "AI detection" don't have
| the weight-level access that it requires.
| eitland wrote:
| > watermarking is mathematically reliable
|
| So should accounting be. To a much higher degree.
|
| Yet I hope most of us are aware of the British Post Office
| scandal where what really should be accounting software falsely
| accused thousands of employees of theft, of which over 900 were
| convicted of theft, fraud and false accounting.
|
| If this can happen in something as utterly boring as an
| accounting system should be in this millennium, I don't think
| we should trust AI fraud detection in science and academia
| until we get a few decades of experience with it.
|
| (Do I think _records_ from accounting can be used as evidence?
| Absolutely, given the right circumstances: we can know it hasn
| 't been tampered with etc
|
| What I don't think however is that pattern matching or
| "watermarks" that indicates _probability_ should be used as
| evidence. Especially not closed source systems with secret
| distributions and watermarking algorithms.)
| zebomon wrote:
| I agree with you completely.
|
| In the wild, there are too many variables to use watermarking
| to draw meaningful conclusions about any piece of text, no
| matter the word count. Scott Aaronson described well one of
| those variables, "the pineapple attack," in his Aug. 2023
| talk at Simons [1].
|
| Watermarking is illuminating to the ongoing study of language
| models' functionality, but it doesn't put the genie back in
| the bottle.
|
| 1. https://www.youtube.com/watch?v=2Kx9jbSMZqA
| michaelt wrote:
| As far as I can tell, what they're proposing is:
|
| Today, for each output token the LLM produces a probability for
| each possible output, then a 'sampler' makes a probability-
| weighted random choice. If the next-token probabilities are 90%
| for foo, 9% for bar and 1% for baz then the sampler chooses a
| random number between 0 and 1, if it's <0.9 it outputs foo,
| 0.9-0.99 it outputs bar, 0.99-1 it outputs baz.
|
| But what if instead of using random numbers, you had a source of
| evenly distributed random numbers that was deterministic, based
| on some secret key?
|
| Each candidate token would remain just as likely as it was before
| - there would still be a 90% chance of foo being chosen. So the
| output shouldn't degrade in quality.
|
| And sure, some tokens will have 99.999% probability and their
| selection doesn't tell you much. But in most real-world use
| multiple wordings are possible and so on. So across a large
| enough sample of the output, you could detect whether the sampler
| was following your secret deterministic pattern.
|
| Of course the downside is you've got to check on exactly the same
| LLM, and only people with the secret key can perform the check.
| And it's only applicable to closed-source LLMs.
|
| I'm also not quite sure if it works when you don't know the exact
| prompt - so maybe my understanding of the paper is all wrong?
| genrilz wrote:
| My understanding from the "Watermark Detection" section is that
| it only requires the key and the output text in order to do the
| detection. In particular, it seems like the random seed used
| for each token is only based off the previous 4 tokens and the
| LLM specific key, so for any output larger than 4 tokens, you
| can start to get a signal.
|
| I don't think the key actually needs to be secret, as it's not
| trying to cryptographicly secure anything. So all closed
| weights LLM providers could just publicly share the keys they
| use for water marking, and then anybody could use them to check
| if a particular piece of text was generated by a particular
| LLM.
|
| That being said, I think you are right about this only really
| being useful for closed weights models. If you have the
| weights, you can just run an LLM through a standard sampler and
| it won't be watermarked.
| danielmarkbruce wrote:
| Why would anyone ever use such a model? And then, given the
| significant reduction in users, why would any closed model
| service do this?
|
| Seems like a cool theoretical trick that has little practical
| implication.
| genrilz wrote:
| There are situations where the model output being
| watermarked doesn't matter. For instance, I hear people on
| HN asking LLMs to explain things to them all the time,
| (which I think is a bad idea, but YMMV) and people use LLMs
| to write code quickly. (which I think is at least possibly
| a good idea) There are also some content farms which churn
| out low quality books on Amazon on various topics, and I
| don't think they care if they get caught using LLM outputs.
|
| Thus it might reduce usage some, but it certainly wouldn't
| block all usage. Additionally, there are only a few
| providers of truly massive LLMs on the market right now. If
| they decided that doing this would be a social good, or
| more likely that it would bring bad PR to not do this when
| their competitors do, then they would at least be able to
| watermark all of the massive LLM outputs.
| danielmarkbruce wrote:
| You say that as though there isn't a choice though. There
| will always be a good offering who doesn't water mark.
|
| And there is no good reason for a provider to watermark -
| they aren't helping the customer. They'd be helping some
| other party who isn't paying them.
|
| This will never be a thing.
| IanCal wrote:
| I'm totally happy having huge amounts of my use of llms
| identifiable as from an llm. I don't see many important
| cases for me where I need to pretend it wasn't from an llm.
|
| I will happily lose those cases for increased performance,
| that's the thing I care about.
|
| Are there normal cases where you picture this as an issue?
| eitland wrote:
| Not a problem for me. I am not a student anymore.
|
| And I am not against LLM output being identifiable as
| such. (although I think an argument could be made based
| on the ruling about the monkey and the camera, which IIRC
| would say that the copyright belongs to whoever created
| the situation).
|
| But after the
|
| 1. British Post Office scandal and
|
| 2. some really high profile cases of education
| institutions here in Norway abusing plagiarism detectors
|
| I do not feel ready to trust neither
|
| 1. complex software (and especially not closed sourced
| software) to tell us who is cheating or not
|
| 2. nor any humans ability to use such a system in a
| sensible way
|
| While cheating isn't usually criminal court, students
| also usually does not get a free defense.
|
| For this reason I suggest cheating should have to be
| _proven to have occurred_ , not "suggested to probably
| have occurred" by the same people who creates the not
| very reliable and extremely hard-to-reproduce LLMs.
| danielmarkbruce wrote:
| Increased performance? Watermarking will not increase
| performance. They are talking about tilting the decoding
| process in minor ways. It won't help (or hurt much)
| performance.
| zmgsabst wrote:
| What happens if I tell the LLM to reword every other sentence?
| -- or every 5th word?
|
| I must be missing something, because this seems to assume a
| contiguous output.
| genrilz wrote:
| It's possible that this might break the method, but what
| seems most likely to me is that the LLM will simply reword
| every 5th word with some other word that it is more likely to
| use due to the watermark sampling. Thus the resulting output
| would display roughly the same level of "watermarkedness".
|
| You might be able to have one LLM output the original, and
| then another to do a partial rewording though. The resulting
| text would likely have higher than chance "watermarkedness"
| for both LLMs, but less than you would expect from a plain
| output. Perhaps this would be sufficient for short enough
| outputs?
| namrog84 wrote:
| What happens when we all reading llm output all the time.
| Simply start to adapt more to llm writing styles, word
| choice, and possibly without realizing it watermark our own
| original writing?
| genrilz wrote:
| You might be right, but my first instinct is that this
| probably wouldn't happen enough to throw off the water
| marking to badly.
|
| The most likely used word is based off the previous four,
| and only works if there is enough entropy present that
| one of multiple word would work. Thus its not a simple
| matter of humans picking up particular word choices.
| There might be some cases where there are 3 tokens in a
| row that occur with low entropy after the first token,
| and then one token generation with high entropy at the
| end. That would cause a particular 5 word phrase to
| occur. Otherwise, the word choice would appear pretty
| random. I don't think humans pick up on stuff like that
| even subconsciously, but I could be wrong.
|
| I would be interested to see if LLMs pick up the
| watermarks when fed watermarked training data though.
| Evidently ChatGPT can decode base64, [0] so it seems like
| these things can pick up on some pretty subtle patterns.
|
| [0] https://www.reddit.com/r/ChatGPT/comments/1645n6i/i_n
| oticed_...
| Buttons840 wrote:
| I haven't been in school since LLMs became useful, but if I
| were to "cheat", I'd ask the LLM for a very fine grained
| outline, and then just translate the outline into my own words.
| Then ask the LLM to fill in my citations in AMA format.
|
| And honestly, this still retains like 95% of the value of
| writing a paper, because I did write it, the words did flow
| through my brain. I just used the LLM to avoid facing a blank
| page.
|
| I've also thought about asking LLMs to simulate a forum
| conversation about the Civil War (or whatever the topic may
| be), and include a wrong comment that can be countered by
| writing exactly what the assignment requires, because I seem to
| have no trouble writing an essay when duty calls and someone is
| wrong in the internet.
| qeternity wrote:
| These things don't have to be foolproof to be effective
| deterrents.
| awongh wrote:
| After skimming through the paper I can't immediately pick out the
| data that says how much more certainty there is for a given text
| to detect a watermark, and the graph of that certainty as the
| text size grows. (They seem to assert that the certainty grows as
| token count goes up, but it's not clear by how much.)
|
| I worry (and have already read worrying things) about "cheating
| detection" tools that have been deployed in schools. My intuition
| would be that there's just too much entropy between something
| like an essay prompt and the essay itself. I guess it also
| depends on how specific the teacher's essay prompt is as well.
| eitland wrote:
| > I worry (and have already read worrying things) about
| "cheating detection" tools that have been deployed in schools.
|
| This is my worry as well.
|
| Punishment for cheating can be can easily set back a student a
| year or more. This is fair if the student has been cheating,
| but really harsh.
|
| So while this isn't criminal court, I think schools should
| apply the same principles here: innocent until proven guilty.
|
| And in my view, secret probability distributions isn't exactly
| good proof.
|
| Furthermore, to make it even worse: if someone is actually
| innocent it will be next to impossible to argue their innocence
| since everyone will trust the system and as far as I can see
| the system cannot be actually verified by a board without
| disclosing the weights. And that is assuming they would care to
| try to help a student prove their innocence in the first place.
|
| AFAIK this is a topic that has been explored to some depth in
| science fiction, but more importantly, we have case like the
| mail service in UK where multiple people lost their jobs
| because nobody could belive the system they had built or paid
| for could make such crazy mistakes.
|
| Back to students: For a less privileged student I guess it can
| easily ruin their studies. TBH as someone who struggled a lot
| in school I am not sure I'd finished if I had gotten my studies
| delayed by a year. Which would have been sad, given how well I
| have managed once I didn't have to juggle full time studies and
| part time work.
|
| Recently (last year and this) we (Norway) have had some debates
| that seemed to be way overdue regarding what can ve considered
| cheating (with some ridiculous examples of students getting
| punished for "self-plagiarism" for the most absurd things,
| including not specifying a failed previous exam written by
| themselves as a source).
|
| This could easily have gotten nowhere except for the fact that:
|
| 1. the person in charge of the board of appeals was caught for
| something else
|
| 2. Somebody took the effort to dig out the master thesis from
| two ministers, including the then sitting Minister of Education
| and proved that they had clearly been "cheating" according to
| the rules that they were judging students by.
| briandear wrote:
| Add prompt to ChatGPT
|
| Get answer.
|
| Rewrite in your own words.
|
| Feed back to chatGpT to check for errors.
|
| Done. Watermarking really doesn't solve any problem a clever
| person can't trivially circumvent.
| dartos wrote:
| Well spammers will probably skip the manual " Rewrite in your
| own words" step.
|
| So it's still useful is reducing spam.
| sim7c00 wrote:
| they will use google translate to translate to chinese and
| back and feed it back into chatgpt to fix grammar mistakes
| yielded :D (sorry. ur right its still useful, but spammers be
| spammers!)
| from-nibly wrote:
| Until it gets blocked as spam and then they will get a
| watermark stripping agent and bam. They are back in business.
| genrilz wrote:
| Or easier: Use Llama or some other open weights model with a
| non-watermarking sampler.
| ape4 wrote:
| Its easy to think of non-secure watermark methods to mark LLM
| generated text for lazy students or lazy copy writers.
| Occassional incorrect capitalization, etc
| compootr wrote:
| one I thought about was zero-width spaces. if you add a
| sequence of them, lazy copiers will paste them, and you'll be
| able to test text with almost no computational overhead!
| mightybyte wrote:
| I have a question for all the LLM and LLM-detection researchers
| out there. Wikipedia says that the Turing test "is a test of a
| machine's ability to exhibit intelligent behaviour equivalent to,
| or indistinguishable from, that of a human."
|
| Three things seem to be in conflict here:
|
| 1. This definition of intelligence...i.e. "behavior
| indistinguishable from a human"
|
| 2. The idea that LLMs are artificial intelligence
|
| 3. The idea that we can detect if something is generated by an
| LLM
|
| This feels to me like one of those trilemmas, where only two of
| the three can be true. Or, if we take #1 as an axiom, then it
| seems like the extent to which we can detect when things are
| generated by an LLM would imply that the LLM is not a "true"
| artificial intelligence. Can anyone deeply familiar with the
| space comment on my reasoning here? I'm particularly interested
| in thoughts from people actually working on LLM detection. Do you
| think that LLM-detection is technically feasible? If so, do you
| think that implies that they're not "true" AI (for whatever
| definition of "true" you think makes sense)?
| roywiggins wrote:
| The original Turing test started by imagining you're trying to
| work out which of two people is a man or woman based on their
| responses to questions alone.
|
| But supposing that you ran that test where one of the hidden
| people is a confederate that steganographically embeds a gender
| marker without it being obvious to anyone but yourself. You
| would be able to break the game, even if your confederate was
| perfectly mimicking the other gender.
|
| That is to say, embedding a secret recognition code into a
| stream of responses works on _humans_ , too, so it doesn't say
| anything about computer intelligence.
|
| And for that matter, passing the Turing test is supposed to be
| _sufficient_ for proving that something is intelligent, not
| _necessary_. You could imagine all sorts of deeply inhuman but
| intelligent systems that completely fail the Turing test. In
| Blade Runner, we aren 't supposed to conclude that failing the
| Voight-Kampff test makes the androids mindless automatons, even
| if that's what humans in the movie think.
| Joel_Mckay wrote:
| The idea that LLM can pass off as a human author for all
| reviewers is demonstrably false:
|
| https://www.youtube.com/watch?v=zB_OApdxcno
| visarga wrote:
| I think measuring intelligence in isolation is misguided, it
| should always be measured in context. Both the social context
| and the problem context. This removes a lot of mystique and
| unfortunately doesn't make for heated debates.
|
| In its essentialist form it's impossible to define, but in
| context it is nothing but skilled search for solutions. And
| because most problems are more than one can handle, it's a
| social process.
|
| Can you measure the value of a word in isolation from language?
| In the same way you can't meaningfully measure intelligence in
| a vacuum. You get a very narrow representation of it.
| warkdarrior wrote:
| > 3. The idea that we can detect if something is generated by
| an LLM
|
| The idea behind watermarking (the topic of the paper) is that
| the output of the LLM is specially marked in some way _at the
| time of generation, by the LLM service_. Afterwards, any text
| can be checked for the presence of the watermark. In this case,
| detect if something is generated by an LLM means checking for
| the presence of the watermark. This all works if the watermark
| is robust.
| andrewla wrote:
| Several commenters who have not read the abstract of the paper
| are mentioning LLM-detection tools. That is not what is being
| shown here.
|
| Rather they are saying how to modify the design of an LLM to
| deliberately inject watermarks into generated text such that it
| will be possible to detect that the text came from a particular
| LLM.
|
| While interesting in the abstract, I think I can definitively say
| that absolutely nobody wants this. People trying to pass off LLM
| content (whether students or content providers) as human-written
| are not interested in being detected. People who are using LLMs
| to get information for their own knowledge or amusement or as a
| cybernetic augmentation do not need this. LLM providers want to
| drive adoption, and if you can be exposed as passing off LLM slop
| as your own, then nobody will use their stuff.
| aaroninsf wrote:
| Why is this in nature.com?
|
| Serious question: has that become pay-to-publish a la Forbes etc
| when I wasn't paying attention?
___________________________________________________________________
(page generated 2024-11-04 23:01 UTC)