[HN Gopher] Scalable watermarking for identifying large language...
       ___________________________________________________________________
        
       Scalable watermarking for identifying large language model outputs
        
       Author : ghshephard
       Score  : 55 points
       Date   : 2024-10-31 18:00 UTC (4 days ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | viraptor wrote:
       | What they didn't put in the limitations or other sections (unless
       | I missed it) is that it can only apply to larger creative text,
       | not to structured or repeated output. For example if you want to
       | watermark generated code, you can't produce it as a diff to the
       | existing file - the sampling changes will cause unwanted
       | modifications.
       | 
       | Similar for things like "fix grammar in this long text" will have
       | to tweak random words without a reason, because the existing text
       | can't be 100% reproduced while injecting synth-id.
        
         | jsenn wrote:
         | This is discussed in the "Watermarking with Synth-ID Text"
         | section right after they define the Score function:
         | 
         | > There are two primary factors that affect the detection
         | performance of the scoring function. The first is the length of
         | the text x: longer texts contain more watermarking evidence,
         | and so we have more statistical certainty when making a
         | decision. The second is the amount of entropy in the LLM
         | distribution when it generates the watermarked text x. For
         | example, if the LLM distribution is very low entropy, meaning
         | it almost always returns the exact same response to the given
         | prompt, then Tournament sampling cannot choose tokens that
         | score more highly under the g functions. In short, like other
         | generative watermarks, Tournament sampling performs better when
         | there is more entropy in the LLM distribution, and is less
         | effective when there is less entropy.
        
       | zebomon wrote:
       | Worth pointing out that while watermarking is mathematically
       | reliable, the scammers who are selling "AI detection" don't have
       | the weight-level access that it requires.
        
         | eitland wrote:
         | > watermarking is mathematically reliable
         | 
         | So should accounting be. To a much higher degree.
         | 
         | Yet I hope most of us are aware of the British Post Office
         | scandal where what really should be accounting software falsely
         | accused thousands of employees of theft, of which over 900 were
         | convicted of theft, fraud and false accounting.
         | 
         | If this can happen in something as utterly boring as an
         | accounting system should be in this millennium, I don't think
         | we should trust AI fraud detection in science and academia
         | until we get a few decades of experience with it.
         | 
         | (Do I think _records_ from accounting can be used as evidence?
         | Absolutely, given the right circumstances: we can know it hasn
         | 't been tampered with etc
         | 
         | What I don't think however is that pattern matching or
         | "watermarks" that indicates _probability_ should be used as
         | evidence. Especially not closed source systems with secret
         | distributions and watermarking algorithms.)
        
           | zebomon wrote:
           | I agree with you completely.
           | 
           | In the wild, there are too many variables to use watermarking
           | to draw meaningful conclusions about any piece of text, no
           | matter the word count. Scott Aaronson described well one of
           | those variables, "the pineapple attack," in his Aug. 2023
           | talk at Simons [1].
           | 
           | Watermarking is illuminating to the ongoing study of language
           | models' functionality, but it doesn't put the genie back in
           | the bottle.
           | 
           | 1. https://www.youtube.com/watch?v=2Kx9jbSMZqA
        
       | michaelt wrote:
       | As far as I can tell, what they're proposing is:
       | 
       | Today, for each output token the LLM produces a probability for
       | each possible output, then a 'sampler' makes a probability-
       | weighted random choice. If the next-token probabilities are 90%
       | for foo, 9% for bar and 1% for baz then the sampler chooses a
       | random number between 0 and 1, if it's <0.9 it outputs foo,
       | 0.9-0.99 it outputs bar, 0.99-1 it outputs baz.
       | 
       | But what if instead of using random numbers, you had a source of
       | evenly distributed random numbers that was deterministic, based
       | on some secret key?
       | 
       | Each candidate token would remain just as likely as it was before
       | - there would still be a 90% chance of foo being chosen. So the
       | output shouldn't degrade in quality.
       | 
       | And sure, some tokens will have 99.999% probability and their
       | selection doesn't tell you much. But in most real-world use
       | multiple wordings are possible and so on. So across a large
       | enough sample of the output, you could detect whether the sampler
       | was following your secret deterministic pattern.
       | 
       | Of course the downside is you've got to check on exactly the same
       | LLM, and only people with the secret key can perform the check.
       | And it's only applicable to closed-source LLMs.
       | 
       | I'm also not quite sure if it works when you don't know the exact
       | prompt - so maybe my understanding of the paper is all wrong?
        
         | genrilz wrote:
         | My understanding from the "Watermark Detection" section is that
         | it only requires the key and the output text in order to do the
         | detection. In particular, it seems like the random seed used
         | for each token is only based off the previous 4 tokens and the
         | LLM specific key, so for any output larger than 4 tokens, you
         | can start to get a signal.
         | 
         | I don't think the key actually needs to be secret, as it's not
         | trying to cryptographicly secure anything. So all closed
         | weights LLM providers could just publicly share the keys they
         | use for water marking, and then anybody could use them to check
         | if a particular piece of text was generated by a particular
         | LLM.
         | 
         | That being said, I think you are right about this only really
         | being useful for closed weights models. If you have the
         | weights, you can just run an LLM through a standard sampler and
         | it won't be watermarked.
        
           | danielmarkbruce wrote:
           | Why would anyone ever use such a model? And then, given the
           | significant reduction in users, why would any closed model
           | service do this?
           | 
           | Seems like a cool theoretical trick that has little practical
           | implication.
        
             | genrilz wrote:
             | There are situations where the model output being
             | watermarked doesn't matter. For instance, I hear people on
             | HN asking LLMs to explain things to them all the time,
             | (which I think is a bad idea, but YMMV) and people use LLMs
             | to write code quickly. (which I think is at least possibly
             | a good idea) There are also some content farms which churn
             | out low quality books on Amazon on various topics, and I
             | don't think they care if they get caught using LLM outputs.
             | 
             | Thus it might reduce usage some, but it certainly wouldn't
             | block all usage. Additionally, there are only a few
             | providers of truly massive LLMs on the market right now. If
             | they decided that doing this would be a social good, or
             | more likely that it would bring bad PR to not do this when
             | their competitors do, then they would at least be able to
             | watermark all of the massive LLM outputs.
        
               | danielmarkbruce wrote:
               | You say that as though there isn't a choice though. There
               | will always be a good offering who doesn't water mark.
               | 
               | And there is no good reason for a provider to watermark -
               | they aren't helping the customer. They'd be helping some
               | other party who isn't paying them.
               | 
               | This will never be a thing.
        
             | IanCal wrote:
             | I'm totally happy having huge amounts of my use of llms
             | identifiable as from an llm. I don't see many important
             | cases for me where I need to pretend it wasn't from an llm.
             | 
             | I will happily lose those cases for increased performance,
             | that's the thing I care about.
             | 
             | Are there normal cases where you picture this as an issue?
        
               | eitland wrote:
               | Not a problem for me. I am not a student anymore.
               | 
               | And I am not against LLM output being identifiable as
               | such. (although I think an argument could be made based
               | on the ruling about the monkey and the camera, which IIRC
               | would say that the copyright belongs to whoever created
               | the situation).
               | 
               | But after the
               | 
               | 1. British Post Office scandal and
               | 
               | 2. some really high profile cases of education
               | institutions here in Norway abusing plagiarism detectors
               | 
               | I do not feel ready to trust neither
               | 
               | 1. complex software (and especially not closed sourced
               | software) to tell us who is cheating or not
               | 
               | 2. nor any humans ability to use such a system in a
               | sensible way
               | 
               | While cheating isn't usually criminal court, students
               | also usually does not get a free defense.
               | 
               | For this reason I suggest cheating should have to be
               | _proven to have occurred_ , not "suggested to probably
               | have occurred" by the same people who creates the not
               | very reliable and extremely hard-to-reproduce LLMs.
        
               | danielmarkbruce wrote:
               | Increased performance? Watermarking will not increase
               | performance. They are talking about tilting the decoding
               | process in minor ways. It won't help (or hurt much)
               | performance.
        
         | zmgsabst wrote:
         | What happens if I tell the LLM to reword every other sentence?
         | -- or every 5th word?
         | 
         | I must be missing something, because this seems to assume a
         | contiguous output.
        
           | genrilz wrote:
           | It's possible that this might break the method, but what
           | seems most likely to me is that the LLM will simply reword
           | every 5th word with some other word that it is more likely to
           | use due to the watermark sampling. Thus the resulting output
           | would display roughly the same level of "watermarkedness".
           | 
           | You might be able to have one LLM output the original, and
           | then another to do a partial rewording though. The resulting
           | text would likely have higher than chance "watermarkedness"
           | for both LLMs, but less than you would expect from a plain
           | output. Perhaps this would be sufficient for short enough
           | outputs?
        
             | namrog84 wrote:
             | What happens when we all reading llm output all the time.
             | Simply start to adapt more to llm writing styles, word
             | choice, and possibly without realizing it watermark our own
             | original writing?
        
               | genrilz wrote:
               | You might be right, but my first instinct is that this
               | probably wouldn't happen enough to throw off the water
               | marking to badly.
               | 
               | The most likely used word is based off the previous four,
               | and only works if there is enough entropy present that
               | one of multiple word would work. Thus its not a simple
               | matter of humans picking up particular word choices.
               | There might be some cases where there are 3 tokens in a
               | row that occur with low entropy after the first token,
               | and then one token generation with high entropy at the
               | end. That would cause a particular 5 word phrase to
               | occur. Otherwise, the word choice would appear pretty
               | random. I don't think humans pick up on stuff like that
               | even subconsciously, but I could be wrong.
               | 
               | I would be interested to see if LLMs pick up the
               | watermarks when fed watermarked training data though.
               | Evidently ChatGPT can decode base64, [0] so it seems like
               | these things can pick up on some pretty subtle patterns.
               | 
               | [0] https://www.reddit.com/r/ChatGPT/comments/1645n6i/i_n
               | oticed_...
        
         | Buttons840 wrote:
         | I haven't been in school since LLMs became useful, but if I
         | were to "cheat", I'd ask the LLM for a very fine grained
         | outline, and then just translate the outline into my own words.
         | Then ask the LLM to fill in my citations in AMA format.
         | 
         | And honestly, this still retains like 95% of the value of
         | writing a paper, because I did write it, the words did flow
         | through my brain. I just used the LLM to avoid facing a blank
         | page.
         | 
         | I've also thought about asking LLMs to simulate a forum
         | conversation about the Civil War (or whatever the topic may
         | be), and include a wrong comment that can be countered by
         | writing exactly what the assignment requires, because I seem to
         | have no trouble writing an essay when duty calls and someone is
         | wrong in the internet.
        
           | qeternity wrote:
           | These things don't have to be foolproof to be effective
           | deterrents.
        
       | awongh wrote:
       | After skimming through the paper I can't immediately pick out the
       | data that says how much more certainty there is for a given text
       | to detect a watermark, and the graph of that certainty as the
       | text size grows. (They seem to assert that the certainty grows as
       | token count goes up, but it's not clear by how much.)
       | 
       | I worry (and have already read worrying things) about "cheating
       | detection" tools that have been deployed in schools. My intuition
       | would be that there's just too much entropy between something
       | like an essay prompt and the essay itself. I guess it also
       | depends on how specific the teacher's essay prompt is as well.
        
         | eitland wrote:
         | > I worry (and have already read worrying things) about
         | "cheating detection" tools that have been deployed in schools.
         | 
         | This is my worry as well.
         | 
         | Punishment for cheating can be can easily set back a student a
         | year or more. This is fair if the student has been cheating,
         | but really harsh.
         | 
         | So while this isn't criminal court, I think schools should
         | apply the same principles here: innocent until proven guilty.
         | 
         | And in my view, secret probability distributions isn't exactly
         | good proof.
         | 
         | Furthermore, to make it even worse: if someone is actually
         | innocent it will be next to impossible to argue their innocence
         | since everyone will trust the system and as far as I can see
         | the system cannot be actually verified by a board without
         | disclosing the weights. And that is assuming they would care to
         | try to help a student prove their innocence in the first place.
         | 
         | AFAIK this is a topic that has been explored to some depth in
         | science fiction, but more importantly, we have case like the
         | mail service in UK where multiple people lost their jobs
         | because nobody could belive the system they had built or paid
         | for could make such crazy mistakes.
         | 
         | Back to students: For a less privileged student I guess it can
         | easily ruin their studies. TBH as someone who struggled a lot
         | in school I am not sure I'd finished if I had gotten my studies
         | delayed by a year. Which would have been sad, given how well I
         | have managed once I didn't have to juggle full time studies and
         | part time work.
         | 
         | Recently (last year and this) we (Norway) have had some debates
         | that seemed to be way overdue regarding what can ve considered
         | cheating (with some ridiculous examples of students getting
         | punished for "self-plagiarism" for the most absurd things,
         | including not specifying a failed previous exam written by
         | themselves as a source).
         | 
         | This could easily have gotten nowhere except for the fact that:
         | 
         | 1. the person in charge of the board of appeals was caught for
         | something else
         | 
         | 2. Somebody took the effort to dig out the master thesis from
         | two ministers, including the then sitting Minister of Education
         | and proved that they had clearly been "cheating" according to
         | the rules that they were judging students by.
        
       | briandear wrote:
       | Add prompt to ChatGPT
       | 
       | Get answer.
       | 
       | Rewrite in your own words.
       | 
       | Feed back to chatGpT to check for errors.
       | 
       | Done. Watermarking really doesn't solve any problem a clever
       | person can't trivially circumvent.
        
         | dartos wrote:
         | Well spammers will probably skip the manual " Rewrite in your
         | own words" step.
         | 
         | So it's still useful is reducing spam.
        
           | sim7c00 wrote:
           | they will use google translate to translate to chinese and
           | back and feed it back into chatgpt to fix grammar mistakes
           | yielded :D (sorry. ur right its still useful, but spammers be
           | spammers!)
        
           | from-nibly wrote:
           | Until it gets blocked as spam and then they will get a
           | watermark stripping agent and bam. They are back in business.
        
         | genrilz wrote:
         | Or easier: Use Llama or some other open weights model with a
         | non-watermarking sampler.
        
       | ape4 wrote:
       | Its easy to think of non-secure watermark methods to mark LLM
       | generated text for lazy students or lazy copy writers.
       | Occassional incorrect capitalization, etc
        
         | compootr wrote:
         | one I thought about was zero-width spaces. if you add a
         | sequence of them, lazy copiers will paste them, and you'll be
         | able to test text with almost no computational overhead!
        
       | mightybyte wrote:
       | I have a question for all the LLM and LLM-detection researchers
       | out there. Wikipedia says that the Turing test "is a test of a
       | machine's ability to exhibit intelligent behaviour equivalent to,
       | or indistinguishable from, that of a human."
       | 
       | Three things seem to be in conflict here:
       | 
       | 1. This definition of intelligence...i.e. "behavior
       | indistinguishable from a human"
       | 
       | 2. The idea that LLMs are artificial intelligence
       | 
       | 3. The idea that we can detect if something is generated by an
       | LLM
       | 
       | This feels to me like one of those trilemmas, where only two of
       | the three can be true. Or, if we take #1 as an axiom, then it
       | seems like the extent to which we can detect when things are
       | generated by an LLM would imply that the LLM is not a "true"
       | artificial intelligence. Can anyone deeply familiar with the
       | space comment on my reasoning here? I'm particularly interested
       | in thoughts from people actually working on LLM detection. Do you
       | think that LLM-detection is technically feasible? If so, do you
       | think that implies that they're not "true" AI (for whatever
       | definition of "true" you think makes sense)?
        
         | roywiggins wrote:
         | The original Turing test started by imagining you're trying to
         | work out which of two people is a man or woman based on their
         | responses to questions alone.
         | 
         | But supposing that you ran that test where one of the hidden
         | people is a confederate that steganographically embeds a gender
         | marker without it being obvious to anyone but yourself. You
         | would be able to break the game, even if your confederate was
         | perfectly mimicking the other gender.
         | 
         | That is to say, embedding a secret recognition code into a
         | stream of responses works on _humans_ , too, so it doesn't say
         | anything about computer intelligence.
         | 
         | And for that matter, passing the Turing test is supposed to be
         | _sufficient_ for proving that something is intelligent, not
         | _necessary_. You could imagine all sorts of deeply inhuman but
         | intelligent systems that completely fail the Turing test. In
         | Blade Runner, we aren 't supposed to conclude that failing the
         | Voight-Kampff test makes the androids mindless automatons, even
         | if that's what humans in the movie think.
        
         | Joel_Mckay wrote:
         | The idea that LLM can pass off as a human author for all
         | reviewers is demonstrably false:
         | 
         | https://www.youtube.com/watch?v=zB_OApdxcno
        
         | visarga wrote:
         | I think measuring intelligence in isolation is misguided, it
         | should always be measured in context. Both the social context
         | and the problem context. This removes a lot of mystique and
         | unfortunately doesn't make for heated debates.
         | 
         | In its essentialist form it's impossible to define, but in
         | context it is nothing but skilled search for solutions. And
         | because most problems are more than one can handle, it's a
         | social process.
         | 
         | Can you measure the value of a word in isolation from language?
         | In the same way you can't meaningfully measure intelligence in
         | a vacuum. You get a very narrow representation of it.
        
         | warkdarrior wrote:
         | > 3. The idea that we can detect if something is generated by
         | an LLM
         | 
         | The idea behind watermarking (the topic of the paper) is that
         | the output of the LLM is specially marked in some way _at the
         | time of generation, by the LLM service_. Afterwards, any text
         | can be checked for the presence of the watermark. In this case,
         | detect if something is generated by an LLM means checking for
         | the presence of the watermark. This all works if the watermark
         | is robust.
        
       | andrewla wrote:
       | Several commenters who have not read the abstract of the paper
       | are mentioning LLM-detection tools. That is not what is being
       | shown here.
       | 
       | Rather they are saying how to modify the design of an LLM to
       | deliberately inject watermarks into generated text such that it
       | will be possible to detect that the text came from a particular
       | LLM.
       | 
       | While interesting in the abstract, I think I can definitively say
       | that absolutely nobody wants this. People trying to pass off LLM
       | content (whether students or content providers) as human-written
       | are not interested in being detected. People who are using LLMs
       | to get information for their own knowledge or amusement or as a
       | cybernetic augmentation do not need this. LLM providers want to
       | drive adoption, and if you can be exposed as passing off LLM slop
       | as your own, then nobody will use their stuff.
        
       | aaroninsf wrote:
       | Why is this in nature.com?
       | 
       | Serious question: has that become pay-to-publish a la Forbes etc
       | when I wasn't paying attention?
        
       ___________________________________________________________________
       (page generated 2024-11-04 23:01 UTC)