https://www.nature.com/articles/s41586-024-08025-4 Skip to main content Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Advertisement Advertisement Nature * View all journals * Search * Log in * Explore content * About the journal * Publish with us * Sign up for alerts * RSS feed 1. nature 2. articles 3. article Scalable watermarking for identifying large language model outputs Download PDF Download PDF * Article * Open access * Published: 23 October 2024 Scalable watermarking for identifying large language model outputs * Sumanth Dathathri ORCID: orcid.org/0009-0007-4937-9903^1^ na1, * Abigail See ORCID: orcid.org/0000-0003-3137-6599^1^ na1, * Sumedh Ghaisas^1^ na1, * Po-Sen Huang^1^ na1, * Rob McAdam^2^ na1, * Johannes Welbl^1, * Vandana Bachani^1, * Alex Kaskasoli^1, * Robert Stanforth^1, * Tatiana Matejovicova^1, * Jamie Hayes^1, * Nidhi Vyas^2, * Majd Al Merey^2, * Jonah Brown-Cohen^1, * Rudy Bunel^1, * Borja Balle^1, * Taylan Cemgil^1, * Zahra Ahmed^1, * Kitty Stacpoole^1, * Ilia Shumailov^1, * Ciprian Baetu^2, * Sven Gowal^1, * Demis Hassabis ORCID: orcid.org/0000-0003-2812-9917^1 & * ... * Pushmeet Kohli ORCID: orcid.org/0000-0002-7466-7997^1 Show authors Nature volume 634, pages 818-823 (2024)Cite this article * 57k Accesses * 1 Citations * 917 Altmetric * Metrics details Subjects * Computer science * Information technology Abstract Large language models (LLMs) have enabled the generation of high-quality synthetic text, often indistinguishable from human-written content, at a scale that can markedly affect the nature of the information ecosystem^1,2,3. Watermarking can help identify synthetic text and limit accidental or deliberate misuse^4, but has not been adopted in production systems owing to stringent quality, detectability and computational efficiency requirements. Here we describe SynthID-Text, a production-ready text watermarking scheme that preserves text quality and enables high detection accuracy, with minimal latency overhead. SynthID-Text does not affect LLM training and modifies only the sampling procedure; watermark detection is computationally efficient, without using the underlying LLM. To enable watermarking at scale, we develop an algorithm integrating watermarking with speculative sampling, an efficiency technique frequently used in production systems^5. Evaluations across multiple LLMs empirically show that SynthID-Text provides improved detectability over comparable methods, and standard benchmarks and human side-by-side ratings indicate no change in LLM capabilities. To demonstrate the feasibility of watermarking in large-scale-production systems, we conducted a live experiment that assessed feedback from nearly 20 million Gemini^6 responses, again confirming the preservation of text quality. We hope that the availability of SynthID-Text^7 will facilitate further development of watermarking and responsible use of LLM systems. Similar content being viewed by others [42256_2024] A large-scale audit of dataset licensing and attribution in AI Article Open access 30 August 2024 [41586_2024] AI models collapse when trained on recursively generated data Article Open access 24 July 2024 [41467_2023] Augmenting interpretable models with large language models during training Article Open access 30 November 2023 Main Large language models (LLMs) are widely adopted tools for synthetic text generation, finding applications in language-based assistants, code generation, writing support and various other domains. As LLMs advance in quality, coherence, coverage and expertise, it can become difficult to distinguish synthetically generated text from human-written text^1,2,3. Given the widespread use of LLMs in education, software development and web content generation, identification and attribution of LLM text is critical to ensure safe and responsible use of the technology^8,9,10,11. Multiple strategies have emerged to address this problem. One is a retrieval-based approach, which involves keeping a growing record of all generated texts and checking against it for matches^12. This requires scale and coordination, and raises privacy concerns as it requires accessing and storing all LLM interactions. Another approach is post hoc detection, often using the statistical features of text or training a machine-learning-based classifier to distinguish human-written from artificial-intelligence-generated text^13,14,15. This approach can potentially provide broader detection without the need for record-keeping or any intervention at the text generation stage. However, post hoc detection systems can themselves be computationally expensive to run, and their practical usage is limited by their inconsistent performance^16. In particular, they are known to perform poorly on out-of-domain data and may have higher false-positive rates for certain groups, such as non-native speakers^ 17. Furthermore, such classifiers fundamentally rely on underlying differences between machine and human text, which may diminish as LLMs improve. This necessitates continuous maintenance of the classifier, including re-training and re-calibrating. A third approach is text watermarking--a way of marking the generated text so that it can subsequently be identified. Text watermarking can be done during the generative process (generative watermarking), by editing already generated text (edit-based watermarking) or by altering the LLM's training data (data-driven watermarking)^4. Edit-based watermarking frequently relies on applying rule-based transformations such as synonym substitution or inserting special Unicode characters^18, whereas data-driven watermarking involves training the LLM on specific trigger phrases^19. With data-driven watermarking, the model outputs are watermarked only when the model is prompted with specific trigger phrases; the primary objective is to identify unauthorized misuse of LLMs rather than attributing pieces of text to an LLM more broadly. Furthermore, both of these approaches can leave noticeable artefacts in the text^4. When watermarking an LLM deployed within a large-scale-production setting, it is important to carefully control any impact from watermarking on text quality and, by extension, user experience. It is also important that we are able to watermark with minimal computational costs. To meet both of these criteria, this work focuses on generative watermarking, which allows us to embed watermarks while carefully controlling the impact on quality and maintaining low computational cost. However, we note that no text detection method is foolproof, and many of the approaches discussed in this section are complementary and can be used in conjunction^4. Generating text with an LLM is often autoregressive: the LLM assigns probabilities to the elements (tokens) of the vocabulary and then selects the next token by sampling according to these probabilities conditional on text generated so far (Fig. 1, top). Generative watermarking (Fig. 1, bottom) works by carefully modifying the next-token sampling procedure to inject subtle, context-specific modifications into the generated text distribution. Such modifications introduce a statistical signature into the generated text; during the watermark detection phase, the signature can be measured to determine whether the text was indeed generated by the watermarked LLM. A key benefit of the approach is that the detection process does not require performing computationally expensive operations or even access to the underlying LLM (which is often proprietary). Fig. 1: Overview of LLM text generation and generative watermarking. figure 1 Top: LLM text generation typically involves generating text from left to right by repeatedly sampling from the LLM distribution. Bottom: a generative watermarking scheme typically consists of the three components, in the blue boxes: random seed generator, sampling algorithm and scoring function. These can be used to provide a text generation method and a watermark detection method. In the SynthID-Text generative watermarking scheme, we use the Tournament sampling algorithm. Full size image In this work, we propose a generative watermarking scheme, SynthID-Text, which builds on previous generative watermarking components, but uses a novel sampling algorithm, Tournament sampling. SynthID-Text can be configured to be non-distortionary (preserving text quality) or distortionary (improving watermark detectability at the cost of text quality). We show that in both settings, SynthID-Text provides improved detection rates, compared with the best existing approaches in each category. We show empirically that non-distortionary SynthID-Text preserves text quality, including through a large-scale user feedback assessment over nearly 20 million responses from live Gemini interactions. Consequently, SynthID-Text has been used to watermark Gemini and Gemini Advanced^20. This serves as practical proof that generative text watermarking can be successfully implemented and scaled to real-world production systems, serving millions of users and playing an integral role in the identification and management of artificial-intelligence-generated content. Furthermore, we provide an algorithm to combine generative watermarking with speculative sampling^5--a frequently used technique to increase LLM text generation speed--allowing for the integration of SynthID-Text into large-scale production systems with negligible additional computational overhead. Watermarking with SynthID-Text LLMs generate text based on preceding context (for example, a response to a provided prompt). More precisely, given a sequence of input text x[= 2, top-p sampling for all \(p\in \left(0,1\right]\), and all temperatures t > 0. SynthID-Text is applied after any such modifications have been made, so for the purposes of this paper we define the LLM distribution p [LM]([?]|x[= 1) and of the key k. This random seed generator is the same as that used by refs. ^22^,^23. In this work, we also assume the watermarking key k and random seed r[t] exist in the same space of n[sec]-bit integers, where n[sec] is the security parameter. Definition 2 (random seed space, random seed distribution) Given a security parameter n[sec], the random seed space \({\mathcal {R}}={\{0,1\}}^{{n}_{\text{sec}}}\) is the space of all n[sec]-bit integers. The random seed distribution is the uniform distribution over all such integers \(\,\text{Unif}\,({\mathcal{R}})\). We also assume that the family of functions \({\{h(\cdot ,\ldots ,\ cdot ,k)\}}_{k\in {\mathcal{R}}}\) is a pseudorandom function family, meaning that (1) h(x[t-H], ..., x[t-1], k) is efficiently computable for any x[t-H], ..., x[t-1] and k, and (2) the distribution of \({\{h(\ cdot ,\ldots ,\cdot ,k)\}}_{{k \sim }\text{Unif}({\mathcal{R}})}\) is computationally indistinguishable from a function sampled uniformly randomly from the set of all functions from V^H to \({\{0,1\}}^{{n}_ {\text{sec}}}\). g-values As illustrated in Fig. 2, Tournament sampling requires g-values to decide which tokens win each match in the tournament. Intuitively, we want a function that takes a token x [?] V, a random seed \(r\in {\ mathcal{R}}\) and the layer number [?] {1, ..., m}, and outputs a g -value g[](x, r) that is a pseudorandom sample from some probability distribution f[g] (the g-value distribution). For example, in Fig. 2, the g-value distribution is Bernoulli(0.5). Given the random seed r, g[](x, r) produces pseudorandom g-values of 0 or 1 for each token x in the vocabulary, for each layer = 1, 2, 3. In this paper, we primarily use the Bernoulli(0.5) g-value distribution, although we also explore Uniform[0, 1]. In general, any g-value distribution can be chosen, as a hyperparameter of the Tournament sampling method. Definition 3 (g-value distribution) The g-value distribution is a probability distribution of any real-valued random variable. We write F[g] to denote the cumulative distribution function, and f[g] to denote the probability density function (if continuous) or probability mass function (if discrete). Next, we need a way to produce a hash \(h(x,{\ell },r)\in {\mathcal {R}}\) of a token x [?] V, an integer [?] {1, ..., m} and a random seed \ (r\in {\mathcal{R}}\). Let's assume we have a pseudorandom function family \({\{h(\cdot ,\cdot ,r)\}}_{r\in {\mathcal{R}}}\) similar to the one described in the 'Random seed generator' section, such that the distribution of \({\{h(\cdot ,\cdot ,r)\}}_{{r \sim }{\rm{Unif}} ({\mathcal{R}})}\) is computationally indistinguishable from a function sampled uniformly randomly from the set of all functions from V x [m] to \({\{0,1\}}^{{n}_{\sec }}\). Definition 4 (g-value) Given a g-value distribution with cumulative density function. F[g], a random seed \(r\in {\mathcal{R}}\), and integer [?] 1, ..., m, the layer- g-value of a token x [?] V is given by: $${g}_{{\ell }}(x,r)\,:={F}_{g}^{-1}\,\left(\frac{h(x,{\ell },r)}{{2} ^{{n}_{\text{sec}}}}\right),$$ where \({F}_{g}^{-1}\) is the generalized inverse distribution function of F[g], and h is a hash function as described above. Intuitively, Definition 4 says that we take a hash h(x, , r) of x, and r, which gives us a uniformly distributed n-bit integer, and divide it by 2^n to get a number in [0, 1]. For large n, this converges to a uniformly distributed number in [0, 1]. We then perform inverse transform sampling to turn this number into a sample from the g-value distribution given by F[g]. Tournament sampling algorithm Definition 5 (watermarking sampling algorithm) In a watermarking scheme, a sampling algorithm \({\mathcal{S}}:\Delta V\times {\mathcal{R}}\to V\) is an algorithm that takes as input a probability distribution p [?] DV and a random seed \(r\in {\mathcal {R}}\) and returns a token \({\mathcal{S}}(p,r)\in V\). If \({\ mathcal{S}}\) always returns the same token given the same p and r, it is deterministic. Otherwise, \({\mathcal{S}}\) is probabilistic. We propose a new probabilistic sampling algorithm called Tournament sampling. We present the simplest, single-layer version of Tournament sampling in Algorithm 1. Instead of sampling directly from p[LM]([?]|x [= 2, g function with g-value distribution f[g] (see Definition 4). 1: Draw Y = [y[1], y[2], ..., y[N]] containing N independent samples from p[LM]([?]|x[= 2, g function with g-value distribution f[g] (see Definition 4), number of layers m >= 1. 1: Draw N^m independent samples \({y}_{0}^{0},{y}_{1}^{0},\ldots ,{y} _{{N}^{m}-1}^{0} \sim {p}_{{\rm{LM}}}(\cdot | {x}_{ < t})\) (may contain repeats). 2: for 1 <= <= m do 3: for 0 <= j <= N^m- - 1 do 4: \(Y:=\,[\,{y}_{Nj}^{{\ell }-1},{y}_{Nj+1}^{{\ell }-1},\ldots , {y}_{Nj+N-1}^{{\ell }-1}]\) (may contain repeats). 5: \({Y}^{* }:=\,[\,y\in Y:{g}_{{\ell }}(\,y,{r}_{t})=\mathop{\max }\limits_{{y}^{{\prime} }\in Y}{g}_{{\ell }}(\,{y}^{{\prime} },{r}_ {t})]\) (may contain repeats). 6: Sample \({y}_{j}^{{\ell }} \sim \,{\rm{Unif}}\,({Y}^{* })\). 7: end for 8: end for 9: return \({x}_{t}:={y}_{0}^{m}\) Repeated context masking To generate a full response, we could simply apply Algorithm 2 on every decoding step, using the sliding-window random seed generator ('Random seed generator' section) to generate the random seed r[t] for each step. However, it is possible that the same window of context, and thus the same random seed might occur more than once (particularly if the sliding-window size H is small or the response is long). It has been shown that in this scenario, the watermark can introduce a repeated bias that affects the quality of the text, for example, causing repeating loops^24,25. One way to avoid this problem is to apply repeated context masking^27, which prevents the watermark from being applied on step t if the context window (x[t-H], ..., x[t -1]) has been used to watermark previously. We present the method in Algorithm 3, which we call K-sequence repeated context masking. The integer parameter K >= 1 controls for how long context windows are held in the history. In the simplest case of K = 1, we only hold the context history for the duration of generating a single response. For larger integers K > 1, we check against a history of contexts used in the last K responses. In the extreme case, we could set K = [?] and retain the context history indefinitely. In Supplementary Information section G.2, we show that applying K-sequence repeated context masking achieves K-sequence non-distortion, an important property for quality preservation. In Supplementary Information section G.3, we discuss the trade-offs of smaller and larger K. For most of our experiments we use K = 1. Algorithm 3 Generating watermarked responses with sliding-window random seed generation and K-sequence repeated context masking. Require: LLM p[LM]([?]|[?]), context window size H, pseudorandom hash function h, watermarking key \(k\in {\mathcal{R}}\), sampling algorithm \({\mathcal{S}}:\Delta V\times {\mathcal{R}}\to V\), integer K >= 1, stream of prompts (x^1, x^2, ...). 1: for i >= 1 do 2: \({C}_{i}:=\varnothing \) 3: t [?] n where n is the length of \({{\bf{x}}}^{i}={{\bf{x}}}_{1}^ {i},\ldots ,{{\bf{x}}}_{n}^{i}\) 4: while \({{\bf{x}}}_{t}^{i}\ne {\mathtt{EOS}}\) do 5: t [?] t + 1 6: if \(({{\bf{x}}}_{t-H}^{i},\ldots ,{{\bf{x}}}_{t-1}^{i})\in {C}_ {i}\cup {C}_{i-1}\cup \cdots \cup {C}_{i-K+1}\) then 7: Sample \({{\bf{x}}}_{t}^{i} \sim {p}_{{\rm{LM}}}(\cdot | {{\bf {x}}}_{ < t}^{i})\) 8: else 9: \({r}_{t}:=h({{\bf{x}}}_{t-H}^{i},\ldots ,{{\bf{x}}}_{t-1}^ {i},k)\) 10: Sample \({{\bf{x}}}_{t}^{i}:={\mathcal{S}}({p}_{{\rm{LM}}}(\ cdot | {{\bf{x}}}_{ < t}^{i}),{r}_{t})\) 11: \({C}_{i}:={C}_{i}\cup \{({{\bf{x}}}_{t-H}^{i},\ldots ,{{\bf {x}}}_{t-1}^{i})\}\) 12: end if 13: end while 14: return Response \({{\bf{y}}}^{i}:={{\bf{x}}}_{n+1:t}^{i}\) 15: end for Scoring functions A scoring function takes a piece of text x[1], ..., x[T] along with the random seeds r[1], ..., r[T] and computes a score, which can then be compared with a threshold to classify the text as watermarked or unwatermarked. Here the random seeds r[t] = f[r](x[