Google DeepMind announced the open-sourcing of its SynthID text watermarking tool on Wednesday, October 23, 2024, via a post on X (formerly Twitter). This AI-generated content authentication system embeds imperceptible watermarks into images, video, and text, enabling verification of content origin (human or machine). This free tool, available to developers and businesses, empowers them to identify AI-generated content.
SynthID initially debuted in 2023 as a watermarking tool for AI-generated images, audio, and video. Initially integrated with Imagen, Google later announced its incorporation into the Gemini chatbot at I/O 2024. Now, SynthID extends its capabilities to text.
The system works by encoding tokens—fundamental data chunks like characters, words, or parts of phrases—with imperceptible watermarks during text generation. As detailed in a DeepMind blog post from May, this is achieved by “introducing additional information in the token distribution at the point of generation by modulating the likelihood of tokens being generated.”
SynthID detects AI-generated text by comparing the model’s word choices and “adjusted probability scores” against expected patterns for watermarked and unwatermarked text. According to a study published in Nature on the same day, this process doesn’t affect the response’s accuracy, quality, or speed, and is difficult to bypass. Unlike easily removable metadata, SynthID’s watermark reportedly persists even after content cropping, editing, or modification.
However, the system isn’t foolproof. While tamper-resistant, SynthID watermarks can be removed through language translation apps or heavy rewriting. Its effectiveness diminishes with short text passages and factual statements where both humans and AI might provide identical answers.
Soheil Feizi, an associate professor at the University of Maryland, highlighted the challenges of reliable and imperceptible text watermarking, especially with near-deterministic large language model outputs like factual questions or code generation. He emphasized the importance of open-sourcing, enabling community testing and robustness evaluation in diverse settings to understand the technique’s limitations, as reported by MIT Technology Review.
If you’re interested in exploring SynthID, it’s available for download on Hugging Face as part of Google’s updated Responsible GenAI Toolkit.