AIs Can Store Secret Messages in Their Text That Are Imperceptible to Humans

Invisible Ink Just what we needed: AI mastering its own imperceptible version of invisible ink. As VentureBeat reports, a recent — though yet-to-be-peer-reviewed — study conducted by AI alignment research group Redwood Research found that large language models (LLMs) are incredibly good at a type of steganography dubbed "encoded reasoning." Basically, the study says, LLMs can be trained to use secret messages to obscure their step-by-step thinking processes, a practice that, interestingly, could make their outputs more accurate — while also rendering them more deceptive. DaVAInci Code Per the study, LLMs are able to take specific advantage of chain-of-thought (CoT) reasoning, or a broadly used technique that effectively teaches AI models how to show their work in their answers. Machine learning is predictive, and for every given input, there are a number of outputs that an AI agent could feasibly drum up; in coaching a model to use CoT, the logic goes, tracing a given model's black-box reasoning gets easier, and thus so does model refinement. But according to this new research, it seems that LLMs are able to subvert CoT. As the researchers put it: "An LLM could encode intermediate steps of reasoning in their choices of a particular word or phrasing (when multiple ones would be equally good from the user's perspective), and then decode these intermediate steps later in the generation to arrive at a more accurate answer than if it tried to answer to the question without any intermediate step." In other words? An LLM can…

