AIs Can Store Secret Messages in Their Text That Are Imperceptible to Humans

Invisible Ink Just what we needed: AI mastering its own imperceptible version of invisible ink. As VentureBeat reports, a recent — though yet-to-be-peer-reviewed — study conducted by AI alignment research group Redwood Research found that large language models (LLMs) are incredibly good at a type of steganography dubbed “encoded reasoning.” Basically, the study says, LLMs can be trained to use secret messages to obscure their step-by-step thinking processes, a practice that, interestingly, could make their outputs more accurate — while also rendering them more deceptive. DaVAInci Code Per the study, LLMs are able to take specific advantage of chain-of-thought (CoT) reasoning, or a broadly used technique that effectively teaches AI models how to show their work in their answers. Machine learning is predictive, and for every given input, there are a number of outputs that an AI agent could feasibly drum up; in coaching a model to use CoT, the logic goes, tracing a given model’s black-box reasoning gets easier, and thus so does model refinement. But according to this new research, it seems that LLMs are able to subvert CoT. As the researchers put it: “An LLM could encode intermediate steps of reasoning in their choices of a particular word or phrasing (when multiple ones would be equally good from the user’s perspective), and then decode these intermediate steps later in the generation to arrive at a more accurate answer than if it tried to answer to the question without any intermediate step.” In other words? An LLM can…AIs Can Store Secret Messages in Their Text That Are Imperceptible to Humans

Leave a Reply

Your email address will not be published. Required fields are marked *