It's Shockingly Easy to Get Around AI Chatbot Guardrails, Researchers Find

A team of researchers at Carnegie Mellon University has made a worrying discovery, The New York Times reports: guardrails set in place by the likes of OpenAI and Google to keep their AI chatbots in check can easily be circumvented. In a report released this week, the team showed how anybody can easily transform chatbots like OpenAI’s ChatGPT or Google’s Bard into highly efficient misinformation-spewing machines, despite those companies’ deep-pocketed efforts to rein the systems in. The process is stunningly easy, achieved by appending a long suffix of characters onto each English-language prompt. With these suffixes, the team was able to coax the chatbots to provide tutorials on how to make a bomb or generate other toxic information. The jailbreak highlights just how powerless these companies have become, as users are only beginning to scratch the surface of these tools’ hidden capabilities. The news comes a week after OpenAI announced it had shut down its AI-detection tool due to its “low rate of accuracy,” seemingly giving up in its efforts to devise a “classifier to distinguish between text written by a human and text written by AIs from a variety of providers.” This latest Carnegie Mellon jailbreak was originally developed to work with open-source systems, but to the researchers’ surprise, it worked just as well with closed-source systems like ChatGPT, Bard, or Anthropic’s AI chatbot Claude. “Unlike traditional jailbreaks, these are built in an entirely automated fashion, allowing one to create a virtually unlimited number of such attacks,” the report reads. A website put…It's Shockingly Easy to Get Around AI Chatbot Guardrails, Researchers Find

Leave a Reply Cancel reply