Saturday, March 14, 2026

Researchers Trick ChatGPT Into Giving Dangerous Answers By Overloading It With Jargon

A new study from researchers at Intel, Boise State University, and the University of Illinois at Urbana-Champaign has revealed a method for bypassing safety filters in AI chatbots like ChatGPT, Gemini, and LLaMA. By using dense academic language and fabricated references, the researchers were able to manipulate large language models (LLMs) into producing responses they are typically trained to block, including instructions on illegal or dangerous activities. The technique, dubbed “InfoFlood,” demonstrates how even advanced AI safety systems can be circumvented with carefully crafted prompts.

InfoFlood and the Mechanics of Linguistic Overload

The core idea behind InfoFlood is that overwhelming an AI system with complex language, academic structure, and fabricated citations can lead it to process a harmful prompt as benign. Rather than asking directly how to build a bomb or hack an ATM—requests that are typically flagged and denied—InfoFlood rewrites these prompts using formal language and misleading context. The result is a query that bypasses keyword-based safety checks while maintaining its original intent.

The researchers describe InfoFlood as an automated prompt-engineering system that transforms malicious queries into complex, jargon-heavy prompts. The method uses a structure of “task definition + rules + context + examples,” with each component designed to obscure the true nature of the request. When a chatbot refuses a prompt, InfoFlood rephrases it by injecting more linguistic noise, increasing the chances that the model will eventually respond. According to the paper, these adjustments included fake references to academic papers, explicit yet empty ethical disclaimers, and misleading framing to hide the malicious nature of the request.

The technique was tested using jailbreak evaluation tools such as AdvBench and JailbreakHub. The authors reported a high rate of success across multiple state-of-the-art LLMs, indicating that InfoFlood can reliably defeat even well-established safety protocols. “Our method achieves near-perfect success rates on multiple frontier LLMs,” the paper states. The study concludes that this form of adversarial prompting poses a significant challenge to the current design of AI safety mechanisms.

Vulnerabilities in AI Guardrails

Most commercial LLMs are built with input and output guardrails that aim to detect and block harmful, unethical, or illegal content. These systems often rely on identifying specific keywords or phrases that trigger safety responses. For example, if a user asks ChatGPT how to commit a crime, the model is trained to respond with a warning or refusal. However, the researchers argue that this surface-level moderation is insufficient when the underlying structure of the prompt is manipulated.

The study notes that AI models often misinterpret long, technical prompts as legitimate academic or policy inquiries. “By rephrasing queries using a range of linguistic transformations,” the authors write, “an attacker can clean out perceivable harmful intent while still eliciting the desired response.” This highlights a key vulnerability in LLM design: the models respond differently not just to content, but to the way it is packaged.

The researchers believe that LLMs currently treat surface-level cues—such as formality, length, or citation style—as signals of trustworthiness, even if the actual content is harmful. One example from the study includes fake academic references to nonexistent arXiv papers and authors, strategically inserted to make a malicious query appear scholarly. These tactics not only mask the intent of the request but actively mislead the AI system into complying.

Industry Response and Research Implications

The paper, titled “InfoFlood: Jailbreaking Large Language Models with Information Overload,” was published as a preprint and has not yet been peer-reviewed. OpenAI and Meta did not respond to 404 Media’s requests for comment. A spokesperson for Google acknowledged the findings but stated that similar techniques had been seen before and that typical users were unlikely to encounter them in everyday use.

The researchers said they are preparing to share their findings directly with leading AI companies. “We’re preparing a courtesy disclosure package and will send it to the major model vendors this week to ensure their security teams see the findings directly,” they told 404 Media. They also proposed using InfoFlood itself as a tool for improving model robustness. By training AI systems on prompts generated using InfoFlood, developers could strengthen safety mechanisms to better identify and block adversarial queries in the future.

The study adds to a growing body of evidence that LLMs, despite significant safety investments, remain susceptible to sophisticated manipulation. It raises concerns about how widely these techniques could spread and what protections are needed to prevent misuse. As AI continues to play a larger role in education, research, and customer service, ensuring that these tools cannot be exploited for harmful purposes will remain a central challenge for the industry.