Cybersecurity researchers at LayerX have discovered an unusual way to bypass protection mechanisms in AI agents. The method, called BioShocking, shows that modern language models do not necessarily need to be hacked directly – they can be tricked into “playing a game” with alternative rules. Once inside this framing, the AI begins performing actions it would normally refuse, including leaking sensitive data.
The name references the video game BioShock, where the protagonist is gradually manipulated into accepting a false version of reality. Researchers found that a similar psychological framing can be applied to AI agents as well.
The attack starts innocently. An AI agent is directed to a specially crafted webpage and told it is entering a game with different rules. For example, it is informed that 2 + 2 is no longer 4, and that “incorrect” answers are considered correct. While absurd to humans, the model treats this as a new rule system. Gradually, it shifts from normal safety behavior to the imposed game logic, where security constraints appear irrelevant.
The agent is then instructed to retrieve a “secret code” from another page. In reality, this refers to real sensitive data such as passwords, cookies, authentication tokens, or SSH keys accessible through granted permissions.
In LayerX’s tests, a controlled GitHub repository was used where the “code” was actually real credentials. All tested agents retrieved and exfiltrated the data.
Notably, some models treated the action as a successful game achievement and even reacted positively after completing the task.
According to LayerX, the attack worked across several AI tools, including ChatGPT Atlas (OpenAI), Comet (Perplexity AI), Fellou, Genspark Browser, Sigma Browser, and the Claude Chrome extension.
Vendor responses varied. OpenAI fixed the issue in Atlas. Anthropic attempted a patch, but it did not fully resolve the vulnerability. Perplexity closed the report without changes, while several other vendors did not respond.
Researchers emphasize that the core issue is not a single bug but the architecture of AI agents themselves: they operate based on context, and that context can be manipulated.
This is why they recommend mandatory user confirmation before any sensitive operation, such as accessing emails, repositories, password managers, or cloud storage, as well as stricter permission controls.
Importantly, BioShocking is not a fundamentally new attack type. It is a variant of prompt injection, a well-known threat that has topped OWASP risk rankings for large language models for several years. In the OWASP Top 10 for LLM Applications 2025, prompt injection (LLM01) remains the leading risk category.
Even more concerning are benchmark results showing how risk scales with repetition: a 4.7% success rate per single attempt can grow to over 60% under repeated adaptive attacks.
The key takeaway is simple: the danger is not in a single attempt, but in persistence.
The BioShocking case once again highlights a fundamental shift in AI security. Modern attacks increasingly target not code, but perception – attempting to reshape what the model believes is real. And for now, the boundary between “game” and reality remains one of the most fragile points in AI systems.
All content provided on this website (https://wildinwest.com/) -including attachments, links, or referenced materials — is for informative and entertainment purposes only and should not be considered as financial advice. Third-party materials remain the property of their respective owners.


