
5 runtime signals for catching a compromised AI agent
In June 2025, Simon Willison, the engineer who coined the term “prompt injection,” published a warning that circulated widely through the security community. He called it the lethal trifecta — three capabilities that, when combined in a single AI agent, create a near-guaranteed path to exploitation through indirect prompt injection: access to private data; exposure to untrusted content; the ability to communicate externally.
The framing was sharp and useful. If your agent reads your email, ingests arbitrary web content, and can make outbound requests, an attacker who embeds malicious instructions anywhere in that content pipeline can direct the agent to exfiltrate your data without you ever ...