When Your AI Agent Is the Attack Surface: The Five Eyes Guidance on Agentic AI

WHY THIS GUIDANCE EXISTS NOW

The agencies that signed this document — CISA and NSA in the United States, the ASD's Australian Cyber Security Centre, Canada's CCCS, and the UK's NCSC — do not publish joint guidance on a whim. Coordinating a document across five national security bureaucracies takes months and requires a shared assessment that a threat is real, present, and not being adequately addressed by the private sector or individual national guidance. The fact that this document exists at all is the first signal worth paying attention to.

The second signal is the framing. The title is "Careful Adoption of Agentic AI Services" — not "How to Use AI Agents" or "AI Agent Best Practices." The word "careful" is doing specific work. The guidance explicitly states that increased autonomy amplifies the impact of design flaws, misconfigurations, and incomplete oversight. That sentence is not a warning about hypothetical future risk. It is a description of what is already happening in production deployments that these agencies have observed.

Agentic AI has moved faster than the security practices surrounding it. A year ago, deploying an AI agent meant giving it access to a few APIs with carefully scoped credentials. Today, production agent deployments routinely have access to email, calendars, codebases, internal documentation, customer records, and the ability to execute code. The attack surface has expanded dramatically. The security thinking has not kept pace. This guidance is an attempt to close that gap — or at least to name it clearly enough that organisations can no longer claim they weren't warned.

THE FIVE RISK CATEGORIES

The guidance organises its findings into five risk categories: privilege, design and configuration, behavioural, structural, and supply chain. Each maps to a distinct failure mode that the agencies have documented in real deployments. Privilege risk is the most straightforward: agents are routinely granted permissions far in excess of what their tasks require, creating a blast radius that extends far beyond any individual action the agent was intended to take. An agent that can read and send email, access file systems, and execute shell commands is not an email assistant with some extra features — it is a general-purpose autonomous system operating under a single set of credentials.

Behavioural risk is more subtle. Language models are non-deterministic systems whose outputs are sensitive to context in ways that are difficult to predict or audit. An agent that behaves correctly across a thousand test scenarios may behave differently when it encounters a prompt constructed to exploit the gap between its intended instructions and its actual decision-making process. Structural risk refers to the interaction effects between multiple agents in a pipeline — a chain of agents where the output of one becomes the input to the next creates compounding uncertainty at each step, and the overall system can behave in ways that none of the individual agents were designed to produce.

Supply chain risk in the agentic context deserves particular attention because it differs from traditional software supply chain risk in a crucial way. When a dependency in a software project is compromised, the compromise is typically stable and detectable — the malicious code does what it does consistently. When an AI model or an MCP server that provides tool access is compromised or manipulated, the effect can be dynamic, context-sensitive, and specifically designed to evade detection by appearing as normal model behaviour. The guidance flags this as an underappreciated risk in current deployment practices, and it is correct.

THE PROMPT INJECTION PROBLEM

The guidance identifies prompt injection as the most persistent and hardest-to-fix threat in agentic AI deployments, and it is worth spending time on why. Prompt injection occurs when an attacker embeds instructions inside content that an agent is expected to process — a document it summarises, a webpage it browses, an email it reads, a database record it retrieves — and those instructions cause the agent to take actions outside its intended scope. The attack is elegant in its simplicity: you do not need access to the model, the system prompt, or the deployment infrastructure. You just need to get malicious content into the agent's input stream.

The reason this is so hard to fix is structural. Language models process all tokens in their context window with the same basic mechanism, regardless of whether those tokens come from a trusted system prompt, a user message, or an untrusted external document. There is no hardware separation between "instructions" and "data" analogous to the separation in traditional computing that prevents code injection attacks. Defences exist — careful formatting, XML tags, explicit instruction hierarchy — but none of them are reliable against a sophisticated adversary who knows the model architecture and the system prompt structure. The guidance acknowledges this directly, noting that language models cannot reliably distinguish between legitimate instructions and malicious data embedded in their inputs.

The practical implication for anyone building with agents is that external content must be treated as adversarial by default. An agent that browses the web, reads customer emails, or processes uploaded documents should be assumed to encounter prompt injection attempts as a matter of course, not as an edge case. The defence is not a better prompt — it is architectural: minimise the actions an agent can take based on external content, require explicit confirmation for high-consequence actions, and design agent permissions around the principle that any single injected instruction should be unable to cause catastrophic or irreversible harm.

PRIVILEGE AND THE BLAST RADIUS PROBLEM

The least-privilege principle is one of the oldest concepts in information security: systems should operate with the minimum permissions required to perform their function. It predates the web, predates modern operating systems, and has been consensus security practice for decades. It has been almost universally ignored in agentic AI deployments. The guidance makes this observation with more diplomatic language, but the substance is the same: agents are routinely over-privileged, and the organisations deploying them have not thought carefully about what happens when an agent with broad access behaves unexpectedly.

The blast radius problem is the consequence. When an agent has write access to a production database, the ability to send email as a real employee, and credentials for third-party services, the scope of potential harm from a single bad decision — whether caused by a model error, a prompt injection, or a misconfigured tool — is enormous. The traditional security model of "break glass" access, where high-privilege actions require explicit human authorisation, has not been systematically applied to agent deployments. Agents frequently have standing access to capabilities that a human employee would only access after explicit approval workflows.

The guidance recommends incremental adoption: start with clearly defined, low-risk tasks with minimal permissions, assess the agent's behaviour against evolving threat models, and expand scope only after building the monitoring and control infrastructure to manage it safely. This is sensible advice that runs directly counter to the incentives of most AI vendor sales cycles, which emphasise breadth of integration as a feature. Teams evaluating agentic AI products should be specifically sceptical of integrations that request broad permissions upfront, and should treat "seamless access to all your systems" as a risk factor rather than a selling point.

WHAT TO ACTUALLY DO

The guidance folds its recommendations into existing security frameworks rather than proposing new ones, which is both practical and correct. Least-privilege, defence-in-depth, zero-trust network design, and software bill of materials practices all apply directly to agentic AI deployments. The translation is not automatic — the specific ways these principles manifest in an agent context are different from how they manifest in traditional software — but the frameworks are sound and the teams that have invested in them are starting from a better position.

Concretely: scope agent credentials to the minimum required for each task and rotate them frequently. Log every tool call an agent makes, including the input it received and the action it took, at a level of detail that allows post-incident forensic reconstruction. Treat external content as untrusted input and design confirmation requirements for any agent action that touches production systems, sends external communications, or modifies persistent data. Test agent pipelines against adversarial inputs, not just representative ones — red-team your own agents with prompt injection payloads before an attacker does. And implement circuit breakers: explicit limits on how many actions an agent can take, how many tokens of external content it can process, and what classes of action it can take autonomously versus with human confirmation.

The deeper point the guidance makes — and the one most worth sitting with — is that the speed of agentic AI adoption has outpaced the security practices required to make it safe. Most organisations that have deployed AI agents have done so by granting access first and thinking about security second, if at all. The Five Eyes document is a formal acknowledgement by the world's most capable intelligence agencies that this is a problem they are already observing consequences of. The question is not whether to take agent security seriously. The question is whether you address it before or after something goes wrong.