Agentic AI security — the surfaces deterministic software does not have
Agentic AI systems have attack surfaces that do not exist in deterministic software: a reasoning loop that can be hijacked, a tool manifest that defines what the agent can do, memory that persists across sessions, and goals that drift. Security review must address all four.
Most security review frameworks were built for deterministic software. You give the system an input, it performs a defined set of operations, it returns an output. The attack surface is the code, the infrastructure, the data stores, and the interfaces between them. The threat model is bounded by what the software can do.
Agentic AI systems break this model. An agent does not just transform inputs into outputs. It reasons. It decides what actions to take. It calls tools. It consults memory. It pursues a goal that can be interrupted, redirected, or gradually replaced. Security review for agentic AI must address four attack surfaces that simply do not exist in deterministic software — and most review processes currently address none of them.
Classic software vs agentic AI — security properties
| Property | Classic software | Agentic AI |
|---|---|---|
| Execution model | Deterministic — same input always produces same output | Non-deterministic — model decides next action from context |
| Attack surface | Code, infrastructure, data stores, interfaces | Reasoning loop, tool manifest, memory, goal anchoring |
| Input types | Structured API calls and form data | Natural language, retrieved documents, tool results, inter-agent messages |
| Output scope | Bounded by code — system does exactly what code says | Bounded by tool manifest — model decides what to invoke |
| Threat model tool support | STRIDE, CVE databases, code review checklists | No standardised framework; agentic-specific review required |
| Review artefact needed | Pen test report, SAST findings, CVE mitigations | Risk disposition covering reasoning loop, tools, memory, goal security |
What makes agents different
The word “agent” covers a wide range of systems, from simple task-completion loops to multi-step autonomous research assistants. What they share is a structure that standard software does not have: a model that decides what to do next, not just what to compute.
In a deterministic system, the logic is explicit in code. A security reviewer can read the code, trace data flows, identify what inputs influence what operations. The system does exactly what the code says — no more, no less.
An agent's logic is not explicit in code. It lives partly in the model weights, partly in the system prompt, partly in the context the model has accumulated during the current session. The model interprets instructions, decides how to proceed, and selects actions from a set of available tools. That decision process is opaque, non-deterministic, and influenceable by inputs the developer never anticipated.
This creates four distinct attack surfaces that do not appear in standard software threat models: the reasoning loop, the tool manifest, persistent memory, and goal drift. Each requires its own review approach.
The reasoning loop as an attack surface
The reasoning loop is the cycle an agent runs: observe inputs, form a plan, select an action, execute, observe the result, and repeat. In chain-of-thought or ReAct-style agents, this loop is explicit. In simpler agents it is implicit. In either case, it is the core of what the agent does — and it can be hijacked.
Hijacking the reasoning loop means supplying inputs that cause the model to form a different plan than the one its operators intended. This does not require exploiting a software vulnerability. It requires crafting input — user input, retrieved content, tool results — that the model interprets as legitimate instructions, overriding or displacing the original goal.
Prompt injection is the most direct example. An attacker embeds instructions in content the agent retrieves or processes. The model, which cannot reliably distinguish instructions from data, treats the injected content as authoritative. The reasoning loop then follows the injected instructions rather than the original task.
Security review must examine: what inputs enter the reasoning loop, from what sources, with what trust level, and whether the system prompt provides sufficient goal anchoring to resist displacement. The review must also identify the consequences of a hijacked loop — which tools could be invoked, what data could be exfiltrated, what external systems could be affected.
The tool manifest as an attack surface
An agentic AI system's capabilities are defined by its tool manifest — the set of tools the model is given access to. Every tool in the manifest is a potential action an attacker could trigger if they can manipulate the model's reasoning.
Tools typically include: web search, code execution, file system access, database queries, API calls to external services, email or messaging capabilities, and in multi-agent systems, the ability to spawn or instruct other agents. The surface is as large as the manifest.
Most manifests are over-provisioned. Teams add tools during development — “we might need this,” “it was easier to include everything” — and rarely revisit the manifest before deployment. An agent designed to summarise documents may have been given file write access during development that was never removed. An agent designed to answer questions may have been given an email tool added for a prototype that shipped.
The principle of least privilege applies directly: the tool manifest for each deployment should include only the tools the agent requires for that specific task in that specific context. A document summariser does not need file write access. A question-answering assistant does not need an email tool. Each extra tool is extra blast radius if the reasoning loop is hijacked.
See tool-use permissions for agentic AI for the full audit method.
Persistent memory as an attack surface
Many agentic systems are designed to remember. They store information across sessions — user preferences, past task outcomes, accumulated context — so they can provide continuity and improve performance over time. This memory is valuable. It is also an attack surface.
An agent that can write to memory can have its memory poisoned. An attacker who influences what the agent stores — by crafting inputs that the agent summarises and records — can plant instructions in future sessions. Unlike direct prompt injection, memory poisoning is persistent. The malicious instruction survives the session and executes in future interactions, potentially long after the attacker's access has ended.
Agentic systems typically have three memory types: in-context memory (the current session), external memory (vector stores, databases), and episodic memory (summaries of past sessions). Each has a different poisoning path and requires different controls.
In-context memory is poisoned via current-session input — the standard prompt injection path. External memory is poisoned by inserting malicious documents into the knowledge base. Episodic memory is the most dangerous: it is poisoned by planting instructions in content that the agent summarises and stores, to be retrieved and acted on in a later session by a different user.
See agent memory as an attack surface for the full treatment.
Goal drift as a security property
Deterministic software does not have goals. It has logic. If the logic is correct, the software does the right thing; if the logic is wrong, it does the wrong thing — and either way, the behavior is auditable from the code.
Agents have goals. The goal is stated in the system prompt, perhaps elaborated in a task description, and then pursued by the model across a potentially extended interaction. Goals are not enforced by code. They are interpreted by the model and can be replaced by a sufficiently persuasive input.
Goal drift is the gradual displacement of an agent's original objective by an attacker-controlled alternative. It is harder to detect than a direct hijack because the agent continues to appear productive — it is just pursuing a different goal than its operators intended.
Goal drift happens across a long interaction: each step seems locally reasonable, but the cumulative effect is an agent pursuing a goal it was never supposed to pursue. By the time the drift is detectable, the agent may have already taken consequential actions.
Security review must assess goal anchoring: how firmly is the goal stated in the system prompt, how resistant is it to displacement by user input or retrieved content, and what happens when the model encounters instructions that conflict with it. The review must also examine what consequential actions the agent can take — what the blast radius of a goal-drifted agent actually is.
See goal hijacking and instruction drift for the full treatment.
Why traditional security review falls short for agentic AI
Traditional security review, applied to an agentic AI system, will find the standard findings: missing input validation, exposed API keys, insufficient access controls on supporting infrastructure. These are real findings and worth addressing. But they miss the attack surface that makes agentic AI distinct.
A penetration test of an agentic system that does not include prompt injection attempts against the reasoning loop has not tested the most important attack surface. A threat model that enumerates CVEs against the hosting infrastructure but does not address tool manifest over-provisioning has missed the primary capability risk.
The gap is not a failure of skill. It is a framework mismatch. The threat models, checklists, and review templates that security teams have were not built with agentic AI in mind. They ask the right questions for deterministic software. They do not ask whether the model's goal can be displaced, whether the tool manifest is scoped to the deployment task, or whether memory poisoning creates persistence across sessions.
The agentic AI security review adds these questions systematically. It does not replace the traditional review — it extends it to cover the four surfaces that agentic AI introduces.
What the agentic security review covers
An AI security review for agentic systems must address the standard review surfaces and four additional areas:
- Reasoning loop review. Document every input source that enters the reasoning loop. Classify each by trust level. Assess goal anchoring in the system prompt. Identify what the model would do if its goal were displaced by input from each source.
- Tool manifest audit. Enumerate every tool in the manifest. For each tool, determine whether it is required for the specific deployment task. Remove tools that are not. For remaining tools, determine what an attacker could do with each if they could invoke it via a hijacked reasoning loop.
- Memory architecture review. Identify whether the system has in-context, external, or episodic memory. For each type, assess the poisoning path: what inputs influence what gets stored, and what stored content influences future behavior. Determine whether session isolation controls are in place.
- Goal security assessment.Define the agent's intended goal precisely. Assess how firmly the goal is anchored in the system prompt. Test whether retrieved content or user input can displace it. Define what “goal drift” looks like for this specific system and what monitoring approach can detect it.
The review produces the same output as any AI security review: a risk disposition with required controls, residual risk, evidence gaps, and re-assessment triggers. The agentic review adds controls and evidence requirements that are specific to the four agentic attack surfaces.
For a complete reference on agentic threats mapped to controls, see the OWASP Agentic Top 10 walkthrough. For the full review framework, see the agentic AI security review hub.
Blog
Get new posts in your inbox
AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.
Review your agentic AI system before it reviews you
Drel structures the agentic AI security review across all four attack surfaces — reasoning loop, tool manifest, memory, and goal security — and produces the defensible record your governance process requires.
A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.