BlogTechnical

Security review for multi-agent systems

When agents orchestrate other agents, every trust assumption in the single-agent model multiplies. This piece defines the additional review surface for multi-agent systems: inter-agent trust, capability delegation, and blast-radius containment.

Drel Research13 min read

A single-agent system has a bounded attack surface. Its threat model has one reasoning loop, one tool manifest, one memory architecture, and one set of trust assumptions. Reviewing it is a defined problem.

A multi-agent system multiplies each of these. An orchestrator delegates to workers. Workers return results. Results become inputs to the orchestrator's next reasoning step. Each communication boundary introduces new trust questions. Each delegation introduces new capability scope questions. The review surface is not one agent — it is the graph of agents and the edges between them.

Multi-agent trust boundary levels

Orchestrator trust boundary

Verify that worker return values are treated as data, not instructions. Confirm that the orchestrator validates result structure against expected sub-task output format before incorporating results into its planning. Any content from workers that resembles instructions must be treated as untrusted.

Sub-agent trust boundary

Verify that delegated capabilities are scoped to the assigned sub-task only. Confirm that worker agents do not retain delegated credentials beyond the task boundary. Check that each worker can refuse instructions that exceed its declared task scope, regardless of claimed source.

Tool execution trust boundary

Verify that every tool call enforces authorization at the infrastructure layer, independent of which agent issued it. Confirm that tools cannot be invoked with parameters outside their permitted scope regardless of the instruction source — model-layer authorization is not sufficient here.

Each boundary must be reviewed separately. A control at the orchestrator boundary does not substitute for a control at the tool execution boundary.

Why multi-agent systems require a different review

Multi-agent systems are built for the same reason microservices are built: to decompose complexity. A task that is too complex for a single agent, or too slow when done sequentially, is broken into parallel sub-tasks handled by specialised workers. The orchestrator coordinates; the workers execute.

This decomposition is valuable. It is also where standard security review frameworks fail to provide adequate guidance. Most agentic AI security frameworks address the single-agent model: one system prompt, one tool manifest, one session. The multi-agent model introduces:

  • Multiple trust boundaries — each inter-agent communication crossing a trust boundary that must be explicitly modelled
  • Distributed capability — the system's total capability is the union of all agents' tool manifests, not any single manifest
  • Cascade risk — a compromise at one point in the graph can propagate to all other points, depending on the trust model
  • Audit complexity — attributing an action to a specific causal chain is harder when multiple agents contributed to it

Trust between agents

The most common security mistake in multi-agent systems is applying an implicit trust model: agents trust each other because they are “in the same system.” This is the same mistake as trusting all traffic within a flat network perimeter, and it has the same consequences.

In a correct trust model for multi-agent systems, every inter-agent message is an untrusted input until verified. A worker agent that receives a message claiming to come from the orchestrator should not act on that claim without verification. A message claiming authorization to perform a sensitive action should not be sufficient authorization for that action.

Agent-to-agent trust is not a binary: either fully trusted or untrusted. The correct model assigns trust levels to different types of inter-agent messages: instructions from the orchestrator have a different trust level than return values from a worker, which have a different trust level than data retrieved from an external source a worker queried.

The practical implication: every worker agent must treat messages from the orchestrator as authoritative instructions, but must also have boundaries on what it will do regardless of those instructions. An orchestrator that has been compromised should not be able to instruct a worker to delete production data — that capability should not exist in the worker's manifest for orchestrator-sourced instructions.

Similarly, the orchestrator must treat worker return values as data, not as instructions. A worker that has been compromised via prompt injection — via the content it retrieved to complete its sub-task — may return a message to the orchestrator that contains injected instructions. The orchestrator must be designed to parse worker outputs as data, not to execute them as instructions.

Capability delegation

Capability delegation is what happens when an orchestrator passes some of its capabilities to worker agents so they can complete their assigned sub-tasks. Done correctly, delegation is minimal: the worker receives only the capabilities required for its specific sub-task. Done incorrectly, delegation is the primary amplifier of blast radius in multi-agent systems.

The incorrect delegation pattern: the orchestrator passes its full credential set, access token, or tool manifest to workers for simplicity. Each worker now has the same capabilities as the orchestrator. A compromised worker can do anything the orchestrator can do.

The correct delegation pattern: the orchestrator issues task-scoped credentials or task-scoped capability sets to workers. A worker assigned to “research recent publications on topic X” receives read access to research databases and nothing else. It does not receive the orchestrator's email access, its file write capabilities, or its administrative tool access.

Implementing minimal delegation requires that the orchestrator system is designed to scope delegated credentials at task-dispatch time, not just at orchestrator-setup time. This is a design-time constraint that must be reviewed before deployment — it cannot be retrofitted easily after the system is built.

Blast radius in multi-agent systems

Blast radius in a single-agent system is bounded by the tool manifest. In a multi-agent system, blast radius has an additional dimension: the topology of the agent graph determines how far a compromise propagates.

In a star topology — one orchestrator, many workers, no worker-to-worker communication — a worker compromise can propagate to the orchestrator if the orchestrator trusts worker outputs. An orchestrator compromise propagates to all workers, since the orchestrator can issue instructions to all of them.

In a mesh topology — agents that can communicate with each other directly — a compromise at any node can propagate to all adjacent nodes, which can propagate to their adjacent nodes, and so on. The blast radius is potentially the entire mesh.

The review must map the agent graph topology and analyze the blast radius of a compromise at each node. For each node, the question is: if this agent's reasoning loop were hijacked, what is the maximum set of actions that could be taken, considering not just this agent's tool manifest but its ability to influence other agents in the graph?

Orchestrator vulnerabilities

The orchestrator is the highest-value target in a multi-agent system. It receives the user's original task, plans the decomposition, dispatches to workers, and integrates results. A compromised orchestrator has the highest blast radius in the system.

Orchestrator-specific vulnerabilities:

  • Worker result injection — a compromised worker returns a result containing injected instructions that the orchestrator's reasoning loop processes and acts on
  • Task hijacking via planning — an attacker who can influence the orchestrator's planning step can insert additional sub-tasks that execute with the orchestrator's authority
  • Over-broad capability set — the orchestrator holds capabilities it delegates to workers, meaning it is both a capability hub and an attack target; reducing its capability set reduces the delegated blast radius
  • Insufficient result validation — the orchestrator trusts worker outputs as ground truth without validating that they are consistent with the assigned sub-task

Review controls for orchestrator security: the orchestrator should validate worker results against the expected sub-task output format before incorporating them; it should treat worker results as data, not instructions; its own tool manifest should be scoped to the minimum required for orchestration (which typically excludes the execution-level tools held by workers).

Worker agent vulnerabilities

Worker agents face the same prompt injection risks as any agentic system, with an additional propagation risk: a compromised worker can attempt to inject instructions upstream to the orchestrator.

Worker-specific vulnerabilities:

  • External content injection — a worker that retrieves content from external sources (web, databases, APIs) is vulnerable to indirect prompt injection; the injected content may target the worker's own behavior or may be crafted to survive in the worker's result and target the orchestrator
  • Task scope expansion — a worker that receives ambiguous instructions from the orchestrator may interpret them expansively, invoking capabilities beyond what the sub-task requires
  • Peer-agent impersonation — in systems where workers can communicate with each other, a compromised worker can impersonate another worker to obtain data or trigger actions
  • Credential retention — a worker that caches the delegated credentials it received for its sub-task and retains them beyond the task boundary

Review additions for multi-agent systems

A security review for a multi-agent system requires all elements of the single-agent review plus the following additional areas:

  1. Agent graph documentation. A complete map of all agents in the system, their roles, their communication paths, and the trust level assigned to each communication type. This is the foundation of all subsequent multi-agent review work.
  2. Inter-agent trust model review. For each communication edge in the graph, what trust level is assigned to messages from the source to the destination? How is the trust level enforced? What does each trust level permit?
  3. Capability delegation audit.For each orchestrator-to-worker delegation, what capabilities are delegated? Are they scoped to the sub-task? Is the delegation mechanism implemented at the application layer (task-scoped credentials) or only at the model layer (system prompt instructions to “stay in scope”)?
  4. Blast radius analysis.For each agent, map the maximum blast radius of a compromise: the actions directly available via the agent's tool manifest, plus the actions available via influenced agents in the graph.
  5. Result validation review. How does the orchestrator validate worker results before incorporating them into its planning and executing further actions? Is validation format-based only, or does it include semantic consistency checks?

These additions are structured within the agentic AI security review framework as the multi-agent extension. They apply when the system has more than one agent in its architecture, regardless of whether the second agent is a production component or a utility sub-agent (e.g., an agent that manages memory writes).

Blog

Get new posts in your inbox

AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.

Review your multi-agent system before deployment

Drel structures the multi-agent security review — agent graph mapping, inter-agent trust model, capability delegation audit, and blast radius analysis — as part of the design-time agentic AI assessment.

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.