BlogTechnical

RAG security — the three boundaries that matter

Retrieval-augmented generation adds a retrieval layer between the user and the model. That layer has three security boundaries — the data boundary, the retrieval boundary, and the context boundary — and each has distinct failure modes.

Drel Research7 July 202410 min read

Retrieval-Augmented Generation has become the default pattern for any LLM application that needs to answer questions grounded in organisational knowledge. The architecture is straightforward: instead of relying on the model's training data, the system retrieves relevant documents from a knowledge base at query time and passes them to the model alongside the user's question. The model synthesises an answer from what it retrieved. Straightforward in concept, considerably more complex in security terms.

The security complexity comes from what RAG adds between the user and the model: a retrieval layer. That layer is not passive plumbing. It determines what information the model sees, under what access rules, and in what form. It is a layer that can be attacked from multiple directions — and one that most AI security reviews underspecify because they treat it as a data-engineering concern rather than a security boundary.

RAG security boundaries — trust assumptions and controls

Ingestion boundary

Trust assumption: all document sources are treated as partially untrusted. Any write path — including internal contributors, third-party feeds, and automated pipelines — must be explicitly authorised and provenance-tracked.

Control: Restrict write access to authorised sources only; record source, contributor, and ingestion timestamp for every document; run content validation at ingestion to detect out-of-scope or adversarially crafted content.

Retrieval boundary

Trust assumption: a user's authorisation to query the system does not imply authorisation to retrieve every document. Access control must be evaluated per-document at retrieval time, not just at query submission.

Control: Enforce access filters at the vector database layer before results are returned; propagate user identity and access attributes to the retrieval query; test with adversarial queries designed to retrieve out-of-scope documents.

Generation boundary

Trust assumption: retrieved content is data, not instructions. The model cannot reliably distinguish system prompt instructions from retrieved document content — prompt design must enforce this separation explicitly.

Control: Mark retrieved content as data in the prompt template; implement output validation for instruction-like patterns in retrieved content; test with adversarially crafted documents in the knowledge base.

Each boundary is independent. A control at the ingestion boundary does not compensate for a gap at the retrieval or generation boundary.

What RAG actually is

A RAG system has three major components: a knowledge base, a retrieval mechanism, and a generation model. The knowledge base holds the documents — internal policies, support articles, research documents, contract templates, whatever the organisation decides to make queryable. The retrieval mechanism takes the user's query, converts it to a vector embedding, and finds the documents in the knowledge base whose embeddings are most similar. The generation model receives the user's original query plus the retrieved documents and produces an answer.

From a security standpoint, the relevant property is this: the model's prompt is no longer a function of only the user's input and a fixed system prompt. It is also a function of whatever documents the retrieval mechanism returns. Those documents come from a system that other parties — potentially including adversaries — can write to. The model cannot distinguish between instructions in the system prompt and content in a retrieved document. It processes all of it as token sequences.

RAG adds a retrieval layer between the user and the model. That layer is not neutral. It is a trust boundary — one that determines what content the model sees, from what sources, with what access rules. Security review must treat it as such.

The three security boundaries in a RAG system are the data boundary (what enters the knowledge base), the retrieval boundary (what gets retrieved for a given query), and the context boundary (what from the retrieved content the model actually processes). Each boundary has distinct failure modes that do not automatically propagate from the others.

Three security boundaries

Security teams often approach RAG as a single-boundary problem: “secure the data” or “add access control.” That framing is incomplete. A RAG system has three distinct boundaries, and a failure at any one of them can result in a security incident even if the other two are correctly controlled.

The data boundary — the point at which content enters the knowledge base. What documents are allowed in? From what sources? With what provenance verification? Failures here allow malicious or unauthorised content to become part of the retrieval corpus.
The retrieval boundary — the point at which the retrieval mechanism decides which documents to return for a given query. What access control is enforced here? Can a query retrieve documents the querying user should not see? Failures here allow authorisation bypass through retrieval.
The context boundary — the point at which retrieved content is assembled with the system prompt and user query into the final model input. How is retrieved content marked as data rather than instructions? Failures here allow indirect prompt injection.

The three boundaries are independent. A well-controlled data boundary does not compensate for a weak retrieval boundary. Strong retrieval access control does not eliminate context-boundary injection risk. Each boundary must be reviewed separately, with its own control questions and evidence requirements.

Boundary 1 — the data boundary

The data boundary governs what enters the knowledge base. It is a write-side control: the threat model is not primarily about who reads from the knowledge base, but about who can cause content to appear in it — and therefore be retrieved into future prompts.

In assessed systems, the data boundary is typically the least formally controlled. Ingestion pipelines are built to be permissive — the goal is to get documents into the system quickly. Access controls on ingestion paths are often weaker than controls on query paths. Document sources are treated as implicitly trusted.

The security questions for the data boundary are:

Who has write access to the knowledge base, directly or transitively? This includes scheduled ingest jobs, human contributors, third-party connectors, and any process that can trigger an ingestion event.
Is document provenance recorded? Each document in the knowledge base should have a verifiable source record. Without provenance, there is no basis for differential trust between documents.
What classification or sensitivity labels are stored with documents? If every document is treated as equally trusted, a low-quality or adversarially crafted document can outrank an authoritative one in retrieval results.
What content validation runs at ingestion? At minimum, this should include structural validation (format, size, encoding), but for higher-risk deployments it should also include semantic checks against the intended scope of the knowledge base.

Boundary 2 — the retrieval boundary

The retrieval boundary governs what gets returned for a given query. It is an access-control problem: the retrieval mechanism must enforce the same document access rules that the application enforces at every other layer, but at the time of retrieval — not before, and not after.

The characteristic failure at this boundary is retrieval-time access control gap: the application checks whether a user can submit queries, but does not check whether the specific documents retrieved for that query are within the user's authorisation scope. A user who can query the system can craft queries that retrieve documents they would not be permitted to read directly.

Vector databases — the infrastructure that usually backs RAG knowledge bases — have variable support for row-level access control. Some support namespace isolation. Some support metadata filtering at query time. Many require the access control to be implemented at the application layer, outside the vector database itself. In any case, the retrieval boundary is only as strong as the access control mechanism that operates at retrieval time, not the one that operates at query submission time.

The security questions for the retrieval boundary are:

At what point in the retrieval pipeline is the querying user's identity and access scope enforced?
Can a user with query access retrieve documents they do not have explicit read access to? The test for this is adversarial: construct queries designed to retrieve out-of-scope content and verify what the system returns.
If the vector database supports multi-tenancy, are tenant namespaces strictly isolated? Can a query in one tenant's namespace retrieve documents from another's?
Does the reranker or any post-retrieval filter have access to the user's identity and access scope?

Boundary 3 — the context boundary

The context boundary governs what from the retrieved content actually reaches the model, and in what form. It is the boundary where retrieved documents transition from data to prompt tokens — and where the risk of indirect prompt injection lives.

The LLM cannot tell the difference between instructions it was given in the system prompt and content that arrived via retrieval. Both are token sequences in its context window. The context boundary controls rely on the prompt template, the model's instruction-following behaviour, and any output validation that follows generation.

Indirect prompt injection at the context boundary works as follows: an adversary places a document in the knowledge base that contains embedded instructions — for example, text that instructs the model to ignore its system prompt, to reveal the contents of other retrieved documents, or to append a specific URL to its response. That document is retrieved into a prompt where it mingles with trusted instructions. Depending on the model and the prompt design, the embedded instructions may be followed.

The context boundary is where retrieved content meets trusted instructions. A well-designed prompt template marks retrieved content as data and treats it accordingly. A poorly designed one gives retrieved content the same authority as the system prompt.

The security questions for the context boundary are:

Does the prompt template explicitly mark retrieved content as “context to answer from” rather than as instructions?
Is there output validation that checks whether the model's response contains content from outside the retrieval scope?
Is there a maximum context budget that limits how much retrieved content can displace system prompt instructions?
Has the system been tested with adversarially crafted documents in the knowledge base, and what was the result?

How failures cascade

The three boundaries are independent but not isolated. A failure at one boundary creates conditions that make failures at subsequent boundaries more likely or more severe.

A data boundary failure — malicious content in the knowledge base — is only exploitable if the retrieval boundary returns that content. A retrieval boundary failure — returning content the user should not see — is only exploitable if the context boundary renders it in the model's output. But each failure in the chain amplifies the downstream risk.

Consider a cascade: an attacker gains write access to an ingestion pipeline (data boundary failure) and plants a document that appears to be an authoritative internal policy. That document contains embedded instructions. A user whose access scope does not include that document queries the system; the retrieval access control is checked only at query submission, not at retrieval time (retrieval boundary failure). The document is retrieved and passed to the model as context. The context boundary has no output validation for instruction-like content in retrieved documents (context boundary failure). The model follows the embedded instructions.

No single boundary failure caused the incident. All three contributed. A security review that evaluates only one boundary will not catch this.

The review framework

A RAG security review has a specific structure that differs from a general LLM security review. The review must cover all three boundaries, with separate control questions and evidence requirements for each.

For the data boundary: document the ingestion paths, the provenance controls, the content classification scheme, and the content validation at ingestion. Evidence: the ingestion pipeline architecture, the access control policy for write paths, and the results of adversarial ingestion testing.

For the retrieval boundary: document the access control mechanism at retrieval time, the multi-tenancy isolation model, and the test results for retrieval-scope adversarial queries. Evidence: the retrieval access control implementation, the namespace configuration, and the test results.

For the context boundary: document the prompt template design (how retrieved content is labelled), the output validation controls, and the test results for indirect injection via crafted documents. Evidence: the prompt template, the output validation implementation, and the injection test results.

The review produces a disposition that covers all three boundaries independently — not a single disposition for “the RAG system.” Control gaps at each boundary are named, evidenced, and assigned required remediation before the system proceeds. See the Drel RAG security assessment hub for the full review framework with worked examples.

Blog

Get new posts in your inbox

AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.

Review all three RAG boundaries

Drel structures a RAG security review across the data boundary, retrieval boundary, and context boundary — producing a disposition that covers each independently.

Request early access See the demo dossier

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.