BlogTechnical

Context-window risks in RAG and how to bound them

The context window is the shared space where user queries and retrieved documents meet the model. Anything in that window can influence model outputs. Context-window risks in RAG are about what gets in — and how to bound it.

Drel Research23 November 202510 min read

The context window is the bounded space in which an LLM processes everything it needs to produce a response: the system prompt, the retrieved documents, and the user's query. In a RAG system, this space is not a clean, controlled input — it is assembled at request time from multiple sources with different trust levels, different owners, and different intended purposes. The assembly process creates risks that are specific to the context window and that do not exist in simpler LLM deployments.

Context-window risks in RAG are underspecified in most security reviews because they are treated as a consequence of retrieval quality rather than a distinct threat surface. When too many irrelevant documents are retrieved, that is treated as a quality problem. When documents from different users appear in the same context in a multi-tenant system, that is treated as an infrastructure configuration issue. When retrieved content appears to override system instructions, that is treated as a prompt-injection incident. Each framing misses the common cause: the context window is where these risks materialise, and the controls must be designed for it specifically.

What enters the context window

In a RAG system, the context window at each request contains:

The system prompt:trusted instructions set by the system operator, defining the model's role, scope, and behavioural constraints. The system prompt is the highest-trust component.
Retrieved document chunks: content selected by the retrieval mechanism from the knowledge base, based on semantic similarity to the query. Trust level varies by document source; most implementations do not differentiate trust within this section.
The user's query: untrusted input from the end user. The query is the lowest-trust component but the most immediate influence on what is retrieved.
Conversation history (in multi-turn systems): prior turns in the conversation, which may include model responses that were themselves grounded in retrieved content.

The context window assembles these components into a single token sequence that the model processes without inherent structural separation between trust levels. The prompt template provides semantic markers, but the model's handling of those markers is learned behaviour, not a structural guarantee.

The context window is the convergence point for every trust domain in the RAG system. System instructions, retrieved content, and user input meet here. Any weakness in how these domains are separated will manifest as a context-window risk.

Context window risks — mechanism and control

Risk	Mechanism	Control
Context stuffing	Excessive retrieved content fills the context window, causing the model to ignore system instructions, lose task coherence, or produce degraded outputs due to attention dilution.	Cap retrieved chunk count and total token budget. Score and filter retrieved chunks before injection. Monitor response quality degradation as a signal.
Data mixing	In multi-tenant systems, retrieved content from different tenants is injected into the same context window, creating cross-tenant information disclosure when the model generates its response.	Tenant-scoped retrieval — filter by tenant ID before returning results. Never mix cross-tenant content in a single context window. Test with cross-tenant query attempts.
Privilege confusion	Retrieved content that appears to carry authority (documents formatted like system prompts, policy statements, or admin instructions) may influence the model to override its operating constraints.	System prompt explicitly labels retrieved content as untrusted external data. Strip formatting that mimics system-level authority. Output validation checks for policy override attempts.
Injection via retrieved payload	A document containing a crafted prompt injection payload is retrieved and injected into context. The model follows the embedded instruction as if it were part of its task context.	Input scanning at ingestion to detect injection patterns. System prompt frames retrieved content as data-only. Behavioral testing with adversarial retrieved payloads.
Context window exhaustion	An adversarial user submits queries that force maximum retrieval, exhausting the context window and preventing legitimate content from being retrieved — a denial-of-relevance attack.	Rate limiting on retrieval per user per session. Hard cap on total retrieved tokens. Detect and alert on anomalous retrieval patterns.

Context stuffing

Context stuffing is the condition where excessive retrieved content occupies the context window in a way that degrades or overrides the safety and scope instructions in the system prompt. It can be adversarial — an attacker crafts a document whose content fills a large portion of the context window, pushing system instructions toward the edges where models give them less weight — or accidental, arising from retrieval configurations that return too many or too large chunks.

The security mechanism behind context stuffing is the attention distribution effect: LLMs do not give equal attention to all positions in the context window. Content near the beginning and end of the context receives relatively more attention. System prompts placed at the beginning compete with the beginning of the retrieved content; if retrieved content is long, the system prompt's relative influence on the output decreases.

Adversarial context stuffing is a variant of indirect injection. A document crafted to be verbose — containing large amounts of plausible but low-value content — can be designed to fill the context window when retrieved, pushing the system prompt's instructions to a position where they receive less model attention. The follow-on injection payload, placed near the end of the verbose document, then receives elevated attention as a “recency” position.

Data mixing

Data mixing is a context-window risk specific to multi-tenant RAG systems. In a system where multiple users or organisations share a knowledge base and query infrastructure, the context window for any given request may contain retrieved content belonging to different tenants — if the retrieval access control is not enforced strictly at the vector database level.

The data mixing scenario: User A and User B are tenants in a shared RAG system. User B has uploaded sensitive documents to their namespace. User A submits a query that semantically overlaps with User B's document content. If the retrieval mechanism does not enforce namespace isolation at query time — or if the namespace isolation has an edge-case gap — User B's documents may appear in User A's context window and therefore in User A's response.

Data mixing through the context window is particularly difficult to detect because it may not be visible in the response — the model may synthesise from both tenants' content without obviously citing one over the other. The mixing has occurred, but the output does not signal it.

The security review for data mixing requires adversarial cross-tenant retrieval testing: submit queries from Tenant A designed to retrieve Tenant B's specific content, and verify that none of that content appears in the context window assembled for Tenant A's query.

Privilege confusion

Privilege confusion is the condition where retrieved content appears to the model as having higher authority than it actually has. The model is designed to give the system prompt authoritative weight; retrieved content is supposed to be data. When a retrieved document is formatted to resemble authoritative instructions — a policy statement, a system directive, a role definition — the model may treat it with system-prompt authority.

Privilege confusion is distinct from explicit injection. An injection attack uses imperative language to override instructions. Privilege confusion uses authoritative framing to elevate the perceived trust level of retrieved content without an explicit override command. The effect is similar — retrieved content influences the model's behaviour beyond its intended role as data — but the mechanism is subtler and harder to detect in output validation.

In assessed systems, privilege confusion most commonly arises from knowledge bases that contain actual internal policies — documents that the system was designed to make queryable, which happen to be written in the imperative “you must” / “the system shall” style. These documents are not adversarially crafted, but their format can trigger privilege confusion for certain model and prompt template combinations.

Context window limits as an accidental control

Context window limits — the maximum number of tokens the model can process in a single request — are an accidental control on some context-window risks. When the context window is small relative to the available retrieved content, the retrieval pipeline must select a subset of documents to include. This selection pressure limits how much adversarially crafted content can enter the context window in any single request.

Smaller context windows are not straightforwardly safer — they may cause important documents to be excluded from context — but they do impose a natural upper bound on the context stuffing risk and limit the amount of potentially injected content that can reach the model at once.

The accidental nature of this control matters. Organisations choosing models with larger context windows for quality reasons (better synthesis from more documents) are simultaneously accepting a larger potential context-window attack surface. The security review must note this tradeoff and ensure that other context-boundary controls are scaled accordingly — retrieval budgets, chunk size limits, and output validation coverage.

Controls

Context-window risks are addressed by controls at the assembly stage (how the context window is constructed) and the validation stage (what is checked after assembly and after generation).

Assembly controls. Retrieval budget: a hard limit on the number of tokens that can be occupied by retrieved content, leaving a guaranteed minimum space for system instructions. System prompt placement and repetition: place system instructions at the beginning of the context and optionally repeat key constraints at the end (the highest-attention positions). Retrieved content framing: explicit semantic marking that distinguishes retrieved content from instructions at every retrieval block.

Multi-tenant isolation controls. Namespace isolation enforced at the vector database level as the primary control; metadata filter verification as the secondary control. Adversarial cross-tenant testing as part of the pre-deployment review, not just at initial deployment.

Output validation controls. Pattern checks on model outputs for privilege confusion indicators — imperative language, system directive format, role redefinition — that may indicate retrieved content was treated as instructions. Context coverage logging: for each response, log which documents were in the context window, enabling audit of what influenced a given output.

Review evidence

A context-window risk review produces evidence covering the assembly controls, the multi-tenant isolation, and the output validation:

Prompt template with retrieval budget, system instruction placement, and retrieved content framing documented.
Retrieval budget configuration — maximum tokens allocated to retrieved content, maximum chunk size, maximum number of chunks.
Context stuffing test results — adversarial documents crafted to fill the retrieval budget, model outputs evaluated for instruction dilution.
Data mixing test results (for multi-tenant deployments) — cross-tenant retrieval test queries and verified absence of cross-tenant content in assembled context windows.
Privilege confusion test results — authoritative-format documents inserted as retrieved content, model outputs evaluated for elevated-trust treatment.
Output validation configuration — what patterns are checked, what happens when patterns match.

See the Drel RAG security assessment hub for the context-window risk review module with assembly control templates and test cases.

Blog

Get new posts in your inbox

AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.

Bound context-window risk in your RAG assessment

Drel reviews context-window assembly, multi-tenant isolation, and privilege confusion as named risks in every RAG security assessment — with test cases and required controls for each.

Request early access See the demo dossier

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.