Data poisoning in RAG knowledge bases
A RAG knowledge base is only as trustworthy as the documents in it. Data poisoning attacks insert malicious content into the knowledge base — not to corrupt the index, but to influence model outputs when those documents are retrieved.
The knowledge base in a RAG system is only as trustworthy as the documents it contains. If those documents can be manipulated — even partially, even by a single contributor — the model's outputs can be manipulated too. Data poisoning is the class of attacks that exploits this property: instead of targeting the model or the retrieval mechanism directly, the attacker targets the knowledge base. The payload is a document that looks legitimate, behaves normally in most queries, and produces the adversary's intended output in the specific queries that matter.
This is distinct from traditional data poisoning in machine learning, which targets training data to corrupt a model's weights. RAG data poisoning does not require access to model training. It requires write access to the knowledge base — directly or through any path that leads there — and an understanding of how the retrieval mechanism ranks documents.
RAG data poisoning vectors — ingestion paths and controls
| Poisoning vector | How it reaches the KB | Detection difficulty | Control |
|---|---|---|---|
| Direct document upload | Attacker or insider with upload credentials submits poisoned document via content management interface, shared folder, or ingestion API | Low — typically no semantic content review at upload | Require contributor authentication; log uploader identity and timestamp; implement semantic scope validation at ingestion |
| Crawled web content | Ingestion pipeline fetches content from external web sources; attacker controls or compromises a crawled domain to serve adversarially crafted pages | Medium — requires attacker to control a crawled domain or compromise a legitimate one | Apply provenance trust tiers by source domain; validate crawled content for adversarial patterns before indexing; re-crawl verification against previous versions |
| API-fed documents | Third-party SaaS integration, content feed, or webhook pushes documents into the knowledge base; attacker compromises the upstream provider or the integration credentials | Medium to High — requires upstream provider compromise or credential theft | Treat all API-fed content as lower-trust tier; validate and sanitize before ingestion; monitor for volume anomalies or content drift from expected patterns |
| Shared KB update | Authorised internal contributor — or an AI system that writes back to the KB — inserts content that is structurally legitimate but semantically adversarial | High — content is indistinguishable from legitimate contributions by automated scanning | Review process for high-impact KB updates; flag documents that rank in top-k for high-value query patterns; monitor for unusual retrieval patterns after new content is added |
| Adversarial document from external user | In systems where external users can contribute content (support tickets, submitted documents, user-generated content pipelines), attacker submits a document crafted to rank highly for target queries | Low — no privilege required; available to any user with submission access | Quarantine external user content before ingestion; never ingest user-submitted content into the primary KB without review; apply separate trust tier with retrieval ranking penalty |
What data poisoning is
Data poisoning in a RAG knowledge base is the deliberate insertion of documents crafted to influence model outputs when retrieved. The objective is not to corrupt the knowledge base wholesale — that would be detectable. The objective is to insert a small number of documents that rank highly for specific query patterns and produce specific, intended outputs from the model.
A poisoned document can serve several adversarial goals:
- Misinformation: the document states false facts that the model will synthesise into its answer, presented with the same confidence as authoritative sources.
- Indirect prompt injection: the document contains embedded instructions that the model executes when it processes the document as context. These are not stated as instructions in the user query — they arrive through retrieval.
- Authority impersonation: the document is crafted to appear authoritative — formatted like internal policy, attributed to trusted sources — so the model treats it preferentially in synthesis.
- Scope manipulation:the document expands the model's apparent mandate, causing it to take actions or provide information outside its intended scope.
A poisoned document does not need to dominate the knowledge base. It only needs to rank first for the queries that matter to the attacker — and modern embedding models make it possible to craft documents that do exactly that.
Why RAG is vulnerable
RAG systems are structurally vulnerable to data poisoning for three reasons:
First, the knowledge base has a write surface. Unlike the model itself, which is a fixed artefact that requires controlled retraining to modify, the knowledge base is designed to be written to. It accepts new documents continuously. Any party with access to any ingestion path has a write path to the corpus.
Second, the retrieval mechanism is not content-aware in a security sense. It ranks documents by semantic similarity to the query, not by trustworthiness, provenance, or classification. A crafted document that embeds the right vocabulary will rank alongside — or above — authoritative documents without any additional privilege.
Third, the model treats retrieved content as evidence. When a document appears in the model's context alongside the user's query, the model has no mechanism to evaluate whether that document is trustworthy, recent, or accurate. It synthesises from what it receives. The quality of the output is a direct function of the quality of the retrieved content.
Attack vectors
In assessed systems, data poisoning reaches the knowledge base through several vectors:
Compromised document upload. Many RAG systems allow authorised users to upload documents directly to the knowledge base — through a content management interface, a shared folder, or an API endpoint. An attacker with credentials for any upload account can insert poisoned documents directly.
Insider threat via authorised contributor. A malicious or coerced insider with legitimate write access can insert documents that are structurally indistinguishable from legitimate content. Document review processes rarely include semantic checks against adversarial intent.
Supply chain compromise of document sources. Many RAG systems ingest from external sources: shared document libraries, third-party content feeds, web scraping pipelines, integration connectors to SaaS platforms. A compromise of any of these upstream sources results in poisoned documents reaching the knowledge base without any action by a party inside the organisation.
Transitive write via integration. Automation tools, chatbot pipelines, and AI-assisted content generation often write back to knowledge bases as part of their normal operation. An attacker who can influence the output of any such system has a transitive write path to the corpus.
The poisoning mechanism
A well-crafted poisoned document has two properties that make it effective and hard to detect: it is retrievable for the target query pattern, and it is plausibly authoritative.
Retrievability is a function of embedding similarity. An attacker who understands the embedding model — which is not secret, since most RAG systems use publicly documented embedders — can craft document text that embeds close to the expected query vectors. This does not require mathematical expertise; it requires iterative text crafting against a known embedder. The result is a document that ranks in the top-k results for the target query.
Plausible authority is a function of formatting and framing. The model treats a document formatted like an internal policy, attributed to an internal team, with consistent terminology and structure, as more authoritative than an informal note on the same topic. A poisoned document that mimics the formatting conventions of legitimate documents in the knowledge base will be treated with corresponding weight by the model.
For indirect prompt injection payloads, the mechanism extends one step further: the document contains text that the model interprets as instructions rather than data. The injection payload is typically embedded within otherwise legitimate content — a few sentences in the middle of a plausible policy document — so that a human reviewer does not flag it as suspicious.
Detection difficulty
Data poisoning in RAG is harder to detect than most content security problems for three reasons.
The attack is targeted, not bulk. A poisoned document is designed to produce specific outputs for specific queries. It behaves normally for all other queries. A review of the knowledge base that does not include the target queries will not surface the attack. Bulk content scanning — looking for obvious markers of malicious content — misses targeted poisoning by design.
The document looks legitimate. A well-crafted poisoned document is structurally indistinguishable from legitimate content in the same knowledge base. It uses the correct formatting, the correct terminology, and the correct level of detail. Human review will not flag it unless the reviewer happens to query the knowledge base with the exact query pattern the attacker targeted.
The output looks legitimate. A model answering with content from a poisoned document produces an answer that looks like a normal model answer. The user has no mechanism to distinguish a response grounded in a legitimate document from one grounded in a poisoned document. Without citation transparency — showing users exactly which documents the answer was grounded in — there is no user-side signal.
Controls
Data poisoning in RAG knowledge bases is addressed by controls at the ingestion stage, the retrieval stage, and the output stage.
Ingestion-stage controls. Document provenance verification: every document entering the knowledge base carries a verified source record — who submitted it, from what system, at what time. Documents from lower-trust sources are tagged accordingly and treated with reduced authority in retrieval ranking. Content validation at ingestion: structural checks for format, size, and encoding compliance; semantic scope checks that flag documents far outside the intended knowledge domain.
Retrieval-stage controls. Trust-weighted ranking: retrieval results incorporate document trust tier as a ranking signal alongside embedding similarity. A document from a verified internal source outranks a structurally similar document from an unverified external source. Retrieval diversity: top-k results are drawn from multiple document clusters rather than allowing a single highly-ranked document to dominate the context.
Output-stage controls.Citation transparency: the model's response includes citations to the source documents, allowing users and auditors to verify grounding. Output validation: automated checks for responses that reference out-of-scope content or contain instruction-like text patterns.
Review evidence
A RAG security review must produce specific evidence for the data poisoning threat. That evidence covers the ingestion controls, the detection capability, and the test results.
- Ingestion pipeline architecture showing all write paths and their access controls.
- Document provenance schema — what metadata is stored with each document, including source, contributor, and trust tier.
- Content validation configuration at ingestion — what checks run, what they flag, and what happens to flagged content.
- Results of adversarial ingestion testing: what documents were inserted as part of the review, whether they ranked as intended, and whether the output validation controls caught the poisoned content.
- Remediation record for any control gaps identified during the review.
See the Drel RAG security assessment hub for the full review framework, including the data poisoning threat module with worked test cases.
Blog
Get new posts in your inbox
AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.
Address data poisoning in your RAG review
Drel covers the data boundary, ingestion controls, and adversarial testing evidence for RAG knowledge base poisoning as a named threat in every RAG security assessment.
A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.