Security review for fine-tuned models — what changes from base model assessment
Fine-tuned models inherit the base model's risk profile and add their own. Training data provenance, alignment drift, and capability overhang are the three areas a security review must address that base model assessments typically skip.
Why your existing threat modeling tool doesn't model agents
STRIDE, attack trees, and most commercial threat modeling tools were designed for deterministic software. Agentic AI has five properties those tools cannot represent — and each one is an attack surface.
Threat modeling an MCP server — the parts AppSec tools miss
MCP servers have four distinct attack surfaces: transport, tool surface, prompt context injection, and auth boundary. Traditional threat modeling tools model the first and miss the other three. Here is the full threat model with controls.
LLM red-teaming for a security review
Red-teaming an LLM application is not the same as penetration testing it. This piece covers the distinct techniques — goal hijacking, jailbreaking, indirect injection, and exfiltration chains — and how to document findings for a security review.
Context-window risks in RAG and how to bound them
The context window is the shared space where user queries and retrieved documents meet the model. Anything in that window can influence model outputs. Context-window risks in RAG are about what gets in — and how to bound it.
Goal hijacking and instruction drift in autonomous agents
Goal hijacking is the attack where a manipulated agent pursues an objective its operators did not intend. Instruction drift is the slow version. Both are harder to detect than traditional attacks because the agent appears to be working.
Guardrails that work vs guardrails that look like they work
Most LLM guardrails are classifiers layered on top of an unguarded model. They can be bypassed. This piece distinguishes the guardrail patterns that provide genuine risk reduction from those that provide the appearance of it.
MCP security vs traditional API security — what changes
MCP looks like a REST API to an infrastructure team. The security model is different: the client is a non-deterministic reasoning engine, the attack surface includes the tool descriptions, and the trust boundary is at the model, not the user.
System prompt leakage and why it matters for security
System prompts encode assumptions, scoping rules, persona instructions, and sometimes credentials. When they leak, they expose the system's trust model. This piece explains why this matters more than most teams believe.
The AI bill of materials (AI-BOM) for security review
An AI bill of materials documents the components of an AI system: base models, fine-tuning datasets, inference infrastructure, plugins, and dependencies. It is the foundation of a security review and a requirement under several emerging governance frameworks.
Securing an internal MCP server exposed to agents
Internal MCP servers — built to expose internal tools, databases, or APIs to agents — have different security requirements than public MCP servers. This piece defines the review checklist for an internally-hosted MCP server.
Evaluating a RAG pipeline for security, not just relevance
RAG evaluation frameworks are designed to measure retrieval quality and answer relevance. Security evaluation asks different questions: what data boundaries does the pipeline cross, what can be extracted, and what controls enforce the intended scope?
Privilege escalation paths in agentic AI
Agentic AI privilege escalation does not require a kernel exploit. It requires a model that can be convinced to invoke a tool it was not intended to invoke. This piece maps the escalation paths and the review controls that block them.
Excessive agency — when an LLM can do too much
OWASP LLM08 excessive agency is the risk that an LLM system has been given more capability than it needs to complete its task — and that excess capability can be exercised by a manipulated model. Least-privilege is the control.
Reading a model card as a security reviewer
Model cards were designed for ML practitioners. A security reviewer reads them differently — looking for training data, capability claims, limitations, and evaluation methodology. This piece explains how to extract security-relevant information from a model card.
Design-time vs runtime AI security — where review belongs
Runtime AI security tools watch for anomalies in production. Design-time review asks whether the system should go to production in the first place. Both matter, but conflating them creates blind spots at each layer.
Prompt-context injection through MCP tools
MCP tools return data that is injected into the model's context. When that data contains instructions, the tool becomes an injection vector. This piece explains how context injection works through MCP and the controls that prevent it.
Vector database security for RAG pipelines
Vector databases are infrastructure. They inherit all the access-control and injection requirements of any other data store — plus some RAG-specific ones. This piece maps the security requirements for a vector database in a production RAG pipeline.
Model denial of service and cost-exhaustion attacks
LLM denial of service is different from traditional DoS. An attacker does not need to crash the service — they need to make it expensive to run. Cost-exhaustion attacks are under-defended and growing in the assessed systems we review.
Security review for multi-agent systems
When agents orchestrate other agents, every trust assumption in the single-agent model multiplies. This piece defines the additional review surface for multi-agent systems: inter-agent trust, capability delegation, and blast-radius containment.
LLM supply-chain risk — models, weights, and dependencies
LLM applications have a supply chain that extends to pre-trained models, fine-tuning datasets, inference providers, and plugin ecosystems. This piece maps the supply-chain attack surface and the review questions for each layer.
Transport security for MCP servers
MCP runs over HTTP (SSE) or stdio. Both transports have distinct security requirements. This piece covers TLS, mutual authentication, and the review questions for MCP server transport configuration.
Indirect prompt injection through retrieved documents
When retrieved documents contain instructions the model executes, the attack surface is anything that ends up in the knowledge base. Indirect prompt injection via documents is harder to detect than direct injection because the attacker is not in the conversation.
Agent memory as an attack surface
Agents that persist memory across sessions carry forward context that can be poisoned. An attacker who controls a past interaction can plant instructions that execute in a future session. This piece maps the memory attack surface and the controls that bound it.
Sensitive information disclosure in LLM applications
LLM applications disclose sensitive information through three distinct channels: training data memorisation, system prompt leakage, and retrieval boundary failures. Each has different controls and different evidence requirements.
The MCP authentication boundary, reviewed
MCP servers authenticate the client (the agent) not the end user. When a user-facing agent invokes an MCP server, the server has no way to enforce per-user authorisation unless authentication is layered in explicitly. This piece maps the gap and the controls.
Access control for RAG — keeping retrieval inside the line
RAG pipelines retrieve documents and pass them into a model context that the user then queries. Access control must operate at retrieval time, not just at query time — or users can extract documents they would not be permitted to read directly.
Tool-use permissions for agentic AI — least privilege for agents
The tool manifest of an agentic AI system defines what the agent can do in the world. Most manifests are over-provisioned. Least privilege for agents means auditing the tool manifest for each deployment scope and removing capabilities the task does not require.
Insecure output handling — the LLM risk teams underrate
Teams spend significant effort hardening the input to an LLM and very little hardening what the LLM outputs. Insecure output handling is how prompt injection becomes code execution, data exfiltration, or stored injection.
Tool poisoning in MCP servers
MCP tool poisoning is the attack where a malicious tool description manipulates the model into invoking tools or revealing data outside its intended scope. The attack surface is the tool manifest — every tool description is untrusted input.
Data poisoning in RAG knowledge bases
A RAG knowledge base is only as trustworthy as the documents in it. Data poisoning attacks insert malicious content into the knowledge base — not to corrupt the index, but to influence model outputs when those documents are retrieved.
Mapping the agentic AI attack surface
The agentic AI attack surface has five distinct layers: the prompt channel, the tool surface, the memory layer, the orchestration boundary, and the output channel. This piece maps each layer with its associated threats and controls.
Prompt injection, explained for security reviewers
Prompt injection is the most widely discussed LLM attack and the most widely misunderstood. This piece cuts through the confusion: what it is, what its variants are, how it differs from SQL injection, and what controls actually reduce the risk.
MCP security — the four attack surfaces of a Model Context Protocol server
MCP servers extend a model's capabilities by exposing tools, resources, and prompts. Each extension point is an attack surface. This piece defines the four surfaces and the review questions for each.
RAG security — the three boundaries that matter
Retrieval-augmented generation adds a retrieval layer between the user and the model. That layer has three security boundaries — the data boundary, the retrieval boundary, and the context boundary — and each has distinct failure modes.
Agentic AI security — the surfaces deterministic software does not have
Agentic AI systems have attack surfaces that do not exist in deterministic software: a reasoning loop that can be hijacked, a tool manifest that defines what the agent can do, memory that persists across sessions, and goals that drift. Security review must address all four.