Blog

Technical

36 articles on AI security technical.

Technical12 min

Security review for fine-tuned models — what changes from base model assessment

Fine-tuned models inherit the base model's risk profile and add their own. Training data provenance, alignment drift, and capability overhang are the three areas a security review must address that base model assessments typically skip.

Technical13 min

Why your existing threat modeling tool doesn't model agents

STRIDE, attack trees, and most commercial threat modeling tools were designed for deterministic software. Agentic AI has five properties those tools cannot represent — and each one is an attack surface.

Technical11 min

Threat modeling an MCP server — the parts AppSec tools miss

MCP servers have four distinct attack surfaces: transport, tool surface, prompt context injection, and auth boundary. Traditional threat modeling tools model the first and miss the other three. Here is the full threat model with controls.

Technical13 min

LLM red-teaming for a security review

Red-teaming an LLM application is not the same as penetration testing it. This piece covers the distinct techniques — goal hijacking, jailbreaking, indirect injection, and exfiltration chains — and how to document findings for a security review.

Technical10 min

Context-window risks in RAG and how to bound them

The context window is the shared space where user queries and retrieved documents meet the model. Anything in that window can influence model outputs. Context-window risks in RAG are about what gets in — and how to bound it.

Technical11 min

Goal hijacking and instruction drift in autonomous agents

Goal hijacking is the attack where a manipulated agent pursues an objective its operators did not intend. Instruction drift is the slow version. Both are harder to detect than traditional attacks because the agent appears to be working.

Technical12 min

Guardrails that work vs guardrails that look like they work

Most LLM guardrails are classifiers layered on top of an unguarded model. They can be bypassed. This piece distinguishes the guardrail patterns that provide genuine risk reduction from those that provide the appearance of it.

Technical10 min

MCP security vs traditional API security — what changes

MCP looks like a REST API to an infrastructure team. The security model is different: the client is a non-deterministic reasoning engine, the attack surface includes the tool descriptions, and the trust boundary is at the model, not the user.

Technical10 min

System prompt leakage and why it matters for security

System prompts encode assumptions, scoping rules, persona instructions, and sometimes credentials. When they leak, they expose the system's trust model. This piece explains why this matters more than most teams believe.

Technical11 min

The AI bill of materials (AI-BOM) for security review

An AI bill of materials documents the components of an AI system: base models, fine-tuning datasets, inference infrastructure, plugins, and dependencies. It is the foundation of a security review and a requirement under several emerging governance frameworks.

Technical10 min

Securing an internal MCP server exposed to agents

Internal MCP servers — built to expose internal tools, databases, or APIs to agents — have different security requirements than public MCP servers. This piece defines the review checklist for an internally-hosted MCP server.

Technical11 min

Evaluating a RAG pipeline for security, not just relevance

RAG evaluation frameworks are designed to measure retrieval quality and answer relevance. Security evaluation asks different questions: what data boundaries does the pipeline cross, what can be extracted, and what controls enforce the intended scope?

Technical12 min

Privilege escalation paths in agentic AI

Agentic AI privilege escalation does not require a kernel exploit. It requires a model that can be convinced to invoke a tool it was not intended to invoke. This piece maps the escalation paths and the review controls that block them.

Technical11 min

Excessive agency — when an LLM can do too much

OWASP LLM08 excessive agency is the risk that an LLM system has been given more capability than it needs to complete its task — and that excess capability can be exercised by a manipulated model. Least-privilege is the control.

Technical10 min

Reading a model card as a security reviewer

Model cards were designed for ML practitioners. A security reviewer reads them differently — looking for training data, capability claims, limitations, and evaluation methodology. This piece explains how to extract security-relevant information from a model card.

Technical10 min

Design-time vs runtime AI security — where review belongs

Runtime AI security tools watch for anomalies in production. Design-time review asks whether the system should go to production in the first place. Both matter, but conflating them creates blind spots at each layer.

Technical11 min

Prompt-context injection through MCP tools

MCP tools return data that is injected into the model's context. When that data contains instructions, the tool becomes an injection vector. This piece explains how context injection works through MCP and the controls that prevent it.

Technical11 min

Vector database security for RAG pipelines

Vector databases are infrastructure. They inherit all the access-control and injection requirements of any other data store — plus some RAG-specific ones. This piece maps the security requirements for a vector database in a production RAG pipeline.

Technical10 min

Model denial of service and cost-exhaustion attacks

LLM denial of service is different from traditional DoS. An attacker does not need to crash the service — they need to make it expensive to run. Cost-exhaustion attacks are under-defended and growing in the assessed systems we review.

Technical13 min

Security review for multi-agent systems

When agents orchestrate other agents, every trust assumption in the single-agent model multiplies. This piece defines the additional review surface for multi-agent systems: inter-agent trust, capability delegation, and blast-radius containment.

Technical12 min

LLM supply-chain risk — models, weights, and dependencies

LLM applications have a supply chain that extends to pre-trained models, fine-tuning datasets, inference providers, and plugin ecosystems. This piece maps the supply-chain attack surface and the review questions for each layer.

Technical9 min

Transport security for MCP servers

MCP runs over HTTP (SSE) or stdio. Both transports have distinct security requirements. This piece covers TLS, mutual authentication, and the review questions for MCP server transport configuration.

Technical12 min

Indirect prompt injection through retrieved documents

When retrieved documents contain instructions the model executes, the attack surface is anything that ends up in the knowledge base. Indirect prompt injection via documents is harder to detect than direct injection because the attacker is not in the conversation.

Technical11 min

Agent memory as an attack surface

Agents that persist memory across sessions carry forward context that can be poisoned. An attacker who controls a past interaction can plant instructions that execute in a future session. This piece maps the memory attack surface and the controls that bound it.

Technical11 min

Sensitive information disclosure in LLM applications

LLM applications disclose sensitive information through three distinct channels: training data memorisation, system prompt leakage, and retrieval boundary failures. Each has different controls and different evidence requirements.

Technical10 min

The MCP authentication boundary, reviewed

MCP servers authenticate the client (the agent) not the end user. When a user-facing agent invokes an MCP server, the server has no way to enforce per-user authorisation unless authentication is layered in explicitly. This piece maps the gap and the controls.

Technical11 min

Access control for RAG — keeping retrieval inside the line

RAG pipelines retrieve documents and pass them into a model context that the user then queries. Access control must operate at retrieval time, not just at query time — or users can extract documents they would not be permitted to read directly.

Technical11 min

Tool-use permissions for agentic AI — least privilege for agents

The tool manifest of an agentic AI system defines what the agent can do in the world. Most manifests are over-provisioned. Least privilege for agents means auditing the tool manifest for each deployment scope and removing capabilities the task does not require.

Technical11 min

Insecure output handling — the LLM risk teams underrate

Teams spend significant effort hardening the input to an LLM and very little hardening what the LLM outputs. Insecure output handling is how prompt injection becomes code execution, data exfiltration, or stored injection.

Technical11 min

Tool poisoning in MCP servers

MCP tool poisoning is the attack where a malicious tool description manipulates the model into invoking tools or revealing data outside its intended scope. The attack surface is the tool manifest — every tool description is untrusted input.

Technical11 min

Data poisoning in RAG knowledge bases

A RAG knowledge base is only as trustworthy as the documents in it. Data poisoning attacks insert malicious content into the knowledge base — not to corrupt the index, but to influence model outputs when those documents are retrieved.

Technical12 min

Mapping the agentic AI attack surface

The agentic AI attack surface has five distinct layers: the prompt channel, the tool surface, the memory layer, the orchestration boundary, and the output channel. This piece maps each layer with its associated threats and controls.

Technical12 min

Prompt injection, explained for security reviewers

Prompt injection is the most widely discussed LLM attack and the most widely misunderstood. This piece cuts through the confusion: what it is, what its variants are, how it differs from SQL injection, and what controls actually reduce the risk.

Technical11 min

MCP security — the four attack surfaces of a Model Context Protocol server

MCP servers extend a model's capabilities by exposing tools, resources, and prompts. Each extension point is an attack surface. This piece defines the four surfaces and the review questions for each.

Technical10 min

RAG security — the three boundaries that matter

Retrieval-augmented generation adds a retrieval layer between the user and the model. That layer has three security boundaries — the data boundary, the retrieval boundary, and the context boundary — and each has distinct failure modes.

Technical11 min

Agentic AI security — the surfaces deterministic software does not have

Agentic AI systems have attack surfaces that do not exist in deterministic software: a reasoning loop that can be hijacked, a tool manifest that defines what the agent can do, memory that persists across sessions, and goals that drift. Security review must address all four.