BlogFoundations

Scoping an AI security review without boiling the ocean

The most common failure mode in AI security reviews is scope so wide nothing gets finished. This piece walks through how to scope a review to the decision you actually need to make: the system, the deployment context, and the threshold.

Drel Research9 min read

The most common failure mode in AI security reviews is not a bad threat model or a missing control. It is a scope so wide that the review never finishes, or so narrow that the important risks are outside its boundary. Both produce the same outcome: a clearance decision that is either impossible to reach or impossible to trust.

Scoping is the first and most consequential decision in an AI security review. Get it wrong and everything downstream is either too expensive to complete or too narrow to be meaningful. Get it right and the review is bounded, tractable, and produces a defensible clearance decision.

The scoping problem

AI systems create genuine scoping ambiguity. A RAG pipeline over an internal knowledge base involves a model, an embedding model, a vector store, a retrieval layer, a prompt template, an API, a user interface, and the underlying infrastructure. Is the scope of the review all of that? Just the model and retrieval layer? Just the API boundary? The answer depends on what decision the review is meant to support.

The other failure mode is the review that scopes everything. “We need to review the entire AI strategy before we can approve any individual system.” This sounds thorough. In practice it means the review never happens, because the scope has no natural boundary and no completion criterion.

Scope is not “everything that is relevant”. Scope is the boundary of the system this specific review is authorising to operate. Everything inside the boundary is assessed. Everything outside it is either in scope for a different review or explicitly excluded with a reason.

Four scoping dimensions — right vs failure mode

DimensionRightScoping failure mode
System boundaryNamed components with explicit exclusions — each exclusion references the baseline that covers it (e.g. provider SOC 2 report, existing infra programme)"The AI system" — no component list, no exclusion reasons, every participant has a different mental model of what is inside
Deployment contextSpecific user population, data classification, operational consequence, and regulatory regime — fixed before the threat model begins"Internal users for now, expanding to customers later" — muddled threat model that is not clearly applicable to either deployment
Risk thresholdA stated clearance decision the review is meant to produce: go-live, pilot, or scope-expansion clearance for a named system and context"Identify all risks" — no completion criterion, the review never produces a decision, only more analysis
Review depthCalibrated to the system's risk profile: higher consequence and wider data access warrants fuller threat modelling and more verification evidenceApplying the same depth regardless of risk — over-engineering a low-risk internal tool or under-reviewing a customer-facing consequential system

Scoping decision matrix

Component or questionIn scopeOut of scopeBoundary call
The AI model and its configured capabilities
The prompt template and system prompt
The tool manifest and tool handler code
Data sources the system reads from (RAG corpus, databases)
The underlying LLM API provider's infrastructure
General cloud infrastructure not specific to the AI system
AI features in SaaS tools used by the same teamneeds judgment
Third-party MCP servers connected to the systemneeds judgment

Boundary calls require explicit scoping decision before the review begins. Document the decision and its rationale.

What scope actually means for an AI review

An AI security review is scoped to three things: the system boundary, the deployment context, and the review threshold. All three must be defined before the review starts. If any one of them is missing, the review will drift.

  • System boundary defines what components are inside the assessed system and what is outside it (and why). It is a technical boundary.
  • Deployment context defines who uses the system, for what purpose, in what environment. It is a risk context boundary.
  • Review threshold defines what decision the review is meant to support. It is the success criterion for the review.

These three together produce a tractable review. Without all three, you have a risk discussion, not an assessment.

Defining the system boundary

The system boundary is the line around the components the review will assess. Drawing it correctly requires two decisions: what is inside, and what is outside with an explicit exclusion reason.

What is typically inside the boundary:

  • The model(s) being used — provider, version, and configuration.
  • The prompt template and system prompt that govern the model’s behaviour.
  • Any retrieval layer, knowledge base, or data sources the model draws from.
  • The tool manifest for agentic systems — what tools can the model invoke?
  • The output handling layer — how model outputs are rendered or acted upon.
  • The API boundary — how users or systems interact with the AI component.

What is typically outside the boundary (with an explicit exclusion):

  • The model provider’s infrastructure — excluded because the provider’s own security certifications cover this. Document which certification applies.
  • The underlying cloud infrastructure — excluded because covered by the organisation’s existing infrastructure security programme. Document the baseline.
  • Adjacent systems that the AI system integrates with — excluded if they are reviewed under a separate assessment. Document the reference.

Deployment context shapes the risk profile

The same AI system can have very different risk profiles depending on how it is deployed. A customer service chatbot deployed to anonymous external users has a different risk profile from the same chatbot deployed only to authenticated employees. The threat model, required controls, and risk tolerance all change with the deployment context.

Deployment context includes:

  • User population. Internal employees, authenticated customers, anonymous public users, or a specific subgroup. The adversarial threat surface is very different for each.
  • Data classification. What data does the system process or have access to? Public, internal, confidential, regulated (personal data, financial data, health data)?
  • Operational consequence. Is this system informational (produces text the user reads) or consequential (triggers actions in downstream systems)?
  • Regulatory regime. Does this deployment fall under the EU AI Act, HIPAA, FCA, or another framework that sets additional requirements?

The deployment context determines the risk threshold the review must clear. A system that is informational, internal-only, and processes no regulated data has a lower bar than one that is consequential, customer-facing, and processes personal health information. The scope document must record the deployment context explicitly so the clearance decision is understood as specific to that context.

The review threshold: what decision does it enable?

The threshold is the decision the review is meant to make possible. It is the success criterion for the whole exercise. Without a clear threshold, reviews produce analysis without producing a decision.

The threshold can be expressed as one of:

  • Go-live clearance.“This review will determine whether the system is cleared to go to production for the stated user population and deployment context.”
  • Pilot clearance.“This review will determine whether the system is cleared for a restricted pilot of [N] internal users, with full production clearance contingent on [conditions].”
  • Scope expansion clearance.“The system has an existing clearance for [previous scope]. This review will determine whether it is cleared for the expanded scope of [new scope].”
The review threshold is not “confirm the system is secure”. No AI security review can certify perfect security. The threshold is: does this system meet the bar to operate in this context, and under what conditions?

The scope document

Before the review begins, produce a short scope document. This is not a threat model or a control plan — those come later. It is the agreement about what this review covers. It should fit on one page and contain:

  • System name and version — the specific instance being reviewed.
  • System boundary — components in scope and explicit exclusions with reasons.
  • Deployment context — user population, data classification, operational consequence, regulatory regime.
  • Review threshold — the clearance decision this review is meant to produce.
  • Review participants — who is conducting the review, and in what role.
  • Out-of-scope items — what is explicitly excluded, and the referenced baseline for each exclusion.

This document becomes the first page of the evidence pack. It makes the clearance decision interpretable: the clearance is for this system, in this context, against this threshold. When the system or context changes, the reviewer can look at this document and immediately see whether the change is within the assessed scope or triggers a re-assessment.

Common scoping mistakes

Having reviewed a significant number of assessed systems, the scoping errors that cause the most downstream pain are consistent:

  1. Scoping to “the AI system” without defining what that means. Every participant has a different mental model of what “the system” includes. The review produces a threat model for one version of the system while controls are being specified for another.
  2. Excluding infrastructure without documenting the baseline. “Infrastructure is out of scope” is a gap unless there is a documented, referenced baseline that covers it. If the baseline is a SOC 2 report from 18 months ago, say so — and note whether you have reviewed what it covers.
  3. Not fixing the deployment context before the review starts. A review conducted against “internal users eventually expanding to customers” will produce a muddled threat model that is not clearly applicable to either. Review the current deployment. When scope expands, review again.
  4. Setting the threshold as “identify all risks”. This has no completion criterion. The threshold must be a clearance decision. Risk identification is a step in producing it, not the destination.
  5. Not getting the scope document agreed before the review starts. Scope disputes that surface mid-review waste the most time and produce the most coverage gaps. Agree the boundary on paper before the first threat-modelling session.

Blog

Get new posts in your inbox

AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.

Define scope before you open the threat model

Drel structures every assessment around an explicit scope document — system boundary, deployment context, and review threshold — before the threat modelling begins.

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.