BlogReference

Assessing third-party AI vendors — the questions procurement skips

Third-party AI vendor assessments typically cover data processing agreements and SOC 2. They miss model governance, incident notification for model updates, and the evidence required to re-assess when the vendor changes the underlying model.

Drel10 min read

Third-party AI vendor assessments are not new — most enterprises have been running them for two years now. What is new is the realisation that the standard tooling does not cover the AI-specific risks. SIG and CAIQ ask about operational security. SOC 2 attests to control operation. A signed DPA addresses lawful processing. None of the three asks the questions that matter most when the vendor swaps the underlying model next quarter, or fine-tunes on your data without flagging it as a material change, or fails to flag an incident because its incident-response policy did not anticipate an AI failure mode.

This piece is a reference. It walks through seven questions that the standard kit does not ask, names what a defensible answer looks like, and provides the contract language and evidence artefacts that turn the answers into the basis for a clearance decision. It is not a replacement for SIG, CAIQ, SOC 2 or the DPA — it is the supplement that closes the AI-specific gaps the standard kit leaves open.

What typical vendor assessments cover — and what they miss

A standard enterprise vendor assessment for a software vendor includes some combination of: a Standardised Information Gathering (SIG) questionnaire, the Cloud Security Alliance Consensus Assessments Initiative Questionnaire (CAIQ), a SOC 2 Type II report, a signed Data Processing Agreement under Article 28 of the GDPR or its non-EU equivalent, and where regulated workloads are involved, a privacy questionnaire and a transfer impact assessment. The kit takes weeks to complete on the vendor side and weeks to review on the customer side. It is well-rehearsed.

The kit covers four things well. It covers operational security — access control, incident response capability at the infrastructure level, encryption in transit and at rest, vulnerability management. It covers data handling — retention, deletion, segregation, backups. It covers legal arrangements — processing purposes, sub-processors, transfers, breach notification, audit rights. And it covers the vendor's overall control environment via the SOC 2 attestation.

What the kit does not cover is model governance. The SIG asks about software development lifecycle but does not ask whether the AI model is selected, evaluated, versioned, or updated. The CAIQ asks about service architecture but does not ask which model the AI feature is built on. The SOC 2 attests to control operation but does not, in most reports today, mention AI-specific controls at all. The DPA addresses processing of personal data but does not address the model lifecycle that sits behind the processing.

SIG, CAIQ, and SOC 2 were written for software vendors. AI vendors are software vendors plus a model lifecycle. The model lifecycle is where the residual risk lives, and it is the layer the standard kit is silent on.

The result is a vendor file that looks complete on a procurement dashboard and is, in fact, missing the questions that matter when the vendor changes the AI feature next quarter. The seven questions below are the ones that close that gap. Each one is short, asked early, and answered in a way that the vendor can commit to in the contract.

The seven AI-specific vendor questions — and what the standard questionnaires miss

Q1
What model and what version?
Coverage today: Not in SIG, CAIQ, or SOC 2.
Defensible artefact: Vendor architecture document naming model family, version, configuration.
Q2
What is the model update / change notification policy?
Coverage today: Not in SIG. Partial in some SaaS change-management sections.
Defensible artefact: Contract clause committing to advance notification with a defined window.
Q3
Is customer data used to train or improve any model?
Coverage today: Not in SIG. Not in SOC 2.
Defensible artefact: Contractual clause, not the marketing site, with the default position written down.
Q4
What counts as an AI incident? What is the disclosure window?
Coverage today: SOC 2 covers infrastructure incidents. Not AI-specific incidents.
Defensible artefact: Incident response policy with AI-specific incident categories and notification SLAs.
Q5
Sub-processor list — including model providers?
Coverage today: GDPR Art. 28 requires the list. Older lists predate AI features.
Defensible artefact: Sub-processor list dated within the last twelve months, naming any model provider.
Q6
Right to re-assess when the vendor changes the AI feature?
Coverage today: Not in the standard kit.
Defensible artefact: Contract clause permitting fresh assessment, audit, or pause on material AI change.
Q7
What evidence will the vendor provide in an AI incident?
Coverage today: SOC 2 incident clause covers logs. Not AI behaviour analysis or customer impact scope.
Defensible artefact: Defined list of artefacts the vendor commits to produce post-incident.

All seven are addressable through contract language and a short addendum to the vendor questionnaire. The hard part is asking before the contract is signed.

Question 1 — what model and what version?

The simplest question, and the one most often answered evasively. The vendor says “we use AI,” or “we use large language models,” or “our AI is powered by leading providers.” None of these answers is sufficient to reason about the system the vendor is selling.

A useful answer to Question 1 names four facts. The model provider — for instance OpenAI, Anthropic, Google, Cohere, Mistral, or an in-house build. The model family — for example GPT-4, Claude Opus, Gemini 1.5 Pro, or a specific fine-tune name. The version — the specific snapshot identifier or the date of the snapshot. And the configuration — temperature, sampling, system prompt architecture, any safety filters, and whether the model is fine-tuned on customer or vendor data.

The information matters for three reasons. It is the input to supply chain analysis: who is upstream of the vendor, what do you know about the upstream provider's governance, what risks do they bring into the chain. It is the input to the change analysis: when the vendor upgrades, you need a before-state to compare to. And it is the input to incident correlation: if the model provider publishes a vulnerability advisory, you need to know whether you are downstream of the affected component.

A vendor unwilling to name the model in writing is a vendor unwilling to be accountable for the model. That is a procurement signal in itself, and one that should be raised before the contract is signed rather than discovered at the first incident.

Question 2 — what is the model update / change notification policy?

Models update. The vendor swaps the underlying model on a quarterly cadence in some sectors, an annual cadence in others, and on a public schedule in some cases and silently in others. Without a notification clause, the system you assessed at procurement is not the system in production six months later — and you have no record that the change happened.

Three patterns appear in the market. The first is public versioning: the vendor publishes when a new model version is rolled out, customers see the change date in a release notes feed, and the version of the model in production is independently verifiable. The second is configurable versioning: the vendor allows the customer to pin a model version and to upgrade on their own schedule. The third is silent updates: the vendor decides when to swap models and the customer finds out by reading the marketing announcement or by noticing changed behaviour.

The third pattern is incompatible with a structured AI security review. If the vendor can change the underlying model without notice, the disposition memo you produced at procurement is stale the moment the change happens, and the re-assessment trigger has fired without anyone noticing it.

The specific window is negotiable; the existence of the clause is not. The clause is also the hook for Question 6 — the right to re-assess on change — and the two clauses should be drafted as a pair.

Question 3 — is customer data used to train, fine-tune, or improve any model?

The question SIG misses entirely. It is also the question that most often surprises procurement teams when they ask, because the vendor's marketing page says one thing and the contract says another. The default position in enterprise inference contracts today is that customer data is not used for training; the variance is in how that default is expressed and how robust the expression is.

A defensible answer to Question 3 has three components. The first is a clear statement of the default — typically that no customer data is used to train, fine-tune, or improve any model operated by the vendor or its sub-processors. The second is the exceptions — any opt-in pathways (a customer enabling fine-tuning of a private model on its own data), and any data categories that fall outside the default (for instance, anonymised aggregate metrics). The third is the source of the commitment: which clause in the contract carries it, and which sub-processors have committed to the same default.

A “we don't train on your data” statement on the marketing site is not evidence. It does not bind the vendor, does not bind sub-processors, and does not survive a change of corporate ownership. The same statement in the contract, with sub-processor pass-through language, is evidence.

Where the vendor relies on a third-party foundation model provider, the pass-through is the critical element. The primary vendor commits, but the inference happens at the model provider's endpoint, and the model provider's contractual position determines whether the data is actually safe from training reuse. Most major model providers offer a clear “no training” position for their enterprise inference offerings; verify the pass-through is in place rather than assuming.

Question 4 — what counts as an AI incident, and what is the disclosure window?

Vendor incident-response policies define incidents at the infrastructure level: a breach of confidentiality, a loss of availability, a compromise of the service's integrity at the system layer. The policies are well-rehearsed, and SOC 2 attests to them. They do not define AI-specific incidents — which means an event that everyone outside the vendor would call an AI incident may not trigger the vendor's notification clauses at all.

A useful definition of an AI incident covers at least these categories:

  • Model failure affecting customer-impacting decisions. An output that is materially wrong, that was used as the basis for a customer action, and that the customer would expect to know about.
  • Hallucination affecting customer-facing content. An output that fabricates facts about identifiable individuals, products, prices, or regulatory positions, and that reaches end users.
  • Prompt injection success against customer data. A manipulation of the agent that causes it to disclose customer data or take actions outside its intended scope.
  • Model bias incident. Outputs that are demonstrably differentially worse for one demographic group, surfaced through monitoring or complaint.
  • Training-data exposure. Where output regurgitates content that should not have been in training data, or where a training-data leak is identified by the model provider or the vendor.
  • Model provider incident with downstream effect.Where the upstream model provider has declared an incident and the vendor's service is in scope.

The disclosure window for each category should be tied to the materiality. A confirmed prompt injection against customer data is reasonably notified within a small number of hours; a non-personal bias finding from internal monitoring can be notified on a longer cycle. The point is to have the window defined rather than left to ad hoc judgement at the time of the incident.

Question 5 — sub-processor list, including model providers

GDPR Article 28 requires processors to maintain a list of sub-processors and to obtain customer consent — or at minimum, to provide notice — for additions. The list is a routine artefact in any vendor file. The problem with AI vendors is that the sub-processor list often predates the AI feature and does not name the foundation model provider.

The pattern is consistent across vendors that added AI features to existing products in the past two years. The sub-processor list lists hosting providers, analytics tools, CDN providers, and the standard SaaS dependencies. It does not list OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Vertex, or whichever inference endpoint the vendor uses for the AI feature. That endpoint is a sub-processor for any AI-routed personal data — and its omission from the list is a defect, not an oversight.

If the sub-processor list does not name the model provider, either the vendor is operating the model in-house — in which case the list should say so — or the list is incomplete and the vendor has not done the Article 28 housekeeping.

Question 5 has two sub-parts. First: is the sub-processor list current, and does it include the model provider where the vendor uses third-party inference. Second: what is the vendor's notification policy for sub-processor changes, including the addition or substitution of a model provider. The change is material; it should not be quiet.

Question 6 — right to re-assess on vendor change

Questions 1, 2, and 5 establish what the vendor is using today and what notifications they will give when that changes. Question 6 closes the loop: it establishes the customer's right to act on the notification. Without a re-assessment right, the customer can be told that the model has changed and has no contractual mechanism to do anything about it.

The right to re-assess typically has three elements:

  • Information request. The customer can ask the vendor for documentation describing the change and its impact on the AI feature. The vendor commits to provide reasonable responses within a defined window.
  • Fresh assessment.The customer can repeat the AI security review against the changed system, at the customer's cost, with the vendor's cooperation. Where the vendor's response shows that the change is small enough not to require a full re-assessment, the right falls away.
  • Pause or terminate. Where the change materially worsens the risk profile and the vendor cannot offer a path to mitigation, the customer can pause use of the AI feature or terminate the relevant order without penalty.

The pause-or-terminate right is the most contested in negotiation. Vendors resist it because it gives the customer an exit triggered by a unilateral vendor change. Customers need it because the alternative — being contractually committed to an AI feature that has materially changed and that the security review cannot endorse — is untenable.

Question 7 — what evidence will the vendor provide in an incident?

Question 4 defines what an AI incident is. Question 7 defines what the vendor will produce when one happens. The two questions are paired: a notification without supporting evidence is not enough to enable the customer to scope the impact or to manage their own incident response.

The evidence list for an AI incident typically includes:

  • Logs of the affected interactions. The prompts, the retrieved context, the system prompts, the model outputs, and the actions taken on the basis of the outputs. With timestamps.
  • Model behaviour analysis.The vendor's assessment of what the model did, why it did it, and whether the behaviour is reproducible.
  • Customer data impact assessment.Which of the customer's data subjects, data categories, or data sets were affected, and to what extent.
  • Containment and remediation actions. What the vendor did to stop the incident, prevent recurrence, and restore confidence in the affected feature.
  • Post-incident review.The vendor's root-cause analysis and the change-of-state in the AI feature — including whether a model version, a configuration, or a control was changed in response.

The evidence list is what allows the customer to write its own incident memo, to update its risk register, and to satisfy its regulator if the incident is reportable on the customer's side. Without a contractual commitment to produce the list, the customer is at the vendor's discretion at the moment when the customer most needs information.

The conditional clearance pattern

A vendor AI feature rarely earns unconditional clearance at first review. The standard pattern is conditional clearance — the system can be used for a defined purpose, with defined controls, subject to defined re-assessment triggers, and on the basis of defined contractual commitments. The seven questions in this piece map onto the standard conditional clearance template directly.

A conditional clearance for a third-party AI vendor typically reads like this:

The conditional clearance is the artefact that the AI Committee approves and that the procurement team takes to the vendor. It is also the artefact that triggers the re-assessment cycle when one of the conditions is breached or one of the triggers fires. Without the clearance, the assessment becomes a one-off exercise that ages out of relevance within a year.

The deeper point is that AI vendor assessment is not a procurement-time event. It is the start of a relationship that needs to be maintained through model changes, sub-processor changes, incidents, and the predictable evolution of the AI feature itself. The seven questions in this piece are the questions that make the maintenance possible — by establishing the visibility, the notification, and the response mechanisms at the start, before the relationship is locked in.

Related reading: for the upstream AI risk evidence that informs the vendor file, see the Article 9 evidence piece in this series. For the DPO's parallel evidence package that the vendor file feeds into, see the DPA piece. For the audit perspective that ties both together, see the ISO 42001 audit-readiness piece.

Assess vendors against the AI-specific questions procurement skips.

Drel maps third-party AI vendor reviews to the controls and evidence requirements that hold up under regulatory and audit scrutiny.

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.