Reviewing how an AI vendor handles your data
What happens to the data you send to an AI vendor? Is it used for training? Who can access it? Where is it stored? These questions are not always answered in the DPA. This piece defines the data-handling review for AI vendors.
Standard data handling reviews for vendor assessments cover a well-established set of questions: how is data encrypted at rest and in transit, what retention periods apply, where is data stored, and how is it deleted on contract termination. These questions are appropriate for conventional SaaS vendors and remain necessary for AI vendors.
They are not sufficient. AI vendors have a data handling layer that does not exist in conventional SaaS: the inference layer, where customer data is processed by the model. The questions that apply to the inference layer are different in kind from standard data handling questions — and the answers are often not reflected in the DPA terms that cover the vendor's infrastructure.
The core questions
The data handling review for an AI vendor requires answers to six core questions that go beyond standard data handling review:
- Is customer data used to train the model — at the vendor level or at any subprocessor level?
- How long is inference data (prompts and completions) retained, and by whom?
- Who can access retained inference data within the vendor organisation and within any subprocessors?
- What is the full subprocessing chain for the AI feature?
- Where is data processed and stored — and are the residency requirements of applicable regulations met?
- How is customer data deleted on contract termination, including data held by subprocessors?
These six questions address risks that are categorically different from standard data handling risks. They reflect the fact that AI systems process the semantic content of customer data in ways that conventional software does not — and that the outputs of that processing may be retained, used for training, or transferred to parties outside the customer's direct relationship.
Data handling categories — what to check and what evidence to require
| Category | What to check | Evidence required |
|---|---|---|
| Training data | Is customer data — including prompts, completions, uploaded documents, and interactions — used to train or fine-tune the model at the vendor or model-provider level? Is an opt-out in place and verifiable? | Written confirmation from vendor. Model provider DPA terms showing training-use restriction. Annual compliance attestation or opt-out confirmation mechanism. |
| Inference data | How long are prompts and completions retained at the vendor level and at each subprocessor? Who within those organisations can access retained inference data? | DPA retention schedule for vendor and each named subprocessor. Access control documentation showing who can read inference logs. Deletion confirmation on contract termination. |
| Output data | Are model outputs retained separately from inputs? Are they used for model evaluation, quality improvement, or any purpose beyond the immediate feature function? Can they be deleted on request? | Retention policy for completions and generated outputs. Statement on secondary use of outputs. Data subject erasure process for outputs containing personal data. |
| Retention | What is the maximum effective retention period across all parties in the chain — vendor plus all subprocessors? Is data deleted on contract termination, and does the deletion obligation cascade to subprocessors? | Retention schedule for each party in the chain. Deletion-on-termination confirmation including subprocessors. For training-embedded data: vendor position on model-weight erasure and available remedies. |
Training data use
Training data use is the question organisations most frequently fail to ask — and the one with the most significant data protection implications. The question is not just whether the vendor uses customer data for training. It is whether anyone in the subprocessor chain does.
The specific questions for training data use:
- Does the vendor use customer data — including prompts, completions, documents uploaded, or user interactions — to train or fine-tune the model that powers the product? If yes, what is the legal basis and opt-out mechanism?
- Does the model provider use inference data from the vendor's API calls to train or fine-tune their model? Does the vendor's agreement with the model provider opt out of this?
- If an opt-out exists, how does the vendor verify it is in effect? What contractual commitment does the model provider make?
- Has the vendor ever used customer data for model improvement or evaluation purposes, even under a different description?
A vendor response that says “we do not use your data for training” answers one half of the question. The other half is: does your model provider? The two questions are different and require different answers.
Inference data retention
Inference data retention covers how long prompts and completions are retained by the vendor and by any subprocessors. Retention practices vary significantly across vendors and model providers:
- No retention: The inference request and response are processed in memory and discarded. The ideal from a data protection perspective, but not universal — some vendors need to retain inference data for audit, debugging, or moderation purposes.
- Short-term retention: Inference data is retained for a defined short period (hours to days) for abuse detection, debugging, or customer support purposes, then deleted.
- Long-term retention: Inference data is retained for extended periods — weeks, months, or indefinitely — for quality improvement, model evaluation, or other purposes.
The review must establish the retention period not just for the vendor's systems but for each subprocessor. A vendor may retain inference data for 30 days; the model provider it uses may retain it for 30 days on a different calendar. The effective retention period for any given inference request is the maximum across all parties in the chain.
The subprocessing chain
The complete subprocessing chain for an AI feature must be documented to understand the full data handling picture. For most enterprise AI features, the chain includes at minimum the SaaS vendor and the model provider. It may include additional parties: AI middleware (orchestration frameworks that route requests), embedding model providers (separate from the generation model), vector database providers (for RAG-powered features), and moderation service providers.
For each party in the chain, the review must document:
- What data they receive as part of the AI feature's operation.
- Their data retention practices for that data.
- Whether they use the data for training or model improvement.
- The contractual basis for the data transfer to that party.
- Their security certifications and data protection commitments.
Data residency
Data residency for AI inference data is complicated by the geographic distribution of model provider infrastructure. Foundation model providers often process inference requests on infrastructure distributed across multiple regions, with routing determined by capacity rather than the customer's residency requirements.
The review must establish:
- Whether the vendor can commit to processing inference requests within a specified geographic region.
- Whether the model provider offers region-specific inference endpoints for enterprise customers.
- Whether the vendor's enterprise agreement includes data residency commitments for the inference layer.
- Whether there are international transfer mechanisms in place for any transfers that do occur across jurisdictions.
Deletion rights
Deletion rights for AI-processed data require attention to several layers. Standard deletion provisions cover data stored in the vendor's platform. AI-specific deletion concerns:
- Inference data deletion:Can retained inference data (prompts and completions) be deleted on request or on contract termination? Within what timeframe? Do the vendor's deletion obligations cascade to subprocessors?
- Model training data deletion:If customer data was used in model training or fine-tuning, deletion rights are complex — data embedded in model weights cannot be deleted in the conventional sense. What is the vendor's position on this? What alternative remedies are available?
- Data subject erasure: How are GDPR erasure requests handled for data that has been processed by the AI feature? Particularly relevant for AI features that process employee data or customer personal data.
Review checklist
The AI data handling review should produce a documented record covering all six core question areas. The checklist:
- Training data use: vendor response documented; subprocessor training data use documented; opt-out mechanism confirmed or gap noted.
- Inference data retention: retention period documented at vendor level; retention period documented at each subprocessor level.
- Access controls: who can access retained inference data documented at each level.
- Subprocessing chain: complete chain documented with each party's data handling terms.
- Data residency: commitments or gaps documented; applicable residency requirements confirmed or noted as unmet.
- Deletion rights: vendor deletion obligations documented; cascade to subprocessors confirmed or gap noted; training data deletion position documented.
Any gap in this checklist is a control gap that should appear in the disposition record with a closure plan or a risk acceptance note. For the full scope of AI vendor subprocessor obligations, see AI subprocessor risk in your vendor chain.
Blog
Get new posts in your inbox
AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.
Get clear on how your AI vendors handle data
Drel structures AI vendor data handling reviews to cover training data use, inference retention, the subprocessing chain, and deletion rights — with documented evidence for each.
A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.