The technical documentation the EU AI Act expects
The EU AI Act requires technical documentation before a high-risk AI system is placed on the market. This piece breaks down what Annex IV requires, what it means in practice, and the gaps that appear most often in documentation we have reviewed.
The EU AI Act's Annex IV defines the technical documentation that must exist before a high-risk AI system is placed on the market or put into service. It is not a checklist — it is a set of substantive requirements that collectively add up to a technically coherent account of what the system is, how it was built, and what its known limitations are.
This piece walks through each of the eight Annex IV sections, translates the regulatory language into specific documentation requirements, identifies the evidence that satisfies each section, and notes the gaps that appear most often when organisations try to assemble this documentation for the first time.
Why documentation is the compliance gate
For high-risk AI systems, Annex IV documentation is not an afterthought — it is the primary mechanism by which the EU AI Act is enforced. A provider claiming conformity with the high-risk obligations must be able to produce Annex IV documentation on request from a notified body or market surveillance authority. Without it, there is no conformity — regardless of how carefully the system was designed.
This creates a practical problem for organisations that have deployed systems thoughtfully but without the explicit intent of generating regulatory documentation. The engineering team may have made careful decisions about training data, model selection, and testing — but if those decisions were not documented at the time, reconstructing the documentation retrospectively is substantially harder and produces weaker evidence.
Annex IV documentation is evidence of how a system was built and why. It cannot be written from the outside after the fact — it requires access to the design decisions, training records, and test results that existed during development.
Annex IV — the eight sections
Annex IV is structured around eight areas of documentation. The table below maps each section to what the regulation requires, the evidence that satisfies it, and the most common documentation gap. The following sections discuss the most substantive areas in detail.
Annex IV — eight sections, evidence requirements, and common gaps
| Section | What Annex IV requires | Evidence to produce | Typical gap |
|---|---|---|---|
| 1. General description | Intended purpose, version information, hardware/software the system runs on, description of components and their interactions | System description document covering: what the system does, who uses it, what decisions it influences, and the technical components involved | Descriptions written for marketing rather than technical audiences — no component inventory, no version record |
| 2. Design and development | Development methodology, training methodology (for ML systems), choices made in design and why, iterative design history | Training methodology description, model selection rationale, design decision log (for internally developed systems) | No design rationale — the system exists but there is no record of why architectural or training choices were made |
| 3. System architecture | Technical architecture including components, interfaces, data flows, and dependencies — sufficient for a technically qualified reviewer to understand the system | Architecture diagram with component labels, data flow notation, external dependency list, API interface documentation | High-level diagrams that describe the concept rather than the actual system — no data flow, no dependency inventory |
| 4. Data governance | Training, validation, and testing datasets — origin, composition, collection methods, labelling methodology, known limitations, bias analysis | Dataset documentation: provenance record, composition description, known gaps, bias examination results | Training data documentation for third-party foundation models — what the vendor discloses is rarely sufficient for Annex IV |
| 5. Validation and testing | Testing methodology, metrics used, test results, performance across relevant subgroups, red-team or adversarial testing results | Test plan, test results report, performance by subgroup where relevant, adversarial testing results for safety-critical scenarios | Testing documentation for development context only — no evidence of performance in the specific deployment context |
| 6. Known limitations | Documented limitations of the system — scenarios where performance degrades, inputs the system is not designed to handle, failure modes | Limitations register: each known limitation with description, conditions under which it applies, and how deployers should handle it | Limitations stated in generic terms — 'may produce inaccurate results' without specifying under what conditions |
| 7. Instructions for use | The information deployers need to operate the system correctly — intended purpose, acceptable input, output interpretation, human oversight requirements | Instructions for use document satisfying Article 13 requirements — see the transparency section of the high-risk obligations guide | Instructions written as legal disclaimers rather than operational guidance |
| 8. Monitoring and maintenance | Expected lifetime, maintenance requirements, post-market monitoring plan, conditions under which re-review is required | Lifecycle documentation: maintenance schedule, monitoring cadence, conditions triggering re-assessment | No post-deployment monitoring plan — the documentation ends at deployment rather than covering the operational period |
1. General description of the system
The general description section requires a plain-language account of what the system does, what it is designed to achieve, and who uses it. It must cover the system's intended purpose, the hardware and software environment it requires, and the version or version history of the system being documented.
This section is frequently underestimated. The common error is treating it as a marketing summary — describing the system's benefits and intended use in terms designed to reassure rather than inform. Annex IV requires a technical description that would allow a technically qualified reviewer to understand the system without additional context.
A well-constructed general description covers: the system's primary function and how it achieves it; the inputs it receives and the outputs it produces; the decisions it influences and who is affected by those decisions; the user populations who interact with the system; and the technical environment in which it operates.
2. Design and development process
For ML-based systems, the design and development section is the core of the technical documentation. It must describe the training methodology, the choices made during development and why they were made, and the iterative process through which the system reached its current form.
This is the section that most organisations building on top of foundation models struggle with. If you are using a pre-trained model from an external provider (whether open-weight or via API), you did not design the training methodology — the foundation model provider did. Your documentation can cover what you did: the fine-tuning process, the prompt engineering, the system architecture built around the model, and the evaluation methodology. But the upstream training documentation depends on what the model provider discloses.
3. System architecture
The system architecture section requires a technical diagram and description sufficient for a technically qualified reviewer to understand the system's structure. This means component-level architecture, not conceptual overview.
The architecture documentation must cover: the main system components and their functions; the interfaces between components, including data flows; external dependencies (APIs, databases, third-party services); and the security boundaries relevant to the system's risk profile.
For agentic systems — systems with multiple components, tool use, and orchestration layers — the architecture documentation is particularly important. The orchestration logic, the tool permissions, and the boundaries between AI-driven and human-verified actions are all relevant to the risk assessment and must be documented.
4. Data governance documentation
The data governance section documents the training, validation, and testing datasets used in the system. This includes the origin of the data, the collection methodology, the labelling process, the composition, the known limitations, and the bias examination performed.
Annex IV section 4 and Article 10 overlap significantly. The data governance documentation required by Annex IV is the documentary evidence that the Article 10 data governance obligations have been addressed.
For systems that process personal data in training — which is a substantial proportion of high-risk AI systems — the data governance documentation must also address the GDPR lawful basis for processing, the data protection measures applied, and the intersection with any DPIA conducted under GDPR. This is an area where EU AI Act and GDPR documentation overlap directly.
5. Validation and testing results
The validation and testing section requires documentation of how the system was tested, what metrics were used, and what the results were. This must include performance across relevant subgroups — not just aggregate metrics that may hide disparate performance for specific populations.
For high-risk systems in Annex III categories where the affected population is heterogeneous — hiring tools assessing diverse applicant pools, credit scoring models applied across demographic groups, law enforcement tools used across different communities — subgroup performance analysis is not optional. The regulation requires it, and the absence of subgroup analysis is one of the most common documentation gaps in systems reviewed for EU AI Act readiness.
For security-relevant systems, validation and testing documentation should also include adversarial testing results. Article 15 requires robustness against adversarial inputs; the testing section is where evidence of that robustness lives.
6. Known limitations and foreseeable misuse
The known limitations section is arguably the most honest and therefore the most valuable part of Annex IV documentation. It requires the provider to document the conditions under which the system does not perform as intended, the failure modes that have been identified, and the foreseeable misuse scenarios that the system is not designed to handle.
A well-constructed known limitations register covers: performance degradation conditions (input types or contexts where the system is less reliable); edge cases that produce incorrect or unexpected output; foreseeable misuse scenarios identified during design; and the operational conditions under which the system should not be used.
What a security review produces for each section
An AI security review conducted at design time produces evidence that maps directly to several Annex IV sections. The mapping is not complete — sections 2 (design process) and 4 (data governance) require information from the development team and model provider that a security review cannot generate independently. But for sections 1, 3, 5, 6, and 7, the review produces documentation that can be used directly or with minor adaptation as Annex IV evidence.
- Section 1 (General description) — the system description component of the security review provides the foundation for the Annex IV general description.
- Section 3 (Architecture) — the architecture diagram produced during the review, with data flow and component labelling, satisfies the Annex IV architecture requirement.
- Section 5 (Validation and testing) — adversarial testing results from the security review contribute to the testing documentation, particularly for Article 15 robustness requirements.
- Section 6 (Known limitations) — the control gap analysis and known limitations identified during the security review map directly to the Annex IV limitations documentation.
- Section 7 (Instructions for use) — the system capabilities and limitations assessment from the security review informs the Article 13 instructions for use document.
The remaining sections — design process (section 2), data governance (section 4), and monitoring and maintenance (section 8) — require documentation from the development team and operational teams respectively. These sections cannot be produced by a security review alone, but the review can identify where they are incomplete and flag the gaps.
For a systematic approach to mapping review evidence to EU AI Act obligations and identifying which gaps remain, see the evidence mapping guide.
Blog
Get new posts in your inbox
AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.
Produce Annex IV evidence at design time
Drel's AI security review process generates the system description, architecture documentation, and known limitations register that Annex IV requires — structured as reusable regulatory evidence.
A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.