BlogReference

An MCP server security review checklist

A structured checklist for reviewing an MCP server before connecting it to a production agent. Covers transport, authentication, tool manifest, context injection surface, third-party dependencies, and evidence requirements.

Drel Research10 min read

This checklist brings together the review requirements from across the MCP security review cluster into a single structured document. It is designed to be used directly in an AI security review for an MCP server deployment — whether the server is third-party or internally developed, and whether it uses stdio or HTTP/SSE transport.

Each section corresponds to one of the MCP attack surfaces or review domains. Each item specifies the gate at which it must be completed (before pilot, before production, or ongoing) and the evidence required to close the item. An item is not closed by an assertion — it is closed by the specified evidence.

How to use this checklist

The checklist is structured around five security domains:

  1. Transport — how the host and server communicate.
  2. Tool manifest — the tools the server exposes and their descriptions.
  3. Authentication — how the server verifies callers.
  4. Context injection — how tool results and resources are handled before reaching the model.
  5. Supply chain — for third-party servers, the vetting of the server itself.

Each item has a gate: “Before pilot” items must be completed before any production-like testing with real data or real users. “Before production” items must be completed before the server is connected to a production agent. “Ongoing” items must be addressed on a recurring schedule after production deployment.

Gaps at the “Before pilot” gate should be treated as blockers. A gap at this gate means the control is missing from the design — it will not be addressed by operational measures later. “Before production” gaps may be acceptable during a controlled pilot if explicitly tracked and assigned to a remediation owner. “Ongoing” gaps are operational risks that require a remediation plan and a timeline.

MCP server security review checklist

AreaCheckEvidence required
Transport — TLSTLS enforced on all HTTP/SSE endpoints. Protocol version TLS 1.2 minimum. Certificate validation enabled in client and not bypassed.TLS configuration excerpt. Certificate validation test result showing invalid cert is rejected.
Transport — AuthAuthentication required before any connection is accepted. Unauthenticated requests return 401. Credentials stored in secrets manager, not code.Unauthenticated request rejection test. Secrets management policy reference.
Tool manifestAll tool descriptions reviewed for imperative language, unusual length, and encoding anomalies. Approved manifest committed as a version-controlled baseline.Signed manifest baseline. Description review notes for each tool. Manifest drift detection test result.
Tool scopeEvery tool in the manifest is justified by the intended use case. Tools with no in-scope use are removed or disabled from the session.Tool justification record. Configuration showing excess tools are excluded from session.
Context injectionSystem prompt explicitly frames tool results as untrusted data. MCP client validates tool results before injecting into context. Data source write-access controls documented.System prompt review. Output validation test result. Data source access control documentation.
Authentication boundaryPer-user auth is implemented: end-user identity is forwarded to the MCP server as a verifiable token. Tool handlers enforce per-user access controls.Auth architecture review showing user identity flow. Cross-user isolation test result.
Supply chain (third-party servers)Server version pinned. Source review completed. Dependency vulnerability scan clean. Permission scope documented and justified. Re-review cadence registered.Source review notes. Vulnerability scan output. Permission scope documentation. Version pin in deployment config.
Audit loggingFull tool-invocation audit log captures: tool name, parameters, invoking identity, end-user identity, result summary, and timestamp. 90-day retention minimum.Sample audit log entries showing required fields. Log retention policy confirmation.

Transport checklist

For the detailed reasoning behind each transport control, see Transport security for MCP servers.

Transport — HTTP/SSE

Review itemGateEvidence required
TLS 1.2+ enforced for all connections. Plaintext HTTP rejected.Before pilotTLS configuration excerpt showing protocol version and cipher suites.
Certificate validation enabled in the MCP client. Not bypassed for any deployment environment.Before pilotTest result: connection with invalid or expired cert is rejected.
Certificate issued by a trusted CA and within validity period.Before pilotCertificate details (issuer, expiry) and rotation schedule.
Server is not accessible from the public internet (or internet exposure is explicitly reviewed and approved).Before pilotNetwork access control documentation or network scan confirming scope.
Mutual TLS in place for high-sensitivity deployments.Before productionmTLS configuration and client certificate provisioning record.
Transport-level connection events and auth failures logged.Before productionLog sample showing connection and auth events.

Transport — stdio

Review itemGateEvidence required
MCP server subprocess runs under a least-privilege OS account, not the host application account.Before pilotProcess user account and permission review.
Subprocess does not have access to credential files, SSH keys, or cloud configuration outside its stated scope.Before pilotFilesystem permission audit for the subprocess account.
In containerised deployments, subprocess isolation is documented.Before pilotContainer configuration review showing subprocess isolation.

Tool manifest checklist

For the detailed reasoning behind tool manifest controls — including tool poisoning mechanics — see Tool poisoning in MCP servers.

Tool manifest

Review itemGateEvidence required
Every tool in the manifest is justified for this deployment. No excess capabilities.Before pilotManifest review document with one-line justification per tool.
Tool descriptions are accurate and minimal. No imperative language, no embedded instructions to the model.Before pilotDescription review notes per tool. Reviewer sign-off.
Parameter schemas contain no instruction-like content in description fields.Before pilotSchema review notes. Any parameter description containing verbs reviewed.
Manifest is served from a version-controlled, authenticated source. Not dynamically generated at runtime from untrusted input.Before pilotManifest source review showing version control and signing.
Approved manifest baseline is documented and signed. Deviations from the baseline are detectable.Before pilotSigned baseline artefact. Manifest diffing mechanism or test.
Destructive tools (write, delete, execute, send, modify) require an explicit approval step before the model can invoke them.Before pilotApproval boundary design review. Test confirming destructive tool is blocked without approval.
Behavioral test cases covering tool-poisoning scenarios have been executed and results documented.Before productionTest case list and results.
Manifest re-review is scheduled on version change and on a periodic cadence.OngoingRe-review schedule in the security review record.

Authentication checklist

For the detailed reasoning behind authentication controls — including the client-auth vs user-auth gap — see The MCP authentication boundary, reviewed.

Authentication and authorization

Review itemGateEvidence required
Client authentication required and enforced from connection establishment. Unauthenticated connections rejected before any manifest is served.Before pilotAuth enforcement test: unauthenticated connection returns rejection.
Per-user authentication is implemented for multi-user agent deployments. End user identity is verified, not asserted.Before pilotAuthentication architecture review showing user identity flow. Token validation test.
Per-user authorization is enforced at the tool handler layer. User A cannot access User B's data through the agent.Before pilotCross-user access test: tool invocation with User A identity cannot retrieve User B data.
MCP server credentials are managed through a secrets manager. Not hardcoded in source code or deployment configuration.Before pilotSecrets management review. CI secret scanning output showing no hardcoded credentials.
MCP server runs under a least-privilege service identity. Not the host application identity or a broadly-privileged account.Before pilotIAM policy review showing service identity and granted permissions.
Downstream systems are called with user-scoped credentials where per-user access controls exist.Before productionDownstream call implementation review.
Credential rotation schedule is defined and implemented.OngoingRotation schedule and last rotation date.

Context injection checklist

For the detailed reasoning behind context injection controls, see Prompt-context injection through MCP tools.

Context injection

Review itemGateEvidence required
System prompt explicitly frames tool results as untrusted data. Instructions for handling instruction-like content in tool results are present.Before pilotSystem prompt review confirming data/instruction framing.
Tool results are validated before being passed to the model. HTML/markdown stripping or instruction-pattern flagging is in place.Before pilotOutput validation implementation review. Adversarial payload test result.
Data sources accessed by MCP tools are inventoried. Write access to each data source is documented.Before pilotData source inventory with access control model for each source.
MCP tools access only the data sources required for their stated function. No broad filesystem or database access.Before pilotAccess scope review. Access control test: out-of-scope resource returns rejection.
Prompt templates do not contain secrets, API keys, or internal policy text.Before pilotTemplate review. Secret scanning of template content.
Behavioral tests covering context injection scenarios have been executed and results documented.Before productionTest case list and results.

Supply chain checklist

This section applies to third-party MCP servers. For internally developed servers, replace this section with the internal server checklist from Securing an internal MCP server exposed to agents. For the full vetting process, see Vetting a third-party MCP server before you connect it.

Supply chain — third-party MCP servers

Review itemGateEvidence required
Server maintainer is identified and accountable. Security disclosure process is documented.Before pilotSource review notes with maintainer identification.
Source code is reviewable (open source) or vendor has a documented security attestation.Before pilotSource review or vendor security attestation.
Server version is pinned in the deployment configuration. Floating version dependencies are not in use.Before pilotDeployment configuration showing pinned version.
Dependency vulnerability scan has been run against the pinned version. Findings documented and triaged.Before pilotVulnerability scan output for the pinned version.
All server dependencies are sourced from standard package registries. No dependencies from personal or unverified sources.Before pilotDependency provenance review.
Permissions granted to the server process are documented and scoped to the minimum required.Before pilotPermission scope documentation.
Version change review process is defined: who re-reviews on update, and at what cadence.Before productionReview cadence documented in the security review record.
Dependency vulnerability scanning is scheduled to run periodically, not only at initial review.OngoingScheduled scan configuration or CI integration.

Evidence requirements

A completed MCP server security review should produce a structured evidence pack that can be presented to a governance committee, a procurement review, or a post-incident inquiry. The evidence pack should include:

Evidence is not an assertion that a control is in place. Evidence is an artefact that demonstrates it. A configuration file that enables a control, a test result that verifies it, or a log sample that shows it operating — these are evidence. A document that says “TLS is enabled” is an assertion. A TLS configuration excerpt and a certificate validation test result are evidence. The distinction matters when the evidence is examined under governance review or following an incident.

Required artefacts

  • Completed checklist — this checklist with each item marked pass, fail, or not applicable, with evidence reference for each passed item and remediation owner for each failed item.
  • Approved tool manifest baseline — a version-controlled, signed copy of the tool manifest as reviewed. Stored alongside the review record.
  • Test results — results of enforcement tests (unauthenticated rejection, certificate validation, cross-user access) and behavioral tests (tool-poisoning and context-injection scenarios).
  • Configuration excerpts — TLS configuration, IAM policy, secrets manager integration, network access control — as applicable.
  • Data source inventory — listing of data sources the MCP tools access, with access control model for each.
  • Residual risk record — any items that could not be fully remediated, with the residual risk description, the accepted risk owner, and the compensating controls in place.
  • Re-review schedule — the conditions that trigger a re-review (version change, configuration change, incident) and the periodic review cadence.

For deeper analysis of any individual section of this checklist, refer to the corresponding cluster article:

Blog

Get new posts in your inbox

AI security review, OWASP Agentic Top 10, ISO 42001 evidence, and what AI Committees actually need. No cadence promises — we publish when there's something worth reading.

Run a structured MCP server security review with Drel

Drel's AI security review produces a completed version of this checklist, with evidence collected against each item and a structured evidence pack for governance sign-off.

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.