EssaysEssays

The decision is human

AI can now write the security review. What it cannot do is decide the system is safe. That is becoming the only step that matters.

Drel8 min read

For most of the history of security, the expensive part of reviewing a system was understanding it. Someone had to sit with an architecture until they could see how it would behave, reason through the ways it could be turned against its owner, and write that reasoning down in a form other people could trust. The work was slow. It was scarce. And because it was scarce, most systems were never really reviewed at all — they shipped on the strength of someone's confidence, and the organization hoped that confidence was earned.

That constraint is lifting. The same models now being deployed into production can also be turned on the problem of reviewing them. Point a capable model at an architecture and it will reconstruct the components, trace the data flows, enumerate plausible attack paths, and propose a control plan — in minutes, at almost no cost, as many times as you like. The reasoning that used to take a senior practitioner days is becoming abundant.

When something that was scarce becomes abundant, the interesting question is what gets scarce in its place. Making the analysis cheap does not make the review free. It moves the cost somewhere else — to the one step that does not get cheaper just because the analysis did.

The reviewer
The review

The review was written by hand

A security architect reconstructed the system, reasoned through how it could be abused, and wrote the review themselves. It was slow, it was scarce, and most AI systems never got one.

Each era folds another part of the security review into the machine. The last one isolates the part that doesn't fold in.

The work that got cheap

It is worth being precise about what has actually changed, because the change is larger than it first appears. A security review is not one task; it is a stack of them. Reconstructing the system from whatever documentation exists. Mapping that reconstruction onto known threat patterns. Reasoning about which of those patterns actually apply to this design rather than to designs in general. Drafting the controls that would close each gap, and grading how much evidence there is that each control is really in place.

Every one of those steps was, until recently, human work — and not junior work. It took someone who had seen enough systems to know which threats were real and which were theater. Today a model does the first draft of all of it, and the draft is good enough that the practitioner's job has quietly shifted from producing the analysis to checking it. That is a smaller job, and it gets smaller every quarter.

The work that didn’t

Here is the step that has not moved. At the end of every review, someone has to look at the assembled evidence and decide: is this system safe enough to put in front of real users, with real data, at real scale? And then they have to put their name to that decision, knowing that if it is wrong, the decision is theirs.

No amount of analysis makes that step for you. You can hand a committee a flawless threat model and a complete control plan, and the committee still has to do the thing the document cannot do on its own — accept the residual risk. That acceptance is not a computation. It is a judgment about how much uncertainty the organization is willing to carry, who carries it, and what happens if it goes wrong. A model can lay the evidence out perfectly and still not be the one who is accountable for what is decided on the strength of it.

The value of a signature was never the ink. It was that a responsible person read the evidence and chose to stand behind it. Making the evidence cheaper to produce does not remove the person. It raises what is being asked of them.

This is why the regulators, auditors, and boards who will eventually ask about these systems are not going to ask whether they were analyzed. They will ask who decided, on what basis, and whether the basis can still be produced a year later. Those are questions about a decision and its record, not about an analysis — and they are the questions that get harder, not easier, as the analysis underneath becomes something a machine can generate on demand.

Why the constraint moves rather than disappears

There is a general version of this, and it is old. Speeding up one stage of a process does not speed up the process; it relocates the slowest part. In computing it is called Amdahl's law. In an organization it shows up every time a team automates one step and discovers the queue has simply formed somewhere downstream.

Anthropic, writing about its own work, described the same pattern in defensive security: once finding software vulnerabilities became cheap and abundant, the binding constraint shifted to fixing them fast enough. Discovery stopped being the wall and decision-and-remediation became it. The review of AI systems is now living through its own version of that shift. The cost of seeing the risk is collapsing. The cost of deciding what to do about it, and being able to defend that decision, is not.

So the constraint does not vanish when analysis becomes free. It concentrates. Each reviewer now steers far more systems than before, because the slow part of their old job has been handed to a machine. What used to be a bottleneck made of effort — not enough hours to review everything — becomes a bottleneck made of judgment: not enough trusted decisions, made fast enough, to keep up with everything the organization now wants to ship.

What if we’re wrong

There are two honest objections to all of this, and they pull in opposite directions.

The first is that the analysis is not actually trustworthy yet — that a machine-drafted threat model still has to be checked so carefully that little has been saved. There is something to this today. A draft you cannot interrogate is worth little at any speed, which is exactly why the discipline of marking what is known, what is inferred, and what is merely assumed matters more as the drafting gets faster. But the trend line is not in doubt. The analysis is getting cheaper and more reliable faster than the decision is getting easier, and the gap between the two is widening, not closing.

The second objection runs the other way: that the decision will be automated too, that a sufficiently capable system will eventually be trusted to clear another system on its own. Parts of the assembly will certainly be automated, and should be. But the acceptance of risk is not really a capability problem. It is an accountability arrangement. Someone has to be answerable for the consequences, and answerability is a property of people and institutions, not of models. If that ever changes, it will not be because the analysis got good enough. It will be because we decided, as a matter of governance, to let the answer to “who is responsible” be nobody — and that is a decision no benchmark will make for us.

What this asks of us

If the decision is the part that stays human, then the work worth doing is making that decision a good one — fast to reach, grounded in evidence, and durable enough to defend later. That is a different ambition than building a faster analyzer, and it leads somewhere different.

It means treating the decision as the artifact, not the diagram: a clear disposition, not a risk score. It means every claim carrying its evidence state, so the person signing knows what they are standing behind and what they are taking on faith. It means the gaps being owned by named people with real deadlines, and the whole record surviving long enough to answer the questions that arrive a year after the system shipped. None of that is analysis. All of it is what makes a fast analysis safe to act on.

This is the part of the problem we have chosen to work on. Not because finding risk does not matter — it does — but because in a world where the finding is nearly free, the scarce and valuable thing is a decision a person can stand behind. That is the work that doesn't compress, and it is where Drel is built to live.

A note on scope: Drel reviews assessed systems against documented architecture, configuration and intent. It does not ingest live telemetry from production environments. Dispositions reflect the assessed system at the time of review and the re-assessment triggers that govern when the disposition must be revisited.