Operations

How Humans Actually Intervene in an Agent Network

Most discussions of AI oversight treat the human as a theoretical checkpoint. In practice, intervention is a system design problem. Here is how it works when you have hundreds or thousands of agents running at once.

By Graeme Provan · 2026-06-11

The intervention stack

Human in control is an operating model for AI agents in which autonomous agents execute the work while a human retains final decision rights. At scale, this requires a layered intervention architecture - not a single human watching a dashboard.

Layer 1: Guardrails (automatic)

Hard limits that stop an agent before it acts - budget caps, forbidden actions, rate limits, data-access boundaries. These do not require a human in real time; they encode the human's intent into the system. The most common intervention at scale is automatic: the agent hits a guardrail and stops.

Layer 2: Escalation (agent-initiated)

The agent recognises that it is outside its mandate - ambiguity, high cost, unfamiliar context - and hands the decision upward. A well-designed agent escalates before it makes a mistake, not after. Escalation rules are part of the agent's design: "If confidence < 0.85, queue for human review."

Layer 3: Exception monitoring (human-initiated, reactive)

A human overseer reviews alerts, anomalies, or flagged outcomes and intervenes. This is the classic "human on the loop" pattern - watching for what the automatic layers missed. The challenge is alert fatigue: too many exceptions and the human stops seeing them.

Layer 4: Strategic override (human-initiated, proactive)

The human changes the objective, revokes an agent's autonomy, or pauses an entire class of operations. This is the kill switch and the policy change. It happens rarely but must work instantly. Strategic override is the difference between governance and theater.

What the data shows

In production agent networks we have observed, the distribution of intervention types is roughly:

  • 85–95% automatic: Guardrails stop the action before a human is ever involved.
  • 4–12% escalation: The agent hands off because it recognises its own limits.
  • 1–3% exception monitoring: A human catches something the automatic layers missed.
  • <1% strategic override: The human changes the rules of the game.

The key insight: the human does not review 10,000 decisions. The human designs the system so that 9,900 decisions need no review, 90 escalate with context, and 10 receive full attention. The art is in the escalation design - what to surface, and how.

Designing escalation

Good escalation has three properties:

  1. Context-rich: The human receives not just the decision but the reasoning, the inputs, and the alternatives considered.
  2. Actionable: The human can approve, modify, or reject with a single interaction - not a five-email thread.
  3. Learned: The system records what the human did and why, so the escalation rules improve over time.

The failure mode: alert fatigue

The most common failure of Layer 3 is noise. If the system escalates 500 items per day, the human will batch-process them at 4pm, spending 12 seconds each. At that rate, the oversight layer is worse than useless - it creates a false sense of security while burying the one signal that mattered.

The fix is to tune the escalation threshold ruthlessly. Start conservative - escalate everything - and measure the human's actual review depth. Then tighten until the human has capacity to engage genuinely with every escalated item.

Read the full canonical definition:

What is Human in Control?