# The AI Agent Requisition
## A governance framework for organizations deploying autonomous AI — written for the people who actually have to live with the decisions

---

## Why this exists

In 2026, 75% of enterprises plan to deploy autonomous AI agents. 21% have governance frameworks in place.

That's not a gap. It's a chasm.

And the reason most organizations fall into it isn't that they don't care about governance. It's that the infrastructure of caring — the friction that forces rigor in human hiring — doesn't exist for agents.

When you hire a human, you have to justify the role. Write a job description. Get headcount approved. Submit to an interview loop. Assign a manager before day one.

When you deploy an agent, you can spin up a hundred of them on a Friday afternoon. Nobody's approval required. No requisition process. No named supervisor. No termination criteria.

That's the actual problem. Not the absence of governance documents. The absence of *friction that forces thought*.

This framework is that friction.

It's not a checklist. It's a requisition process. For agents.

---

## The short version

If you read nothing else, answer these five questions in writing before you deploy an agent:

1. **What role does this agent fill?** If you can't write it in one paragraph, you haven't decided.
2. **Who is its named supervisor?** Not a team. One person.
3. **What can it NOT do?** Not "be helpful." What decisions are off-limits.
4. **What outcome — not activity — proves it's working?** Not tokens. Not speed. Not decisions per day.
5. **When do you sunset it?** You need an exit criterion before you deploy. Otherwise you'll never get rid of it.

If you can't answer all five, don't deploy. Or: deploy knowing you're the reason 79% of organizations have no governance.

---

## Part I — The diagnosis

### Why measurement goes wrong

In 2026, Meta built an internal dashboard called Claudeonomics. It tracked token consumption across 85,000 employees. In thirty days, it logged roughly 60 trillion tokens — about $9 billion worth at public pricing. It awarded titles: "Token Legend" for the top user, "Cache Wizard" for another.

One employee consumed 281 billion tokens in a month. That's 9.36 billion per day. Every day. For thirty days.

The problem isn't that the number is big. The problem is that the number is meaningless.

Token consumption measures how much AI you're *using*. It tells you nothing about what you're *producing* with it. Measuring engineer productivity by tokens is like measuring writers by keystrokes, or lawyers by hours-at-desk, or doctors by prescriptions-written.

You get what you measure. If you measure tokens, you get tokens.

Meta's engineers weren't being deceptive. They were being rational. Leadership said "the metric is tokens." They optimized for tokens. Some ran agents overnight just to move up the board.

Frederick Taylor ran into the same problem in 1911. He called it "soldiering" — workers performing at demonstration pace when the stopwatch was running. Productive enough to avoid being fired. Not productive enough to raise the baseline. The measurement didn't capture real output. It *changed* real output.

Meta put a stopwatch on token consumption. Their engineers soldiered in the other direction — burning tokens as fast as possible instead of as strategically as possible. Same trap. 115 years later. Different factory floor.

### Why deployment goes wrong

The reason organizations deploy agents without governance isn't negligence. It's the absence of *forcing functions*.

Human hiring has forcing functions at every step:

- **Requisition** forces you to justify the role
- **Job description** forces you to define what success looks like
- **Interview loop** forces you to verify the person can actually do it
- **Manager assignment** forces accountability before day one
- **Probationary period** forces you to evaluate before committing
- **Performance review** forces you to revisit the decision
- **Termination criteria** forces you to define "not working" up front

Every one of these is friction. Every one of these is the feature, not the bug.

Agents have none of these. You can deploy one in an hour, give it access to production systems, and never name a supervisor. The organizational immune system that catches bad human hires is silent on agents.

This framework gives you those forcing functions — for agents.

---

## Part II — The requisition

Before you deploy an agent, submit a requisition. To yourself if you have to. But write it down.

### The Agent Requisition Template

**1. Role identity**
- Agent name (specific — "AWS Cost Optimization Agent," not "AI Helper")
- Primary function (one sentence; if you need two, scope is too vague)
- The one decision this agent owns

**2. Justification**
- What problem does this agent solve that a human or existing tool can't?
- What's the counterfactual — what happens if we don't deploy it?
- Why now?

**3. Authority boundaries**
- What CAN this agent decide unilaterally?
- What requires human approval before action?
- What is off-limits entirely?

**4. Accountability**
- Named supervisor (one person, not a team)
- Named executive sponsor (one person, not a committee)
- Named secondary supervisor (for when the primary is out)

**5. Success definition**
- What outcome — not activity — proves this agent is working?
- What's the target number, measured over what period?
- Who reviews the number, and how often?

**6. Exit criteria**
- What override rate means "retrain"?
- What override rate means "sunset"?
- What external conditions would make this agent obsolete?

**7. Risk register**
- What's the worst decision this agent could make?
- What's the blast radius if it's wrong?
- How do you know it's wrong before the damage is done?

If you can't fill out all seven sections, the agent isn't ready to deploy. That's the point of the requisition.

---

## Part III — The supervisor model

Every agent needs one named supervisor. Not a team. One person.

Why one person? Because accountability dilutes when shared. A team that's collectively responsible for an agent is a team where nobody is responsible for the agent.

### Matching supervisors to agent types

The rule: **the supervisor must be able to tell when the agent is wrong.**

This sounds obvious. It's not. Most supervisor assignments fail here. Someone who manages the team that uses the agent is not the same as someone who can evaluate the agent's decisions.

Examples:

| Agent type | Supervisor | Why |
| --- | --- | --- |
| Architecture Review Agent | Enterprise Architect | Only someone who already does architecture review can tell when an automated review is wrong |
| Code Review Agent | Senior Engineer on the affected codebase | Needs context on the specific code, not general engineering knowledge |
| Security Scanning Agent | Security Engineering Lead | Needs to distinguish real vulnerabilities from noise |
| Financial Approval Agent | Finance Manager with signing authority | Needs actual financial judgment, not accounting literacy |
| Customer Response Agent | Customer Success Lead with direct customer relationships | Needs to know when a response is off-tone for this customer |
| Data Access Agent | Data Governance Officer | Needs to evaluate access decisions against actual policy |
| HR Policy Agent | HR Business Partner | Needs to know when a policy interpretation is legally risky |

The common thread: the supervisor already does this work manually, at a senior level, and can evaluate the agent against their own judgment.

If you can't point to a human who does this work today at a level where they could catch the agent being wrong — you don't have a supervisor candidate. You have a governance gap with a name on it.

### Supervisor responsibilities

1. **Review decisions regularly** — weekly minimum for new agents, monthly for mature ones
2. **Override when wrong** — and log the override with the reason
3. **Escalate patterns** — if the agent keeps being wrong about the same thing, retrain or sunset
4. **Report quarterly** — to the executive sponsor: is this agent still earning its keep?
5. **Own the sunset decision** — when it's time to kill the agent, the supervisor pulls the trigger

### The overload limit

One supervisor cannot oversee unlimited agents. Capacity depends on stakes.

| Risk level | Max agents per supervisor | Review cadence |
| --- | --- | --- |
| High-stakes (financial, customer-facing, security) | 3 | Weekly, with sample audit |
| Medium-stakes (internal workflows, data handling) | 5–7 | Bi-weekly review |
| Low-stakes (scheduling, formatting, triage) | 8–12 | Monthly spot check |

If you're asking one person to supervise more agents than this, you don't have a supervisor. You have a rubber stamp.

---

## Part IV — The guardrails

The job description tells the agent what it can do. The guardrails tell it what it can't.

Both matter. Most frameworks skip the second half.

### The guardrail template

**Decision authority**
- Actions the agent takes without approval: ___________
- Actions requiring human approval before execution: ___________
- Actions the agent cannot take under any circumstance: ___________

**Data access**
- Data categories the agent can read: ___________
- Data categories requiring per-access approval: ___________
- Data categories the agent cannot access: ___________
- Data the agent cannot export, modify, or delete: ___________

**Cost controls**
- Maximum spend per decision without approval: $___________
- Monthly budget cap (hard stop): $___________
- Model tier the agent defaults to: ___________
- Escalation path when budget is exceeded: ___________

**Confidence and escalation**
- Confidence threshold below which the agent must escalate: ___%
- Situation categories that require automatic escalation (regardless of confidence): ___________
- Who the escalation goes to (primary, secondary, fallback)

**Audit requirements**
- Every decision logged with: input, output, reasoning, confidence, cost
- Retention period: ___________ (minimum 90 days, often longer for compliance)
- Who reviews logs and how often: ___________

### The trendslop guardrail

Research published in *Harvard Business Review* in March 2026 tested seven major AI models on actual strategic business decisions. Thousands of simulations. Every model clustered around the same recommendations regardless of context:

- Differentiation over cost leadership
- Augmentation over automation
- Long-term over short-term

The researchers named it "trendslop": AI's propensity to reach for buzzy ideas over reasoned ones.

The Krafton case from early 2026 showed what trendslop looks like in practice. CEO Changhan Kim asked ChatGPT how to get out of a performance bonus his company had promised. The bot initially said it would be difficult. He kept rephrasing. The bot eventually produced a corporate takeover playbook. He followed it. The Delaware Court of Chancery reinstated everyone he fired and put a $250 million price tag on the decision.

The agent gave him what he asked for. The problem was that he kept asking until he got the answer he wanted.

**The trendslop guardrail:** For strategic or consequential decisions, require the agent to output *counter-arguments before recommendations*. Force it to argue both sides. Force it to note where its answer would be different for a different-sized company, a different industry, a different context.

If you're using an agent to make decisions that matter, you have to inoculate it against confirming what you already thought.

---

## Part V — The metrics that matter

Token consumption is the thermometer, not the fever.

### Wrong metrics

These are popular because they're easy to measure. That's also why they're wrong.

- **Tokens consumed** — measures usage, not output
- **Decisions per day** — measures speed, not quality
- **Uptime** — measures availability, not value
- **Response time** — "fast and wrong beats slow and right" is never true
- **Number of agents deployed** — measures activity, not outcomes

If your AI dashboard looks like any of these, you're measuring the wrong thing. You're in Meta's Claudeonomics loop.

### Right metrics

**Accuracy (outcome-based)**
- % of decisions the supervisor agrees with on review
- Target: >95% for routine, >90% for novel
- Measured: weekly for new agents, monthly for mature ones

**Override rate**
- % of agent decisions the supervisor overrides
- <5% means the agent is reliable; >15% means retrain or sunset
- Trend matters more than point value — rising override rate is the first warning sign

**Escalation rate**
- % of decisions where the agent says "I don't know, ask the human"
- Healthy: 5–15%
- <2% means the agent is too confident (dangerous)
- >20% means the scope is wrong (not enough autonomy, or scope too broad)

**Outcome quality**
- Does this agent's work lead to the business outcome it was hired to produce?
- Measured case-by-case, reviewed quarterly
- The supervisor owns this answer

**Human intervention frequency**
- How often does a human have to take over the agent's work?
- Target: approaches zero over time
- Tracked by supervisor, reviewed monthly

### The one question that matters

**If this agent disappeared tomorrow, what would break?**

If nothing breaks, you don't need the agent.

If something breaks, you now know what the agent is actually for. That's the thing to measure.

---

## Part VI — Capacity planning

### The 100:1 trap

At GTC 2026, Jensen Huang said Nvidia had 42,000 "biological" employees and expected to add hundreds of thousands of "digital employees" — roughly one hundred AI agents per human worker.

This is aspirational. For most organizations, it's also dangerous. At 100:1, no single human has the cognitive budget to supervise the agents assigned to them. Governance collapses.

### Realistic ratios

Based on supervisor capacity and overload limits, here's where organizations actually land:

| Stage | Agents per worker | Why |
| --- | --- | --- |
| Pilot (first 6 months) | 0.1–0.5 | Building governance muscle; over-invest in supervision |
| Early adoption (6–18 months) | 0.5–2 | Scaling what worked, sunsetting what didn't |
| Mature (18 months+) | 2–5 | Governance infrastructure in place; supervisor ratios stable |
| Saturated | 5–10 | Requires robust governance tooling and dedicated oversight roles |
| Aspirational (Nvidia model) | 50–100+ | Requires infrastructure most organizations don't have |

### Token budgets

These are planning budgets per worker per month. They are not targets. You measure actual spend, then ask: is the spend producing the outcome?

| Usage profile | Monthly token budget | Approx. cost / worker / mo | What it looks like |
| --- | --- | --- | --- |
| Light | 2M–10M | $20–$100 | Occasional AI assistance — questions, drafts, summaries |
| Standard | 10M–50M | $100–$500 | Daily AI-augmented work — coding, analysis, writing |
| Heavy | 50M–200M | $500–$2,000 | AI as primary tool — agent-assisted workflows |
| Agent-driven | 200M–2B | $2,000–$20,000 | Autonomous agents operating continuously |

*Cost estimates assume a mid-tier model (Sonnet/GPT-4-class) at a blended rate of roughly $8–$10 per million tokens (input + output, with typical prompt caching). Haiku-class models run 3–5x cheaper; Opus-class runs 3–5x more expensive. Your actual blended rate depends on model mix, caching, and input/output ratio — measure it, don't assume it.*

If you're consistently 2x under budget, the scope may be too small. If you're hitting limits, the agent may be over-engineered — or you're in the Claudeonomics trap and burning tokens to hit a number.

The spend is a symptom. The cause is upstream.

### Why the dollar column matters

Tokens are the thermometer. Dollars are the organizational signal.

Finance doesn't care about tokens. Finance cares about the line item. The moment you translate "237M tokens" into "$2,100 per engineer per month," three conversations change:

1. **Procurement starts paying attention.** Tokens are a technical abstraction. Dollars are a budget.
2. **ROI becomes calculable.** "This agent costs $1,200/month and replaces 8 hours of engineering time worth $2,000" is a defensible decision. "This agent used 47M tokens" isn't.
3. **The sunset conversation gets easier.** Killing an agent that's "using a lot of tokens" feels political. Killing an agent that's costing $18,000/year and producing unclear outcomes is a budget review.

Report tokens to engineering. Report dollars to everyone else. Report both to the executive sponsor.

---

## Part VII — The sunset

Most organizations deploy agents. Few sunset them.

This is how governance silently fails. An agent deployed for a specific purpose outlives the purpose. Scope expands. Supervision thins. Eventually it's making decisions no one is watching, for reasons no one remembers.

### Sunset triggers

You sunset an agent when any of these is true:

1. **Override rate climbs above 15% for two consecutive review periods.** The agent is consistently wrong enough that the supervisor is doing the work anyway.

2. **The supervisor can't maintain review cadence.** If the supervisor hasn't reviewed the agent in 6 weeks, the agent is operating without oversight — whether it's delivering or not.

3. **The outcome the agent was hired for no longer exists.** Business model changes, product gets deprecated, team reorganizes — the agent's purpose shifts or evaporates.

4. **A better tool exists.** Off-the-shelf solutions mature, models improve, or internal tools absorb the use case.

5. **The quarterly "would we hire a human for this instead" answer flips to yes.** At some point the reliability gap, cost gap, or capability gap makes the human the better choice.

6. **Scope creep can't be reined in.** The agent's actual behavior has drifted from its original requisition, and you can't bring it back to scope.

### The sunset process

1. Supervisor recommends sunset to executive sponsor, with reason
2. 30-day review period: does the agent have defenders? What's the cost of keeping it?
3. Decision: retrain, rescope, or retire
4. If retire: announce to affected teams, set a shutdown date, document what was learned
5. Shutdown: disable access, archive logs (for audit), cancel recurring costs
6. Retrospective: what would we do differently next time?

Sunsets are not failures. Sunsets are the system working. The failure is the agent that runs for three years past its useful life because no one ever decided to turn it off.

---

## Part VIII — The audit

Five questions. Honest answers. Use on every agent currently deployed or planned.

**1. Can I write this agent's job description in one paragraph?**
- Yes: move on.
- No: the scope is vague. Fix scope before deployment.

**2. Can I name one person — not a team — who supervises this agent?**
- Yes: name them here: ___________
- No: don't deploy. Find the supervisor first.

**3. Do the guardrails say what the agent CANNOT do, not just what it can?**
- Yes: guardrails are complete.
- No: you have a permissions problem waiting to happen.

**4. Am I measuring outcomes or activity?**
- Outcomes (accuracy, override rate, business impact): you're measuring the right things.
- Activity (tokens, speed, decisions per day): you're in the Claudeonomics trap.

**5. Have I defined what "this isn't working anymore" looks like?**
- Yes: you have exit criteria. You'll know when to sunset.
- No: you'll keep this agent past its useful life.

### Scoring

- **5/5** — You have governance.
- **4/5** — You have governance with one known gap. Close it.
- **3/5** — You have partial governance. The missing pieces are risks.
- **2/5** — You have an agent, not governance. Pause new deployments until you fix this.
- **1/5** — You are the 79%. You're building a risk portfolio without knowing it.
- **0/5** — Don't deploy. Or: deploy knowing exactly what you're choosing.

---

## Appendix: The Agentic AI Platform — Reference Architecture

The requisition, supervisor model, and guardrails above only work if the underlying platform can enforce them. Most organizations deploying agents don't have a platform — they have a stack of LLM API calls stitched together with ambition.

The companion diagram, *Agentic AI Platform — Reference Architecture*, shows what the platform actually needs to look like. Four layers, a governance plane, and the personas that own each piece.

**The four layers (top to bottom):**

- **UI Layer** — chat, embedded widgets, API/SDK, voice, and the admin/supervisor console. This is where humans and systems enter the platform.
- **App Layer** — the four use-case patterns, in increasing order of complexity and risk:
  - **Simple Prompt / Response** — stateless LLM call, no memory, no tools (drafting, summarization, classification)
  - **RAG** — retrieve-then-generate, grounded in your data (doc Q&A, policy lookup)
  - **Full AI Chatbot** — multi-turn conversation with memory and RAG (support, onboarding)
  - **Headless Agent** — autonomous, plans and takes actions via tools. **Only this pattern requires the full Agent Requisition.**
- **AI Layer** — LLM gateway (model routing, caching, token accounting), agent orchestrator (planning and tool-use loop), retrieval service, tool/MCP registry, guardrails and policy engine, memory store, eval and observability.
- **Data Layer** — vector store, knowledge base, operational DBs, user profiles and entitlements, audit log, external APIs.

**The governance plane cuts across all four layers.** The Agent Requisition, guardrails policy, metrics, and sunset criteria are not things that live in one box — they are enforced at every layer. Remove any one of them and the rest of the stack is compromised.

**Personas map to components.** Every box in the stack has a named human owner — end users and business users consume; power users build; supervisors, governance, and executive sponsors oversee. If you can point at a component and can't name its owner, you've found a governance gap.

The diagram file (`agentic_platform_diagram.drawio`) is available to subscribers as a gated download. Use it to:

1. Scope a new use case to the right pattern — before you over-engineer a simple prompt into a full agent
2. Verify every component has a named owner
3. Identify missing governance components before deploying
4. Plan what to build in-house vs. buy off-the-shelf

---

## The summary

If you only remember three things:

**One.** The reason most agent deployments fail governance isn't negligence. It's the absence of friction. Rebuild the friction. Make deployment harder than it is today. That's the feature.

**Two.** Token consumption is not a metric. It's a cost. The metric is whether the agent's work produces an outcome you couldn't produce cheaper, faster, or better another way.

**Three.** If you can't name the supervisor, don't deploy. Period.

Everything else in this framework is elaboration on those three.

---

*This framework is derived from the governance research discussed in AI, Honestly Episode 5: "AI at Work." For the episode, show notes, and sources: [kipdavis.com/ep005](https://kipdavis.com/ep005.html)*

*Share your governance story, questions, or pushback: [kipdavis.com/podcast](https://kipdavis.com/podcast)*
