Post-Mortems

Why Most AI Pilots Stall at Week Six

A post-mortem methodology for the quietly dying pilot.

Marcus Whitfield/Director, Operations/February 18, 2026/7 min read

The six-week cliff

If you have sat in enough AI steering committees, you recognize the shape. Week one is kickoff. Week two is the first demo, better than anyone expected. Week four is the first real pilot run against real data. Week six is silence — not failure, silence. The dashboards still exist, but stand-ups get shorter, demos slip, and the champion starts forwarding meetings to a direct report. By week ten, the pilot has been "extended." By week sixteen, it is quietly absorbed into a different initiative, so the original ROI case never has to be closed.

We have sat with post-mortems on roughly forty of these over the last two years. The causes cluster.

The post-mortem structure

We borrow heavily from the US Forest Service facilitated learning analysis model — the approach wildland fire crews use after a near-miss. It separates three questions: what actually happened, in order (reconstruction, not judgment); what did each participant know at each decision point (local rationality — every decision made sense to the person making it at the time); what conditions made the failure possible, regardless of who was in the room (system, not individuals).

We run these with the full pilot team, including skeptics. One session, three hours, a facilitator who did not work on the pilot. No slides, a whiteboard timeline. The artifact is a written narrative. The most important rule: the report never names a person as the cause. If your post-mortem reads like a performance review, the next pilot will fail the same way.

The five patterns

1. The scope was a feature list, not a loop

Pilots that stall at week six usually had scope expressed as a bullet list — "agent handles refunds, exceptions, appeals." Pilots that survive had scope as a closed loop — "intake → triage → decision → confirmation → learning signal back to intake." A feature list can be partially delivered forever. A loop either runs or it does not. Loops force the hard architectural questions in week two, when they are cheap. From the OODA loop literature and John Boyd's original papers: if you cannot draw the pilot as a closed cycle on a single page, do not start it.

2. The champion was a sponsor, not an operator

Every dead pilot had a sponsor. Most had an executive sponsor in addition. Very few had an operator champion — the person who lived the old process and will live the new one. When that person does not exist, the pilot builds the thing that the sponsor imagines the process to be, which is almost never the process as it actually runs.

We insist on at least one named operator, at any seniority level, who has veto on the final design. Their veto is more useful than the sponsor's approval.

3. The evaluation was qualitative for too long

Pilots die in the gap between "it feels right in the demo" and "it measurably outperforms the status quo." Teams that close that gap by week three tend to survive. Teams still doing show-and-tell at week six are already dead. We ask clients to write the eval harness before the agent. Goodhart's law applies — the metric will distort behavior — but no metric distorts behavior worse.

4. The data was clean in the pilot and dirty in production

The most common technical cause we see. A pilot runs against a curated slice that the team hand-assembled because it was easier than plumbing the real pipeline. The agent looks spectacular. Then production data reveals duplicates with subtle variations, missing fields silently filled with zeros, and schema changes nobody was told about. By the time it is diagnosed, the budget is spent and the mood has turned. We ask clients, on day one, to run against a random production slice, not a representative one. Representative slices lie. Random slices do not.

5. There was no explicit kill criterion

The pilots that died at week six rarely had a written condition under which they would be stopped. Without one, a stall becomes an extension becomes a quiet absorption. The ROI case never gets closed because it never gets evaluated.

Every engagement we run starts with a one-sentence kill criterion: "If by week X, metric Y has not reached threshold Z, we stop." Not "we review." Not "we consider." We stop. The discipline of writing that sentence, and having the sponsor sign it, produces about half the value of the pilot itself.

What survives

Pilots that make it past week six share five properties: scope expressed as a loop, an operator champion with design veto, a written eval harness by week three, a random production data slice by week four, and a written kill criterion with a specific date and threshold. The absence of any one is not fatal. The absence of three or more nearly always is.

The reason pilots stall at week six and nobody does a proper post-mortem is that the pilot did not technically fail. It was extended, rescoped, absorbed, or quietly mothballed. This is the single biggest learning gap we see — the typical enterprise runs four to seven AI pilots per year that end this way, and learns almost nothing from any of them. The same mistakes repeat. If you suspect a pilot is stalling, do not wait for the official failure. Run the post-mortem now.

Cadence Advisors Group runs structured post-mortems on stalled AI pilots. If you have one that is quietly ending, [we would rather help you document what happened than watch you repeat it](/contact).

pilot designimplementationpost-mortemchange management