Frameworks
The Unit Economics of AI Agents in 2026
A cost-per-task framework for buyers who have outgrown the demo stage.
The quiet repricing of intelligence
In 2023, the unit of measure for AI spend was the seat. By 2024, it was the token. By the end of 2025, most of our clients stopped measuring either. The only number their finance teams cared about was cost per completed task — an envelope, a claim, a reconciliation, a resolved ticket. Everything above that number is abstraction. Everything below it is procurement.
The agents that look expensive on a per-token chart often look cheap on a per-task chart, and vice versa. Vendors know this. They price accordingly. The shift we have watched with clients over the last eighteen months is not technical. It is accounting.
Six costs, not one
When we rebuild a client's agent economics from scratch, we decompose the cost of a single completed task into six layers. Missing any one produces a model that is off by 2–5x.
- Inference cost — the marginal LLM call. Frontier models in 2026 cost roughly $3–$15 per million input tokens and $15–$75 per million output tokens. A well-scoped task lands at $0.02–$0.40.
- Orchestration cost — the durable execution layer (Temporal, LangGraph, Inngest). In-house orchestrators are cheaper per run, more expensive per incident. Budget 10–20% of inference.
- Retrieval cost — vector search, hybrid retrieval, reranking, and the storage beneath it (Pinecone, Weaviate, pgvector). For agents that answer from proprietary corpora, this is the second-largest cost line and the one most often forgotten.
- Verification cost — evaluator, human reviewer, or judge model. In regulated work, non-optional. In unregulated work, what separates agents that hold up from agents that quietly degrade. Plan for 30–80% of inference where being wrong has real downside.
- Integration cost — per-call API cost of tools the agent invokes. Stripe fees, carrier APIs, SERP APIs, document parsing. We have seen a misconfigured retry policy multiply a client's integration bill by 11x in a weekend.
- Oversight cost — amortized cost of humans and infrastructure that watch the agents. Observability (Arize, Helicone, Langfuse), red-team cycles, the engineer paged at 3am. Routinely externalized; mature organizations allocate it to the product.
Sum them. Divide by completed tasks over a representative window. That is your real unit cost. It is almost never the number on the vendor's slide.
Where margin actually comes from
Once you have honest unit cost, the second question is where margin comes from. A useful mental model, borrowed loosely from Simon Wardley's mapping work, is to ask which component is moving from genesis toward commodity and which is moving the other way. Inference is commoditizing fast — price per token is falling roughly 4x per year on the trailing frontier. Retrieval is bifurcating: generic RAG is commoditizing, but domain-specific retrieval still holds moat. Orchestration and verification are where durable advantage is quietly forming; the teams with agent systems that survive two years in production won on eval infrastructure, not on model choice.
A worked example
A client in commercial insurance came to us with a submission-triage agent priced at $0.84 per completed submission. The vendor model assumed 2,300 input tokens, 450 output tokens, a single retrieval call, and a 90-second human review at $0.00.
What we found, after instrumenting three weeks of production runs:
- Average input tokens, after the retrieval payload was stitched in, was 14,600. Not 2,300.
- The agent made 3.2 tool calls per submission on average, not one. Each was billed.
- 18% of submissions triggered a retry loop averaging 2.4 additional calls.
- The human reviewer spent a real 4–7 minutes per submission, at a fully loaded $52/hour.
- Observability was free for the first 10,000 traces. They were doing 85,000 a month.
True cost per completed submission: $3.11. ROI case changed materially. The client did not cancel. They renegotiated the contract to a per-completed-task price rather than a per-seat license, passing the cost risk back to the vendor. The vendor accepted, because the alternative was losing the account. Both sides ended up better aligned.
What we recommend to buyers
Before signing anything longer than a quarter, run three tests. The 10x test: what happens to unit economics if volume grows 10x? Most vendor pricing has quiet thresholds at scale. The halved-inference test: what if inference costs fall 50% over the next twelve months, as they probably will? If the vendor captures all the savings, you have a pricing problem. The replacement test: what would it cost to rebuild the same task on commodity primitives in nine months? If the gap is small, you are paying for convenience. If large, you are paying for a moat. Both fine — knowing which is the point.
The conversation has moved. Vendors worth working with show per-task economics unprompted. The ones still leading with tokens or seats are either behind or hoping you are. Ask for the six-line breakdown.
Cadence Advisors Group helps leadership teams scope, price, and govern AI systems. If you are in the middle of a procurement decision and the numbers feel off, the numbers probably are off. [Schedule a diagnostic](/contact).