The Token Bill Is the New Cloud Bill, and Enterprises Are Not Ready

Enterprise AI · Cost Architecture

Token spend is arriving as an operational cost before enterprises have built the governance to manage it. The applied AI platforms that understand this will own the next cycle of differentiation.

By Shashi Bellamkonda · June 7, 2026

30x Cost increase per interaction, agentic AI 2023 vs 2026

60–80% Potential cost reduction via intelligent model routing

4–11x Budget overrun in first 90 days of broad agent deployment

The cloud bill surprised every CFO once, too. Token costs are following the same arc, just faster and with less warning. Box CEO Aaron Levie made this point plainly in a LinkedIn post this week: token costs are now one of the hottest budget topics in every enterprise conversation he has. His framing was bullish, reading rising spend as proof of real-world scale that nobody anticipated. That read is correct. It is also, for most enterprise technology leaders, the wrong place to focus.

The more consequential observation came second. Levie identified model routing as the differentiator that will separate applied AI platforms over the next several years. The logic is sound: once tokens represent a significant share of workflow cost, the ability to direct each task to the cheapest model that can handle it competently becomes a genuine strategic capability, not an infrastructure footnote.

Key Takeaway

Token spend is moving from engineering experiments to organization-wide operating cost faster than enterprises built governance for it. The applied AI platforms with domain-specific evals and intelligent routing will capture the value that frontier model providers cannot.

The cost structure that changed without a memo

A simple AI interaction in 2023 (input, retrieval, response) cost roughly four cents. An orchestrated agentic workflow in 2026, with tool calls, iterative reasoning loops, and multi-step output generation, runs closer to $1.20. That is a 30-times increase per interaction, according to analysis published this month by EY. The number comes from structural complexity, not price inflation. The models themselves are getting cheaper. The workflows are getting more expensive because they are doing more work.

The pattern Levie described is already showing up in enterprise budgets. One healthcare organization consumed one trillion tokens over six months, generating more than $6 million in unplanned costs before the finance team could trace the source. Uber and Microsoft have reportedly burned through their 2026 token budgets in a matter of months. This is not a story about profligate spending. It is a story about enterprises deploying agentic systems without corresponding visibility into consumption.

Most enterprise agent rollouts exceed their pilot budget by four to eleven times within the first 90 days of broad deployment, driven primarily by uncapped tool-call recursion and retrieval breadth that nobody scoped at the pilot stage. The CFO starts asking pointed questions about per-workflow unit economics. The control surface shrinks right when it needs to expand.

Routing is not a feature. It is the product

The conventional read of this situation treats model routing as a cost-optimization layer, something you bolt on when bills get uncomfortable. Levie's argument is different: routing is where the applied AI layer builds a moat that neither frontier model providers nor hyperscalers can easily replicate.

Frontier models will keep improving at complex reasoning tasks. Legal analysis, healthcare diagnostics, financial modeling: spending on these will increase. But the majority of enterprise AI volume is not frontier work. It is document extraction, classification, summarization, form completion, first-draft generation. Tasks that do not require the most capable model available, only a model capable enough and confident enough that the routing system knows when to escalate.

The companies with the best evals, the best ability to route workloads, and business models aligned to customers' financial goals will be in a great position. The question is whether they can build that before their customers build it themselves.

That is the capability Levie is describing: domain-specific evaluation frameworks that know, for a given document type or legal task subclass, which model performs best at what price point. The applied AI layer that builds this evals infrastructure owns the routing decision. It can substitute cheaper models, route to open-weight alternatives, or escalate to frontier intelligence when the task genuinely demands it. The platform that cannot make this determination sends everything to the frontier and passes the cost to the customer.

The pricing spread makes the stakes concrete. Budget-tier models run at roughly $0.10 to $1 per million tokens. Frontier models with extended reasoning capabilities run $15 to $30 per million or higher. A platform routing 70 percent of volume to the appropriate tier is not offering a slightly better deal. It is offering a fundamentally different cost structure.

Context portability is the other half of the argument

Levie's second LinkedIn post, on context architecture, connects directly to the routing thesis. His argument: the enterprise AI platform problem is not model quality, it is getting the right context to agents reliably and at scale. Enterprises have knowledge fragmented across legacy systems, with access controls that do not map to actual workflows. Agents either have too much access or too little. Neither works.

The portability point carries a specific warning for CIOs. Locking context into a single agent architecture means long-term dependency on that vendor's model choices, pricing decisions, and roadmap. The organizations that build context infrastructure capable of serving Claude Cowork, Codex, ChatGPT, Cursor, and whatever agent platform emerges next are not placing a bet on today's model leaders. They are buying optionality in a market that is still pricing in its own uncertainty.

Box is building toward that platform position. What makes the argument worth watching is that Levie is describing a structural opportunity, not a product announcement. The companies that can establish domain evals, build routing logic, and keep context portable will capture value that neither model providers nor cloud infrastructure vendors can absorb. That is a large enough space to matter.

Key Takeaway

Context portability and model routing are not independent problems. The enterprise AI layer that solves both, with domain-calibrated evals, intelligent routing, and governance at the workflow level, is the one that absorbs margin the frontier providers cannot defend.

What the CFO conversation requires now

The missing piece in most enterprise token governance conversations is instrumentation before deployment, not after budget overrun. The organizations managing this well treat token spend as an operational metric with the same rigor as latency and accuracy, not as a billing surprise to be explained in quarterly reviews.

That requires three things that most enterprises do not yet have. Visibility into consumption by workflow and department. Routing policies that direct low-complexity tasks to cheaper models without manual intervention. And escalation logic that is calibrated to actual task requirements, not to the default assumption that more expensive means better.

The applied AI platform that delivers all three, and can demonstrate per-workflow unit economics to a CFO who has stopped asking polite questions, is in a different conversation from the one about features and integrations.

CIO / CTO Viability Question

Before renewing or expanding your applied AI platform contract, ask the vendor to show you their model routing architecture and their domain-specific evals framework. If they cannot demonstrate how they decide which model handles which task class, and what that decision costs per workflow, they are passing the entire cost optimization problem back to you. That is not a platform. That is an API wrapper with a dashboard.

Sources

Levie, Aaron. LinkedIn posts on AI token costs and context architecture. June 2026.
EY. "Agentic AI Enterprise Token Cost." ey.com, 1 Jun. 2026. https://ey.com
Griffiths, Brent D. "Box CEO Says Companies Will Need to Figure Out How to Budget for Workers Running Up AI Token Bills." aol.com, 20 Mar. 2026. https://aol.com
Griffiths, Brent D. "Box CEO Explains Why He's OK with Engineers Wasting Some AI Tokens Right Now." aol.com, 11 Apr. 2026. https://aol.com
Thompson, Derek. "The AI Boom Has Entered Its 'Wait, Is This Worth It?' Phase." derekthompson.org, Jun. 2026. https://derekthompson.org
Elvex. "AI Token Cost Enterprise: Stop Budget Blowouts in 2026." elvex.com, Jun. 2026. https://elvex.com
isimplifyme. "AI Agent Cost Governance: Cap Token Spend Smartly." isimplifyme.com, 1 May 2026. https://isimplifyme.com

Disclaimer: This blog reflects my personal views only. Content does not represent the views of my employer, Info-Tech Research Group. AI tools may have been used for brevity, structure, or research support. Please independently verify any information before relying on it.

Shashi.co

The Token Bill Is the New Cloud Bill, and Enterprises Are Not Ready

Get new posts by email: