Your Highest Token Spenders Might Be Your Best People

Enterprise AI / Technology Strategy

The Wall Street Journal this week covered companies scrambling to control AI token spend. Buried in the same article is a finding that inverts the entire premise. The governance tools being built right now may be solving the wrong problem.

The Wall Street Journal ran a piece this week on companies building dashboards to track AI token consumption. The frame was familiar: costs are accumulating, finance teams are getting surprised at month end, and nobody has a clear picture of who is spending what. Zapier was cited as an example of a company that built internal tooling to surface these metrics because the old way of thinking, budgeting per seat and auditing per user, no longer maps to how AI systems generate costs. Every prompt, every automated workflow, every background agent burns tokens. The invoice reflects consumption patterns that no approval chain anticipated.

All of that is accurate. But two-thirds into the same article, there is a data point that quietly dismantles the premise. Guillermo Rauch, the chief executive of Vercel, looked at his token consumption data and found that his highest token spenders are his top performers. A team of AI agents analyzed a research paper and built a critical infrastructure service in a single day, work that would have taken months done conventionally. The bill was approximately $10,000. His estimate is that it saved him millions. His response was not to cap the spend. It was to get comfortable letting it run.

If token consumption correlates with your best work, a cap is not governance. It is a tax on performance dressed up as financial discipline.

That finding sits in direct tension with the instinct driving most IT conversations about AI spend right now. The questions I hear most often are: how do we stop usage from blowing through the budget before the invoice lands, who is actually producing results, and what are we measuring? Those are reasonable operational questions. But they assume high consumption is a problem to contain. Rauch's data suggests it may be a signal of exactly the work you should be funding more of.

The Governance Model Arrived from the Wrong Category

The mental model most companies are applying to AI spend was built for seat-based software. You bought a fixed number of licenses, tracked utilization against headcount, and renewed based on whether the organization grew. The logic was linear. AI tools don't work that way. Billing accumulates by token, by API call, by credit, across every team and every workflow, compounding daily. The cost structure is consumption-based, but the governance instinct is still seat-based, and that mismatch is where the confusion lives.

A category of tooling is emerging to close that gap, platforms that pull live usage data from AI providers and map consumption to teams, cost centers, and contract commitments. The operational value is real. Knowing what you are spending and where it is concentrated is the precondition for any decision that follows. But here is an observation worth sitting with: in conversations with companies building this visibility layer, the friction is almost never technical. Connecting to a system takes minutes. Getting the organizational permissions to do so takes weeks. The Salesforce administrator sits in the sales organization. Workday requires a different approval chain entirely. The governance gap is not primarily a technology problem. It is a structural one, and a dashboard does not resolve it.

Jensen Huang's Reframe at GTC

At NVIDIA's GPU Technology Conference earlier this month, Jensen Huang pushed a different lens on this question. Rather than treating tokens as an expense to minimize, he argued that companies should measure token consumption against revenue generated. Every inference that contributes to a customer outcome, a closed deal, a prevented churn, a faster resolution, has economic value attached to it. Measured that way, tokens are not a cost line. They are a productivity signal.

That reframe connects directly to what Rauch found at Vercel. The governance question changes entirely depending on which frame you adopt. If tokens are an expense, the right response is caps, thresholds, and controls. If tokens are a signal, the right response is to ask which consumption connects to a business outcome and which does not, and to invest in the first category while examining the second. Those are different management decisions with different organizational consequences.

The Divide That Already Exists

Companies that built machine learning capabilities for prediction, pattern recognition, and decision support over the past several years already have the infrastructure to answer the outcome question. They have baselines. They know what a model deployment costs and what it returns. They argue about which outcome metrics matter because they have outcome metrics to argue about.

Companies deploying generative AI as a productivity layer right now are often starting without that foundation. The tools are accessible, the results feel immediate, and the measurement layer has not been built. Usage climbs. Whether output climbed alongside it is harder to answer than it should be. Getting visibility into spend is a start. It is not the same as knowing what the spend produced.

The CIO / CTO Question

Before building a token governance framework oriented around control, the more important question is whether your organization can connect consumption to outcomes at all. If you cannot, you are managing a number you do not yet understand. The Vercel finding suggests that caps applied before you know what the consumption represents may be cutting into your highest-value work.

The companies that will look back on this period as a turning point are not the ones that controlled AI spend most tightly. They are the ones that built the measurement layer first and let the governance follow from what the data actually showed. Visibility is the necessary first step. It is not the answer.

Bindley, Katherine. "Companies Learn to Track AI Use." The Wall Street Journal, 19 Mar. 2026, p. B4.

Huang, Jensen. Keynote address. NVIDIA GPU Technology Conference, San Jose, CA, Mar. 2026.

Disclaimer: This blog reflects my personal views only. Content does not represent the views of my employer, Info-Tech Research Group. AI tools may have been used for brevity, structure, or research support. Please independently verify any information before relying on it.