The Oversight Tax: Why AI Agents Are Not the Delegation Win Companies Expected

The Oversight Tax: Why AI Agents Are Not the Delegation Win Companies Expected Strategy 5 20260405

Enterprise AI / April 2026

The productivity case for agentic AI rests on freeing humans from repetitive work. New research finds the opposite dynamic taking hold: employees are working more, not less, because someone has to watch the machine.

1,488 US workers surveyed by BCG
on AI use, Jan 2026

#1 Most mentally taxing AI activity:
overseeing AI agents

2M Visits in a single week
to one AI assistant tool

Every vendor briefing on agentic artificial intelligence eventually arrives at the same slide. It shows a human at the top of a pyramid, liberated from repetitive tasks, freed to do higher-value work while agents handle the rest. I have sat through versions of this slide from nearly every major platform vendor over the past two years. The slide is not wrong exactly. It just leaves out what happens in the hours between kicking off an agent and reading its output.

In January 2026, BCG surveyed 1,488 full-time US workers at large companies on their experience with artificial intelligence. The number that should concern technology leaders is not an adoption rate. It is this: among all the ways workers interact with AI, overseeing agents ranked as the most mentally taxing activity. Using AI to replace a repetitive, well-defined task does reduce burnout, the researchers found. But pair that task relief with a heavier overall workload and the added responsibility of monitoring what the agent is actually doing, and mental fatigue goes up, not down.

Victor Dibia, a software engineer at Microsoft and an early adopter of Claude Code from Anthropic, put it plainly in his newsletter. He had expected the experience to feel like handing work to a capable junior and checking back later. Instead, he found he could not step away from the screen. If he did, he risked losing the thread of what the agent had done and whether any of it had gone sideways.

He described feeling not like a maestro directing the work, but like someone being dragged along by it. The agent was running. He was watching the agent run.

That is the dynamic the productivity slide does not show. The human is still in the loop. The loop just got longer, less predictable, and harder to step away from than the original task ever was.

I have been using these tools myself, mainly for code on my personal blogs, and the pattern is consistent: the agent makes decisions without fully understanding the context, adds something that did not need to be there, removes something that did. You catch it, you correct it, you keep watching. There is a low-grade anxiety that runs the whole time, a background awareness that something could go wrong in a way you will not notice until it has already propagated three steps further. The only reason it has not been catastrophic for me is that a broken blog post is a recoverable problem. The cost of failure is not devastating at that scale. At enterprise scale, that calculus changes considerably.

Researchers at Harvard Business Review have named this AI brain fry: the specific cognitive load that comes from sustained supervision of systems that are mostly correct, mostly of the time. Mostly is the operative word. A procurement analyst who processes invoices manually knows the error states. The job has shape. The same analyst supervising an agent doing the same invoices has a murkier assignment: stay alert enough to catch mistakes she is not doing the work to see coming. That requires a different kind of attention, and it does not get easier with practice the way a structured task does.

Why context failures are the hardest to catch

Most agent errors are not obvious crashes. They are quiet misreadings: the agent removes a line of code that looks redundant but is load-bearing, or adds a configuration that is technically valid but wrong for your specific environment. These errors do not announce themselves. They surface later, in production, or when something downstream behaves unexpectedly.

The reason is structural. Agents operate on the instructions and context they are given at the start of a task. When the task evolves, or when domain knowledge lives in someone's head rather than in the prompt, the agent keeps moving forward on its original read of the situation. It is not being reckless. It simply does not know what it does not know. Supervision is not a backup option in this scenario. It is the primary error-detection mechanism.

The scale question makes this harder. Anthropic has said publicly it expects its general-purpose agent, Claude Cowork, to surpass Claude Code in adoption. More agents in more workflows means more employees in Dibia's position, running the same uncomfortable calculation: how far can I let this run before I need to intervene?

Mandates are making it worse

Alongside the fatigue data sits a separate problem that technology leaders are creating for themselves. Employees told to use AI tools because their organization issued a mandate are, according to analysis in Bloomberg Opinion this spring, more likely to push back than employees who found the tools useful on their own terms. Encouragement and requirement produce different psychology. That distinction matters a great deal when the tool you are requiring people to use is also generating cognitive overhead they did not sign up for.

You are, in other words, deploying agents into an environment where the people watching those agents are burning more energy than before, and then adding a mandate that signals the organization either does not know that or does not care. That sequence does not produce engagement. It produces quiet workarounds and usage metrics that look fine in dashboards and mean nothing about actual adoption.

The productivity gains are real in the right conditions. But the right conditions require some honest accounting that most business cases for agentic deployments are not doing. Supervision time is a cost. The employees carrying that load are a resource being drawn down, not a free variable. If the return-on-investment calculation for your agent deployment is built on the assumption that oversight costs nothing because it does not show up on a budget line, the math has a gap in it.

Match autonomy to reversibility. An agent summarizing meeting notes and drafting follow-up emails can run with light oversight; the cost of a bad summary is low and visible. An agent touching pricing logic, customer commitments, or regulated data needs checkpoints designed into the workflow from day one, before a mistake compounds into something that takes three people and two weeks to unwind. The useful question when scoping a deployment is not "what can the agent do?" It is "what does a mistake here actually cost, and who catches it?"

Measure what supervision costs and put it in the business case. If an agent handles a significant volume of tasks weekly and the person monitoring it spends several hours doing so, that time belongs in the model. It is not overhead. It is the job you created when you deployed the agent.

The employees who will get the most out of agentic tools over time are the ones who can direct agents clearly, catch errors before they propagate, and hold enough domain expertise to know when the output is subtly wrong rather than obviously wrong. That is a real capability and building it takes deliberate investment. It is also not the same job as what those employees were doing before, which means assuming the transition is automatic will produce people who are neither doing the old job well nor the new one.

Signs your agent deployment is generating hidden oversight cost

Employees describe feeling they cannot fully step away from a running agent, even during low-stakes tasks
Adoption metrics look healthy but time-to-completion on supervised tasks has not improved significantly
Errors are being caught, but mostly after the fact rather than before propagation
The people supervising agents are also the people with the deepest domain expertise, creating a concentration risk
Your business case for the deployment does not include a line for supervision time

Viability Question for Technology Leaders

Before the next agent deployment goes to a business case review: who specifically is doing the oversight, how many hours a week does it realistically require, and does that cost appear anywhere in the return-on-investment model? If the answer to the last question is no, the model is incomplete and the productivity gain you are planning for may not arrive on the schedule you expect.

Sources

BCG Survey on AI and Worker Mental Fatigue, January 2026. Reported via Bloomberg Weekend, "The Forecast: When AI FOMO Takes Over," April 5, 2026. bloomberg.com
Victor Dibia, "Upgrade Or," Newsletter. newsletter.victordibia.com
Bedard, Kropp, Hsu, Karaman, Hawes, and Kellerman (Boston Consulting Group / University of California, Riverside), "When Using AI Leads to Brain Fry," Harvard Business Review, March 5, 2026. hbr.org
Gautam Mukunda, "AI: Why Corporate Mandates to Use It Won't Work," Bloomberg Opinion, March 30, 2026. bloomberg.com
Anthropic, Claude Cowork expected to surpass Claude Code. Bloomberg, April 1, 2026. bloomberg.com

Disclaimer: This blog reflects my personal views only. Content does not represent the views of my employer, Info-Tech Research Group. AI tools may have been used for brevity, structure, or research support. Please independently verify any information before relying on it.

Independent Analysis: AI, SaaS, and Revenue Growth Strategy

The Oversight Tax: Why AI Agents Are Not the Delegation Win Companies Expected

Mandates are making it worse

Sources

Get new posts by email: