on AI use, Jan 2026
overseeing AI agents
to one AI assistant tool
Every vendor briefing on agentic artificial intelligence eventually arrives at the same slide. It shows a human at the top of a pyramid, liberated from repetitive tasks, freed to do higher-value work while agents handle the rest. I have sat through versions of this slide from nearly every major platform vendor over the past two years. The slide is not wrong exactly. It just leaves out what happens in the hours between kicking off an agent and reading its output.
In January 2026, BCG surveyed 1,488 full-time US workers at large companies on their experience with artificial intelligence. The number that should concern technology leaders is not an adoption rate. It is this: among all the ways workers interact with AI, overseeing agents ranked as the most mentally taxing activity. Using AI to replace a repetitive, well-defined task does reduce burnout, the researchers found. But pair that task relief with a heavier overall workload and the added responsibility of monitoring what the agent is actually doing, and mental fatigue goes up, not down.
Victor Dibia, a software engineer at Microsoft and an early adopter of Claude Code from Anthropic, put it plainly in his newsletter. He had expected the experience to feel like handing work to a capable junior and checking back later. Instead, he found he could not step away from the screen. If he did, he risked losing the thread of what the agent had done and whether any of it had gone sideways.
He described feeling not like a maestro directing the work, but like someone being dragged along by it. The agent was running. He was watching the agent run.
That is the dynamic the productivity slide does not show. The human is still in the loop. The loop just got longer, less predictable, and harder to step away from than the original task ever was.
I have been using these tools myself, mainly for code on my personal blogs, and the pattern is consistent: the agent makes decisions without fully understanding the context, adds something that did not need to be there, removes something that did. You catch it, you correct it, you keep watching. There is a low-grade anxiety that runs the whole time, a background awareness that something could go wrong in a way you will not notice until it has already propagated three steps further. The only reason it has not been catastrophic for me is that a broken blog post is a recoverable problem. The cost of failure is not devastating at that scale. At enterprise scale, that calculus changes considerably.
Researchers at Harvard Business Review have named this AI brain fry: the specific cognitive load that comes from sustained supervision of systems that are mostly correct, mostly of the time. Mostly is the operative word. A procurement analyst who processes invoices manually knows the error states. The job has shape. The same analyst supervising an agent doing the same invoices has a murkier assignment: stay alert enough to catch mistakes she is not doing the work to see coming. That requires a different kind of attention, and it does not get easier with practice the way a structured task does.
Most agent errors are not obvious crashes. They are quiet misreadings: the agent removes a line of code that looks redundant but is load-bearing, or adds a configuration that is technically valid but wrong for your specific environment. These errors do not announce themselves. They surface later, in production, or when something downstream behaves unexpectedly.
The reason is structural. Agents operate on the instructions and context they are given at the start of a task. When the task evolves, or when domain knowledge lives in someone's head rather than in the prompt, the agent keeps moving forward on its original read of the situation. It is not being reckless. It simply does not know what it does not know. Supervision is not a backup option in this scenario. It is the primary error-detection mechanism.
The scale question makes this harder. Anthropic has said publicly it expects its general-purpose agent, Claude Cowork, to surpass Claude Code in adoption. More agents in more workflows means more employees in Dibia's position, running the same uncomfortable calculation: how far can I let this run before I need to intervene?
Mandates are making it worse
Alongside the fatigue data sits a separate problem that technology leaders are creating for themselves. Employees told to use AI tools because their organization issued a mandate are, according to analysis in Bloomberg Opinion this spring, more likely to push back than employees who found the tools useful on their own terms. Encouragement and requirement produce different psychology. That distinction matters a great deal when the tool you are requiring people to use is also generating cognitive overhead they did not sign up for.
You are, in other words, deploying agents into an environment where the people watching those agents are burning more energy than before, and then adding a mandate that signals the organization either does not know that or does not care. That sequence does not produce engagement. It produces quiet workarounds and usage metrics that look fine in dashboards and mean nothing about actual adoption.
The productivity gains are real in the right conditions. But the right conditions require some honest accounting that most business cases for agentic deployments are not doing. Supervision time is a cost. The employees carrying that load are a resource being drawn down, not a free variable. If the return-on-investment calculation for your agent deployment is built on the assumption that oversight costs nothing because it does not show up on a budget line, the math has a gap in it.
Match autonomy to reversibility. An agent summarizing meeting notes and drafting follow-up emails can run with light oversight; the cost of a bad summary is low and visible. An agent touching pricing logic, customer commitments, or regulated data needs checkpoints designed into the workflow from day one, before a mistake compounds into something that takes three people and two weeks to unwind. The useful question when scoping a deployment is not "what can the agent do?" It is "what does a mistake here actually cost, and who catches it?"
Measure what supervision costs and put it in the business case. If an agent handles a significant volume of tasks weekly and the person monitoring it spends several hours doing so, that time belongs in the model. It is not overhead. It is the job you created when you deployed the agent.
The employees who will get the most out of agentic tools over time are the ones who can direct agents clearly, catch errors before they propagate, and hold enough domain expertise to know when the output is subtly wrong rather than obviously wrong. That is a real capability and building it takes deliberate investment. It is also not the same job as what those employees were doing before, which means assuming the transition is automatic will produce people who are neither doing the old job well nor the new one.
- Employees describe feeling they cannot fully step away from a running agent, even during low-stakes tasks
- Adoption metrics look healthy but time-to-completion on supervised tasks has not improved significantly
- Errors are being caught, but mostly after the fact rather than before propagation
- The people supervising agents are also the people with the deepest domain expertise, creating a concentration risk
- Your business case for the deployment does not include a line for supervision time
Before the next agent deployment goes to a business case review: who specifically is doing the oversight, how many hours a week does it realistically require, and does that cost appear anywhere in the return-on-investment model? If the answer to the last question is no, the model is incomplete and the productivity gain you are planning for may not arrive on the schedule you expect.
Sources
- BCG Survey on AI and Worker Mental Fatigue, January 2026. Reported via Bloomberg Weekend, "The Forecast: When AI FOMO Takes Over," April 5, 2026. bloomberg.com
- Victor Dibia, "Upgrade Or," Newsletter. newsletter.victordibia.com
- Bedard, Kropp, Hsu, Karaman, Hawes, and Kellerman (Boston Consulting Group / University of California, Riverside), "When Using AI Leads to Brain Fry," Harvard Business Review, March 5, 2026. hbr.org
- Gautam Mukunda, "AI: Why Corporate Mandates to Use It Won't Work," Bloomberg Opinion, March 30, 2026. bloomberg.com
- Anthropic, Claude Cowork expected to surpass Claude Code. Bloomberg, April 1, 2026. bloomberg.com
