MAI-Image-2.5 Hits a Leaderboard. Copilot Still Can't Help You Build a Power Automate Flow.

MAI-Image-2.5 Hits a Leaderboard. Copilot Still Can't Help You Build a Power Automate Flow.

Enterprise AI · Microsoft
Microsoft's new image model ranks third on Arena.AI. The product that enterprise users actually touch every day still can't reliably explain its own workflows.
By Shashi Bellamkonda  ·  May 26, 2026  ·  shashi.co
#3
Arena.AI text-to-image rank
3rd
MAI-Image release in the series
0
Power Automate flows Copilot helped me build reliably

The number is real. MAI-Image-2.5 debuted at third on the Arena.AI text-to-image leaderboard today, with visible gains in text rendering, commercial imagery, and stylized illustration. Mustafa Suleyman's team is shipping.

Now try asking Microsoft Copilot how to build a Power Automate flow. At some point in that conversation, you will open another tab. Not a competitor's product. Just another Microsoft page, because Copilot gave you something generic or wrong about a tool Microsoft owns.

That gap is not new. What's new is Microsoft announcing model wins on one side while the daily product experience stays flat on the other.

A leaderboard rank is not a product delivery

MAI-Image-2.5 scoring on Arena.AI means Microsoft's research side can compete. It says nothing about whether that work reaches the person in front of Microsoft 365 trying to automate something before their next meeting.

Getting a model to perform on a benchmark and getting Copilot to walk someone through a Power Automate trigger condition are two separate engineering problems. Microsoft has put money into the first one. The second one is where people feel the gap every day, and a leaderboard result does not close it.

"Strong models and a useful product are not the same thing. Microsoft keeps announcing the first while users keep leaving the second to go find help somewhere else."

Amazon Quick started with the friction, not the capability

Amazon Quick is a useful comparison here. Amazon built it around a simple premise: your work is already scattered across Slack, your inbox, your CRM, your documents, and nothing ties it together. Quick connects those tools, grounds its answers in your actual data, and does things like schedule, draft, and build dashboards without you switching windows. The product starts with what a person is already frustrated about.

Copilot's pitch has been the reverse. Microsoft announces model improvements and benchmark positions, then asks users to discover the value themselves. For someone who just wants Copilot to know how Power Automate works, that approach does not land.

Microsoft owns Power Automate. Microsoft owns Power BI. Microsoft 365 is the product surface. No outside company has more context about how these tools work and what users get stuck on. The problem is not access to that knowledge. It is whether fixing that experience is being prioritized alongside the model work.

Suleyman is doing the right work at the wrong speed for users

The Suleyman optimism holds. He took Microsoft AI with a mandate to build in-house and cut the OpenAI dependence. MAI-Image-2.5 is the third entry in the MAI series, which now spans image, voice, and text. That is a portfolio being built with intention.

But habits form fast. Every week Copilot sends a user to another tab is a week that user learns to skip Copilot. MAI-Image-2.5 goes to the MAI Playground and Microsoft Foundry next week. That serves developers. It does not change what happens when a Microsoft 365 user needs help with a workflow today.

Who the announcement is actually for

Text-to-image is visible. It generates shareable output and ranks on a public leaderboard. Fixing Power Automate guidance inside Copilot is invisible work with no press release. The MAI team is choosing what to announce, and the audience for a third-place Arena.AI finish is developers, journalists, and investors.

Microsoft's installed Power Automate base is enormous. Those users are not a new acquisition story. They are an existing relationship that gets weaker every time Copilot lets them down on something Microsoft built. That is the retention problem, and no benchmark covers it.

Prior coverage
In February 2026, this site covered Microsoft's move toward a multi-model Copilot approach, including Claude models inside Copilot Studio. The flexibility is real. It still does not fix what happens when a user asks Copilot about their own Microsoft tools.

The MAI-Image-2.5 result is worth noting. But the more useful number is the one in the stat strip above: how many Power Automate flows Copilot actually helped complete without sending the user somewhere else.

CIO/CTO Viability Question
Before your next Microsoft Copilot license renewal, run this test: ask Copilot to walk you through building a non-trivial Power Automate flow. Count how many steps require you to go elsewhere. That number tells you more about where Microsoft's AI investment is actually landing than any leaderboard ranking will.
Sources
MAI Superintelligence Team. "MAI-Image-2.5 Launches at No. 3 on Arena.AI." microsoft.ai, 26 May 2026.
Bellamkonda, Shashi. "The Vertical Integration of Intelligence: Why Microsoft Is Moving Toward Self-Sufficiency." shashi.co, 14 Feb. 2026.
Bellamkonda, Shashi. "Microsoft Copilot Tasks: AI That Works, Not Just Talks." shashi.co, 28 Feb. 2026.
Shashi Bellamkonda · Principal Research Director, Info-Tech Research Group · Former Adjunct Professor, Georgetown University, Entrepreneur in Residence, Stony Brook University, NY.
Disclaimer: This blog reflects my personal views only. Content does not represent the views of my employer, Info-Tech Research Group. AI tools may have been used for brevity, structure, or research support. Please independently verify any information before relying on it.