Claude Opus 4.5: First Model to Break 80% on SWE-Bench

On November 24, 2025, Anthropic released Claude Opus 4.5, becoming the first language model to exceed 80% accuracy on SWE-Bench Verified—a respected benchmark measuring real-world software engineering capabilities. The model achieved 80.9% accuracy, setting a new standard for AI coding performance.

The release comes days after Google's Gemini 3 (November 18) and OpenAI's GPT-5.1 (November 12), marking an intense period of competition among frontier AI models. Anthropic also significantly reduced pricing from the previous Opus model, making it more accessible for production use.

Source: Anthropic announcement, TechCrunch, CNBC, November 24, 2025

What Changed

Claude Opus 4.5 is Anthropic's flagship model, completing the 4.5 series that includes Sonnet 4.5 (September) and Haiku 4.5 (October). According to Anthropic, the model demonstrates improvements across multiple areas:

Coding Performance: First model to achieve over 80% on SWE-Bench Verified. According to internal benchmarks, it outperforms competing models from Google and OpenAI on coding tasks.

Agent Capabilities: Improved performance on long-horizon autonomous tasks requiring sustained reasoning and multi-step execution. According to Anthropic, the model handles complex workflows with fewer dead ends.

Computer Use: Achieved 66.3% on OSWorld, a benchmark measuring AI's ability to control computer interfaces. Includes a new zoom tool allowing the model to request magnified screen regions for inspection.

Efficiency: According to customer feedback reported by Anthropic, Opus 4.5 uses fewer tokens to solve the same problems compared to previous models.

Memory Improvements: Better context management for long operations. According to Dianne Na Penn, Anthropic's head of product management for research, quoted in TechCrunch: "Knowing the right details to remember is really important in complement to just having a longer context window."

The Pricing Change

Anthropic reduced Opus pricing significantly:

Previous Opus: $15 per million input tokens, $75 per million output tokens
Opus 4.5: $5 per million input tokens, $25 per million output tokens

This represents a 67% price reduction on input and output. The new pricing makes Opus more competitive with Google's Gemini 3 Pro ($2/$12, or $4/$18 for contexts over 200K tokens) and OpenAI's GPT-5.1 ($1.25/$10), though still more expensive than both.

For comparison, Anthropic's other models are priced at: Sonnet 4.5 ($3/$15) and Haiku 4.5 ($1/$5).

Additional cost savings are available through prompt caching (up to 90%) and batch processing (50%).

What Customers Are Saying

Anthropic shared feedback from early access customers:

GitHub reported that Claude Opus 4.5 "delivers high-quality code and excels at powering heavy-duty agentic workflows with GitHub Copilot," with internal testing showing it "surpasses internal coding benchmarks while cutting token usage in half."

Lovable noted that Opus 4.5 "delivers frontier reasoning within Lovable's chat mode, where users plan and iterate on projects. Its reasoning depth transforms planning—and great planning makes code generation even better."

Cursor stated that "Claude Opus 4.5 is a notable improvement over the prior Claude models inside Cursor, with improved pricing and intelligence on difficult coding tasks."

According to Anthropic's internal testing, when testers evaluated the model before release, they reported that "Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding" and that "tasks that were near-impossible for Sonnet 4.5 just a few weeks ago are now within reach."

New Product Releases

Alongside Opus 4.5, Anthropic expanded several products:

Claude for Chrome: Browser extension expanding to all Max users, allowing Claude to take action across browser tabs.

Claude for Excel: Now generally available to Max, Team, and Enterprise users. The model can understand and edit spreadsheets directly.

Claude Code Desktop: Previously available only on mobile and web, now available in the desktop app including Mac.

Conversation Limits: Usage caps removed for Opus users. Max and Team Premium members received increased overall usage limits.

The Competitive Context

Opus 4.5 arrives during an intense period of model releases. Google announced Gemini 3 on November 18, positioning it as their most capable model with day-one deployment to 2 billion Search users. OpenAI released GPT-5.1 on November 12 with personality customization features.

Anthropic's timing suggests deliberate competition. According to CNBC reporting, the company's valuation recently increased to $350 billion following multi-billion-dollar investments from Microsoft and Nvidia announced last week.

The SWE-Bench Verified achievement provides concrete differentiation. Being first to exceed 80% accuracy on a respected coding benchmark gives Anthropic a specific claim against competitors.

Who This Targets

According to Mike White, Anthropic's head of business development quoted in CNBC, ideal users for Claude Opus 4.5 include "professional software developers and knowledge workers like financial analysts, consultants and accountants" as well as people who are "excited to push their own creativity, build new things, expand their professional purview."

Specific use cases highlighted by Anthropic:

Software engineering teams working on complex codebases, code migration, and refactoring projects.

Development tool companies like GitHub, Cursor, and Lovable integrating AI coding capabilities into their platforms.

Knowledge workers in finance and consulting creating spreadsheets, presentations, and conducting deep research.

Teams building autonomous agents requiring sustained multi-step reasoning and execution across complex workflows.

Availability and Access

Claude Opus 4.5 is available through multiple channels:

For Consumers: Available in Claude apps for Pro, Max, Team, and Enterprise users. Desktop apps now include Claude Code.

For Developers: Available via Claude API using model identifier claude-opus-4-5-20251101. Also available on Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Context Window: 200,000 tokens (same as Sonnet 4.5)
Output Limit: 64,000 tokens
Knowledge Cutoff: March 2025

Real-World Testing

Simon Willison, a developer with preview access, documented his experience using Opus 4.5 in Claude Code over a weekend. According to his blog post, the model was "responsible for most of the work across 20 commits, 39 files changed, 2,022 additions and 1,173 deletions in a two day period" for a major refactoring project.

Anthropic also tested Opus 4.5 on a difficult take-home exam given to prospective performance engineers. According to the company, the model scored higher than any human candidate.

Security and Safety

Anthropic emphasized improvements in robustness against prompt injection attacks. According to the company's announcement, "Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry."

However, benchmarks show single attempts at prompt injection still succeed approximately 5% of the time, with success rates increasing to roughly 33% if attackers can try ten different approaches.

The company describes Opus 4.5 as its "most robustly aligned model to date" while acknowledging that no system is completely immune to sophisticated attacks.

What This Signals

The rapid succession of flagship model releases—Gemini 3, GPT-5.1, and now Opus 4.5 within two weeks—demonstrates accelerating competition among AI labs. Each company is claiming superiority on different benchmarks and use cases.

Anthropic's focus on coding benchmarks and developer tools positions Claude as the preferred choice for software engineering applications. The significant price reduction makes Opus viable for production use cases where previous pricing was prohibitive.

The expansion into Chrome, Excel, and desktop applications shows Anthropic moving beyond API-only distribution to embed Claude directly in productivity workflows.

With a $350 billion valuation following recent Microsoft and Nvidia investments, Anthropic has capital to sustain development and compete with OpenAI and Google long-term.

The Bottom Line

Claude Opus 4.5 achieves measurable improvements on coding benchmarks and delivers significant price reductions compared to previous Opus models. Being first to exceed 80% on SWE-Bench Verified provides concrete differentiation in a crowded market.

For developers already using Claude, Opus 4.5 offers better performance at lower cost. For teams evaluating AI coding assistants, the SWE-Bench results and customer testimonials from GitHub and Cursor provide evidence of production capabilities.

The timing—days after Google and OpenAI releases—demonstrates Anthropic's ability to compete at the frontier of AI development while maintaining a distinct focus on safety, reliability, and developer experience.

Strategic Engagement

→ Request Analyst Briefing → Book Shashi for Speaking

Disclaimer: This blog reflects my personal views only. AI tools may have been used for research support. This content does not represent the views of my employer, Info-Tech Research Group.