AI News 5 min read

Claude Sonnet 5: Anthropic's Most Agentic Mid-Tier Model Yet

Claude Sonnet 5, released June 30, 2026, is Anthropic's most agentic mid-tier model. It beats Sonnet 4.6 on every published benchmark (63.2% SWE-bench Pro, 80.4% Terminal-Bench 2.1, 81.2% OSWorld) and edges Opus 4.8 on GDPval-AA v2 knowledge work. Intro pricing is /0 per million tokens through Aug 31, 2026, then /5. A new tokenizer can raise token counts up to 1.35x, and xhigh effort can cost more than Opus 4.8.

Sarah Chen

Jul 1, 2026

Anthropic shipped Claude Sonnet 5 on June 30, 2026, and the framing is deliberate: this is the company's "most agentic Sonnet yet." Not the smartest model it makes — that's still Opus 4.8 — but the one built to plan, drive browsers and terminals, and keep its footing across long, autonomous task chains without falling over. If you live inside Claude Code, Cursor, or a homegrown agent loop, this is the release that matters most to you this month.

It's already the default for Free and Pro users on Claude.ai, selectable for Max, Team, and Enterprise, and live on the Claude Platform via the model string claude-sonnet-5.

What actually changed

Sonnet 5 is an upgrade to Sonnet 4.6, which landed in February 2026. Anthropic didn't build the launch around one headline benchmark. It built it around agentic reliability — the unglamorous stuff that decides whether an agent finishes a job or wanders off a cliff halfway through.

In practice that means longer task chains without losing the thread, better self-correction when a tool call fails, and steadier behavior across extended sessions. Anthropic also reports lower hallucination, sycophancy, and undesirable-behavior rates than 4.6. One early tester asked it to investigate a bug; it wrote a reproducing test, implemented the fix, then confirmed the bug came back without the change — all in a single pass. That's the kind of closed-loop discipline agents usually fake.

The model exposes four effort levels — low, medium, high, and xhigh. Higher effort spends more tokens on reasoning, buying quality at the cost of latency and money. That dial turns out to be the whole story on pricing, which we'll get to.

The benchmarks

Anthropic published a comparison table against Sonnet 4.6 and Opus 4.8. Sonnet 5 beats its predecessor in every tested category and closes much of the gap to the flagship.

Metric	Sonnet 4.6	Sonnet 5	Opus 4.8
Agentic coding (SWE-bench Pro)	58.1%	63.2%	69.2%
Terminal-Bench 2.1	67.0%	80.4%	not reported
Computer use (OSWorld-Verified)	78.5%	81.2%	not reported
Humanity's Last Exam (with tools)	46.8%	57.4%	57.9%
Knowledge work (GDPval-AA v2)	not reported	1,618	1,615

Two numbers deserve a second look. The 13-point jump on Terminal-Bench 2.1 (67.0% to 80.4%) is the biggest single gain, and it lines up exactly with the "drive the terminal autonomously" pitch. And on GDPval-AA v2 knowledge work, Sonnet 5's 1,618 actually edges out Opus 4.8's 1,615 — the one place the mid-tier model nudges ahead of the flagship.

On Humanity's Last Exam with tools, Sonnet 5's 57.4% nearly matches Opus 4.8's 57.9%. On SWE-bench Pro, though, Opus 4.8 still leads clearly at 69.2%. The pattern is consistent: Sonnet 5 gets you most of Opus for agentic and knowledge work, and the last few points still cost you the flagship.

The pricing catch

Here's where you need to read the fine print. Sonnet 5 launches at $2 per million input tokens and $10 per million output — but that's introductory pricing that runs only through August 31, 2026. After that it reverts to $3/$15, the same as Sonnet 4.6. For comparison, Opus 4.8 is $5/$25.

The intro price is genuinely compelling. The standard price is merely fine. Budget for $3/$15, and treat the next two months as a discount window, not the new normal.

There's a second, sneakier catch: Sonnet 5 uses an updated tokenizer (the same one introduced with Opus 4.7). The same text can map to roughly 1.0 to 1.35 times more tokens than on older models. So a naive "same price as 4.6 after August" comparison undersells the real cost — your token counts can quietly climb 15–35% on identical inputs. Model your spend on your own traffic before you migrate.

And the effort dial cuts both ways. Sonnet 5 is a strict improvement over 4.6 at every effort level, but the clear value shows up at low and medium effort, where you get quality earlier Sonnet pricing simply couldn't buy. Crank it to xhigh and the cost can exceed Opus 4.8 for comparable quality — at which point you should just use Opus.

Where it fits

The routing policy writes itself. Send the bulk of your agentic coding, tool use, and knowledge work to Sonnet 5. Reserve Opus 4.8 for accuracy-critical tasks and the hardest problems. Keep Haiku 4.5 for high-volume, latency-sensitive calls where you don't need the reasoning.

One caveat worth flagging: Sonnet 5's cyber capability is intentionally low. If you're doing sanctioned security work, Anthropic points you back to Opus. That's a deliberate safety choice, not an oversight.

Migration is close to free. The API call is identical to any other Anthropic model — you swap the model string to claude-sonnet-5 and you're done. It also carries the 1M-token context window, enough to load a full codebase into a single prompt, which pairs naturally with the long-horizon agent use case.

The Bottom Line

Claude Sonnet 5 is the most sensible default Anthropic has shipped for agents and everyday coding. It closes most of the gap to Opus 4.8, edges it on knowledge work, and — at the intro price — does it for a fraction of the cost. The asterisks are real, though: the $2/$10 window closes August 31, the new tokenizer can inflate your token counts by up to 35%, and xhigh effort erases the savings. Run the math on your own workload, ride the discount while it lasts, and keep Opus in your back pocket for the tasks where the last few points actually matter.

anthropic ai-models ai-agents benchmarks reasoning-models

More in AI News

AI News

Grok 4.3: xAI's Frontier Model Hits Amazon Bedrock

Grok 4.3 is generally available on Amazon Bedrock with a 1M-token context window, $1.25/$2.50 pricing, and a top hallucination-rate score.

By Sarah Chen · 4 min · Jun 29, 2026

AI News

GLM-5.2: Zhipu's Open-Weight Model Beats GPT-5.5 at 1/6 the Cost

Z.AI released GLM-5.2 on June 16, 2026: a 753B-parameter MoE model under an MIT license with a 1M-token context. It tops open-weight coding benchmarks, beating GPT-5.5 on SWE-bench Pro, FrontierSWE and PostTrainBench at roughly one-sixth the cost.

By Sarah Chen · 5 min · Jun 26, 2026

AI News

MAI-Thinking-1: Microsoft's First In-House Reasoning Model

Microsoft unveiled MAI-Thinking-1 at Build 2026, its first reasoning model trained in-house without distillation. The 35B-active, ~1T-total MoE has a 256k context window, scores 97.0% on AIME 2025 and matches Claude Opus 4.6 on SWE-Bench Pro. It's in private preview on Microsoft Foundry.

By Sarah Chen · 5 min · Jun 23, 2026