DeepSeek V4 Pro: 1.6T Open-Weights Model Hits #2 on the Index
AI News 5 min read

DeepSeek V4 Pro: 1.6T Open-Weights Model Hits #2 on the Index

Sarah Chen
Sarah Chen
Apr 29, 2026

A year after DeepSeek R1 rattled Silicon Valley with a $5M training budget that embarrassed labs spending billions, the Hangzhou-based outfit is back at the front of the pack. On April 24, 2026, DeepSeek released V4 Pro and V4 Flash under an MIT license — its first new architecture since V3, and its first two-tier release.

The headline result: V4 Pro is the #2 open-weights reasoning model on the Artificial Analysis Intelligence Index, behind only Moonshot's Kimi K2.6.

A 10-Point Jump in Intelligence

Numbers, not vibes. Artificial Analysis benchmarked V4 Pro at 52 on the Intelligence Index in Max Effort reasoning mode — a 10-point gain over V3.2's 42. V4 Flash lands at 47, putting a 284B-parameter model roughly at Claude Sonnet 4.6 (Max) level intelligence.

Model Intelligence Index Total / Active Params Context
Kimi K2.6 54 (open weights)
DeepSeek V4 Pro 52 1.6T / 49B 1M tokens
DeepSeek V4 Flash 47 284B / 13B 1M tokens
DeepSeek V3.2 42 671B / 37B 128K
Claude Sonnet 4.6 (Max) ~47 (closed)

V4 Pro is DeepSeek's largest model to date — a 1.6T parameter mixture-of-experts that activates 49B per token. The architecture is a clean break from V3's 671B / 37B layout. Both V4 variants are hybrid thinking/non-thinking models with a 1-million-token context window, an 8x expansion over V3.2's 128K.

"V4 introduces a new architecture... DeepSeek's first two-tier lineup, with Pro positioned for maximum capability and Flash for faster, lower-cost inference." — Artificial Analysis

Where V4 Pro Actually Wins: Agents

The Intelligence Index is one number. The more interesting story is GDPval-AA, Artificial Analysis's benchmark for agentic real-world work tasks. V4 Pro scored 1554, finishing ahead of the entire open-weights field:

  • DeepSeek V4 Pro: 1554
  • GLM-5.1: 1535
  • MiniMax-M2.7: 1514
  • Kimi K2.6: 1484
  • GLM-5: 1402
  • DeepSeek V4 Flash: 1388

That's not a small margin. On real, scored work — software engineering, data analysis, document workflows — V4 Pro is the best open-weights model available. If you're building agents and you cannot ship closed weights, this is now the model to beat.

The Pricing Trade-Off

V4 Pro costs $1.74 / $3.48 per 1M input/output tokens on DeepSeek's first-party API. V4 Flash is dramatically cheaper at $0.14 / $0.28 per 1M.

To run the full Intelligence Index evaluation, V4 Pro burned $1,071more than 4x cheaper than Claude Opus 4.7 ($4,811), but noticeably more expensive than its open peers: Kimi K2.6 ($948), GLM-5.1 ($544), V3.2 ($71), and gpt-oss-120B ($67).

The reason isn't the per-token rate — it's token sprawl. V4 Pro emitted 190M output tokens to finish the eval. V4 Flash burned 240M. Both are reasoning-heavy, both think out loud, and both will inflate your bill if you turn Max Effort on for everything.

The Catch: 94% Hallucination Rate

Here's the part that should make every production team pause. On Artificial Analysis's AA-Omniscience benchmark, V4 Pro scored -10 — an 11-point improvement over V3.2's -21, driven mostly by higher accuracy. V4 Flash scored -23, roughly tied with V3.2.

But the hallucination rate is brutal: 94% for V4 Pro, 96% for V4 Flash. That number measures how often the model answers anyway when it doesn't know the answer. Translation: V4 will confidently make things up almost every time it hits a knowledge gap.

This is the headline risk for V4 in production. If your application surfaces model outputs to end users without a retrieval layer or a verification step, V4 will embarrass you. Pair it with retrieval. Add a verifier. Don't trust freestanding facts.

Architecture and License

The V4 paper introduces a hybrid attention scheme — Compressed Sparse Attention (CSA) combined with Heavily Compressed Attention (HCA) — designed to push down inference cost on long contexts. DeepSeek reports V4 Pro needs roughly 27% of single-token inference FLOPs and 10% of the KV cache versus V3.2 at equivalent positions.

Both models are MIT-licensed and available immediately on DeepSeek's first-party API. Hugging Face weights are live for both V4 Pro and V4 Flash, and most major inference providers (Together, Fireworks, OpenRouter) are expected to host within days.

Both models remain text-input/text-output only — no vision, no audio. If you need multimodal, look at NVIDIA's Nemotron 3 Nano Omni or Qwen 3.6 VL.

Who Should Use What

Use V4 Pro if you're building agents, you can absorb the token cost, and you have retrieval or verification in front of it. The GDPval-AA result is the strongest agentic signal from any open-weights model right now.

Use V4 Flash if you want roughly Claude Sonnet 4.6 intelligence at $0.28 per million output tokens. It's the cheapest "good enough" reasoning model in the open-weights catalog, full stop.

Don't use either for raw factual recall without grounding. The hallucination rate isn't a rounding error — it's a design choice.

The Bottom Line

DeepSeek V4 doesn't reclaim the open-weights crown — Kimi K2.6 still holds that on intelligence — but it puts DeepSeek back at the negotiating table after a year of being lapped. V4 Pro is the best open-weights agent model that's been released, and V4 Flash is the cheapest competent reasoner money can buy. The architecture is new, the license is permissive, and the 1M context is finally table stakes.

Just don't ask it questions you can't verify.