MiniMax M3: Open-Weight Frontier Coding Model With 1M Context
AI News 6 min read intermediate

MiniMax M3: Open-Weight Frontier Coding Model With 1M Context

MiniMax M3 is an open-weight model pairing a 1M-token context and revived sparse attention with frontier coding benchmarks at 15x lower cost than Claude Opus 4.7.

Sarah Chen
Sarah Chen
Jun 16, 2026

MiniMax M3: Open-Weight Frontier Coding Model With 1M Context

For years, the gap between Chinese open-weight models and Western flagships was "close enough on price, not quite there on quality." On June 1, 2026, MiniMax shipped a model that argues the gap has narrowed to a sliver — and that the sliver runs more than 15× cheaper.

What MiniMax Actually Shipped

MiniMax M3 is the Shanghai lab's new flagship, and the pitch is unusually concrete: it claims to be the first open-weight model to combine three frontier capabilities in a single architecture — frontier-level coding, a 1-million-token context window, and native multimodality (image and video understanding).

You can use M3 today through three channels: the standard API, MiniMax's monthly token plans, and MiniMax Code, the company's agent product. The catch worth flagging up front: at launch the model is hosted-only. MiniMax says the open weights and a full technical report will land on Hugging Face and GitHub within roughly ten days — so the "open-weight" label is a promise the company hasn't fully cashed yet.

The repo to watch is huggingface.co/MiniMaxAI for a model card titled MiniMax-M3.

The Real Story Is the Attention Architecture

The headline isn't the benchmarks — it's how M3 gets them. The model is built on MiniMax Sparse Attention (MSA), and the interesting part is that MiniMax abandoned sparse attention for its entire M2 generation before bringing it back here.

A quick refresher on why this matters:

  • Full attention lets every token in the context attend to every other token. It's accurate but expensive, and the cost scales quadratically as context grows.
  • Sparse attention skips most of those connections, focusing compute only on the tokens that matter.

M3's design uses a lightweight index branch to scan incoming tokens and select which blocks of past tokens deserve attention, then runs attention only on those blocks. Crucially, it does block selection on the real, uncompressed key-values rather than a compressed approximation — which is how MiniMax claims to keep precision while cutting cost.

This is a public self-correction worth appreciating. In its own M2 engineering notes, the MiniMax team wrote that "the infrastructure for linear and sparse attention is much less mature" than full attention. A year later, they shipped production sparse attention with order-of-magnitude speedups. The bet paid off.

The efficiency numbers

All figures are measured at 1M-token context against the prior generation:

Metric MiniMax M3
Per-token compute at 1M context 1/20 of prior generation
Prefill (processing input) >9× faster
Decoding (generating output) >15× faster
Output speed ~100 tokens/sec
Context window Up to 1M tokens (512K guaranteed minimum)

The business angle here is sharper than "big context window." Most teams don't need to dump an entire monorepo into one prompt. What they need is long-context inference that's cheap enough to run in a loop — and cutting per-token compute to a twentieth is what makes long-running agents economical instead of a demo-only luxury.

The Benchmarks — and the Asterisk

On coding and agentic tasks, M3's self-reported numbers put it in frontier company:

Benchmark MiniMax M3 Measures
SWE-Bench Pro 59.0% Real-world software fixes
Terminal-Bench 2.1 66.0% Command-line agent tasks
SWE-fficiency 34.8% Efficient code changes
KernelBench Hard 28.8% Low-level kernel optimization
MCP Atlas 74.2% Tool use via MCP
BrowseComp 83.5 Web search and browsing

The standout claims: on SWE-Bench Pro, M3's 59.0% beats GPT-5.5 and Gemini 3.1 Pro and approaches Claude Opus 4.7. On BrowseComp, its 83.5 actually surpasses Opus 4.7's 79.3.

Here's the honest caveat, and MiniMax discloses it themselves: several of these results were run on MiniMax's own infrastructure with agent scaffolding (tools like Claude Code, Mini-SWE-Agent, or Terminus). A model that looks strong on a curated leaderboard can behave very differently inside a messy internal repo. M3 also isn't yet on the neutral DeepSWE board for long-horizon software tasks.

Bottom line on benchmarks: strong enough to take seriously, not yet independent enough to bet production on. Pilot it; don't migrate blind.

Price Is the Loudest Argument

This is where M3 stops being interesting and starts being disruptive. API input pricing starts around $0.30 per million tokens, dropping to a blended ~$0.06 per million with cache optimization. Output runs roughly $1.20 per million during the launch promo.

For comparison, Claude Opus 4.7 runs about $5 per million input and $25 per million output. If M3's quality holds up under independent testing, that's more than 15× cheaper on input — and open-weight on top of it.

For heavier users, MiniMax sells monthly token plans: $20 Plus (~1.7B tokens), $50 Max (~5.1B tokens), and $120 Ultra (~9.8B tokens).

One Honest Limitation: "Open" Has Fine Print

Don't confuse open-weight with open-source. Open-weight means you can download and run the files. Open-source, strictly, also means the license permits unrestricted commercial use.

MiniMax's earlier M2 shipped under a modified-MIT license, but the M2.7 license restricts commercial use without prior written authorization. If M3 follows that precedent, expect downloadable weights with a non-commercial default and enterprise licensing sold separately. Until the technical report drops, this is unsettled — and it matters a lot if you're planning to build a product on top.

The Bottom Line

If your workload genuinely needs 1M-token context — whole-codebase analysis, multi-document research agents, long-running session memory — MiniMax M3 is the first model worth testing first, because it's the one engineered to make that context affordable. For standard coding pipelines, pilot it against your current model but wait for independent benchmarks and the license before committing.

The bigger signal is strategic: Chinese open-weight labs keep widening the cost advantage, and now they're doing it with architecture the closed labs haven't publicly matched. The premium on Western flagships gets harder to defend with every launch like this one.