MiniMax M3: Open-Weight Frontier Coding Model With 1M Context
For years, the gap between Chinese open-weight models and Western flagships was "close enough on price, not quite there on quality." On June 1, 2026, MiniMax shipped a model that argues the gap has narrowed to a sliver — and that the sliver runs more than 15× cheaper.
What MiniMax Actually Shipped
MiniMax M3 is the Shanghai lab's new flagship, and the pitch is unusually concrete: it claims to be the first open-weight model to combine three frontier capabilities in a single architecture — frontier-level coding, a 1-million-token context window, and native multimodality (image and video understanding).
You can use M3 today through three channels: the standard API, MiniMax's monthly token plans, and MiniMax Code, the company's agent product. The catch worth flagging up front: at launch the model is hosted-only. MiniMax says the open weights and a full technical report will land on Hugging Face and GitHub within roughly ten days — so the "open-weight" label is a promise the company hasn't fully cashed yet.
The repo to watch is
huggingface.co/MiniMaxAIfor a model card titledMiniMax-M3.
The Real Story Is the Attention Architecture
The headline isn't the benchmarks — it's how M3 gets them. The model is built on MiniMax Sparse Attention (MSA), and the interesting part is that MiniMax abandoned sparse attention for its entire M2 generation before bringing it back here.
A quick refresher on why this matters:
- Full attention lets every token in the context attend to every other token. It's accurate but expensive, and the cost scales quadratically as context grows.
- Sparse attention skips most of those connections, focusing compute only on the tokens that matter.
M3's design uses a lightweight index branch to scan incoming tokens and select which blocks of past tokens deserve attention, then runs attention only on those blocks. Crucially, it does block selection on the real, uncompressed key-values rather than a compressed approximation — which is how MiniMax claims to keep precision while cutting cost.
This is a public self-correction worth appreciating. In its own M2 engineering notes, the MiniMax team wrote that "the infrastructure for linear and sparse attention is much less mature" than full attention. A year later, they shipped production sparse attention with order-of-magnitude speedups. The bet paid off.
The efficiency numbers
All figures are measured at 1M-token context against the prior generation:
| Metric | MiniMax M3 |
|---|---|
| Per-token compute at 1M context | 1/20 of prior generation |
| Prefill (processing input) | >9× faster |
| Decoding (generating output) | >15× faster |
| Output speed | ~100 tokens/sec |
| Context window | Up to 1M tokens (512K guaranteed minimum) |
The business angle here is sharper than "big context window." Most teams don't need to dump an entire monorepo into one prompt. What they need is long-context inference that's cheap enough to run in a loop — and cutting per-token compute to a twentieth is what makes long-running agents economical instead of a demo-only luxury.
The Benchmarks — and the Asterisk
On coding and agentic tasks, M3's self-reported numbers put it in frontier company:
| Benchmark | MiniMax M3 | Measures |
|---|---|---|
| SWE-Bench Pro | 59.0% | Real-world software fixes |
| Terminal-Bench 2.1 | 66.0% | Command-line agent tasks |
| SWE-fficiency | 34.8% | Efficient code changes |
| KernelBench Hard | 28.8% | Low-level kernel optimization |
| MCP Atlas | 74.2% | Tool use via MCP |
| BrowseComp | 83.5 | Web search and browsing |
The standout claims: on SWE-Bench Pro, M3's 59.0% beats GPT-5.5 and Gemini 3.1 Pro and approaches Claude Opus 4.7. On BrowseComp, its 83.5 actually surpasses Opus 4.7's 79.3.
Here's the honest caveat, and MiniMax discloses it themselves: several of these results were run on MiniMax's own infrastructure with agent scaffolding (tools like Claude Code, Mini-SWE-Agent, or Terminus). A model that looks strong on a curated leaderboard can behave very differently inside a messy internal repo. M3 also isn't yet on the neutral DeepSWE board for long-horizon software tasks.
Bottom line on benchmarks: strong enough to take seriously, not yet independent enough to bet production on. Pilot it; don't migrate blind.
Price Is the Loudest Argument
This is where M3 stops being interesting and starts being disruptive. API input pricing starts around $0.30 per million tokens, dropping to a blended ~$0.06 per million with cache optimization. Output runs roughly $1.20 per million during the launch promo.
For comparison, Claude Opus 4.7 runs about $5 per million input and $25 per million output. If M3's quality holds up under independent testing, that's more than 15× cheaper on input — and open-weight on top of it.
For heavier users, MiniMax sells monthly token plans: $20 Plus (~1.7B tokens), $50 Max (~5.1B tokens), and $120 Ultra (~9.8B tokens).
One Honest Limitation: "Open" Has Fine Print
Don't confuse open-weight with open-source. Open-weight means you can download and run the files. Open-source, strictly, also means the license permits unrestricted commercial use.
MiniMax's earlier M2 shipped under a modified-MIT license, but the M2.7 license restricts commercial use without prior written authorization. If M3 follows that precedent, expect downloadable weights with a non-commercial default and enterprise licensing sold separately. Until the technical report drops, this is unsettled — and it matters a lot if you're planning to build a product on top.
The Bottom Line
If your workload genuinely needs 1M-token context — whole-codebase analysis, multi-document research agents, long-running session memory — MiniMax M3 is the first model worth testing first, because it's the one engineered to make that context affordable. For standard coding pipelines, pilot it against your current model but wait for independent benchmarks and the license before committing.
The bigger signal is strategic: Chinese open-weight labs keep widening the cost advantage, and now they're doing it with architecture the closed labs haven't publicly matched. The premium on Western flagships gets harder to defend with every launch like this one.


