GLM-5.2: Zhipu's Open-Weight Model Beats GPT-5.5 at 1/6 the Cost
AI News 5 min read intermediate

GLM-5.2: Zhipu's Open-Weight Model Beats GPT-5.5 at 1/6 the Cost

Z.AI released GLM-5.2 on June 16, 2026: a 753B-parameter MoE model under an MIT license with a 1M-token context. It tops open-weight coding benchmarks, beating GPT-5.5 on SWE-bench Pro, FrontierSWE and PostTrainBench at roughly one-sixth the cost.

Sarah Chen
Sarah Chen
Jun 26, 2026

On June 16, 2026, Beijing-based Z.AI (formerly Zhipu AI) released GLM-5.2, and the open-weight world shifted under our feet again. This is not another "good for a free model" story. On several of the hardest long-horizon coding benchmarks, GLM-5.2 edges past OpenAI's GPT-5.5 — and it does so at roughly one-sixth of the cost. The weights ship under a plain MIT license, which means you can download them, run them on your own hardware, and never pay a per-token bill again.

If 2025 was the year open weights caught up to the mid-tier, GLM-5.2 is the moment they started trading blows with the frontier.

What Z.AI actually shipped

GLM-5.2 is a 753-billion-parameter Mixture-of-Experts (MoE) model with roughly 40 billion parameters active per token. It carries a 1-million-token context window — about five times the effective working memory of its predecessor, GLM-5.1 — and can emit up to 128K tokens of output in a single response. The weights are published on both Hugging Face and ModelScope, and the model runs on the usual open inference stacks: transformers, vLLM, SGLang, and ktransformers.

Two architectural choices stand out. The first, which Z.AI calls IndexShare, reuses a single lightweight indexer across every four sparse-attention layers, cutting per-token compute by roughly 2.9x at full 1M context — the trick that makes a million-token window economically sane rather than a marketing number. The second is an improved multi-token-prediction layer for speculative decoding that lifts acceptance length by up to 20%, which is a fancy way of saying it generates tokens faster.

The model also ships with the table stakes that actually matter for agents: thinking mode, function calling, structured output, context caching, and native MCP integration.

The benchmarks that matter

Standard chatbot leaderboards are noisy. The interesting story is in long-horizon coding — the multi-step, multi-hour engineering tasks where models usually fall apart. Here is how GLM-5.2 stacks up against GPT-5.5, per independent benchmarks reported by VentureBeat:

Benchmark GLM-5.2 GPT-5.5
SWE-bench Pro 62.1 58.6
FrontierSWE 74.4% 72.6%
PostTrainBench 34.3% 25.0%
SWE-Marathon 13.0% 12.0%
Terminal-Bench 2.1 81.0

The gap on PostTrainBench is the one to watch: a 9-point lead on a benchmark built around sustained, real-world engineering trajectories. GLM-5.2 is now the top-ranked open-weight model on long-horizon coding, sitting just behind Claude Opus 4.8 among all models, open or closed.

A note of caution worth keeping: these are the numbers Z.AI and early testers are reporting, and SWE-Marathon's 13.0-vs-12.0 margin is well inside the noise. The headline is not "GLM-5.2 is the best coder alive." It is that an MIT-licensed model is now genuinely in the same conversation.

The cost story is the real headline

Capability you can match is interesting. Capability you can match at a fraction of the price is disruptive.

Hosted GLM-5.2 runs about $1.40 per million input tokens and $4.40 per million output tokens. GPT-5.5 lists at $5.00 input and $30.00 output. On a typical workload that is roughly one-sixth the price — before you account for the option to self-host and pay nothing per token at all.

For a startup running an autonomous coding agent through thousands of iterations a day, that is the difference between a viable product and a burned runway. And because the weights are open under MIT, the pricing ceiling is your own GPU bill, not a vendor's roadmap.

Why MIT matters more than the score

Plenty of "open" models arrive wrapped in custom licenses full of usage carve-outs. GLM-5.2 ships under the MIT license — the same permissive, four-paragraph license that governs countless production codebases. Commercial use is allowed, redistribution is allowed, and there is no acceptable-use appendix waiting to trip up your legal team.

That is a deliberate strategic move. Z.AI is not trying to win a single benchmark cycle; it is trying to make GLM the default substrate that other companies build on, the way Meta hoped Llama would be. An MIT license is how you do that.

The catch

Self-hosting a 753B-parameter model is not a weekend project. Even with MoE sparsity keeping active parameters around 40B, you need serious multi-GPU hardware to serve it at full 1M context. For most teams, the practical path is a hosted provider like FriendliAI or Novita — which still lands you at one-sixth of GPT-5.5's price, just without the "zero marginal cost" dream.

And as always, benchmark leadership is a snapshot. The frontier labs ship monthly. The durable advantage here is not the score; it is the license and the price floor it sets for everyone else.

The Bottom Line

GLM-5.2 is the clearest sign yet that the open-weight frontier is no longer a generation behind. It matches or beats GPT-5.5 on the coding benchmarks that reflect real engineering work, it does so at roughly one-sixth the cost, and it hands you the weights under an MIT license to do with as you please. The headline is not that one model dethroned another — it is that the price of frontier-grade coding intelligence just collapsed, and an open model set the new floor.