AI News 5 min read intermediate

GLM-5.2: Zhipu's Open-Weight Model Beats GPT-5.5 at 1/6 the Cost

Z.AI released GLM-5.2 on June 16, 2026: a 753B-parameter MoE model under an MIT license with a 1M-token context. It tops open-weight coding benchmarks, beating GPT-5.5 on SWE-bench Pro, FrontierSWE and PostTrainBench at roughly one-sixth the cost.

Sarah Chen

Jun 26, 2026

On June 16, 2026, Beijing-based Z.AI (formerly Zhipu AI) released GLM-5.2, and the open-weight world shifted under our feet again. This is not another "good for a free model" story. On several of the hardest long-horizon coding benchmarks, GLM-5.2 edges past OpenAI's GPT-5.5 — and it does so at roughly one-sixth of the cost. The weights ship under a plain MIT license, which means you can download them, run them on your own hardware, and never pay a per-token bill again.

If 2025 was the year open weights caught up to the mid-tier, GLM-5.2 is the moment they started trading blows with the frontier.

What Z.AI actually shipped

GLM-5.2 is a 753-billion-parameter Mixture-of-Experts (MoE) model with roughly 40 billion parameters active per token. It carries a 1-million-token context window — about five times the effective working memory of its predecessor, GLM-5.1 — and can emit up to 128K tokens of output in a single response. The weights are published on both Hugging Face and ModelScope, and the model runs on the usual open inference stacks: transformers, vLLM, SGLang, and ktransformers.

Two architectural choices stand out. The first, which Z.AI calls IndexShare, reuses a single lightweight indexer across every four sparse-attention layers, cutting per-token compute by roughly 2.9x at full 1M context — the trick that makes a million-token window economically sane rather than a marketing number. The second is an improved multi-token-prediction layer for speculative decoding that lifts acceptance length by up to 20%, which is a fancy way of saying it generates tokens faster.

The model also ships with the table stakes that actually matter for agents: thinking mode, function calling, structured output, context caching, and native MCP integration.

The benchmarks that matter

Standard chatbot leaderboards are noisy. The interesting story is in long-horizon coding — the multi-step, multi-hour engineering tasks where models usually fall apart. Here is how GLM-5.2 stacks up against GPT-5.5, per independent benchmarks reported by VentureBeat:

Benchmark	GLM-5.2	GPT-5.5
SWE-bench Pro	62.1	58.6
FrontierSWE	74.4%	72.6%
PostTrainBench	34.3%	25.0%
SWE-Marathon	13.0%	12.0%
Terminal-Bench 2.1	81.0	—

The gap on PostTrainBench is the one to watch: a 9-point lead on a benchmark built around sustained, real-world engineering trajectories. GLM-5.2 is now the top-ranked open-weight model on long-horizon coding, sitting just behind Claude Opus 4.8 among all models, open or closed.

A note of caution worth keeping: these are the numbers Z.AI and early testers are reporting, and SWE-Marathon's 13.0-vs-12.0 margin is well inside the noise. The headline is not "GLM-5.2 is the best coder alive." It is that an MIT-licensed model is now genuinely in the same conversation.

The cost story is the real headline

Capability you can match is interesting. Capability you can match at a fraction of the price is disruptive.

Hosted GLM-5.2 runs about $1.40 per million input tokens and $4.40 per million output tokens. GPT-5.5 lists at $5.00 input and $30.00 output. On a typical workload that is roughly one-sixth the price — before you account for the option to self-host and pay nothing per token at all.

For a startup running an autonomous coding agent through thousands of iterations a day, that is the difference between a viable product and a burned runway. And because the weights are open under MIT, the pricing ceiling is your own GPU bill, not a vendor's roadmap.

Why MIT matters more than the score

Plenty of "open" models arrive wrapped in custom licenses full of usage carve-outs. GLM-5.2 ships under the MIT license — the same permissive, four-paragraph license that governs countless production codebases. Commercial use is allowed, redistribution is allowed, and there is no acceptable-use appendix waiting to trip up your legal team.

That is a deliberate strategic move. Z.AI is not trying to win a single benchmark cycle; it is trying to make GLM the default substrate that other companies build on, the way Meta hoped Llama would be. An MIT license is how you do that.

The catch

Self-hosting a 753B-parameter model is not a weekend project. Even with MoE sparsity keeping active parameters around 40B, you need serious multi-GPU hardware to serve it at full 1M context. For most teams, the practical path is a hosted provider like FriendliAI or Novita — which still lands you at one-sixth of GPT-5.5's price, just without the "zero marginal cost" dream.

And as always, benchmark leadership is a snapshot. The frontier labs ship monthly. The durable advantage here is not the score; it is the license and the price floor it sets for everyone else.

The Bottom Line

GLM-5.2 is the clearest sign yet that the open-weight frontier is no longer a generation behind. It matches or beats GPT-5.5 on the coding benchmarks that reflect real engineering work, it does so at roughly one-sixth the cost, and it hands you the weights under an MIT license to do with as you please. The headline is not that one model dethroned another — it is that the price of frontier-grade coding intelligence just collapsed, and an open model set the new floor.

open-weights llm benchmarks mixture-of-experts ai-coding-agents

More in AI News

AI News

MAI-Thinking-1: Microsoft's First In-House Reasoning Model

Microsoft unveiled MAI-Thinking-1 at Build 2026, its first reasoning model trained in-house without distillation. The 35B-active, ~1T-total MoE has a 256k context window, scores 97.0% on AIME 2025 and matches Claude Opus 4.6 on SWE-Bench Pro. It's in private preview on Microsoft Foundry.

By Sarah Chen · 5 min · Jun 23, 2026

AI News

Mistral: The Industrial AI Pivot Behind Airbus and BMW Deals

Mistral AI used its May 2026 AI Now Summit to pivot toward industrial engineering, announcing a physics-AI stack, the Emmi acquisition, partnerships with Airbus, BMW (crash simulation) and ASML, the unified Vibe agent, and a 10 MW Les Ulis inference data center opening Q3 2026.

By Sarah Chen · 5 min · Jun 19, 2026

AI News

Meta Business Agent: Now Global on WhatsApp & Instagram

On June 3, 2026, Meta made Meta Business Agent globally available to businesses of all sizes across WhatsApp, Messenger, and Instagram. The agent answers questions, recommends catalog products, books appointments, qualifies leads, and closes sales, with human handoff. A new Business Agent Platform connects to hundreds of systems like Shopify, Zendesk, and Shopee. It's free to start, with token-based pricing for larger businesses.

By Sarah Chen · 5 min · Jun 17, 2026