For two years, the rule was simple: Flash models are cheap. Pro models are smart. Pick your trade-off and pay the bill. Google just broke that rule.
Released at Google I/O on May 19, 2026, Gemini 3.5 Flash is the first Flash-tier model that beats its own Pro counterpart on the benchmarks that actually look like work — agentic coding, tool use, multi-step automation. It is also three times more expensive per token than the Flash model it replaces. That contradiction tells you everything about where Google thinks the market is going.
The benchmarks that matter
Google did not bury the numbers. On Terminal-Bench 2.1, the agentic terminal coding suite, 3.5 Flash hits 76.2% against Gemini 3.1 Pro's 70.3%. On MCP Atlas, the multi-step tool-orchestration benchmark, it scores 83.6% — leading Claude Opus 4.7 by 4.5 points and GPT-5.5 by 8.3.
The pattern repeats across the workloads that translate to dollars: Finance Agent v2 jumps from 43.0% to 57.9%, and GDPval-AA climbs from 1314 Elo to 1656. On output speed, the model runs roughly 4x faster than other frontier models per second.
The Flash brand used to mean "the model you use when latency matters more than answers." With 3.5, it means "the model you use when agents matter more than chat."
It is not a uniform win. On reasoning-only benchmarks where there is no tool loop to exploit, 3.1 Pro still leads. Humanity's Last Exam: 40.2% vs 44.4%. ARC-AGI-2: 72.1% vs 77.1%. If your workload is pure deduction with no environment to act on, Pro is still the call.
The pricing reset nobody saw coming
Here is where the story gets uncomfortable. The new pricing is:
| Tier | Input ($/1M tokens) | Output ($/1M tokens) | Cached input |
|---|---|---|---|
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 |
| Gemini 3.1 Pro | $2.00 | $12.00 | — |
| Gemini 3 Flash (prior) | $0.50 | $3.00 | — |
Read it twice. 3.5 Flash is 25% cheaper than 3.1 Pro, but roughly 3x more expensive than the Flash model it replaces. Google is not subsidizing Flash anymore — it is repositioning it.
The cached-input rate is the genuine bargain: $0.15 per million tokens, a 90% cut from the standard input rate. For retrieval-augmented workloads where the same system prompt or document context hits the model thousands of times, this is the line item that changes the unit economics.
One more catch worth flagging: Google bills internal thinking tokens at the output rate. A reasoning-heavy job with thinking: high will run up a bill far larger than the sticker price suggests. The new minimal, low, medium, high knobs are not cosmetic — they are cost controls.
The context window is the headline
Specs that matter for builders:
- 1,048,576 input tokens — over one million in a single call
- 65,536 maximum output tokens
- Knowledge cutoff: January 2025
- Full multimodal input: text, image, video, audio
- Available in the Gemini app, AI Mode in Google Search, the Gemini API, Google Antigravity, and Vertex AI
A 1M-token context plus a 65K-token output ceiling is what an agent needs to read a real codebase, plan a multi-step refactor, and write back the resulting changes in one round trip. That is what the benchmark numbers reflect — and it is the reason the price climbed.
Who this is actually for
If you are running a chat product on Flash-tier economics, this release stings. The cheap workhorse you priced your unit margins around just tripled in cost. Google's bet is that you will eat it, because the agent capability is now worth more than the cost delta.
If you are building agents — terminal automation, browser control, multi-tool workflows — this is the moment Flash graduates. MCP Atlas at 83.6% means the failure rate on tool sequences just dropped enough to ship products that previously fell apart in production.
If you are doing pure reasoning research, stay on 3.1 Pro until Gemini 3.5 Pro ships — Google has confirmed it is internal-only today and rolling out in June 2026.
The Bottom Line
Gemini 3.5 Flash is not a faster, cheaper Flash. It is a new product category: a Pro-tier agentic model wearing a Flash badge, priced between the two tiers it disrupts. The naming is misleading on purpose. Google wants developers to default to 3.5 Flash for everything that involves tools, and reserve Pro for the long-tail of pure reasoning work.
The real question is what comes next. If 3.5 Flash already beats 3.1 Pro on agent benchmarks, what does 3.5 Pro look like when it lands next month? Either it leaps the frontier again, or it quietly confirms that Flash is now the line that matters — and Pro is the prestige tier nobody actually deploys.
Either answer reshapes the price-per-token wars that defined 2024 and 2025. Watch June.


