For nine years, every serious frontier model has been a transformer with dense attention — and every serious frontier model has been bumping against the same wall: doubling the input quadruples the work. A Miami startup called Subquadratic says it has a model that breaks that ceiling, and on May 5, 2026 it walked out of stealth with $29 million in seed funding and a flagship model named SubQ.
The pitch is audacious: a 12-million-token context window — roughly 9 million words, or about 120 books — running at a fraction of the cost of dense-attention frontier models. The AI research community is not sure whether to call it the biggest architectural shift since the transformer or a marketing exercise dressed up as physics. Both camps have a point.
The numbers Subquadratic put on the table
Co-founders Justin Dangel (CEO) and Alexander Whedon (CTO, ex-Head of Generative AI at Meta) describe SubQ as a transformer built on Subquadratic Sparse Attention (SSA). Instead of comparing every token with every other token, the model picks the most relevant tokens and computes relationships only inside that subset. The math is what the name promises: scaling that is linear-ish rather than quadratic in sequence length.
Their headline claims:
| Claim | SubQ | Frontier comparison |
|---|---|---|
| Max context | 12,000,000 tokens | ~1–2M for frontier cloud models (Claude Opus 4.7, Gemini 3.1 Pro) |
| Speed @ 1M tokens | ~50× faster | Frontier dense-attention models |
| Cost @ 1M tokens | ~50× cheaper | Frontier dense-attention models |
| Compute @ 12M tokens | ~1,000× less | Other frontier models at the same context |
| RULER 128K eval | 95% accuracy at ~$8 | Claude Opus: 94% at ~$2,600 |
"The fundamental scaling laws imposed by the transformer architecture and dense attention have been broken through," Dangel told SiliconANGLE.
The investor lineup is unusually loud for a seed round: Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex. This is the round you raise when you want benchmark numbers to travel.
Why a 12M context window actually matters
If SubQ holds up, the immediate beneficiary is every product that currently leans on RAG, agentic retrieval, and prompt-curation gymnastics to squeeze information into a 128K or 1M-token window. Those systems work, but they add latency, add a second failure mode, and bias what the model gets to see.
Whedon framed the problem bluntly: "I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows. That is kind of a waste of human intelligence and also limiting to the product quality."
A model that can swallow an entire codebase, a full legal corpus, or a quarter's worth of meeting transcripts in a single pass doesn't kill RAG — but it changes the cost calculus for a lot of products that exist mostly to work around the context limit.
Two SubQ products ship with the launch:
- SubQ API — direct access to the 12M-token window for developers and enterprise teams.
- SubQ Code — a CLI coding agent that loads a whole repository into a single context window, so a developer can plan, execute, and review across a codebase without orchestrating multiple agents.
A free SubQ Search product is also coming, hinting at the land-and-expand strategy you'd expect from a long-context bet. Access today is waitlist-only — the model is not public, and Dangel said it will not be open-weight or open-source in the near term.
The skeptics are not being unreasonable
The response from researchers within hours of launch was, charitably, spirited. AI commentator Dan McAteer summed up the mood: "SubQ is either the biggest breakthrough since the Transformer… or it's AI Theranos."
The substantive objections are worth taking seriously:
- Provenance. Engineer Will Depue initially observed that SubQ is "almost surely a sparse attention finetune of Kimi or DeepSeek." Subquadratic has not published architectural details or weights that would let anyone check.
- Single-run benchmarks. Each benchmark model was run only once due to inference cost, and the company's own paper concedes the SWE-Bench margin is "harness as much as model."
- Lab vs. production gap. On MRCR v2 the research score was 83, but the third-party verified production model scored 65.9 — a 17-point gap between what the paper reports and what the shipping model does.
- Cost transparency. Subquadratic has not publicly disclosed API pricing, so the cost-per-task comparisons against Opus are not independently verifiable.
Not everyone is dismissive. AI researcher John Rysana pushed back on the Theranos framing, arguing the work is "just subquadratic attention done well which is very meaningful for long context workloads," and that "odds of it being BS are extremely low."
The Bottom Line
Sparse attention is not new — it has been in academic papers for years. What is new is (a) a well-funded team claiming to have made it work at frontier scale, and (b) the willingness to put the words "1,000x cheaper" in a press release before the model is in anyone else's hands. Until the API opens up and independent labs can replicate the RULER 128K and MRCR v2 numbers on a fixed harness, treat the headline figures as marketing, not science.
But the underlying problem — quadratic attention is the load-bearing wall of the entire LLM economy — is real, and the first company to credibly knock it down will reshape pricing across the stack. SubQ deserves the scrutiny it is getting. It also deserves to be checked, not laughed off. The next move belongs to whoever gets API access first.


