Meta just drew a line under the Llama era. Muse Spark, the first model out of Meta Superintelligence Labs, launched on April 8, 2026 — and it signals a fundamentally different approach to how the company builds and ships AI.
This is not a Llama successor. It is not open-weight. And it is, by several measures, the most capable free frontier model available today.
The Backstory Matters
Muse Spark exists because Mark Zuckerberg was reportedly unhappy with the progress of Meta's AI efforts. The response was dramatic: Meta invested $14.3 billion in Scale AI for a 49% stake, recruited Scale's co-founder Alexandr Wang to lead a new division called Meta Superintelligence Labs, and gave him a mandate to start from scratch.
Wang, who became the world's youngest self-made billionaire at 24, stepped down as Scale AI CEO but remained on the board. Nine months later, Muse Spark — internally codenamed Avocado — is the result.
What Muse Spark Can Do
The model accepts text, voice, and image inputs but produces text-only output. It runs in two modes today, with a third coming:
- Instant mode — fast answers for straightforward questions
- Thinking mode — complex reasoning with multiple subagents working in parallel
- Contemplating mode — coming soon — extended reasoning comparable to Gemini Deep Think
Thinking mode is where things get interesting. Meta describes it as deploying multiple subagents simultaneously — for example, when planning a trip, the system can draft itineraries, compare destinations, and identify activities in parallel rather than sequentially.
The Benchmarks Tell a Nuanced Story
Muse Spark scores 52 on the Artificial Analysis Intelligence Index v4.0, placing it in the top 5 behind GPT-5.4 (57), Gemini 3.1 Pro (57), and Claude Opus 4.6 (53). For a free model, that is remarkable.
But the headline numbers hide where Muse Spark truly excels — and where it falls short.
Where it leads:
| Benchmark | Muse Spark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|
| HealthBench Hard | 42.8% | 40.1% | 14.8% | 20.6% |
| MMMU-Pro (vision) | 80.5% | — | — | 82.4% |
The HealthBench result is striking. Muse Spark outperforms GPT-5.4 on medical reasoning and triples Claude Opus 4.6's score. Meta says the model was developed with physician input, and the numbers suggest that investment paid off.
On multimodal perception via MMMU-Pro, Muse Spark is the second-most capable vision model available, trailing only Gemini 3.1 Pro by less than two percentage points.
Where it trails:
| Benchmark | Muse Spark | GPT-5.4 | Claude Sonnet 4.6 |
|---|---|---|---|
| Terminal-Bench (coding) | 59.0 | 75.1 | — |
| GDPval-AA (agentic) | 1,427 | 1,676 | 1,648 |
Coding and agentic tasks remain weak spots. Meta acknowledges gaps in "long-horizon agentic systems and coding workflows." If you need a model for software engineering, Muse Spark is not your best option today.
Token Efficiency Is the Sleeper Advantage
One number jumped out of the Artificial Analysis evaluation: Muse Spark used just 58 million output tokens to complete the full Intelligence Index benchmark suite. For comparison, Claude Opus 4.6 used 157 million tokens and GPT-5.4 used 120 million.
Fewer tokens for comparable performance means faster responses and lower computational cost at scale. For Meta, which will deploy this across WhatsApp, Instagram, Facebook, Messenger, and AI glasses, that efficiency is not academic — it is the difference between viable and prohibitively expensive.
16 Built-In Tools
Muse Spark ships with a surprisingly complete tool set, as documented by developer Simon Willison:
- Search and browse — web search, page opening, and content finding
- Meta content search — semantic search across Instagram, Threads, and Facebook
- Image generation — artistic and realistic modes in multiple aspect ratios
- Python execution — sandboxed Python 3.9 with pandas, NumPy, matplotlib, scikit-learn, and more
- Visual grounding — object detection with bounding box, point, and count formats
- Subagent spawning — the model can create specialized sub-agents for complex tasks
- Third-party integrations — Google and Outlook calendars, Gmail
The Meta content search is unique to this model. No competitor can semantically search your Instagram and Facebook history as part of a conversation.
The Open-Source Question
Here is the elephant in the room: Muse Spark is not open-weight. After years of positioning Llama as the open-source alternative to GPT and Claude, Meta has released a proprietary model. API access is currently limited to a private preview for select partners.
Wang has indicated that future versions may be open-sourced. But for now, the only way to use Muse Spark is through Meta's own apps and the meta.ai website, which requires a Facebook or Instagram login.
This is a strategic pivot. Meta is no longer trying to win the open-source AI race. It is trying to win the consumer AI race — and it is willing to keep its best model closed to do it.
The Bottom Line
Muse Spark is a strong first showing from Meta Superintelligence Labs. The medical reasoning benchmarks are best-in-class, the vision capabilities are near the top, and the token efficiency suggests a model built with deployment scale in mind. The weaknesses in coding and agentic tasks are real but unsurprising for a consumer-focused model.
The bigger story is what Muse Spark represents: Meta betting that the next phase of AI competition is not about open weights or developer mindshare, but about embedding intelligence so deeply into consumer products that switching costs become insurmountable. Whether that bet pays off depends on whether Contemplating mode and future Muse models can close the gap on the benchmarks where Spark still trails.


