Sarah Chen

AI researcher and tech journalist covering the frontier of machine intelligence. Previously at MIT Tech Review.

50 articles

GLM-5.2: Zhipu's Open-Weight Model Beats GPT-5.5 at 1/6 the Cost

Z.AI released GLM-5.2 on June 16, 2026: a 753B-parameter MoE model under an MIT license with a 1M-token context. It tops open-weight coding benchmarks, beating GPT-5.5 on SWE-bench Pro, FrontierSWE and PostTrainBench at roughly one-sixth the cost.

By Sarah Chen · 5 min · Jun 26, 2026

AI News

MAI-Thinking-1: Microsoft's First In-House Reasoning Model

Microsoft unveiled MAI-Thinking-1 at Build 2026, its first reasoning model trained in-house without distillation. The 35B-active, ~1T-total MoE has a 256k context window, scores 97.0% on AIME 2025 and matches Claude Opus 4.6 on SWE-Bench Pro. It's in private preview on Microsoft Foundry.

By Sarah Chen · 5 min · Jun 23, 2026

AI News

Mistral: The Industrial AI Pivot Behind Airbus and BMW Deals

Mistral AI used its May 2026 AI Now Summit to pivot toward industrial engineering, announcing a physics-AI stack, the Emmi acquisition, partnerships with Airbus, BMW (crash simulation) and ASML, the unified Vibe agent, and a 10 MW Les Ulis inference data center opening Q3 2026.

By Sarah Chen · 5 min · Jun 19, 2026

AI News

Meta Business Agent: Now Global on WhatsApp & Instagram

On June 3, 2026, Meta made Meta Business Agent globally available to businesses of all sizes across WhatsApp, Messenger, and Instagram. The agent answers questions, recommends catalog products, books appointments, qualifies leads, and closes sales, with human handoff. A new Business Agent Platform connects to hundreds of systems like Shopify, Zendesk, and Shopee. It's free to start, with token-based pricing for larger businesses.

By Sarah Chen · 5 min · Jun 17, 2026

AI News

MiniMax M3: Open-Weight Frontier Coding Model With 1M Context

MiniMax M3 is an open-weight model pairing a 1M-token context and revived sparse attention with frontier coding benchmarks at 15x lower cost than Claude Opus 4.7.

By Sarah Chen · 6 min · Jun 16, 2026

AI News

Anthropic IPO: The $965B Filing That Beat OpenAI to Wall Street

On June 1, 2026, Anthropic confidentially filed a draft S-1 with the SEC at a roughly $965B valuation, backed by a $65B raise and a ~$47B May run-rate. OpenAI followed on June 8. Both target public listings as soon as fall 2026.

By Sarah Chen · 5 min · Jun 15, 2026

AI News

Kimi K2.7-Code: A 30% Token Cut With a Benchmark Asterisk

Moonshot AI's Kimi K2.7-Code is an open-weights, OpenAI-compatible coding model (1T-param MoE, 32B active, 256K context) claiming a 30% cut in reasoning tokens and a narrow win over Claude Opus 4.8. But all published benchmarks are Moonshot's own proprietary suites, with no independent results yet, so the efficiency claims remain unverified.

By Sarah Chen · 5 min · Jun 14, 2026

AI News

Apple Siri: Why Apple Is Paying Google $1B for Gemini

At WWDC 2026, Apple unveiled a rebuilt Siri powered by a custom, Apple-tuned Google Gemini model—reportedly a 1.2-trillion-parameter mixture-of-experts system costing roughly $1 billion a year. On-device Apple Silicon models handle quick private tasks, while complex reasoning routes to the Gemini model inside Apple's Private Cloud Compute, with a contract barring Google from training on Apple user data.

By Sarah Chen · 5 min · Jun 11, 2026

AI News

Gemma 4 12B: Google's Encoder-Free Multimodal Laptop Model

Google released Gemma 4 12B on June 3, 2026, a multimodal open model with an encoder-free architecture that feeds vision and audio directly into the LLM backbone. It runs locally on 16GB of memory, approaches the 26B MoE on benchmarks, uses Multi-Token Prediction drafters for low latency, and ships under Apache 2.0 with broad tooling support.

By Sarah Chen · 5 min · Jun 9, 2026

AI News

MAI-Code-1-Flash: Microsoft's Lean Coding Model Hits Copilot

Microsoft launched MAI-Code-1-Flash on June 2, 2026, a lightweight, agentic coding model built end-to-end in-house and rolling out to GitHub Copilot users in VS Code. It outperforms Claude Haiku 4.5 across four coding benchmarks (including 51.2% vs 35.2% on SWE-Bench Pro) while using up to 60% fewer tokens, signaling Microsoft's push for AI independence from OpenAI.

By Sarah Chen · 5 min · Jun 6, 2026

AI News

DeepSeek V4-Pro: 75% Price Cut Becomes Permanent

On May 22, 2026, DeepSeek made its 75% promotional discount on V4-Pro permanent rather than letting it expire May 31. New permanent rates: $0.435/M input, $0.87/M output, $0.003625/M cache hit. That puts V4-Pro output roughly 34x cheaper than GPT-5.5 and 17x cheaper than Claude Opus 4.7, while landing within 3-7 points on coding and reasoning benchmarks. The underrated detail is the cache-hit price, which can cut input cost ~88% for agents with stable prefixes. Teams should re-run their build math and route the easy majority of traffic to V4-Pro.

By Sarah Chen · 5 min · Jun 1, 2026

AI News

Claude Opus 4.8: Anthropic's Honest, Parallel-Agent Flagship

Anthropic released Claude Opus 4.8 on May 28, 2026, 41 days after Opus 4.7. It scores 69.2% on SWE-Bench Pro, emphasizes calibrated honesty and longer autonomy, adds Dynamic Workflows for hundreds of parallel subagents, runs fast mode ~2.5x quicker, and holds pricing flat from 4.7.

By Sarah Chen · 4 min · May 30, 2026

Reviews

Vivago Video Agent: A Swarm of AI Directors Replaces Your Prompt

Vivago Video Agent uses AI directors to generate 1-minute 1080p videos from a single story line.

By Sarah Chen · 4 min · May 29, 2026

AI News

Gemini 3.5 Flash: Google's Flash Tier Eats Pro on Agent Benchmarks

Gemini 3.5 Flash outperforms the Pro tier on agent benchmarks with superior speed and efficiency.

By Sarah Chen · 5 min · May 28, 2026

AI News

Gemini Spark: Google's 24/7 Agent Runs Even When You Close Your Laptop

Gemini Spark is Google's 24/7 agent that continues working even when your laptop is closed.

By Sarah Chen · 6 min · May 27, 2026

AI News

Qwen3.7-Max: Alibaba's 35-Hour Agent Run Resets the Frontier

Alibaba's Qwen3.7-Max agent achieved a 35-hour autonomous run, setting new performance and cost benchmarks.

By Sarah Chen · 5 min · May 25, 2026

Reviews

PollyReach Review: AI Voice Agent With a Real Phone Number

PollyReach provides AI agents with real phone numbers, enabling multi-language calls and skill distribution.

By Sarah Chen · 7 min · May 20, 2026

AI News

Gemini Intelligence: Google Moves AI From the App to the Android OS

Google's Gemini Intelligence brings OS-level AI to Android, transforming how devices integrate artificial intelligence.

By Sarah Chen · 5 min · May 19, 2026

AI News

Claude for Small Business: Anthropic Targets 36M U.S. SMBs

Anthropic's 'Claude for Small Business' integrates AI into SMB tools like QuickBooks, targeting 36M businesses.

By Sarah Chen · 6 min · May 17, 2026

AI News

SubQ: The 12M-Token Subquadratic LLM Splitting AI Researchers

SubQ is a new 12M-token subquadratic LLM claiming massive context and low compute, sparking debate among researchers.

By Sarah Chen · 5 min · May 16, 2026

AI News

Lightfield: The AI-Native CRM Tome's Founders Built Next

Lightfield is an AI-native CRM by Tome's founders, using agents to automate sales tasks like prospecting and coaching.

By Sarah Chen · 5 min · May 15, 2026

AI News

GPT-Realtime-2: OpenAI's Voice Model Gets GPT-5 Reasoning

OpenAI's GPT-Realtime-2 voice model now boasts GPT-5 reasoning and advanced features.

By Sarah Chen · 6 min · May 14, 2026

AI News

Claude Dreaming: Anthropic's Agents Now Learn While They Sleep

Anthropic's Claude agents now 'dream' to learn and improve task completion overnight.

By Sarah Chen · 5 min · May 13, 2026

AI News

Kimi K2.6: Moonshot's Open-Weights Model Beats GPT-5.4 on SWE-Bench Pro

Moonshot's Kimi K2.6, an open-weights model, surpasses GPT-5.4 on SWE-Bench Pro.

By Sarah Chen · 6 min · May 12, 2026

AI News

Codex 3.0: OpenAI's Autonomous Build-Test-Debug Loop Hits Product Hunt

OpenAI's Codex 3.0 offers an autonomous build-test-debug loop powered by GPT-5.5.

By Sarah Chen · 5 min · May 11, 2026

AI News

GPT-5.5-Cyber: OpenAI Hands Verified Defenders a Less-Restricted Model

OpenAI's GPT-5.5-Cyber, a less-restricted model, is now available for vetted cyber defenders.

By Sarah Chen · 6 min · May 8, 2026

AI News

Anthropic's $1.5B AI Services Firm Takes Aim at Big Consulting

Anthropic launches a $1.5B AI services firm, directly challenging big consulting.

By Sarah Chen · 6 min · May 7, 2026

AI News

Vision Banana: DeepMind Beats SAM 3 and Depth Anything V3

DeepMind's Vision Banana outperforms leading models, suggesting generation is key for vision pretraining.

By Sarah Chen · 4 min · May 6, 2026

AI News

GPT-5.5: OpenAI's First Full Retrain Since GPT-4.5 Bets on Agents

OpenAI's GPT-5.5 is a fully retrained model, focusing on agentic computer use, not just benchmarks.

By Sarah Chen · 5 min · May 5, 2026

AI News

Mistral Medium 3.5: 128B Open-Weight Model That Opens PRs

Mistral Medium 3.5 is a powerful 128B open-weight model capable of opening GitHub pull requests.

By Sarah Chen · 7 min · May 4, 2026

AI News

Microsoft Agent 365: $15-Per-Seat Control Plane for Your AI Agents

Microsoft Agent 365 offers a control plane to observe, govern, and secure all your AI agents.

By Sarah Chen · 6 min · May 2, 2026

AI News

DeepSeek V4 Pro: 1.6T Open-Weights Model Hits #2 on the Index

DeepSeek V4 Pro is a top 1.6T open-weights model for agents, but has a high hallucination rate.

By Sarah Chen · 5 min · Apr 29, 2026

AI News

Coinbase's Fred and Balaji AI Agents Arrive in Slack

Coinbase launched AI agents modeled on Fred Ehrsam and Balaji Srinivasan in Slack and email.

By Sarah Chen · 5 min · Apr 21, 2026

AI News

OpenAI Agents SDK: Sandboxes Land for Long-Horizon Agents

OpenAI's Agents SDK now features sandboxes, built-in providers, and durable state for long-horizon agents.

By Sarah Chen · 5 min · Apr 20, 2026

AI News

Claude Opus 4.7: Anthropic's New Flagship Clears SWE-Bench Pro

Anthropic's Claude Opus 4.7 excels on SWE-bench Pro with enhanced vision and new features.

By Sarah Chen · 6 min · Apr 19, 2026

AI News

MAI-Transcribe-1: Microsoft's Whisper Killer Hits 3.8% WER at $0.36/Hour

Microsoft's MAI-Transcribe-1 beats Whisper with 3.8% WER and lower costs, signaling independence from OpenAI.

By Sarah Chen · 6 min · Apr 17, 2026

AI News

Qwen 3.6 Plus: Alibaba's Free Preview Beats Claude Opus on Agent Tasks

Alibaba's Qwen 3.6 Plus Preview surpasses Claude Opus on agent tasks with impressive speed and context.

By Sarah Chen · 5 min · Apr 15, 2026

AI News

Figma for Agents: AI Now Designs Directly on Your Canvas

Figma now enables AI agents to design and modify directly on its canvas, leveraging your design system.

By Sarah Chen · 4 min · Apr 15, 2026

AI News

Atlassian Remix: AI Visuals and MCP Agents Come to Confluence

Atlassian Remix brings AI visuals and MCP agents to Confluence, transforming pages into dynamic content.

By Sarah Chen · 4 min · Apr 10, 2026

AI News

Meta Muse Spark: The First Model From Superintelligence Labs Is a Strategic Reset

Meta Muse Spark, from Superintelligence Labs, marks a strategic AI reset with top benchmarks and medical reasoning.

By Sarah Chen · 5 min · Apr 9, 2026

AI News

Bluesky Attie: The AI Feed Builder That 125,000 Users Blocked on Sight

Bluesky's Attie AI feed builder, powered by Claude, was blocked by 125,000 users quickly.

By Sarah Chen · 4 min · Apr 8, 2026

Reviews

MindsDB Anton: The Open-Source BI Agent That Replaces Your Dashboard

MindsDB Anton is an open-source BI agent that replaces dashboards by answering questions in plain English.

By Sarah Chen · 4 min · Apr 8, 2026

AI News

Denovo Turns a Business Idea Into a Running Startup in 8 Minutes

Denovo's AI platform turns a business idea into a fully running startup in just eight minutes.

By Sarah Chen · 5 min · Apr 3, 2026

AI News

GLM-5V-Turbo: Z.ai's 744B Vision Model Turns Screenshots Into Code

Z.ai's GLM-5V-Turbo vision model converts screenshots directly into executable code efficiently.

By Sarah Chen · 4 min · Apr 3, 2026

AI News

Tobira.ai: The AI Agent Network Where Bots Find You Business

Tobira.ai is an AI agent network where bots find clients, partners, and investors for you.

By Sarah Chen · 5 min · Apr 2, 2026

AI News

Google Stitch 2.0: The Free AI Design Tool That Topped Product Hunt

Google Stitch 2.0, a free AI design tool, topped Product Hunt with new vibe design and voice canvas.

By Sarah Chen · 4 min · Apr 2, 2026

AI News

LillyPod: Eli Lilly's 9,000-Petaflop Supercomputer Bets Big on AI Drug Discovery

Eli Lilly's LillyPod, a 9,000-petaflop AI supercomputer, is making big bets on drug discovery.

By Sarah Chen · 4 min · Apr 1, 2026

AI News

NVIDIA Nemotron 3 Super: The Hybrid Architecture That Rewrites the Agent Playbook

NVIDIA's Nemotron 3 Super, a hybrid architecture, delivers 5x throughput and top agentic benchmarks.

By Sarah Chen · 4 min · Mar 31, 2026

Open Source

Qwen 3.5 Small: Alibaba's 9B Model That Beats GPT-OSS-120B

Alibaba's Qwen 3.5 Small, a 9B multimodal AI, surprisingly beats models 13x its size.

By Sarah Chen · 5 min · Mar 29, 2026

AI News

GPT-5.4: OpenAI's Five-Variant Strategy Reshapes the AI Market

OpenAI's GPT-5.4, with five variants and expert-level computer use, is reshaping the AI market.

By Sarah Chen · 5 min · Mar 29, 2026