Sarah Chen
AI researcher and tech journalist covering the frontier of machine intelligence. Previously at MIT Tech Review.
23 articles

Kimi K2.6: Moonshot's Open-Weights Model Beats GPT-5.4 on SWE-Bench Pro
Moonshot's Kimi K2.6 ships 1T parameters, 256K context, and a 300-agent swarm at $0.60/M tokens. It tops SWE-Bench Pro and DeepSearchQA against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro.
By Sarah Chen · 6 min · May 12, 2026

Codex 3.0: OpenAI's Autonomous Build-Test-Debug Loop Hits Product Hunt
Codex 3.0 launched on Product Hunt at #3, powered by GPT-5.5 with browser automation, cross-app file editing, 90+ plugins, and an 82.7% score on Terminal-Bench 2.0.
By Sarah Chen · 5 min · May 11, 2026

GPT-5.5-Cyber: OpenAI Hands Verified Defenders a Less-Restricted Model
OpenAI rolled out GPT-5.5-Cyber to vetted defenders this week. AISI says the underlying model just topped its hardest cyber benchmarks — and cracked a 12-hour reverse-engineering challenge in 10 minutes.
By Sarah Chen · 6 min · May 8, 2026

Anthropic's $1.5B AI Services Firm Takes Aim at Big Consulting
Anthropic just announced a standalone enterprise services firm with Blackstone, Hellman & Friedman, and Goldman Sachs — backed by $1.5 billion. It is the most direct shot the consulting industry has ever taken from a foundation model lab.
By Sarah Chen · 6 min · May 7, 2026

Vision Banana: DeepMind Beats SAM 3 and Depth Anything V3
Google DeepMind's Vision Banana — a tuned image generator — outperforms SAM 3 on segmentation and Depth Anything V3 on metric depth, suggesting generation may be the best vision pretraining signal yet.
By Sarah Chen · 4 min · May 6, 2026

GPT-5.5: OpenAI's First Full Retrain Since GPT-4.5 Bets on Agents
GPT-5.5 hits 82.7% on Terminal-Bench 2.0 with a 1M context window and powers OpenAI's coming Super App.
By Sarah Chen · 5 min · May 5, 2026

Mistral Medium 3.5: 128B Open-Weight Model That Opens PRs
Mistral Medium 3.5 is a 128B dense open-weight model with a 256k context, 77.6% on SWE-Bench Verified, and remote agents in Vibe that open pull requests on GitHub.
By Sarah Chen · 7 min · May 4, 2026

Microsoft Agent 365: $15-Per-Seat Control Plane for Your AI Agents
Microsoft Agent 365 hits general availability May 1, 2026 at $15 per user. Here's how it observes, governs, and secures every AI agent in your tenant—including OpenClaw and Claude Code.
By Sarah Chen · 6 min · May 2, 2026

DeepSeek V4 Pro: 1.6T Open-Weights Model Hits #2 on the Index
DeepSeek V4 Pro is the new #2 open-weights reasoning model and the best agent model in the open-weights field. The catch: it hallucinates 94% of the time on questions it can't answer.
By Sarah Chen · 5 min · Apr 29, 2026

Coinbase's Fred and Balaji AI Agents Arrive in Slack
Coinbase deployed two AI agents named Fred and Balaji into Slack and email, modeled on its co-founder and former CTO.
By Sarah Chen · 5 min · Apr 21, 2026

OpenAI Agents SDK: Sandboxes Land for Long-Horizon Agents
OpenAI's April 2026 Agents SDK update makes sandboxing a first-class SDK feature, adds 7 built-in providers, and gives long-horizon agents durable state.
By Sarah Chen · 5 min · Apr 20, 2026

Claude Opus 4.7: Anthropic's New Flagship Clears SWE-Bench Pro
Anthropic's Opus 4.7 hits 64.3% on SWE-bench Pro, adds an xhigh effort level, and ships with 3x sharper vision. But the new tokenizer quietly shifts your bill, and Mythos still sits in the drawer.
By Sarah Chen · 6 min · Apr 19, 2026

MAI-Transcribe-1: Microsoft's Whisper Killer Hits 3.8% WER at $0.36/Hour
Microsoft's first fully in-house speech model hits 3.8% Word Error Rate across 25 languages — beating Whisper-large-v3 on every one. At $0.36/hour and 50% lower GPU cost, MAI-Transcribe-1 is the clearest signal yet that Microsoft is done renting AI from OpenAI.
By Sarah Chen · 6 min · Apr 17, 2026
AI NewsQwen 3.6 Plus: Alibaba's Free Preview Beats Claude Opus on Agent Tasks
Alibaba's Qwen 3.6 Plus Preview posts 61.6 on Terminal-Bench 2.0, runs at 158 tok/s, and ships with a 1M-token context window — all free during the preview window.
By Sarah Chen · 5 min · Apr 15, 2026

Figma for Agents: AI Now Designs Directly on Your Canvas
Figma opened its canvas to AI agents via MCP, letting tools like Claude Code and Cursor create and modify designs using your real design system.
By Sarah Chen · 4 min · Apr 15, 2026

Atlassian Remix: AI Visuals and MCP Agents Come to Confluence
Atlassian launches Remix in open beta, turning Confluence pages into charts and infographics. MCP partner agents from Lovable, Replit, and Gamma arrive April 13.
By Sarah Chen · 4 min · Apr 10, 2026

Meta Muse Spark: The First Model From Superintelligence Labs Is a Strategic Reset
Meta's first model from Superintelligence Labs scores top-5 on AI benchmarks, leads in medical reasoning, and signals a strategic pivot away from open-weight Llama models.
By Sarah Chen · 5 min · Apr 9, 2026

Bluesky Attie: The AI Feed Builder That 125,000 Users Blocked on Sight
Bluesky launched Attie, an AI assistant powered by Claude that lets anyone build custom feeds using plain English. Over 125,000 users blocked it within days.
By Sarah Chen · 4 min · Apr 8, 2026

MindsDB Anton: The Open-Source BI Agent That Replaces Your Dashboard
MindsDB launches Anton, an open-source autonomous BI agent that turns plain-English questions into verified dashboards and charts — replacing traditional BI workflows.
By Sarah Chen · 4 min · Apr 8, 2026

LillyPod: Eli Lilly's 9,000-Petaflop Supercomputer Bets Big on AI Drug Discovery
Eli Lilly's LillyPod supercomputer brings 9,000 petaflops of AI power to drug discovery with 1,016 NVIDIA Blackwell Ultra GPUs.
By Sarah Chen · 4 min · Apr 1, 2026

NVIDIA Nemotron 3 Super: The Hybrid Architecture That Rewrites the Agent Playbook
NVIDIA Nemotron 3 Super combines three neural-network architectures into one efficient open model for enterprise AI agents.
By Sarah Chen · 4 min · Mar 31, 2026

Qwen 3.5 Small: Alibaba's 9B Model That Beats GPT-OSS-120B
Alibaba's Qwen 3.5 Small series delivers natively multimodal AI models from 0.8B to 9B parameters under Apache 2.0, with the 9B scoring 81.7 on GPQA Diamond — beating OpenAI's gpt-oss-120B.
By Sarah Chen · 5 min · Mar 29, 2026

GPT-5.4: OpenAI's Five-Variant Strategy Reshapes the AI Market
OpenAI releases GPT-5.4 in five variants with native computer use surpassing human experts and a new Tool Search architecture for agent systems.
By Sarah Chen · 5 min · Mar 29, 2026