AI News
Latest artificial intelligence breakthroughs, model releases, and industry moves

MAI-Thinking-1: Microsoft's First In-House Reasoning Model
Microsoft unveiled MAI-Thinking-1 at Build 2026, its first reasoning model trained in-house without distillation. The 35B-active, ~1T-total MoE has a 256k context window, scores 97.0% on AIME 2025 and matches Claude Opus 4.6 on SWE-Bench Pro. It's in private preview on Microsoft Foundry.
By Sarah Chen · 5 min · Jun 23, 2026

Mistral: The Industrial AI Pivot Behind Airbus and BMW Deals
Mistral AI used its May 2026 AI Now Summit to pivot toward industrial engineering, announcing a physics-AI stack, the Emmi acquisition, partnerships with Airbus, BMW (crash simulation) and ASML, the unified Vibe agent, and a 10 MW Les Ulis inference data center opening Q3 2026.
By Sarah Chen · 5 min · Jun 19, 2026

Meta Business Agent: Now Global on WhatsApp & Instagram
On June 3, 2026, Meta made Meta Business Agent globally available to businesses of all sizes across WhatsApp, Messenger, and Instagram. The agent answers questions, recommends catalog products, books appointments, qualifies leads, and closes sales, with human handoff. A new Business Agent Platform connects to hundreds of systems like Shopify, Zendesk, and Shopee. It's free to start, with token-based pricing for larger businesses.
By Sarah Chen · 5 min · Jun 17, 2026

MiniMax M3: Open-Weight Frontier Coding Model With 1M Context
MiniMax M3 is an open-weight model pairing a 1M-token context and revived sparse attention with frontier coding benchmarks at 15x lower cost than Claude Opus 4.7.
By Sarah Chen · 6 min · Jun 16, 2026

Anthropic IPO: The $965B Filing That Beat OpenAI to Wall Street
On June 1, 2026, Anthropic confidentially filed a draft S-1 with the SEC at a roughly $965B valuation, backed by a $65B raise and a ~$47B May run-rate. OpenAI followed on June 8. Both target public listings as soon as fall 2026.
By Sarah Chen · 5 min · Jun 15, 2026

Kimi K2.7-Code: A 30% Token Cut With a Benchmark Asterisk
Moonshot AI's Kimi K2.7-Code is an open-weights, OpenAI-compatible coding model (1T-param MoE, 32B active, 256K context) claiming a 30% cut in reasoning tokens and a narrow win over Claude Opus 4.8. But all published benchmarks are Moonshot's own proprietary suites, with no independent results yet, so the efficiency claims remain unverified.
By Sarah Chen · 5 min · Jun 14, 2026

Apple Siri: Why Apple Is Paying Google $1B for Gemini
At WWDC 2026, Apple unveiled a rebuilt Siri powered by a custom, Apple-tuned Google Gemini model—reportedly a 1.2-trillion-parameter mixture-of-experts system costing roughly $1 billion a year. On-device Apple Silicon models handle quick private tasks, while complex reasoning routes to the Gemini model inside Apple's Private Cloud Compute, with a contract barring Google from training on Apple user data.
By Sarah Chen · 5 min · Jun 11, 2026

Gemma 4 12B: Google's Encoder-Free Multimodal Laptop Model
Google released Gemma 4 12B on June 3, 2026, a multimodal open model with an encoder-free architecture that feeds vision and audio directly into the LLM backbone. It runs locally on 16GB of memory, approaches the 26B MoE on benchmarks, uses Multi-Token Prediction drafters for low latency, and ships under Apache 2.0 with broad tooling support.
By Sarah Chen · 5 min · Jun 9, 2026

MAI-Code-1-Flash: Microsoft's Lean Coding Model Hits Copilot
Microsoft launched MAI-Code-1-Flash on June 2, 2026, a lightweight, agentic coding model built end-to-end in-house and rolling out to GitHub Copilot users in VS Code. It outperforms Claude Haiku 4.5 across four coding benchmarks (including 51.2% vs 35.2% on SWE-Bench Pro) while using up to 60% fewer tokens, signaling Microsoft's push for AI independence from OpenAI.
By Sarah Chen · 5 min · Jun 6, 2026

DeepSeek V4-Pro: 75% Price Cut Becomes Permanent
On May 22, 2026, DeepSeek made its 75% promotional discount on V4-Pro permanent rather than letting it expire May 31. New permanent rates: $0.435/M input, $0.87/M output, $0.003625/M cache hit. That puts V4-Pro output roughly 34x cheaper than GPT-5.5 and 17x cheaper than Claude Opus 4.7, while landing within 3-7 points on coding and reasoning benchmarks. The underrated detail is the cache-hit price, which can cut input cost ~88% for agents with stable prefixes. Teams should re-run their build math and route the easy majority of traffic to V4-Pro.
By Sarah Chen · 5 min · Jun 1, 2026

Claude Opus 4.8: Anthropic's Honest, Parallel-Agent Flagship
Anthropic released Claude Opus 4.8 on May 28, 2026, 41 days after Opus 4.7. It scores 69.2% on SWE-Bench Pro, emphasizes calibrated honesty and longer autonomy, adds Dynamic Workflows for hundreds of parallel subagents, runs fast mode ~2.5x quicker, and holds pricing flat from 4.7.
By Sarah Chen · 4 min · May 30, 2026

Gemini 3.5 Flash: Google's Flash Tier Eats Pro on Agent Benchmarks
Gemini 3.5 Flash outperforms the Pro tier on agent benchmarks with superior speed and efficiency.
By Sarah Chen · 5 min · May 28, 2026

Gemini Spark: Google's 24/7 Agent Runs Even When You Close Your Laptop
Gemini Spark is Google's 24/7 agent that continues working even when your laptop is closed.
By Sarah Chen · 6 min · May 27, 2026

Qwen3.7-Max: Alibaba's 35-Hour Agent Run Resets the Frontier
Alibaba's Qwen3.7-Max agent achieved a 35-hour autonomous run, setting new performance and cost benchmarks.
By Sarah Chen · 5 min · May 25, 2026

Gemini Intelligence: Google Moves AI From the App to the Android OS
Google's Gemini Intelligence brings OS-level AI to Android, transforming how devices integrate artificial intelligence.
By Sarah Chen · 5 min · May 19, 2026

Claude for Small Business: Anthropic Targets 36M U.S. SMBs
Anthropic's 'Claude for Small Business' integrates AI into SMB tools like QuickBooks, targeting 36M businesses.
By Sarah Chen · 6 min · May 17, 2026

SubQ: The 12M-Token Subquadratic LLM Splitting AI Researchers
SubQ is a new 12M-token subquadratic LLM claiming massive context and low compute, sparking debate among researchers.
By Sarah Chen · 5 min · May 16, 2026

Lightfield: The AI-Native CRM Tome's Founders Built Next
Lightfield is an AI-native CRM by Tome's founders, using agents to automate sales tasks like prospecting and coaching.
By Sarah Chen · 5 min · May 15, 2026

GPT-Realtime-2: OpenAI's Voice Model Gets GPT-5 Reasoning
OpenAI's GPT-Realtime-2 voice model now boasts GPT-5 reasoning and advanced features.
By Sarah Chen · 6 min · May 14, 2026

Claude Dreaming: Anthropic's Agents Now Learn While They Sleep
Anthropic's Claude agents now 'dream' to learn and improve task completion overnight.
By Sarah Chen · 5 min · May 13, 2026

Kimi K2.6: Moonshot's Open-Weights Model Beats GPT-5.4 on SWE-Bench Pro
Moonshot's Kimi K2.6, an open-weights model, surpasses GPT-5.4 on SWE-Bench Pro.
By Sarah Chen · 6 min · May 12, 2026

Codex 3.0: OpenAI's Autonomous Build-Test-Debug Loop Hits Product Hunt
OpenAI's Codex 3.0 offers an autonomous build-test-debug loop powered by GPT-5.5.
By Sarah Chen · 5 min · May 11, 2026

GPT-5.5-Cyber: OpenAI Hands Verified Defenders a Less-Restricted Model
OpenAI's GPT-5.5-Cyber, a less-restricted model, is now available for vetted cyber defenders.
By Sarah Chen · 6 min · May 8, 2026

Anthropic's $1.5B AI Services Firm Takes Aim at Big Consulting
Anthropic launches a $1.5B AI services firm, directly challenging big consulting.
By Sarah Chen · 6 min · May 7, 2026

Vision Banana: DeepMind Beats SAM 3 and Depth Anything V3
DeepMind's Vision Banana outperforms leading models, suggesting generation is key for vision pretraining.
By Sarah Chen · 4 min · May 6, 2026

GPT-5.5: OpenAI's First Full Retrain Since GPT-4.5 Bets on Agents
OpenAI's GPT-5.5 is a fully retrained model, focusing on agentic computer use, not just benchmarks.
By Sarah Chen · 5 min · May 5, 2026

Mistral Medium 3.5: 128B Open-Weight Model That Opens PRs
Mistral Medium 3.5 is a powerful 128B open-weight model capable of opening GitHub pull requests.
By Sarah Chen · 7 min · May 4, 2026

Microsoft Agent 365: $15-Per-Seat Control Plane for Your AI Agents
Microsoft Agent 365 offers a control plane to observe, govern, and secure all your AI agents.
By Sarah Chen · 6 min · May 2, 2026

DeepSeek V4 Pro: 1.6T Open-Weights Model Hits #2 on the Index
DeepSeek V4 Pro is a top 1.6T open-weights model for agents, but has a high hallucination rate.
By Sarah Chen · 5 min · Apr 29, 2026

Coinbase's Fred and Balaji AI Agents Arrive in Slack
Coinbase launched AI agents modeled on Fred Ehrsam and Balaji Srinivasan in Slack and email.
By Sarah Chen · 5 min · Apr 21, 2026

OpenAI Agents SDK: Sandboxes Land for Long-Horizon Agents
OpenAI's Agents SDK now features sandboxes, built-in providers, and durable state for long-horizon agents.
By Sarah Chen · 5 min · Apr 20, 2026

Claude Opus 4.7: Anthropic's New Flagship Clears SWE-Bench Pro
Anthropic's Claude Opus 4.7 excels on SWE-bench Pro with enhanced vision and new features.
By Sarah Chen · 6 min · Apr 19, 2026

MAI-Transcribe-1: Microsoft's Whisper Killer Hits 3.8% WER at $0.36/Hour
Microsoft's MAI-Transcribe-1 beats Whisper with 3.8% WER and lower costs, signaling independence from OpenAI.
By Sarah Chen · 6 min · Apr 17, 2026

Qwen 3.6 Plus: Alibaba's Free Preview Beats Claude Opus on Agent Tasks
Alibaba's Qwen 3.6 Plus Preview surpasses Claude Opus on agent tasks with impressive speed and context.
By Sarah Chen · 5 min · Apr 15, 2026

Figma for Agents: AI Now Designs Directly on Your Canvas
Figma now enables AI agents to design and modify directly on its canvas, leveraging your design system.
By Sarah Chen · 4 min · Apr 15, 2026

Atlassian Remix: AI Visuals and MCP Agents Come to Confluence
Atlassian Remix brings AI visuals and MCP agents to Confluence, transforming pages into dynamic content.
By Sarah Chen · 4 min · Apr 10, 2026

Meta Muse Spark: The First Model From Superintelligence Labs Is a Strategic Reset
Meta Muse Spark, from Superintelligence Labs, marks a strategic AI reset with top benchmarks and medical reasoning.
By Sarah Chen · 5 min · Apr 9, 2026

Bluesky Attie: The AI Feed Builder That 125,000 Users Blocked on Sight
Bluesky's Attie AI feed builder, powered by Claude, was blocked by 125,000 users quickly.
By Sarah Chen · 4 min · Apr 8, 2026

Denovo Turns a Business Idea Into a Running Startup in 8 Minutes
Denovo's AI platform turns a business idea into a fully running startup in just eight minutes.
By Sarah Chen · 5 min · Apr 3, 2026

GLM-5V-Turbo: Z.ai's 744B Vision Model Turns Screenshots Into Code
Z.ai's GLM-5V-Turbo vision model converts screenshots directly into executable code efficiently.
By Sarah Chen · 4 min · Apr 3, 2026

Tobira.ai: The AI Agent Network Where Bots Find You Business
Tobira.ai is an AI agent network where bots find clients, partners, and investors for you.
By Sarah Chen · 5 min · Apr 2, 2026

Google Stitch 2.0: The Free AI Design Tool That Topped Product Hunt
Google Stitch 2.0, a free AI design tool, topped Product Hunt with new vibe design and voice canvas.
By Sarah Chen · 4 min · Apr 2, 2026

LillyPod: Eli Lilly's 9,000-Petaflop Supercomputer Bets Big on AI Drug Discovery
Eli Lilly's LillyPod, a 9,000-petaflop AI supercomputer, is making big bets on drug discovery.
By Sarah Chen · 4 min · Apr 1, 2026

NVIDIA Nemotron 3 Super: The Hybrid Architecture That Rewrites the Agent Playbook
NVIDIA's Nemotron 3 Super, a hybrid architecture, delivers 5x throughput and top agentic benchmarks.
By Sarah Chen · 4 min · Mar 31, 2026

LTX 2.3: Lightricks' Open-Source Model Generates 4K Video with Synced Audio
Lightricks' LTX 2.3 is an open-source model generating native 4K video with perfectly synced audio.
By Marcus Rivera · 6 min · Mar 29, 2026

GPT-5.4: OpenAI's Five-Variant Strategy Reshapes the AI Market
OpenAI's GPT-5.4, with five variants and expert-level computer use, is reshaping the AI market.
By Sarah Chen · 5 min · Mar 29, 2026
