Claude Dreaming: Anthropic's Agents Now Learn While They Sleep

For most of the agent era, every Claude session has started from zero. Tools forgotten. Filetype quirks unlearned. The same mistake, made again on Monday because nobody remembered Friday. That changes now.

At Code with Claude 2026 in San Francisco on May 6, Anthropic's Chief Product Officer Ami Vora walked on stage and announced something the company has been quietly testing for months: dreaming. It's a scheduled background process for Claude Managed Agents that reviews previous sessions, extracts patterns, and writes new memory entries the next session can use. Anthropic frames it as the hippocampal-consolidation analogue for software: a brain replaying the day's events at night and deciding what to keep.

The early numbers are not subtle. Legal AI company Harvey says task completion rates rose roughly 6x once dreaming was enabled on its internal agents — and crucially, without any change to the underlying model.

What Dreaming Actually Does

Strip away the metaphor and dreaming is three things stacked together:

A scheduler. Dreaming runs between sessions, not during them. Your agent finishes a job, the lights go down, dreaming wakes up.
A pattern extractor. It reads through what the agent did — successes, dead ends, tool calls that failed three times before they worked — and surfaces structure a single session can't see on its own.
A memory curator. It updates the agent's memory store with new entries: recurring mistakes to avoid, workflows the team converges on, preferences shared across users.

"We're not changing the model itself through dreaming — it's not doing updates to the weights or anything like that," an Anthropic engineer told VentureBeat.

That distinction matters. Dreaming is not fine-tuning. The base model — whether that's Claude Opus 4.7, Sonnet, or Haiku — stays frozen. What changes is the context the agent walks into next morning. Think of it less as "the model got smarter" and more as "the model finally has a notebook it actually reads."

The Control Knob Most Teams Will Care About

Anthropic shipped dreaming with two operating modes, and the choice between them is the real product decision:

Mode	Behavior	Best for
Auto-curate	Dreaming updates memory automatically between sessions	High-volume internal agents where you trust the loop
Review-before-land	Proposed memory edits queue up for a human to approve	Customer-facing agents, regulated industries, anything load-bearing

The review mode is the one regulated teams have been quietly asking for. A dreaming agent that silently rewrites its own instructions is a compliance problem; a dreaming agent that proposes changes and waits for a human nod is a productivity tool.

Why Harvey's 6x Matters (And Why It Doesn't)

Harvey's headline number — a sixfold lift in task completion rate — is the soundbite, and it's a real result from real production traffic on real legal workflows. But the texture of why it lifted is more interesting than the multiplier itself.

According to Anthropic's announcement and follow-up reporting, Harvey's agents previously kept rediscovering the same things every session: how a specific firm prefers contract redlines, which document parser handles which file format, how to recover when a citation lookup tool times out. Each of those is a small thing. Stacked across a thousand sessions a day, they become the difference between agents that almost work and agents that do the job.

Dreaming doesn't make the agent smarter. It makes the agent stop forgetting. For a class of long-running, multi-session work — legal review, financial analysis, anything where the same client gets the same agent next week — that distinction is the entire product.

What Shipped Alongside

Vora's Code with Claude announcement was a three-feature bundle, not a one-off:

Outcomes — a self-grading loop where agents evaluate their own runs against the user's stated goal. Moved from research preview into public beta.
Multi-agent orchestration — letting one Claude agent spawn and coordinate sub-agents on complex jobs. Also into public beta.
Dreaming — the only one of the three still gated. Available through a developer access program, not yet generally available.

The bundle telegraphs Anthropic's read on the next year: single agents are not the unit anymore. Teams of agents that grade themselves and remember what they learned — that's the unit. Dreaming is the memory layer for the bet.

The Skeptic's Case

Three things worth flagging before anyone gets too excited:

It's not GA. Dreaming is in developer access. There's no published pricing, no SLA, and no public benchmark beyond Harvey's number. Treat the 6x as a real signal, not a guarantee.
Memory drift is a real failure mode. An agent that curates its own memory can also curate it badly — overfitting on one weird session, baking in a workaround that becomes the new wrong answer. The review-before-land mode exists for a reason.
It only matters if your agent has continuity. A one-shot agent that handles a single request and never sees the user again has nothing to dream about. Dreaming is a feature for operated agents, not chatbots.

The Bottom Line

Dreaming is not a model release. It's an admission that the bottleneck on production agents in 2026 was never raw capability — it was the fact that every session started amnesiac. Anthropic just shipped the memory layer that makes Claude Managed Agents accumulate institutional knowledge the way a long-tenured employee does. Harvey's 6x is the first proof point. The interesting question is which industry posts the second one, and how quickly OpenAI and Google ship their answer.

Claude Dreaming: Anthropic's Agents Now Learn While They Sleep

Claude Dreaming: Anthropic's Agents Now Learn While They Sleep

What Dreaming Actually Does

The Control Knob Most Teams Will Care About

Why Harvey's 6x Matters (And Why It Doesn't)

What Shipped Alongside

The Skeptic's Case

The Bottom Line

More in AI News

GLM-5V-Turbo: Z.ai's 744B Vision Model Turns Screenshots Into Code

Kimi K2.6: Moonshot's Open-Weights Model Beats GPT-5.4 on SWE-Bench Pro

Codex 3.0: OpenAI's Autonomous Build-Test-Debug Loop Hits Product Hunt