Mustafa Batın EFE - Software Engineer

A New Word for an Old Idea

Anthropic introduced a technique it calls “dreaming” — a between-tasks process where an agent reviews its own trajectories, looks for patterns and mistakes, and writes updated policies back into its working memory. It is framed as part of Anthropic's broader push to build managed agents that can run for hours or days without going off the rails.

The name is new; the idea isn't. Reinforcement learning has long used experience replay. What changes here is that the loop runs at the agent layer, in natural language, against the same model the agent uses at runtime — not against gradients in a training cluster.

How It Works

1. Collect Trajectories

During a work session, the agent emits structured traces — tool calls, observations, intermediate plans, outcomes. These are stored alongside the surrounding task definition and the human verdict on whether the task succeeded.

2. Dream Pass

Offline (often overnight), a second pass walks back through recent trajectories, flags where things drifted, generalizes the lesson, and rewrites a small set of working-memory rules. Crucially, dreaming doesn't retrain the base model — it edits the playbook that gets injected at runtime.

3. Replay

On the next session, the updated playbook is part of the prompt. Anthropic reports meaningful improvements on multi-step tasks — particularly the ones where the failure mode was “agent didn't remember the constraint we already established.”

Why It's Interesting

The bottleneck for long-running agents has never been raw model capability; it has been the difficulty of accumulating durable knowledge across sessions without devolving into a stale or contradictory rulebook. Dreaming is a candidate mechanism for that — lightweight, model-driven, auditable.

It also lines up with how Anthropic has been framing its agent work for the past year: keep the base model stable, push the dynamics into structured context and tools, and treat the agent as a system you can debug rather than a black box you retrain.

The Open Questions

The biggest unknown is robustness. A bad dream pass can teach an agent the wrong lesson and propagate it. Anthropic's answer is human-in-the-loop review for rule updates that touch high-stakes workflows, and a rollback path on every playbook change.

The second question is portability. If the dreamed playbook is a chunk of natural-language context, it should travel between models. If it relies on Claude-specific behaviors, less so. Anthropic hasn't said much yet about cross-model evaluation.

The agents that survive in production aren't the smartest single-shot — they're the ones that learn from their own logs. Dreaming is a clean name for that workflow.

References

Tags: Anthropic • AI Agents • Research

Anthropic Teaches AI Agents to “Dream”