1.7 Context Management for AI-Native Workflows

Your AI coding agent is only as good as the context it's working with. This section teaches you to manage your context window like a professional — keeping your agent sharp, focused, and productive throughout long development sessions.

Why Context Management Matters

Every interaction with an AI model accumulates tokens: your prompts, the model's responses, tool outputs, file contents read into context, and error messages. The context window — the model's "working memory" — is finite. And unlike a simple hard limit where things just stop working, the reality is worse: output quality degrades gradually as context fills.

This means your agent can slowly become less effective without any obvious signal that something is wrong. It still responds. It still writes code. But it starts missing details, ignoring instructions, and producing subtler bugs. Understanding this dynamic — and knowing how to counter it — is one of the most important skills you'll develop in this program.

What This Means for You

You don't need to know transformer mechanics to use Claude Code well, but five facts about how the model relates to context are worth carrying around as analogies. Each one drives a concrete habit:

The model has a finite scratchpad. As it fills, output quality drops silently before any error appears. Take this as an instruction to compact early — at the warning signs, not at the limit.
Edges win, middles lose. The model attends most strongly to what's at the top and bottom of a long prompt; what gets buried in the middle is often ignored. Put your ask last, your supporting context up top, and any constraints you really need followed at both ends.
The model has no memory between turns. The client re-sends the entire conversation each time you hit enter; the model itself remembers nothing. You own memory — write decisions, plans, and progress to files, not just to the chat.
Long prompts are not free. Doubling the context roughly quadruples the work and dilutes the model's focus across more tokens. Treat every frontloaded token (system prompt, tool definitions, steering files) as RAM you can't use for the actual task.
Wrong information in context is worse than missing information. Failed attempts and bad conclusions survive compaction and mislead future turns. The order of harm, from worst to most tolerable, is: incorrect information → missing information → noisy-but-accurate information. Prune dead ends with /rewind before you /compact.

If you want the mechanics behind these — what attention is, why filling the window costs more than you think, why information in the middle gets ignored — they're spelled out in the Context Internals appendix. The rest of 1.7 is the operator's manual.

Anatomy of a Prompt

When you send "Can you refactor this function?" to Claude Code, the model doesn't just see those six words. Andrej Karpathy's mental model — widely cited in this space — is useful here: the LLM is a CPU, the context window is its RAM. Every byte of that RAM that's already spoken for before your message arrives is RAM you can't use for the actual task.

Here's what gets frontloaded into the window, in order:

Layer	Approximate Tokens (Claude Code)
Base system prompt / harness preamble	~2,900
Built-in tool definitions (18–24 tools, with structured tool schemas)	~3,000
MCP (Model Context Protocol) server tool definitions	varies — often hundreds to thousands per server
Skill descriptions, system reminders	~300–500
Steering docs (`CLAUDE.md`, `AGENTS.md`)	500–2,000 (target <500)
Memory files, environment info (current working directory, git status, OS)	~200–500
Conversation history (compacted or raw)	grows each turn
Subtotal before your first message	~8,000–12,000

(Figures from Drew Breunig's reverse-engineering of Claude Code's system prompt.)

At Anthropic's own scale, a five-MCP-server setup consumed ~55K tokens in tool definitions alone. Internal measurements found 134K tokens of tool definitions in some configurations before the new Tool Search Tool was introduced to load tools on demand. On a 200K context window, that's more than half your RAM spoken for before your first prompt.

Structured prompting: Markdown first

The most reliable way to make a long prompt easy for the model to navigate is to give it structure. Use clear headings, numbered or bulleted lists, and labeled code fences for any code or output blocks. Markdown structure is "structured prompting" for both humans and the model — it gives the model anchors that resist dilution and makes the document easy for you to read back later.

## Goal
Refactor the `UserService` to remove the duplicated validation.

## Constraints
- Don't change the public API
- Keep the existing tests passing

## Files in scope
- `src/services/user.ts`
- `src/services/__tests__/user.test.ts`

That's all the structure most Claude Code prompts need.

XML tags (<instructions>, <context>, <document>) are an additional Anthropic-trained convention that's especially useful for programmatic API prompts, templates with variable substitution, and multi-document retrieval. The full XML treatment lives in the Context Internals appendix. For chatting in Claude Code, Markdown first.

Lost-in-the-middle-aware structure

The model attends most strongly to the top and bottom of a long prompt. Information in the middle gets disproportionately ignored. Two practical rules fall out of this:

Critical instructions at both ends. If a constraint really matters, state it once near the top and once near the bottom. The cost is a few dozen tokens; the benefit is reliability under long-context conditions.
Long documents above the query. When you're feeding in source files, logs, or reference material, put the documents above and the actual ask below. Anthropic's own long-context guidance reports up to 30% quality improvements from this ordering on 20K+ token inputs.

For ranked retrievals, reorder so the strongest content sits at the top and bottom of the list, weakest in the middle.

Token economics

Every frontloaded token is RAM you can't use for working memory. Three mitigations:

Prompt caching — Anthropic auto-caches stable system prompts and tool definitions, dramatically reducing per-turn cost when your prefix doesn't change.
MCP hygiene — only load Model Context Protocol servers you're actively using. Treat tool definitions as a context budget line item.
Tight CLAUDE.md — aim for under 500 tokens of steering content. Anything longer belongs in skills or referenced files loaded on demand.

Context Management Strategies

Once you're aware of what's in your context and why it matters, four strategies consistently come up across the practitioner literature. Liatrio's Spec-Driven Development workflow encodes most of them by design — specifically task decomposition, serialized external state, and structured outputs. Sub-agents are the exception: SDD is intentionally harness-agnostic, so it doesn't orchestrate sub-agents for you. That's a tool you apply on top of SDD, particularly to shape a high-quality input into the first SDD step.

Feeding SDD well. SDD produces better specs when its input is already a well-reasoned idea, not a vague prompt. A clear, researched problem statement fed into the spec step yields dramatically better specs than "hey, build me X." This is where sub-agents earn their keep: before you kick off SDD step 1, use sub-agents and meta-prompting to research the problem space, summarize prior art, inventory constraints, and draft a crisp prompt. That up-front context curation — which never needs to live in your main window — is often the difference between a spec you can execute and a spec you have to rewrite.

Compare the naive ask:

/sdd-skill Make me an audit-logging feature for the pet clinic.

…with a meta-prompt that does the legwork first:

Goal: Produce a tight, researched problem statement I can hand to /sdd-skill
for an audit-logging feature in the Emerald Grove Pet Clinic codebase.

# Research
- Use a research sub-agent to inventory existing logging, audit, and
  persistence patterns in this repo. Return only the relevant findings.
- Use context7 MCP in a sub-agent to pull current docs for the persistence + auth
  libraries actually in use here.
- Use exploratory sub-agents to identify likely entity types that need
  audit coverage, retention constraints, and any compliance hints in
  CLAUDE.md or related docs.

Then compose a single meta-prompt — goal, in-scope entities, constraints,
out-of-scope, and any open questions for me — and copy it to the clipboard
so I can paste it straight into the SDD spec phase.

Task decomposition

Splitting work into discrete phases — research → plan → implement → review — prevents a single context window from carrying raw exploration, design rationale, and code edits simultaneously. Each phase has a different "shape" of useful context, and compacting between phases lets you carry forward only the conclusions.

Aider operationalizes this with architect mode: a reasoning model thinks about the change, a separate editor model emits the diff. Decoupling reasoning from edit-format compliance measurably improves results.

SDD is exactly this pattern at the workflow level: spec → task breakdown → implement → validation. Each step has a dedicated phase, a dedicated artifact, and a clean handoff.

Serialized external state

The most powerful technique in this whole document, and the hardest habit to build: write state to disk. A progress.md, decisions.md, or spec file is memory that doesn't decay. You can clear your context window, start fresh, and reload only what's relevant for the next phase.

Anthropic's own multi-agent research system does this explicitly — from their engineering blog:

The LeadResearcher… saves its plan to Memory to persist the context, since if the context window exceeds 200,000 tokens it will be truncated.

The file is the compaction. Every SDD artifact (spec, task list, validation report) is a serialized state boundary by design.

Emerald Grove walkthrough — starting a feature

You've cloned the Emerald Grove Pet Clinic and you want to add an appointment-reminder email feature. Instead of one long session that holds the spec discussion, the task breakdown, and the implementation simultaneously:

Spec phase. Run SDD step 1, get a spec.md written to disk, review it, commit. /clear.
Task phase. Open a fresh window. SDD step 2 reads the spec from disk and writes the task list. Commit. /clear.
Implementation phase. Fresh window. SDD step 3 reads the task list and the spec it needs, makes its edits, commits the code.

At no point is your main window carrying the spec discussion and the task discussion and the code edits. Each phase gets a clean window with only the disk artifacts it actually needs.

Structured outputs

Free-form prose drifts; constrained formats don't. Asking the model to emit output in a defined shape — a template, a checklist, a markdown document with required sections, a typed artifact like a spec or task list — gives you something you can rely on, re-read, and reload later without losing fidelity. Narrative summaries lose details every time they're compacted; structured artifacts survive intact.

This is less about any specific feature (though providers like Anthropic offer strict tool-call schemas) and more about a habit: when you want the model's output to be reusable context on a future turn — a plan, a progress log, a decision record — ask for it in a format you can parse back in.

Sub-agents for context isolation

Claude Code's Agent tool spawns a sub-agent with its own system prompt, tool list, permissions, and — crucially — its own fresh context window. The only channel in is the prompt string; the only channel out is the final message. Fifty thousand tokens of grep output, file reads, or MCP responses never enter your main window; only the conclusion does.

Use sub-agents liberally for token-intensive work:

Codebase exploration — grep and glob searches (which run through the Bash tool on native macOS/Linux builds), reading many files to answer a single question
Research and summarization — web searches, long document digestion
MCP tool orchestration — when tool responses are large or numerous
Investigation — "find all callers of X," "audit this migration"

Be cautious about delegating coding itself to sub-agents. This is Cognition AI's core argument in "Don't Build Multi-Agents": when the task is tightly coupled and decisions build on each other — as in implementing a feature — parallel sub-agents lose each other's design choices and produce incoherent output. Their example: sub-agents asked to build a Flappy Bird game independently designed Mario-style graphics because neither saw the other's stylistic decisions.

The takeaway: sub-agents shine when the task is fan-out (many independent lookups, each returning a small conclusion) and stumble when the task is cohesive (a single thread of design judgment). Research, exploration, and summarization are perfect. Writing the feature itself is usually best kept in the main agent, with sub-agents feeding it curated inputs.

SDD ties most of these together

You don't need to invent this discipline yourself. Liatrio's SDD workflow encodes three of the four strategies directly:

Task decomposition → spec, task breakdown, implement, validation as distinct phases
Serialized external state → spec files, task lists, validation reports on disk
Structured outputs → spec documents, task entries, and validation reports follow defined shapes
Natural compaction checkpoints → each phase transition is a clean handoff

Sub-agents are not built into SDD (it's harness-agnostic) — that's where you add your own layer, especially at the front end to produce a strong prompt for the first SDD step. We'll cover this more concretely in SDD as a Context Management Strategy below.

Your Context Management Toolkit

Claude Code provides several mechanisms for managing context. Here's when and how to use each one.

`/compact` — Summarize and Continue

The /compact command summarizes your current conversation to free context space while preserving key decisions, state, and progress.

/compact

You can also provide focus instructions to guide what the summary preserves:

/compact focus on the database schema changes and migration plan

When to use: When your status line shows context climbing into the tens of thousands of tokens — well before ~100K — or when you notice the agent starting to lose track of earlier decisions. Compact before quality degrades, not after.

Compaction Preserves Everything — Including Mistakes

/compact faithfully summarizes whatever is in your context, including failed attempts, wrong conclusions, and error loops. If your context is polluted, compaction serializes that pollution into the summary that guides all future work. See Intentional Compaction below for how to clean your context before compacting.

`/rewind` (`Esc+Esc`) — Selective Undo

The /rewind command (or the Esc+Esc keyboard shortcut) lets you undo back to a specific point in the conversation and compact from there. This is more surgical than /compact — it lets you discard a wrong direction entirely.

/rewind

When to use: When the agent has gone down a wrong path and you want to back up to before the mistake. Rather than trying to correct accumulated errors, rewind to a clean state and redirect.

/rewind isn't just an undo button — it's a context quality tool. Every wrong path the agent explores leaves tokens in your context that will survive compaction. By rewinding past failed attempts before you compact, you ensure only validated, correct work gets summarized. Think of /rewind as protecting the quality of your next /compact.

`/clear` — Full Reset

The /clear command wipes the entire conversation history and starts fresh.

/clear

When to use: When switching to an entirely unrelated task, or when context has become so polluted that compaction won't help. Starting fresh is often faster than fighting accumulated confusion.

When using SDD, /clear between steps is a habit worth building. Because SDD serializes state to disk at every phase transition (spec file, task list, implementation commits, validation report), you can safely /clear between steps and reload only the artifact you need. The spec exists on disk; the task list exists on disk; nothing is lost. Treat each SDD phase boundary as a free /clear — it's one of the easiest ways to keep your token count low throughout a feature.

Status Line — Your Early Warning System

If you configured your status line in 1.2 Verify Installation, you have a live token counter at the bottom of your terminal. This is your dashboard — glance at it regularly.

When to use: Always. The status line is passive monitoring. When you see token usage climbing into yellow territory (~80K tokens) — and certainly red (~120K) — that's your cue to compact or clear. Watch tokens, not percent: quality degrades with the absolute number of tokens in context, so a percent gauge means different things on a 200K window than on a 1M window (see the Context Internals appendix for the evidence).

CLAUDE.md — Persistent Context That Survives Compaction

Your project's CLAUDE.md file is loaded into context at the start of every session and after every compaction. This makes it the ideal place for instructions that must always be present: coding conventions, project architecture notes, workflow rules, and behavioral guidance.

When to use: For any instruction you find yourself repeating across sessions or after compaction. If you've told the agent the same thing three times, put it in CLAUDE.md.

`@` File References — Precision Context Loading

Instead of letting the agent read entire files into context, you can use @ references to load specific files precisely when needed.

@src/components/Header.tsx What props does this component accept?

When to use: When you need the agent to work with specific files but want to minimize unnecessary context consumption. This is especially valuable when working with large codebases where reading the wrong files can quickly eat your context budget.

Intentional Compaction — Curate Before You Compact

The tools above — /compact, /rewind, /clear — are powerful individually. But the real skill is knowing how to combine them. This is the idea behind Intentional Compaction, a concept articulated by Dex Horthy at HumanLayer.

The Problem with Naive Compaction

When you hit /compact after a messy session — one full of failed attempts, error loops, and wrong directions — the summarizer doesn't know which parts were mistakes. It faithfully condenses everything, including wrong conclusions and abandoned approaches. Your compacted context now carries that misinformation forward, subtly poisoning every subsequent interaction.

Not all context problems are equal. Dex Horthy identifies a clear hierarchy of harm:

Context Problem	Severity	Why
Incorrect information	Worst	Wrong conclusions survive compaction and actively mislead the agent into repeating mistakes
Missing information	Bad	Gaps can be re-discovered by reading files or asking questions; wrong conclusions cannot be easily detected
Too much noise	Tolerable	Verbose but at least accurate — the signal is still in there somewhere

This hierarchy is why naive compaction is dangerous: it converts noise (tolerable) into incorrect information (worst) by summarizing wrong paths as if they were valid work.

The `/rewind` → `/compact` Workflow

The fix is simple: curate your context before you compact it.

Recognize a dead end — the agent went down a wrong path, a fix didn't work, or you realize the approach was flawed
/rewind back to before the mistake — surgically remove the bad tokens from context
Continue from the clean checkpoint — redirect the agent on the correct path
/compact when ready — now the summary contains only validated, correct work

This two-step pattern — rewind then compact — ensures your compacted context is high-quality. You're not just saving space; you're curating what the agent remembers.

Serialized External State

The most powerful form of compaction isn't /compact at all — it's writing state to a file and clearing your window entirely. The pattern is:

Capture the meaningful state to disk (goal, decisions, current status, next steps)
/clear for a completely fresh context
Start the next session by loading only the curated state back in

This is the basis of Dex Horthy's Frequent Intentional Compaction (FIC) methodology — rather than letting context grow until it degrades, you proactively externalize state and restart with a clean window.

For most bootcamp work, you get this for free. SDD already serializes the right state at every phase transition (spec, tasks, commits, validation report). You don't need to hand-write a progress.md — you just /clear between SDD steps and reload the artifact you need. The workflow is doing FIC on your behalf.

Where you'll still do this by hand: long-running research and exploration that sits outside the SDD loop. When you've spent an hour with sub-agents exploring a codebase, reading docs, and weighing approaches, that accumulated understanding isn't captured by any SDD artifact yet. Before handing off to SDD step 1, have the agent write a short research summary to disk — then /clear and feed that summary into the spec step along with your actual prompt. You'll get a better spec and a leaner starting context.

Attribution

The concepts of Intentional Compaction and Frequent Intentional Compaction (FIC) were developed by Dex Horthy at HumanLayer. For a deeper dive, see the Advanced Context Engineering for Coding Agents repository.

SDD as a Context Management Strategy

The Spec-Driven Development workflow isn't just a development methodology — it's the context management strategy from the previous section, made concrete. Each SDD step serializes state to an external artifact, creating a natural compaction checkpoint:

SDD Step	Artifact Created	Context Implication
Specification	`spec.md`	Decisions captured in a file — safe to compact
Task Breakdown	Task list with proof artifacts	Plan exists outside context — safe to compact
Implementation	Code + tests committed to git	Work is saved — safe to compact
Validation	Validation report	Results recorded — safe to compact

After completing any SDD step, you can safely compact (or even clear) your context and reload from the artifact. The spec file, task list, or committed code contains everything the agent needs to continue. Treat each SDD step as an opportunity to reset your context.

In practice on Emerald Grove, this looks like running /clear between every SDD step and then kicking the next step off with an identifier (/sdd-3-implement spec 3). The SDD command locates the spec and task list on disk itself — you don't need to manually @-reference anything. You start each phase with a fresh window and get the full token budget for the work that phase actually does.

Best Practices

Watch your status line — compact proactively as context climbs toward ~80K tokens; by ~100K most models are measurably degraded, regardless of window size. (The older "40–60% utilization" rule of thumb is this same guidance on a 200K window — but percent rules break down on 1M-window models.) These numbers reflect today's attention architectures and will shift as models improve; the durable habit is proactive compaction, not any specific number — see the Context Internals appendix for why.
Start fresh rather than fighting accumulated context — if the agent seems confused or is ignoring instructions, a /clear and fresh start is often faster than trying to course-correct
"After two failed corrections, start fresh" — this is Anthropic's own guidance; if the agent isn't responding to feedback, the context is likely too polluted
Use sub-agents for token-intensive, fan-out tasks — research, exploration, summarization, and MCP-heavy work run in their own context windows. Keep coding itself in the main agent.
Use Plan Mode to reduce token consumption — Plan Mode (Shift+Tab) is a Claude Code mode that lets the agent read and reason but blocks edits, so exploration and planning generate fewer tokens than regular interactive sessions
Treat MCP and tool definitions as a context budget line item — only load MCP servers you're actively using; prune tools you don't need
Structure prompts with clear sections and put the ask last — Markdown headings, lists, and code fences cooperate with primacy and recency biases; place long documents above the query, the actual ask at the bottom
Configure PreCompact hooks to preserve critical instructions — Claude Code's hook system can run a script before each /compact to inject must-keep context (e.g. re-stating critical conventions), ensuring nothing essential is lost in the summary
Break long tasks into phases with explicit context boundaries — don't try to implement an entire feature in one session; use SDD steps or other natural breakpoints to segment work
Use context markers to detect instruction loss — add a directive to your CLAUDE.md requiring the agent to include a specific emoji (e.g., 🤖) at the start of every response. When the marker disappears, your agent has lost its instructions — time to compact or clear. This technique, popularized by Lada Kesseler, complements the status line: the status line monitors token count while context markers monitor instruction adherence. Liatrio's SDD workflow incorporates context markers as a best practice — check this project's own CLAUDE.md for a working example. For complex workflows, use different markers for different instruction layers so you can see at a glance exactly which instructions were lost.

Think of Context Like RAM

Treat your context window like RAM, not a hard drive. It's fast, powerful working memory — but it's finite and volatile. Your files, specs, and CLAUDE.md are your "disk storage." Save state to disk (artifacts, commits, CLAUDE.md) and keep your RAM (context) lean and focused on the immediate task.

Knowledge Check

Question 1 of 10

Context Management Knowledge Check

Q1. What makes context degradation particularly dangerous compared to a hard context limit?

It causes Claude to stop responding entirely when the limit is approached
It deletes your conversation history without warning when the window fills
Quality degrades gradually and silently — the agent keeps working but produces subtler bugs
It forces a mandatory session restart, causing you to lose all uncommitted work

Going Deeper

Everything in this section is the operator's manual: when to compact, when to clear, how to structure prompts, how SDD turns context management into something you mostly get for free. None of it requires knowing how attention works under the hood.

If you're curious about why it works the way it does — what attention actually is, why filling the context window costs more than it looks like it should, why information in the middle of a long prompt gets ignored, what "BOS token" and "softmax" mean, the deep XML-tag conventions Anthropic trains Claude on — head over to the Context Internals appendix. It's optional reading for the Forge, but it'll change how you see every prompt you write.

Why Context Management Matters​

What This Means for You​

Anatomy of a Prompt​

Structured prompting: Markdown first​

Lost-in-the-middle-aware structure​

Token economics​

Context Management Strategies​

Task decomposition​

Serialized external state​

Structured outputs​

Sub-agents for context isolation​

SDD ties most of these together​

Your Context Management Toolkit​

/compact — Summarize and Continue​

/rewind (Esc+Esc) — Selective Undo​

/clear — Full Reset​

Status Line — Your Early Warning System​

CLAUDE.md — Persistent Context That Survives Compaction​

@ File References — Precision Context Loading​

Intentional Compaction — Curate Before You Compact​

The Problem with Naive Compaction​

The /rewind → /compact Workflow​

Serialized External State​

SDD as a Context Management Strategy​

Best Practices​

Knowledge Check​