04: Context Engineering

30 minutes | You need: Claude Code running with some conversation history

How Context Works

Everything Claude knows during a session lives in one place: the context window. It’s a fixed-size buffer (200K tokens by default; 1M tokens on Max, Team, and Enterprise plans via opt-in) that holds everything:

┌─────────────────────────────────────────────┐
│ System prompt           (~2K tokens, fixed)  │
│ CLAUDE.md               (always loaded)      │
│ Tool descriptions       (MCP + built-in)     │
│ Memory index            (~/.claude/MEMORY.md) │
│ Memory topic files      (loaded on demand)    │
│ ─────────────────────────────────────────── │
│ Conversation history    (grows with usage)    │
│ Tool outputs            (file reads, search   │
│                          results, command      │
│                          output — this is the  │
│                          big one)              │
│ ─────────────────────────────────────────── │
│ Free space              (what's left for      │
│                          reasoning)            │
└─────────────────────────────────────────────┘

The symptoms:

Claude repeats information it already gave you
Claude mixes up files from different parts of the conversation
Claude applies conventions from a previous task that don’t apply now

A single test suite run can dump hundreds of lines into your context — passing test names, progress bars, coverage tables, stack traces for unrelated failures. Most of it is noise. Configure a test reporter that only outputs failures, or pipe through a filter: npm test 2>&1 | grep -A 5 FAIL. Claude will often try to do this itself, but baking it into your CLAUDE.md test command guarantees it:

## Testing
- Run: `pnpm test -- --reporter=dot 2>&1 | grep -A 10 FAIL`

This one line in your CLAUDE.md can save thousands of tokens per test cycle.

Context engineering is the art of the tradeoff: Claude needs enough context to do the job right (relevant code, conventions, tool access) but every token you add leaves less room for reasoning. Too little context and Claude guesses wrong. Too much and Claude drowns in noise, forgets instructions, and loses coherence. The goal is the minimum context that produces the maximum quality output.

Do This

1. See where your tokens go

/context

Study the breakdown. Notice how much is consumed before you’ve typed anything — system prompt, CLAUDE.md, tool descriptions.

2. Understand the context tax of tools

Every MCP server you connect adds tool descriptions to your context. A typical MCP server adds 10-50 tools, each with a description. Ten MCP servers can consume thousands of tokens of your window before you’ve asked a single question.

Tool Search (automatic on Sonnet 4+ and Opus 4+) helps by lazy-loading tool descriptions only when they seem relevant. But each search query still costs tokens — it’s not free, just deferred.

Approach	Idle context cost	When to use
Skill wrapping CLI (`gh`, `aws`, `docker`)	~100 tokens (name + description only)	Tool has a good CLI
MCP server	Hundreds to thousands of tokens	No CLI exists, needs persistent connection, or needs structured I/O

You’ll build skills in Module 6. For now, remember: fewer MCP servers = more room for actual work.

3. Compact with intent

After a research-heavy conversation:

/compact Focus on the auth implementation decisions and test patterns

Run /context before and after. Compaction compresses conversation history while preserving what you tell it to focus on.

Key facts:

CLAUDE.md survives compaction — it’s always re-injected
Conversation history does not — it gets summarized
/clear is a nuclear option — use it when switching to a completely unrelated task

4. Save knowledge, start fresh

After a long investigation or design discussion, your context is full of exploration noise — file reads, dead ends, back-and-forth. The useful output is a fraction of the tokens consumed. Instead of compacting (which lossy-summarizes everything), externalize the knowledge and start clean:

Summarize everything we've decided about the auth refactor — architecture, API contracts, migration plan, open questions — into docs/auth-refactor-decisions.md

Then /clear and start a new session:

@docs/auth-refactor-decisions.md Implement the auth refactor based on these decisions.

You now have a fresh context window with 100% free space, loaded with exactly the knowledge that matters — no investigation noise, no dead ends, no stale tool outputs.

5. Delegate to save context

Use a subagent to investigate how our authentication system handles token refresh, map all the files involved, and summarize what I need to know to add a new token type.

Run /context before and after. Your context barely grew, but you have a complete analysis.

Different subagent types exist:

Explore agents — fast, read-only, typically use Haiku by default (configurable via CLAUDE_CODE_SUBAGENT_MODEL). Good for “find X” tasks
General agents — full tool access, same model as you. Good for complex research
Plan agents — read-only research for planning

6. Model selection as strategy

Models are a cost/quality/speed tradeoff:

Command	Cost	When to use
`/model sonnet`	See anthropic.com/pricing	80% of daily work — fast, good enough
`/model opus`	Higher than Sonnet	Complex debugging, architecture, subtle bugs
`/effort low`	Less reasoning	Simple edits, formatting, quick answers
`/effort high`	More reasoning	Hard problems, multi-step logic

Power moves

Command	What it does
`/btw [question]`	Ask a side question with no context cost — doesn’t pollute your session
`/cost`	See real-time token spend for this session
`--max-budget-usd 5`	Set a cost cap (print mode only) — use as `claude -p --max-budget-usd 5 "query"`
`/fork`	Branch the conversation — try two approaches without losing either

Artifact

A context management practice: check /context regularly, compact with focus at breakpoints, delegate research to subagents, and match model to task difficulty.

Go Deeper

Playbook M04 — Context Engineering for the four failure modes of large contexts, the research-plan-implement workflow, and advanced compaction strategies.