04: Context Engineering
30 minutes | You need: Claude Code running with some conversation history
How Context Works
Section titled “How Context Works”Everything Claude knows during a session lives in one place: the context window. It’s a fixed-size buffer (200K tokens by default; 1M tokens on Max, Team, and Enterprise plans via opt-in) that holds everything:
┌─────────────────────────────────────────────┐│ System prompt (~2K tokens, fixed) ││ CLAUDE.md (always loaded) ││ Tool descriptions (MCP + built-in) ││ Memory index (~/.claude/MEMORY.md) ││ Memory topic files (loaded on demand) ││ ─────────────────────────────────────────── ││ Conversation history (grows with usage) ││ Tool outputs (file reads, search ││ results, command ││ output — this is the ││ big one) ││ ─────────────────────────────────────────── ││ Free space (what's left for ││ reasoning) │└─────────────────────────────────────────────┘The symptoms:
- Claude repeats information it already gave you
- Claude mixes up files from different parts of the conversation
- Claude applies conventions from a previous task that don’t apply now
Context engineering is the art of the tradeoff: Claude needs enough context to do the job right (relevant code, conventions, tool access) but every token you add leaves less room for reasoning. Too little context and Claude guesses wrong. Too much and Claude drowns in noise, forgets instructions, and loses coherence. The goal is the minimum context that produces the maximum quality output.
Do This
Section titled “Do This”1. See where your tokens go
Section titled “1. See where your tokens go”/contextStudy the breakdown. Notice how much is consumed before you’ve typed anything — system prompt, CLAUDE.md, tool descriptions.
2. Understand the context tax of tools
Section titled “2. Understand the context tax of tools”Every MCP server you connect adds tool descriptions to your context. A typical MCP server adds 10-50 tools, each with a description. Ten MCP servers can consume thousands of tokens of your window before you’ve asked a single question.
Tool Search (automatic on Sonnet 4+ and Opus 4+) helps by lazy-loading tool descriptions only when they seem relevant. But each search query still costs tokens — it’s not free, just deferred.
| Approach | Idle context cost | When to use |
|---|---|---|
Skill wrapping CLI (gh, aws, docker) | ~100 tokens (name + description only) | Tool has a good CLI |
| MCP server | Hundreds to thousands of tokens | No CLI exists, needs persistent connection, or needs structured I/O |
You’ll build skills in Module 6. For now, remember: fewer MCP servers = more room for actual work.
3. Compact with intent
Section titled “3. Compact with intent”After a research-heavy conversation:
/compact Focus on the auth implementation decisions and test patternsRun /context before and after. Compaction compresses conversation history while preserving what you tell it to focus on.
Key facts:
- CLAUDE.md survives compaction — it’s always re-injected
- Conversation history does not — it gets summarized
/clearis a nuclear option — use it when switching to a completely unrelated task
4. Save knowledge, start fresh
Section titled “4. Save knowledge, start fresh”After a long investigation or design discussion, your context is full of exploration noise — file reads, dead ends, back-and-forth. The useful output is a fraction of the tokens consumed. Instead of compacting (which lossy-summarizes everything), externalize the knowledge and start clean:
Summarize everything we've decided about the auth refactor — architecture, API contracts, migration plan, open questions — into docs/auth-refactor-decisions.mdThen /clear and start a new session:
@docs/auth-refactor-decisions.md Implement the auth refactor based on these decisions.You now have a fresh context window with 100% free space, loaded with exactly the knowledge that matters — no investigation noise, no dead ends, no stale tool outputs.
5. Delegate to save context
Section titled “5. Delegate to save context”Use a subagent to investigate how our authentication system handles token refresh, map all the files involved, and summarize what I need to know to add a new token type.Run /context before and after. Your context barely grew, but you have a complete analysis.
Different subagent types exist:
- Explore agents — fast, read-only, typically use Haiku by default (configurable via
CLAUDE_CODE_SUBAGENT_MODEL). Good for “find X” tasks - General agents — full tool access, same model as you. Good for complex research
- Plan agents — read-only research for planning
6. Model selection as strategy
Section titled “6. Model selection as strategy”Models are a cost/quality/speed tradeoff:
| Command | Cost | When to use |
|---|---|---|
/model sonnet | See anthropic.com/pricing | 80% of daily work — fast, good enough |
/model opus | Higher than Sonnet | Complex debugging, architecture, subtle bugs |
/effort low | Less reasoning | Simple edits, formatting, quick answers |
/effort high | More reasoning | Hard problems, multi-step logic |
Power moves
Section titled “Power moves”| Command | What it does |
|---|---|
/btw [question] | Ask a side question with no context cost — doesn’t pollute your session |
/cost | See real-time token spend for this session |
--max-budget-usd 5 | Set a cost cap (print mode only) — use as claude -p --max-budget-usd 5 "query" |
/fork | Branch the conversation — try two approaches without losing either |
Artifact
Section titled “Artifact”A context management practice: check /context regularly, compact with focus at breakpoints, delegate research to subagents, and match model to task difficulty.
Go Deeper
Section titled “Go Deeper”Playbook M04 — Context Engineering for the four failure modes of large contexts, the research-plan-implement workflow, and advanced compaction strategies.