Token Optimization Mastery
Learning Objectives
- Apply model switching strategies by task type
- Practice session hygiene for minimal token waste
- Calculate real savings from optimization techniques
- Track costs with /cost and /insights
The Token Budget Mindset
Claude Code on a Max subscription feels unlimited — and compared to API pricing, it is. But tokens still matter. Not because of cost (your subscription covers it), but because of quality. A context window filled with irrelevant history, unused MCP definitions, and stale file references produces worse results than a clean, focused context.
Token optimization is quality optimization.
Strategy 1: Model Switching
Different tasks need different models. Don't use Opus for renaming variables, and don't use Haiku for architecture review.
The Assignment Matrix
| Task | Model | Why |
|---|---|---|
| Architecture design | Opus | Complex reasoning, multi-factor analysis |
| Security review | Opus | Deep pattern recognition, threat modeling |
| Complex debugging | Opus | Subtle bug analysis, multi-file reasoning |
| Feature implementation | Sonnet | Balanced capability and speed |
| Writing tests | Sonnet | Good at pattern following |
| Code review | Sonnet | Catches issues without over-reasoning |
| Boilerplate generation | Haiku | Fast, pattern-based |
| Code exploration | Haiku | Quick lookups, simple queries |
| Renaming/formatting | Haiku | Mechanical tasks |
Switching During a Session
/model opus
"Analyze the auth module architecture and identify single points of failure"
/model sonnet
"Implement the changes we discussed"
/model haiku
"Add JSDoc comments to all public functions"
Three model switches in one session — each task gets the right level of capability.
Strategy 2: Session Hygiene
One Task, One Session
The single most impactful habit:
# Session 1: Debug auth
claude -n "debug-auth"
# ... 30 messages ...
# Done. Exit.
# Session 2: Implement notifications
claude -n "notifications"
# Fresh context, no auth debugging history
Compact at 80%
Don't wait for the 95% auto-compact:
/context # Check usage
/compact "keep the payment refund logic details" # Manual compact with hint
Write, Then Compact
Save important findings before compacting:
"Add to CLAUDE.md: The payment timeout is caused by connection pool
exhaustion under high webhook load. Fix: increase pool size to 25."
/compact
Strategy 3: Minimize Passive Token Costs
Disconnect Unused MCP Servers
Connected servers: Token cost per message
GitHub: ~3,000 tokens
PostgreSQL: ~1,600 tokens
Slack: ~2,400 tokens
Notion: ~2,000 tokens
─────────────────────────────
Total: ~9,000 tokens × 50 messages = 450,000 tokens
After disconnecting unused:
GitHub: ~3,000 tokens (actively using)
─────────────────────────────
Total: ~3,000 tokens × 50 messages = 150,000 tokens
Saved: 300,000 tokens
Enable LSP
{ "env": { "ENABLE_LSP_TOOL": "1" } }
Without LSP:
Claude: "Let me find where createOrder is defined"
→ grep "createOrder" src/ → 45 results → ~3,000 tokens in context
With LSP:
Claude: "Let me find where createOrder is defined"
→ go to definition → 1 result → ~50 tokens in context
Savings: ~2,950 tokens per navigation × 20 navigations per session = ~59,000 tokens saved
Reference Files Late
# Expensive: file tokens re-processed 50 times
Message 1: "Look at @src/services/auth.ts"
Messages 2-50: Auth file in every message
# Efficient: file tokens re-processed 5 times
Message 46: "Now look at @src/services/auth.ts"
Messages 47-50: Auth file in 5 messages
Strategy 4: Effort Matching
/effort low # Renaming, formatting, boilerplate
/effort medium # Standard features, tests
/effort high # Complex debugging, architecture
Low effort uses ~1/3 the tokens of medium. For 10 simple tasks at low instead of medium, you save 60-70% of those tokens.
Tracking Costs
/cost
Shows the cost of the current session:
/cost
# Session cost: $0.42 (15,200 input tokens, 3,400 output tokens)
/insights
Shows usage analytics:
/insights
# This week:
# Sessions: 23
# Total tokens: 2.4M
# Average session: 104K tokens
# Most expensive session: "database migration" (340K tokens)
The Real Savings Math
For a Max subscriber ($200/month), here's what optimization saves in equivalent API costs:
Baseline (no optimization):
50 messages/session × 3 sessions/day × 22 days
Average 150K tokens/session
Total: 9.9M tokens/month
API equivalent: ~$3,000/month
With optimization:
30 messages/session (shorter sessions)
× 4 sessions/day (more sessions, less wasted context)
× 22 days
Average 80K tokens/session
Total: 7.0M tokens/month
API equivalent: ~$2,100/month
But! Better results because context is focused:
More accurate code (fewer retries)
Fewer corrections needed
Better architecture decisions
The token savings are real, but the quality improvement is worth even more.
Quick Optimization Checklist
1. Enable ENABLE_LSP_TOOL=1 (biggest single win)
2. One task = one session
3. Disconnect unused MCP servers
4. Match model to task (Opus/Sonnet/Haiku)
5. Match effort to task (low/medium/high)
6. Compact at 80%, not 95%
7. Reference files late, not early
8. Write findings to CLAUDE.md before compacting
9. Use brief mode (CLAUDE_CODE_BRIEF=1) if you don't need explanations
10. Track with /cost and /insights
Key Takeaway
Token optimization improves quality, not just cost. The biggest wins are: enabling LSP (saves thousands of tokens per session), one task per session (clean context), disconnecting unused MCP servers (eliminates passive token costs), and matching model and effort to the task. Track your usage with /cost and /insights. A Max subscription ($200/month) delivers $5,000-15,000+ in equivalent API value — optimization ensures you get the most quality from every token.