Why Does Claude Burn Tokens So Fast? Tool Verbosity, Thinking, CLAUDE.md
You are coding in Claude Code. Twenty minutes in. Already used 200k tokens. That seems fast. What is eating up so much? Turns out: Claude consumes tokens from sources you do not see. Thinking tokens, system prompts, CLAUDE.md file content, noisy tool results. This guide explains where tokens really go and how Token Limits compresses the biggest offenders.
Sources of hidden token consumption
- ✓System prompts: Claude's internal instructions (usually 20-50k tokens)
- ✓Thinking tokens: Extended thinking adds 10-30k per request
- ✓Tool results: grep, ls, file reads return thousands of lines
- ✓CLAUDE.md: Project instructions added to context (often 5-20k tokens)
- ✓Repeated tool calls: The same file read twice = double tokens
Tool results: The biggest culprit (50%+ of token usage)
A single grep search with 50 matches returns 15,000 tokens. That is:
- ✓30+ lines of repetitive paths and headers
- ✓Line numbers (one per match)
- ✓Decorative brackets and formatting
- ✓Blank lines for visual spacing
- ✓Color codes (ANSI escape sequences)
Only 25% of those 15,000 tokens is actual information. 75% is waste. Token Limits strips the waste.
System prompts and safety tokens (15-20% of usage)
Claude has internal system prompts that guide its behavior. These are added to every request and count against your token limit. Estimates: 20-50k tokens for system prompts + safety margin per session.
Thinking tokens: The expensive feature (if enabled)
If you have extended thinking enabled, Claude uses "thinking tokens" for internal reasoning. These cost tokens but do not appear in the output. A single request might use 10-30k thinking tokens.
Thinking is powerful for complex problems but expensive. For routine coding, consider disabling it.
CLAUDE.md and project configuration (5-15% of usage)
Many projects have CLAUDE.md files with instructions, architecture notes, and workflow guidelines. Claude Code reads these and includes them in context. Large CLAUDE.md files (5-20k tokens) add up across a session.
Repeated tool calls burning double tokens
If you read the same file twice (once for context, once to confirm changes), you use 2x tokens. Token Limits caches and deduplicates: the second read costs 0 tokens.
Real token budget breakdown: Typical 30-minute session
| Source | Tokens | % of Total |
|---|---|---|
| Tool outputs (uncompressed) | 180k | 50% |
| System prompts + safety | 72k | 20% |
| Conversation messages | 54k | 15% |
| CLAUDE.md and config | 36k | 10% |
| Thinking (if enabled) | 18k | 5% |
How Token Limits compression works
Token Limits targets the biggest offender: tool results. It compresses:
- ✓Timestamps: Strips dates, times (often repeated 50+ times per result)
- ✓Blank lines: Removes visual spacing (costs tokens, adds no info)
- ✓Line numbers: Strips leading numbers (Claude only needs content)
- ✓Repeated headers: Collapses duplicate column labels
- ✓Decorative formatting: Brackets, quotes, extra spaces
Real impact: With Token Limits compression
Same 30-minute session with Token Limits proxy installed:
| Source | Tokens Before | Tokens After | Savings |
|---|---|---|---|
| Tool outputs | 180k | 36k | 80% |
| System prompts | 72k | 72k | 0% (not compressed) |
| Conversation | 54k | 54k | 0% (not compressed) |
| CLAUDE.md | 36k | 36k | 0% (not compressed) |
| Thinking | 18k | 18k | 0% (not compressed) |
| Total | 360k | 216k | 40% |
Cut token consumption by 40-60% with Token Limits
Automatically compress the biggest token offender: tool results. Same information, fraction of the tokens. Install in 2 minutes.
FAQ
Can I reduce CLAUDE.md size to save tokens?
Yes. Trim instructions that are not actively needed. Keep what is critical for the current project.
Should I disable thinking tokens?
If you are hitting limits frequently, try disabling thinking. You lose reasoning power but save 10-30k tokens per session.
Why does Token Limits only compress tool results, not system prompts?
System prompts are part of Claude's core behavior. Token Limits can not modify them. But compressing tool results (the largest part) is enough to fix most limit problems.
Does deduplication really help that much?
Yes. In typical coding sessions, files are read 2-5 times. Dedup saves 5-20k tokens per session.
What other sources of token waste can I fix myself?
Keep your conversation focused (avoid long tangents). Clear old chat history. Disable thinking if you do not need it. But Token Limits automates the biggest savings.