Why Does Claude Burn Tokens So Fast? Tool Verbosity, Thinking, CLAUDE.md

2026-05-06—5 min read

You are coding in Claude Code. Twenty minutes in. Already used 200k tokens. That seems fast. What is eating up so much? Turns out: Claude consumes tokens from sources you do not see. Thinking tokens, system prompts, CLAUDE.md file content, noisy tool results. This guide explains where tokens really go and how Token Limits compresses the biggest offenders.

Sources of hidden token consumption

✓System prompts: Claude's internal instructions (usually 20-50k tokens)
✓Thinking tokens: Extended thinking adds 10-30k per request
✓Tool results: grep, ls, file reads return thousands of lines
✓CLAUDE.md: Project instructions added to context (often 5-20k tokens)
✓Repeated tool calls: The same file read twice = double tokens

Tool results: The biggest culprit (50%+ of token usage)

A single grep search with 50 matches returns 15,000 tokens. That is:

✓30+ lines of repetitive paths and headers
✓Line numbers (one per match)
✓Decorative brackets and formatting
✓Blank lines for visual spacing
✓Color codes (ANSI escape sequences)

Only 25% of those 15,000 tokens is actual information. 75% is waste. Token Limits strips the waste.

System prompts and safety tokens (15-20% of usage)

Claude has internal system prompts that guide its behavior. These are added to every request and count against your token limit. Estimates: 20-50k tokens for system prompts + safety margin per session.

Thinking tokens: The expensive feature (if enabled)

If you have extended thinking enabled, Claude uses "thinking tokens" for internal reasoning. These cost tokens but do not appear in the output. A single request might use 10-30k thinking tokens.

Thinking is powerful for complex problems but expensive. For routine coding, consider disabling it.

CLAUDE.md and project configuration (5-15% of usage)

Many projects have CLAUDE.md files with instructions, architecture notes, and workflow guidelines. Claude Code reads these and includes them in context. Large CLAUDE.md files (5-20k tokens) add up across a session.

Repeated tool calls burning double tokens

If you read the same file twice (once for context, once to confirm changes), you use 2x tokens. Token Limits caches and deduplicates: the second read costs 0 tokens.

Real token budget breakdown: Typical 30-minute session

Source	Tokens	% of Total
Tool outputs (uncompressed)	180k	50%
System prompts + safety	72k	20%
Conversation messages	54k	15%
CLAUDE.md and config	36k	10%
Thinking (if enabled)	18k	5%

How Token Limits compression works

Token Limits targets the biggest offender: tool results. It compresses:

✓Timestamps: Strips dates, times (often repeated 50+ times per result)
✓Blank lines: Removes visual spacing (costs tokens, adds no info)
✓Line numbers: Strips leading numbers (Claude only needs content)
✓Repeated headers: Collapses duplicate column labels
✓Decorative formatting: Brackets, quotes, extra spaces

Real impact: With Token Limits compression

Same 30-minute session with Token Limits proxy installed:

Source	Tokens Before	Tokens After	Savings
Tool outputs	180k	36k	80%
System prompts	72k	72k	0% (not compressed)
Conversation	54k	54k	0% (not compressed)
CLAUDE.md	36k	36k	0% (not compressed)
Thinking	18k	18k	0% (not compressed)
Total	360k	216k	40%

Cut token consumption by 40-60% with Token Limits

Automatically compress the biggest token offender: tool results. Same information, fraction of the tokens. Install in 2 minutes.

Get Token Limits View Setup Guide

FAQ

Can I reduce CLAUDE.md size to save tokens?

Yes. Trim instructions that are not actively needed. Keep what is critical for the current project.

Should I disable thinking tokens?

If you are hitting limits frequently, try disabling thinking. You lose reasoning power but save 10-30k tokens per session.

Why does Token Limits only compress tool results, not system prompts?

System prompts are part of Claude's core behavior. Token Limits can not modify them. But compressing tool results (the largest part) is enough to fix most limit problems.

Does deduplication really help that much?

Yes. In typical coding sessions, files are read 2-5 times. Dedup saves 5-20k tokens per session.

What other sources of token waste can I fix myself?

Keep your conversation focused (avoid long tangents). Clear old chat history. Disable thinking if you do not need it. But Token Limits automates the biggest savings.