Stop Wasting Tokens in Claude Code
Most Claude Code sessions waste 60-80% of tokens on bloated tool outputs. Each grep, file read, and ls command returns kilobytes of formatting, paths, and duplicate content. Token Limits proxy strips this noise automatically, giving you more conversation per token.
Claude Code relies on tools: file reads, searches, listings, diffs. These tools are verbose by design—they return complete, structured output. But they also return tons of noise: repeated headers, paths you already know, duplicate lines, verbose formatting.
Where are your tokens being wasted?
| Tool | Typical Output Size | Noise % | Compressible Tokens |
|---|---|---|---|
| grep (50 matches) | 15,000 tokens | 75% | 11,000 tokens |
| file read (medium) | 12,000 tokens | 70% | 8,400 tokens |
| ls (500 items) | 8,000 tokens | 80% | 6,400 tokens |
| git diff | 16,000 tokens | 75% | 12,000 tokens |
| npm list | 10,000 tokens | 75% | 7,500 tokens |
A single grep search with 50 matches returns 15,000 tokens. Of those, 75% is duplicative—repeated file paths, line numbers, formatting brackets. Remove that noise and you get 11,000 tokens of savings in one tool call.
What exactly is being wasted?
- ✓Repeated file paths: Seen in every match; path is known context
- ✓Line numbers: Helpful for humans; AI only needs the content
- ✓Decorative formatting: Brackets, quotes, extra spaces
- ✓Duplicate headers: Same column headers repeated in lists
- ✓Blank lines: Visual spacing that costs tokens with zero info
How much can you save?
A typical Claude Code session with 20 tool calls might consume 200k-400k tokens total. Of those, the majority are from tool outputs. With compression, that drops by 60-80%. Same session, same results, fraction of the usage.
The fix: Token Limits proxy
Install Token Limits as a proxy between Claude Code and the API. It intercepts tool results, compresses them, and forwards the compressed version to Claude. All compression happens locally on your machine.
- npm install -g token-limits
- Run: token-limits start
- In Claude Code settings (cmd+comma): Tools > API URL > http://localhost:4800
- Done. All future tool calls are compressed.
Real compression numbers
| Tool | Before Compression | After Compression | Savings |
|---|---|---|---|
| grep (50 matches) | 15,000 tokens | 2,700 tokens | 82% |
| file read | 12,000 tokens | 2,100 tokens | 82% |
| ls (500 items) | 8,000 tokens | 1,200 tokens | 85% |
| git diff | 16,000 tokens | 2,400 tokens | 85% |
| npm list | 10,000 tokens | 1,500 tokens | 85% |
Cut your Claude Code token usage by 80%
Token Limits proxy automatically compresses every grep, file read, ls, and diff before Claude reads it. Set it up in 2 minutes — runs locally, no API key needed for compression.
FAQ
What exactly does Token Limits compress?
Timestamps, blank lines, repeated paths, line numbers, duplicate headers, emoji, and verbose formatting. It preserves file contents, error messages, search matches, and all meaningful information.
Does compression affect accuracy?
No. Compression removes format noise that does not contain information. Claude still sees the actual content—files, errors, search results—just without the padding.
How much bandwidth does the proxy use?
Almost zero difference. Compressed output travels over the same API connection as uncompressed. No bandwidth savings because the API still sends full output; we just compress what Claude reads.
Is it secure? Does it store my data?
The proxy runs locally on your machine. No data is sent to external servers. It is open-source and auditable.
What about Cursor, Windsurf, or other IDEs?
For IDEs other than Claude Code, use the Token Limits MCP server instead. It provides 8 compressed tools (local_read, expand, search, etc.) that replace the defaults.