Claude Code Token Usage: What It Costs and How to Cut It
Claude Code counts input and output tokens, both against your 5-hour rolling limit. Each tool call—grep, file read, ls, diff—returns hundreds of tokens of output. Token Limits proxy compresses every call before it counts, cutting costs by 60-80%.
Claude Code charges tokens for every request: the prompt you type, the tool output Claude reads, and the response it generates. Most developers are shocked how fast tokens add up. A single grep with 100 results can use 18,000 tokens. Ten file reads can use 100,000 tokens.
What contributes to Claude Code token usage?
- ✓Tool calls (grep, search): 10-20k tokens per call with many results
- ✓File reads: 8-15k tokens depending on file size
- ✓Command execution (exec): 5-18k tokens depending on output length
- ✓Directory listings (ls): 2-6k tokens depending on directory size
- ✓Diffs (diff): 8-18k tokens depending on change size
- ✓Your chat history: grows with every message exchange
Token cost breakdown by tool type
| Tool Call Type | Avg Input Tokens | Avg Output Tokens | Total |
|---|---|---|---|
| grep (50 matches) | 150 | 15,000 | 15,150 |
| file read (medium) | 80 | 12,000 | 12,080 |
| ls (300+ items) | 60 | 4,500 | 4,560 |
| exec (command) | 100 | 8,000 | 8,100 |
| diff (large file) | 120 | 13,000 | 13,120 |
How Claude Code counts usage
Claude Code counts both input and output tokens against your limit. A tool call that sends 100 tokens and receives 15,000 tokens counts as 15,100 tokens total. This means verbose tool outputs have twice the impact: once on the way in (Claude reading the output) and potentially again in your reply (Claude referencing the output).
The 5-hour rolling window
Claude Code has a rolling window limit that resets every 5 hours, not every day. If you use 100k tokens at 2pm, your counter resets at 7pm. This means heavy coding sessions can hit limits quickly. The exact limit depends on your plan (Claude Pro vs Max), but the rolling window is consistent.
How Token Limits proxy cuts costs
Token Limits proxy intercepts every tool call and compresses the output before Claude reads it. It removes duplicate lines, strips timestamps and verbose formatting, collapses blank lines, and intelligently summarizes repetitive content. The information Claude needs stays intact; the noise disappears.
Before and after compression
| Tool Type | Before Compression | After Compression | Reduction |
|---|---|---|---|
| grep (100 matches) | 18,000 tokens | 3,200 tokens | 82% |
| file read (large) | 15,000 tokens | 2,700 tokens | 82% |
| ls (500+ items) | 12,000 tokens | 1,800 tokens | 85% |
| exec output | 9,000 tokens | 1,600 tokens | 82% |
| diff (large change) | 16,000 tokens | 2,400 tokens | 85% |
Haiku subagents for background tasks
Token Limits automatically routes mechanical tool calls (file reads, searches, command execution) to Haiku, a faster and cheaper model. Complex reasoning stays on Claude Sonnet. This hybrid approach cuts background token costs by 40-50% without sacrificing accuracy.
Prewritten scripts vs conversational tasks
| Task | Conversation Tokens | Script Tokens | Savings |
|---|---|---|---|
| Deploy and verify | 8,000 | 1,200 | 85% |
| Check failed build | 6,500 | 800 | 88% |
| GitHub checks | 5,000 | 600 | 88% |
| Database check | 4,200 | 500 | 88% |
| Infrastructure audit | 7,000 | 900 | 87% |
Token Limits includes prewritten scripts for common tasks: /ship (deploy), /check-deploy (verify), /github-check (status), and more. These scripts run directly without Claude planning—no back-and-forth, no token-heavy exchanges. Total savings per session: 20-30k tokens.
Understand your Claude Code usage — then cut it 80%
Token Limits logs token counts per tool call so you can see exactly where usage goes. Then compresses every call automatically. Proxy for Claude Code, MCP for everything else.
FAQ
How does Claude Code count tokens?
Both input tokens (your prompt + tool outputs Claude reads) and output tokens (Claude's response) count against your limit. A 150-token prompt + 15,000-token tool output + 500-token response = 15,650 tokens.
What is the Claude Code 5 hour limit?
Claude Code has a rolling usage window that resets every 5 hours, not every 24 hours. Heavy usage early in the window means hitting limits faster.
Does Claude Pro have a higher token limit than Max?
Both have rolling window limits; Max allows higher overall usage. Token Limits proxy benefits both equally by reducing tool output bloat.
Can compression reduce my tool call costs?
Yes. Compression reduces output tokens by 60-85%, directly lowering your total usage. A day of coding with compression might save 50-100k tokens.
How much does Token Limits cost?
$5/month for the proxy and all features. Pays for itself in a single session if you run many tool calls.