Claude Code Token Usage: What It Costs and How to Cut It

April 5, 20267 min read

Claude Code counts input and output tokens, both against your 5-hour rolling limit. Each tool call—grep, file read, ls, diff—returns hundreds of tokens of output. Token Limits proxy compresses every call before it counts, cutting costs by 60-80%.

Claude Code charges tokens for every request: the prompt you type, the tool output Claude reads, and the response it generates. Most developers are shocked how fast tokens add up. A single grep with 100 results can use 18,000 tokens. Ten file reads can use 100,000 tokens.

What contributes to Claude Code token usage?

  • Tool calls (grep, search): 10-20k tokens per call with many results
  • File reads: 8-15k tokens depending on file size
  • Command execution (exec): 5-18k tokens depending on output length
  • Directory listings (ls): 2-6k tokens depending on directory size
  • Diffs (diff): 8-18k tokens depending on change size
  • Your chat history: grows with every message exchange

Token cost breakdown by tool type

Tool Call TypeAvg Input TokensAvg Output TokensTotal
grep (50 matches)15015,00015,150
file read (medium)8012,00012,080
ls (300+ items)604,5004,560
exec (command)1008,0008,100
diff (large file)12013,00013,120

How Claude Code counts usage

Claude Code counts both input and output tokens against your limit. A tool call that sends 100 tokens and receives 15,000 tokens counts as 15,100 tokens total. This means verbose tool outputs have twice the impact: once on the way in (Claude reading the output) and potentially again in your reply (Claude referencing the output).

The 5-hour rolling window

Claude Code has a rolling window limit that resets every 5 hours, not every day. If you use 100k tokens at 2pm, your counter resets at 7pm. This means heavy coding sessions can hit limits quickly. The exact limit depends on your plan (Claude Pro vs Max), but the rolling window is consistent.

How Token Limits proxy cuts costs

Token Limits proxy intercepts every tool call and compresses the output before Claude reads it. It removes duplicate lines, strips timestamps and verbose formatting, collapses blank lines, and intelligently summarizes repetitive content. The information Claude needs stays intact; the noise disappears.

Before and after compression

Tool TypeBefore CompressionAfter CompressionReduction
grep (100 matches)18,000 tokens3,200 tokens82%
file read (large)15,000 tokens2,700 tokens82%
ls (500+ items)12,000 tokens1,800 tokens85%
exec output9,000 tokens1,600 tokens82%
diff (large change)16,000 tokens2,400 tokens85%

Haiku subagents for background tasks

Token Limits automatically routes mechanical tool calls (file reads, searches, command execution) to Haiku, a faster and cheaper model. Complex reasoning stays on Claude Sonnet. This hybrid approach cuts background token costs by 40-50% without sacrificing accuracy.

Prewritten scripts vs conversational tasks

TaskConversation TokensScript TokensSavings
Deploy and verify8,0001,20085%
Check failed build6,50080088%
GitHub checks5,00060088%
Database check4,20050088%
Infrastructure audit7,00090087%

Token Limits includes prewritten scripts for common tasks: /ship (deploy), /check-deploy (verify), /github-check (status), and more. These scripts run directly without Claude planning—no back-and-forth, no token-heavy exchanges. Total savings per session: 20-30k tokens.

Understand your Claude Code usage — then cut it 80%

Token Limits logs token counts per tool call so you can see exactly where usage goes. Then compresses every call automatically. Proxy for Claude Code, MCP for everything else.

FAQ

How does Claude Code count tokens?

Both input tokens (your prompt + tool outputs Claude reads) and output tokens (Claude's response) count against your limit. A 150-token prompt + 15,000-token tool output + 500-token response = 15,650 tokens.

What is the Claude Code 5 hour limit?

Claude Code has a rolling usage window that resets every 5 hours, not every 24 hours. Heavy usage early in the window means hitting limits faster.

Does Claude Pro have a higher token limit than Max?

Both have rolling window limits; Max allows higher overall usage. Token Limits proxy benefits both equally by reducing tool output bloat.

Can compression reduce my tool call costs?

Yes. Compression reduces output tokens by 60-85%, directly lowering your total usage. A day of coding with compression might save 50-100k tokens.

How much does Token Limits cost?

$5/month for the proxy and all features. Pays for itself in a single session if you run many tool calls.