Claude Code Token Usage: What It Costs and How to Cut It

April 5, 2026—Token Limits Team—7 min read

Claude Code counts input and output tokens, both against your 5-hour rolling limit. Each tool call—grep, file read, ls, diff—returns hundreds of tokens of output. Token Limits proxy compresses every call before it counts, cutting costs by 60-80%.

Claude Code charges tokens for every request: the prompt you type, the tool output Claude reads, and the response it generates. Most developers are shocked how fast tokens add up. A single grep with 100 results can use 18,000 tokens. Ten file reads can use 100,000 tokens.

What contributes to Claude Code token usage?

✓Tool calls (grep, search): 10-20k tokens per call with many results
✓File reads: 8-15k tokens depending on file size
✓Command execution (exec): 5-18k tokens depending on output length
✓Directory listings (ls): 2-6k tokens depending on directory size
✓Diffs (diff): 8-18k tokens depending on change size
✓Your chat history: grows with every message exchange

Token cost breakdown by tool type

Tool Call Type	Avg Input Tokens	Avg Output Tokens	Total
grep (50 matches)	150	15,000	15,150
file read (medium)	80	12,000	12,080
ls (300+ items)	60	4,500	4,560
exec (command)	100	8,000	8,100
diff (large file)	120	13,000	13,120

How Claude Code counts usage

Claude Code counts both input and output tokens against your limit. A tool call that sends 100 tokens and receives 15,000 tokens counts as 15,100 tokens total. This means verbose tool outputs have twice the impact: once on the way in (Claude reading the output) and potentially again in your reply (Claude referencing the output).

The 5-hour rolling window

Claude Code has a rolling window limit that resets every 5 hours, not every day. If you use 100k tokens at 2pm, your counter resets at 7pm. This means heavy coding sessions can hit limits quickly. The exact limit depends on your plan (Claude Pro vs Max), but the rolling window is consistent.

How Token Limits proxy cuts costs

Token Limits proxy intercepts every tool call and compresses the output before Claude reads it. It removes duplicate lines, strips timestamps and verbose formatting, collapses blank lines, and intelligently summarizes repetitive content. The information Claude needs stays intact; the noise disappears.

Before and after compression

Tool Type	Before Compression	After Compression	Reduction
grep (100 matches)	18,000 tokens	3,200 tokens	82%
file read (large)	15,000 tokens	2,700 tokens	82%
ls (500+ items)	12,000 tokens	1,800 tokens	85%
exec output	9,000 tokens	1,600 tokens	82%
diff (large change)	16,000 tokens	2,400 tokens	85%

Haiku subagents for background tasks

Token Limits automatically routes mechanical tool calls (file reads, searches, command execution) to Haiku, a faster and cheaper model. Complex reasoning stays on Claude Sonnet. This hybrid approach cuts background token costs by 40-50% without sacrificing accuracy.

Prewritten scripts vs conversational tasks

Task	Conversation Tokens	Script Tokens	Savings
Deploy and verify	8,000	1,200	85%
Check failed build	6,500	800	88%
GitHub checks	5,000	600	88%
Database check	4,200	500	88%
Infrastructure audit	7,000	900	87%

Token Limits includes prewritten scripts for common tasks: /ship (deploy), /check-deploy (verify), /github-check (status), and more. These scripts run directly without Claude planning—no back-and-forth, no token-heavy exchanges. Total savings per session: 20-30k tokens.

Understand your Claude Code usage — then cut it 80%

Token Limits logs token counts per tool call so you can see exactly where usage goes. Then compresses every call automatically. Proxy for Claude Code, MCP for everything else.

Get Token Limits View Setup Guide

FAQ

How does Claude Code count tokens?

Both input tokens (your prompt + tool outputs Claude reads) and output tokens (Claude's response) count against your limit. A 150-token prompt + 15,000-token tool output + 500-token response = 15,650 tokens.

What is the Claude Code 5 hour limit?

Claude Code has a rolling usage window that resets every 5 hours, not every 24 hours. Heavy usage early in the window means hitting limits faster.

Does Claude Pro have a higher token limit than Max?

Both have rolling window limits; Max allows higher overall usage. Token Limits proxy benefits both equally by reducing tool output bloat.

Can compression reduce my tool call costs?

Yes. Compression reduces output tokens by 60-85%, directly lowering your total usage. A day of coding with compression might save 50-100k tokens.

How much does Token Limits cost?

$5/month for the proxy and all features. Pays for itself in a single session if you run many tool calls.