AI Does Not Read Like You Do — And It Is Costing You Tokens
An emoji costs 3-4 tokens. A plain word costs 1. AI does not need visual formatting, repeated headings, or decorative whitespace — it parses meaning, not layout. Stripping that noise from your prompts, files, and tool outputs is the fastest way to cut token usage by 40-80%.
Humans need visual structure. Bullet points help you skim. Blank lines give your eyes a rest. Emojis signal tone at a glance. But LLMs like Claude, GPT-4o, and Gemini do not skim. They read every single token — including the ones that add zero information.
How much does formatting actually cost?
| Element | Tokens per Instance | Instances per 1000 lines | Total Waste |
|---|---|---|---|
| Emoji | 3-4 | 50-100 | 150-400 tokens |
| Blank line | 1-2 | 100-300 | 100-600 tokens |
| Timestamp | 5-7 | 10-50 | 50-350 tokens |
| Repeated header | 4-8 | 10-30 | 40-240 tokens |
| Tab/indentation | 1-2 | 100-500 | 100-1000 tokens |
What counts as waste?
- ✓Decorative emoji: 🚀, ⚡, ✨ add tone but no information
- ✓Excessive blank lines: More than one per section is padding
- ✓Timestamps: Almost never needed by the AI for current tasks
- ✓Repeated column headers: In long lists, headers repeat unnecessarily
- ✓Over-quoted content: Extra quotes, brackets, or asterisks
Real example: A typical build log
A build log might have 3000 lines. Of those, 400-600 are blank lines (400-600 tokens). 200-300 are timestamps (1400-2100 tokens). 100+ are repeated headers (400-800 tokens). Total waste: 2000-3500 tokens. Remove that noise, and the same log becomes 500-1000 tokens.
How compression removes waste automatically
Token Limits compression pipeline removes formatting noise in this order: timestamps first (before processing lines), then blank lines (before detecting lists), then duplicates (before summarizing), then emoji last. This ordering is critical—wrong order means missed compression.
Three ways to eliminate formatting waste
Approach 1: Proxy compression (Claude Code)
Install Token Limits proxy. Every tool output is compressed automatically before Claude reads it.
Approach 2: MCP server compression (Cursor, Windsurf, etc.)
Add Token Limits MCP server to your IDE. All tool calls return compressed output by default.
Approach 3: Paste compressor (manual)
For one-off logs or files, use tokenlimits.app/compress. Paste, compress, copy the result.
Why compression works better than rewriting
You could manually edit every log and config before pasting. Or you could install compression once and stop thinking about it. Token Limits does the work automatically — every file read, grep result, and exec output is stripped of noise before Claude reads it. One install, every tool call, forever.
What Token Limits actually does
- ✓Proxy (Claude Code): intercepts every Anthropic API call, strips noise from tool outputs before they hit your context. Set ANTHROPIC_BASE_URL once, compression is automatic.
- ✓MCP server (Cursor, Windsurf, VS Code, JetBrains, Claude Desktop): 8 compressed tools (local_read, local_exec, local_search, local_ls, local_diff, local_json, local_map, expand). Same results, 60-85% fewer tokens.
- ✓Paste compressor: browser-based, no install. Paste noisy content in, get clean content out.
- ✓Haiku subagent routing: cheap tasks (file reads, directory scans) route to Haiku automatically — Sonnet/Opus budget reserved for reasoning.
- ✓Prewritten scripts: /ship, /check-deploy, /logs replace entire planning conversations with a single command call.
- ✓Runs entirely locally. Your code, logs, and prompts never leave your machine.
Stop paying for tokens you do not need
Token Limits compresses every tool output automatically. Works with Claude Code, Cursor, Windsurf, VS Code, JetBrains, and Claude Desktop. Runs locally — your code never leaves your machine.
FAQ
Does AI really care about formatting?
No. AI parses tokens, not visual layout. Formatting is noise that costs tokens without adding meaning.
How much can compression save?
Typical compression is 60-80%, depending on content type. Build logs and grep results compress best (85%+). Sparse code compresses less (40-50%).
Does compression ever lose information?
No. Timestamps, blank lines, and emoji are removed because they add zero information. File content, error messages, and search results are preserved. The expand tool lets the model request full content on-demand when needed.
Can I use compression on sensitive content?
Yes. Paste compressor runs in-browser — no external calls, nothing sent anywhere. Proxy and MCP server run locally on your machine. Your code and logs never leave your environment.
Does it work with all AI coding tools?
Yes. The proxy works with Claude Code (intercepts Anthropic API calls). The MCP server works with Cursor, Windsurf, VS Code, JetBrains, and Claude Desktop — any tool that supports MCP.