AI Does Not Read Like You Do — And It Is Costing You Tokens

March 30, 20266 min read

An emoji costs 3-4 tokens. A plain word costs 1. AI does not need visual formatting, repeated headings, or decorative whitespace — it parses meaning, not layout. Stripping that noise from your prompts, files, and tool outputs is the fastest way to cut token usage by 40-80%.

Humans need visual structure. Bullet points help you skim. Blank lines give your eyes a rest. Emojis signal tone at a glance. But LLMs like Claude, GPT-4o, and Gemini do not skim. They read every single token — including the ones that add zero information.

How much does formatting actually cost?

ElementTokens per InstanceInstances per 1000 linesTotal Waste
Emoji3-450-100150-400 tokens
Blank line1-2100-300100-600 tokens
Timestamp5-710-5050-350 tokens
Repeated header4-810-3040-240 tokens
Tab/indentation1-2100-500100-1000 tokens

What counts as waste?

  • Decorative emoji: 🚀, ⚡, ✨ add tone but no information
  • Excessive blank lines: More than one per section is padding
  • Timestamps: Almost never needed by the AI for current tasks
  • Repeated column headers: In long lists, headers repeat unnecessarily
  • Over-quoted content: Extra quotes, brackets, or asterisks

Real example: A typical build log

A build log might have 3000 lines. Of those, 400-600 are blank lines (400-600 tokens). 200-300 are timestamps (1400-2100 tokens). 100+ are repeated headers (400-800 tokens). Total waste: 2000-3500 tokens. Remove that noise, and the same log becomes 500-1000 tokens.

How compression removes waste automatically

Token Limits compression pipeline removes formatting noise in this order: timestamps first (before processing lines), then blank lines (before detecting lists), then duplicates (before summarizing), then emoji last. This ordering is critical—wrong order means missed compression.

Three ways to eliminate formatting waste

Approach 1: Proxy compression (Claude Code)

Install Token Limits proxy. Every tool output is compressed automatically before Claude reads it.

Approach 2: MCP server compression (Cursor, Windsurf, etc.)

Add Token Limits MCP server to your IDE. All tool calls return compressed output by default.

Approach 3: Paste compressor (manual)

For one-off logs or files, use tokenlimits.app/compress. Paste, compress, copy the result.

Why compression works better than rewriting

You could manually edit every log and config before pasting. Or you could install compression once and stop thinking about it. Token Limits does the work automatically — every file read, grep result, and exec output is stripped of noise before Claude reads it. One install, every tool call, forever.

Token compression is not a new feature — it is how AI coding should work. Every platform (Claude Code, Cursor, Windsurf, VS Code) should compress by default. Token Limits is that layer, built and ready to install.

What Token Limits actually does

  • Proxy (Claude Code): intercepts every Anthropic API call, strips noise from tool outputs before they hit your context. Set ANTHROPIC_BASE_URL once, compression is automatic.
  • MCP server (Cursor, Windsurf, VS Code, JetBrains, Claude Desktop): 8 compressed tools (local_read, local_exec, local_search, local_ls, local_diff, local_json, local_map, expand). Same results, 60-85% fewer tokens.
  • Paste compressor: browser-based, no install. Paste noisy content in, get clean content out.
  • Haiku subagent routing: cheap tasks (file reads, directory scans) route to Haiku automatically — Sonnet/Opus budget reserved for reasoning.
  • Prewritten scripts: /ship, /check-deploy, /logs replace entire planning conversations with a single command call.
  • Runs entirely locally. Your code, logs, and prompts never leave your machine.

Stop paying for tokens you do not need

Token Limits compresses every tool output automatically. Works with Claude Code, Cursor, Windsurf, VS Code, JetBrains, and Claude Desktop. Runs locally — your code never leaves your machine.

FAQ

Does AI really care about formatting?

No. AI parses tokens, not visual layout. Formatting is noise that costs tokens without adding meaning.

How much can compression save?

Typical compression is 60-80%, depending on content type. Build logs and grep results compress best (85%+). Sparse code compresses less (40-50%).

Does compression ever lose information?

No. Timestamps, blank lines, and emoji are removed because they add zero information. File content, error messages, and search results are preserved. The expand tool lets the model request full content on-demand when needed.

Can I use compression on sensitive content?

Yes. Paste compressor runs in-browser — no external calls, nothing sent anywhere. Proxy and MCP server run locally on your machine. Your code and logs never leave your environment.

Does it work with all AI coding tools?

Yes. The proxy works with Claude Code (intercepts Anthropic API calls). The MCP server works with Cursor, Windsurf, VS Code, JetBrains, and Claude Desktop — any tool that supports MCP.