How to Compress Tokens Before They Hit Your AI Context
Token compression strips noise from tool outputs, logs, and pastes before the AI reads them. Timestamps, blank lines, repeated headers, and emoji typically add 3-5 tokens each without providing information. Compression pipelines can reduce context size by 60-85% while preserving all meaningful details.
Most of what Claude reads is noise. Tool outputs include timestamps, verbose formatting, repeated headers, blank lines, and decorative emoji. These elements add tokens without adding information. Token compression removes the noise, keeping the signal.
What gets compressed?
- ✓Timestamps and dates (rarely relevant to current tasks)
- ✓Blank lines and redundant spacing
- ✓Duplicate or repeated headers
- ✓Decorative emoji and ASCII art
- ✓Repeated file paths
- ✓Verbose formatting and extra quotes
- ✓Redundant line numbers or prefixes
Token cost of noise elements
| Element | Tokens per Instance | Information Value | Impact per 1000 lines |
|---|---|---|---|
| Emoji | 3-4 | None | 300-400 tokens wasted |
| Blank line | 1 | None | 100-200 tokens wasted |
| Timestamp | 5-7 | Rarely relevant | 500-700 tokens wasted |
| Repeated header | 4-8 | Redundant | 400-800 tokens wasted |
| Path prefix | 2-3 | Known context | 200-300 tokens wasted |
Three compression approaches
Approach 1: Claude Code proxy (automatic)
Token Limits proxy intercepts all tool outputs and compresses them before Claude reads them. Install once, get automatic compression for all requests. No manual work needed.
Approach 2: MCP server (Claude Desktop, Cursor, etc.)
Token Limits MCP server provides 8 compressed tools (local_read, expand, search, ls, exec, json, diff, map) that replace defaults. Set up once in your IDE, compress all future tool calls.
Approach 3: Paste compressor (manual, browser)
Paste logs, error messages, or files into tokenlimits.app/compress. Get compressed output back. No account needed, runs in-browser, always free.
How compression pipeline works
Compression order matters. Token Limits uses this sequence: (1) Remove timestamps before processing line numbers, (2) Collapse blank lines before detecting lists, (3) Deduplicate repeated lines before summarizing sections, (4) Strip emoji last. Wrong order means missed compression opportunities.
- ✓Step 1: Timestamps → Remove dates, times, ISO strings
- ✓Step 2: Blank lines → Collapse multiple spaces into one
- ✓Step 3: Duplicate detection → Find repeated lines/blocks
- ✓Step 4: List compression → Collapse repeated patterns
- ✓Step 5: Emoji removal → Strip decorative characters
- ✓Step 6: Smart summarization → Add section headers for long lists
Real compression ratios by content type
| Content Type | Before Compression | After Compression | Reduction |
|---|---|---|---|
| Build log output | 24,000 tokens | 3,600 tokens | 85% |
| npm list (deep tree) | 18,000 tokens | 2,200 tokens | 88% |
| grep results (100 matches) | 18,000 tokens | 3,200 tokens | 82% |
| ls output (500+ files) | 12,000 tokens | 1,800 tokens | 85% |
| Stack trace + context | 15,000 tokens | 2,100 tokens | 86% |
| JSON API response | 10,000 tokens | 1,400 tokens | 86% |
| git diff (large file) | 16,000 tokens | 2,400 tokens | 85% |
Haiku AI summaries for large old content
For very large content that stays in context but isn't actively being edited (old logs, archived discussions, build histories), Token Limits can optionally summarize with Haiku instead of stripping. This preserves important details while cutting tokens by 70-80%. Enabled with API key in proxy settings.
Compression best practices
- ✓Use proxy/MCP for automatic compression on every tool call (best option)
- ✓Use paste compressor for one-off logs or large pastes
- ✓Compress before pasting if tools aren't configured
- ✓Clear old compressed content periodically (new context resets better than stale summaries)
- ✓Test compression on sensitive logs first if privacy is a concern
Compression built for AI coding tools
Token Limits is purpose-built compression for AI coding workflows — not a generic text minifier. The pipeline is tuned for logs, code, diffs, and JSON. Proxy for Claude Code, MCP for everything else, paste compressor for free.
FAQ
How does token compression work?
Compression removes non-informative elements (timestamps, blank lines, emoji) before the AI reads the output. The signal stays; the noise disappears.
Does compression lose important information?
No. Timestamps, blank lines, and emoji add noise without information value. Compression preserves file names, paths, error messages, and actual content.
What is the biggest source of token waste?
Timestamps and blank lines in logs. A typical build log has 200-400 blank lines; each wastes 1 token. Times add 5-7 tokens each. Total: 1500-2500 tokens wasted per log.
Can I use compression on sensitive content?
With paste compressor: yes, it runs in-browser. With proxy/MCP: yes, compression happens locally on your machine. Nothing is sent to servers.
How much does compression help?
Typical reduction is 60-85% depending on content type. Build logs and grep results compress best (85%+). Sparse content compresses less (40-50%).