Token Compression Protocol v3.0

Stop Wasting Tokens
Every Token Counts. Most Are Wasted.

Every file read, search result, and command output is full of noise your AI pays for but never uses. Token Limits strips it automatically. 60-80% smaller, zero effort.

Learn

Tips & Guides

Get More With Token Limits

Works With All Major Tools

Update, Uninstall & Help

Fix API Connection Errors

Why We Built Token Limits

Make Your Subscription Last Longer

Watch

Get More With Token Limits

0%
Avg. Token Reduction
0
Tokens Saved
<0ms
Added Proxy Latency
The Problem

Your AI is Reading Noise

Every tool call returns more text than your AI actually needs. You're paying for all of it.

Verbose Output

Stack traces, build logs, and error messages burn thousands of tokens. Your AI only needs the key lines -- the rest is expensive filler.

Repeated Reads

Context gets compacted, AI forgets, re-reads the same file. 5x the tokens for the same information. Over and over.

Hidden Waste

Did you know a single emoji costs 3-4 tokens while a word is just 1? Timestamps, ANSI codes, formatting, duplicate lines -- it all counts against your budget and adds zero value.

Compression Engine

How Token Compression Works

Token Limits compresses tool outputs, file reads, search results, and verbose logs before they reach your AI. Code is automatically detected and left intact. Repeated content is deduplicated. Old context is intelligently summarized.

  • CONTENT-AWARE COMPRESSION
  • AI-POWERED SUMMARIZATION
  • SMART DEDUPLICATION
  • CODE-SAFE BY DEFAULT
  • ONE-COMMAND .MD CLEANUP
Compression architecture

Any LLM, Any Tool

Works with Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, JetBrains, and any MCP-compatible tool. Runs a local OpenAI-compatible proxy on port 4801 for Ollama, llama.cpp, and LM Studio -- same compression, any model.

AI Summaries

Optional AI-powered summaries condense old context, error traces, and verbose output into concise overviews. Cached so repeated content is free.

Local & Private

Core compression runs on your machine. License checks go to Token Limits, and optional AI summaries send large content directly to Anthropic using your API key.

.MD File Cleanup

Your CLAUDE.md and memory files load every session. One command strips emoji, formatting, and whitespace so they cost fewer tokens. Originals saved as backups.

Real Session Data

Performance Metrics

0%tool output compression
0 tokens savedfrom a single session
0
Tokens Saved
0
Requests
0
Deduped
By Content Type
File reads
82%
Command output
55%
Search results
48%
Context & plans
90%
Web content
7%

Based on estimated token counts (~4 chars/token) from a single developer session. Actual savings vary by content type and usage patterns.

Live Dashboard

See It Working

Compression Relay
localhost:4800
79%token savings
File reads
82%
Command output
55%
Search results
48%
Context & plans
90%

Real-time dashboard included with every install

Local LLM Support

Compress for Ollama, llama.cpp, LM Studio

Local models have the tightest context limits and no cloud to absorb the cost. Token Limits runs a local OpenAI-compatible proxy on port 4801 -- point your tool there and every request is compressed before it reaches your model.

  • DROP-IN REPLACEMENT FOR OPENAI BASE URL
  • WORKS WITH OLLAMA, LLAMA.CPP, LM STUDIO, AIDER
  • AUTO-DETECTS OLLAMA ON STARTUP
  • SAME DASHBOARD AND STATS ON PORT 4801
3-line setup
1. Install
curl -fsSL https://tokenlimits.app/api/install | bash
2. Set your local LLM URL
token-limits config --openai-url http://localhost:11434
3. Point your tool at port 4801
http://localhost:4801/v1/chat/completions

Any app that accepts a custom OpenAI base URL works. API key field is ignored.

Join developers saving thousands of tokens every session.

Pricing

Simple, transparent pricing

Developer Pro
$5/ month
  • 100 free requests to start
  • Unlimited compression
  • AI-powered summaries
  • Live usage dashboard
  • All tools & integrations
  • Works with any LLM
  • Cancel anytime
Start Free Trial
Community
Free
  • Paste compressor tool
  • Strips formatting & noise
  • Works with any LLM
  • No account required
Open Compressor
The Story

Why We Made Token Limits

Stop Overpaying for Tokens

Install the CLI, run the setup wizard, then follow the guide for your tool. Most setups take about a minute.

macOS, Linux, WSL · Claude Code, Codex, Cursor, Windsurf, VS Code, JetBrains, Claude Desktop, Ollama, LM Studio

FAQ