
Every file read, search result, and command output is full of noise your AI pays for but never uses. Token Limits strips it automatically. 60-80% smaller, zero effort.
Get More With Token Limits
Works With All Major Tools
Update, Uninstall & Help
Fix API Connection Errors
Why We Built Token Limits
Make Your Subscription Last Longer

Every tool call returns more text than your AI actually needs. You're paying for all of it.
Stack traces, build logs, and error messages burn thousands of tokens. Your AI only needs the key lines -- the rest is expensive filler.
Context gets compacted, AI forgets, re-reads the same file. 5x the tokens for the same information. Over and over.
Did you know a single emoji costs 3-4 tokens while a word is just 1? Timestamps, ANSI codes, formatting, duplicate lines -- it all counts against your budget and adds zero value.
Token Limits compresses tool outputs, file reads, search results, and verbose logs before they reach your AI. Code is automatically detected and left intact. Repeated content is deduplicated. Old context is intelligently summarized.

Works with Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, JetBrains, and any MCP-compatible tool. Runs a local OpenAI-compatible proxy on port 4801 for Ollama, llama.cpp, and LM Studio -- same compression, any model.
Optional AI-powered summaries condense old context, error traces, and verbose output into concise overviews. Cached so repeated content is free.
Core compression runs on your machine. License checks go to Token Limits, and optional AI summaries send large content directly to Anthropic using your API key.
Your CLAUDE.md and memory files load every session. One command strips emoji, formatting, and whitespace so they cost fewer tokens. Originals saved as backups.
Based on estimated token counts (~4 chars/token) from a single developer session. Actual savings vary by content type and usage patterns.
Real-time dashboard included with every install
Local models have the tightest context limits and no cloud to absorb the cost. Token Limits runs a local OpenAI-compatible proxy on port 4801 -- point your tool there and every request is compressed before it reaches your model.
Any app that accepts a custom OpenAI base URL works. API key field is ignored.
Join developers saving thousands of tokens every session.
Simple, transparent pricing

Install the CLI, run the setup wizard, then follow the guide for your tool. Most setups take about a minute.
macOS, Linux, WSL · Claude Code, Codex, Cursor, Windsurf, VS Code, JetBrains, Claude Desktop, Ollama, LM Studio