How to Increase Context Window in Claude, Cursor & More
You cannot increase the context window — that is a hard model limit. But you can make the same window hold 3-5x more useful information by compressing what goes into it. For Claude Code, Cursor, Windsurf, VS Code Cline, and JetBrains, Token Limits does this automatically on every tool call.
Every AI coding tool has a context window — a hard ceiling on how much text the model can process at once. Claude Sonnet 4.6 has 1 million tokens. GPT-4o has 128k. These are fixed by the model provider and cannot be changed by the user. What you can change is how efficiently you use the space you have.
Why your context fills up faster than it should
The average tool call in Claude Code, Cursor, or Cline returns 5-10x more tokens than the model actually needs. A grep search with 50 matches returns 15,000 tokens — but 75% is repeated file paths, line numbers, and formatting. The model only needs the matched content, which is about 3,000 tokens. The rest is noise that eats your context window.
| Tool | Raw context used | Compressed context | Effective window increase |
|---|---|---|---|
| Claude Code (1M window) | 200k tokens/session | 40k tokens/session | 5x more sessions |
| Cursor (128k window) | 80k tokens/session | 20k tokens/session | 4x longer sessions |
| Windsurf (200k window) | 120k tokens/session | 30k tokens/session | 4x longer sessions |
| VS Code Cline (1M window) | 180k tokens/session | 36k tokens/session | 5x longer tasks |
| JetBrains AI Assistant | 100k tokens/session | 25k tokens/session | 4x longer sessions |
The only real way to increase your effective context window
Compression. Strip the noise before it hits the model. Token Limits intercepts every tool call — file reads, searches, terminal output, diffs — and removes timestamps, blank lines, repeated paths, duplicate content, and verbose formatting. The model gets the same information in 20-40% of the original token count.
How to set it up for your tool
| Tool | Install method | Config |
|---|---|---|
| Claude Code | curl -fsSL https://tokenlimits.app/api/install | bash | token-limits setup (sets ANTHROPIC_BASE_URL) |
| Cursor | curl install + token-limits setup-cursor | Auto-writes ~/.cursor/mcp.json |
| Windsurf | curl install + token-limits setup-windsurf | Auto-writes ~/.codeium/windsurf/mcp_config.json |
| VS Code / Cline | curl install + token-limits setup-vscode | Writes .vscode/mcp.json (manual paste) |
| JetBrains | curl install + token-limits setup-jetbrains | Auto for Junie; manual for AI Assistant |
Other things that reduce effective context (and what to do)
- ✓Long chat history: Start new sessions per feature — do not let one session grow unbounded
- ✓Reading entire files: Ask for specific functions or sections rather than whole files
- ✓Verbose error pastes: Use tokenlimits.app/compress to strip noise before pasting
- ✓Repeated context: Do not re-explain your codebase in every message — store it in a project system prompt
- ✓Large diff reviews: Review file by file rather than entire PRs in one go
Does upgrading to a bigger model help?
Switching from GPT-4o (128k) to Claude Sonnet 4.6 (1M) increases your window significantly and is worth doing. But the core problem persists: if you are sending 10x more tokens than needed per tool call, a bigger window just means you hit the ceiling later in the session, not never. Compression and a larger window together give you the best results.
Make your context window 3-5x more effective
Token Limits compresses every tool call automatically — 60-80% fewer tokens per session without losing any useful information. Works with Claude Code, Cursor, Windsurf, VS Code, and JetBrains. Free trial, no credit card.
FAQ
Can I actually increase the context window size?
No. The context window is a hard limit set by the model provider. You can choose a model with a larger window (Claude Sonnet 4.6 at 1M tokens is currently the largest available in coding tools), but you cannot increase the limit of a given model.
What is the biggest context window available in AI coding tools?
Claude Opus 4.6 and Sonnet 4.6 both offer 1 million token context windows, available in Claude Code, Cursor, Windsurf, and VS Code Cline. This is currently the largest context window available in production AI coding tools.
How much does compression actually help?
In practice, 60-80% of most tool outputs is recoverable noise. Compressing that gives you 3-5x more useful work per session. A developer who previously hit limits every 2 hours typically runs full-day sessions after installing Token Limits.
Does compression lose important information?
No. Compression removes structural noise — timestamps, blank lines, repeated paths, duplicate content. Code, error messages, and file contents are preserved in full. The model gets the same signal with less noise.
Is there a free way to compress context?
Yes. tokenlimits.app/compress is a free in-browser paste compressor with no account required. For automatic compression on every tool call, Token Limits MCP/proxy is $5/month with a free 50-request trial.