How to Increase Context Window in Claude, Cursor & More

April 13, 2026—Token Limits Team—5 min read

You cannot increase the context window — that is a hard model limit. But you can make the same window hold 3-5x more useful information by compressing what goes into it. For Claude Code, Cursor, Windsurf, VS Code Cline, and JetBrains, Token Limits does this automatically on every tool call.

Every AI coding tool has a context window — a hard ceiling on how much text the model can process at once. Claude Sonnet 4.6 has 1 million tokens. GPT-4o has 128k. These are fixed by the model provider and cannot be changed by the user. What you can change is how efficiently you use the space you have.

Why your context fills up faster than it should

The average tool call in Claude Code, Cursor, or Cline returns 5-10x more tokens than the model actually needs. A grep search with 50 matches returns 15,000 tokens — but 75% is repeated file paths, line numbers, and formatting. The model only needs the matched content, which is about 3,000 tokens. The rest is noise that eats your context window.

Tool	Raw context used	Compressed context	Effective window increase
Claude Code (1M window)	200k tokens/session	40k tokens/session	5x more sessions
Cursor (128k window)	80k tokens/session	20k tokens/session	4x longer sessions
Windsurf (200k window)	120k tokens/session	30k tokens/session	4x longer sessions
VS Code Cline (1M window)	180k tokens/session	36k tokens/session	5x longer tasks
JetBrains AI Assistant	100k tokens/session	25k tokens/session	4x longer sessions

The only real way to increase your effective context window

Compression. Strip the noise before it hits the model. Token Limits intercepts every tool call — file reads, searches, terminal output, diffs — and removes timestamps, blank lines, repeated paths, duplicate content, and verbose formatting. The model gets the same information in 20-40% of the original token count.

Compressing tool outputs from 100k tokens to 25k tokens has the same effect on session length as a model with a 4x larger context window. Compression is the practical equivalent of upgrading to a bigger window.

How to set it up for your tool

Tool	Install method	Config
Claude Code	curl -fsSL https://tokenlimits.app/api/install \| bash	token-limits setup (sets ANTHROPIC_BASE_URL)
Cursor	curl install + token-limits setup-cursor	Auto-writes ~/.cursor/mcp.json
Windsurf	curl install + token-limits setup-windsurf	Auto-writes ~/.codeium/windsurf/mcp_config.json
VS Code / Cline	curl install + token-limits setup-vscode	Writes .vscode/mcp.json (manual paste)
JetBrains	curl install + token-limits setup-jetbrains	Auto for Junie; manual for AI Assistant

Other things that reduce effective context (and what to do)

✓Long chat history: Start new sessions per feature — do not let one session grow unbounded
✓Reading entire files: Ask for specific functions or sections rather than whole files
✓Verbose error pastes: Use tokenlimits.app/compress to strip noise before pasting
✓Repeated context: Do not re-explain your codebase in every message — store it in a project system prompt
✓Large diff reviews: Review file by file rather than entire PRs in one go

Does upgrading to a bigger model help?

Switching from GPT-4o (128k) to Claude Sonnet 4.6 (1M) increases your window significantly and is worth doing. But the core problem persists: if you are sending 10x more tokens than needed per tool call, a bigger window just means you hit the ceiling later in the session, not never. Compression and a larger window together give you the best results.

Make your context window 3-5x more effective

Token Limits compresses every tool call automatically — 60-80% fewer tokens per session without losing any useful information. Works with Claude Code, Cursor, Windsurf, VS Code, and JetBrains. Free trial, no credit card.

Get Token Limits View Setup Guide

FAQ

Can I actually increase the context window size?

No. The context window is a hard limit set by the model provider. You can choose a model with a larger window (Claude Sonnet 4.6 at 1M tokens is currently the largest available in coding tools), but you cannot increase the limit of a given model.

What is the biggest context window available in AI coding tools?

Claude Opus 4.6 and Sonnet 4.6 both offer 1 million token context windows, available in Claude Code, Cursor, Windsurf, and VS Code Cline. This is currently the largest context window available in production AI coding tools.

How much does compression actually help?

In practice, 60-80% of most tool outputs is recoverable noise. Compressing that gives you 3-5x more useful work per session. A developer who previously hit limits every 2 hours typically runs full-day sessions after installing Token Limits.

Does compression lose important information?

No. Compression removes structural noise — timestamps, blank lines, repeated paths, duplicate content. Code, error messages, and file contents are preserved in full. The model gets the same signal with less noise.

Is there a free way to compress context?

Yes. tokenlimits.app/compress is a free in-browser paste compressor with no account required. For automatic compression on every tool call, Token Limits MCP/proxy is $5/month with a free 100-request trial.