How to Fix Claude Code Context Limit Exceeded
Claude Code now has a 1 million token context window (up from 200k as of March 2026) but you can still burn through it fast with noisy tool outputs. When you hit the limit, Claude stops responding. The best fix is Token Limits proxy — automatic 60-80% compression on every tool call. This guide also covers /compact, clearing history, and smaller file operations.
Claude Code now has a 1 million token context window — up from 200k, now generally available as of March 2026 for Opus 4.6 and Sonnet 4.6. But 1M tokens does not mean unlimited. Large codebases, long chat histories, and verbose tool outputs still fill it fast, and the 5-hour rolling usage window means you can hit rate limits well before the context window is full.
What causes context limits?
- ✓Tool outputs (grep results, file listings, build errors can be hundreds of lines)
- ✓File reads (reading 10+ files adds up quickly)
- ✓Search results (full file contents for every match)
- ✓Chat history (conversations grow with every exchange)
- ✓Error logs and stack traces (often kilobytes of noise)
Which strategy should you use?
| Strategy | Effort | Effectiveness | Best for |
|---|---|---|---|
| Token Limits proxy | Low | Very High (60-80%) | Long-term — the permanent fix |
| Use /compact | Very Low | High (40-50%) | Immediate relief while you install |
| Clear chat history | Low | High | Starting fresh on old chats |
| Smaller file operations | Medium | Medium | New sessions, ongoing work |
| AI summaries | Low (with key) | High | Large old context |
Strategy 1: Install Token Limits proxy (the permanent fix)
Token Limits proxy intercepts all your tool outputs, file reads, and search results, automatically stripping noise and redundancy before they hit your context. Install once and compression happens automatically for every request — no manual intervention, no typing commands mid-session.
- Install: npm install -g token-limits
- Start: token-limits start
- Configure Claude Code: cmd+comma, scroll to "Tools", set API URL to http://localhost:4800
- Done. All tool outputs are now compressed by 60-80%.
The proxy caches repeated reads and searches, detects and removes duplicate content, and intelligently collapses large outputs. You get the same information, just in compressed form. Runs locally — your code never leaves your machine.
Strategy 2: Use /compact for immediate relief
If you are already mid-session and hitting the limit, /compact gives quick relief. Run /compact in your Claude Code chat. This command compresses your entire conversation history in place, shrinking it by 40-50% while keeping the important details.
Other strategies: Clear history, smaller files, AI summaries
- ✓Clear chat history: Start a new chat when the current one approaches 150k tokens
- ✓Work with smaller files: Avoid reading entire 1000+ line files; ask Claude to work with specific functions or sections
- ✓Enable AI summaries (with API key): Summarize old content that's still relevant but verbose
Why compression beats clearing context
Clearing context means losing information. Compression keeps it. A 50k token file read becomes 5k tokens after compression — same information, 90% fewer tokens. That is why installing the proxy once is better than managing context manually every session.
Fix context limits once, permanently
Token Limits proxy compresses every Claude Code tool output automatically — 60-80% less context per session. Install once, never manually manage context again.
FAQ
Does Claude Code have unlimited context?
No. Claude Code uses Opus 4.6 and Sonnet 4.6, both with a 1 million token context window (generally available as of March 2026). But the 5-hour rolling usage window means you can still hit rate limits before you fill the context window.
How do I know if I am about to hit the limit?
Claude Code slows down as context approaches the limit. Responses take longer. If you notice slowness, start a new chat or run /compact.
Does /compact actually work?
Yes. It rewrites your chat history in a more concise format. Results vary (40-60% compression), but it works best early before context is already extremely full.
Is Token Limits proxy better than /compact?
Yes. The proxy compresses every new tool call (60-80%), while /compact is one-time. Using both is ideal: /compact clears old history, proxy keeps new outputs lean.
Can I use Claude Code without hitting limits?
With the Token Limits proxy installed, most people can run full-day sessions without hitting limits. Without it, you typically hit limits every 1-3 hours of heavy tool usage.