GitHub Copilot Context Window Limit: What You Can Do
GitHub Copilot uses GPT-4o under the hood for most agent tasks, giving you a 128k token context window. Copilot manages what goes into that window — you do not control it directly. But in Copilot agent mode with MCP enabled (VS Code 1.99+), Token Limits can compress every tool call, effectively giving you 3-4x longer agent sessions.
GitHub Copilot has two modes that handle context differently. Inline completions use a small local context window (usually 8-16k tokens) around your cursor. Agent mode (VS Code Copilot Chat in agentic tasks) uses the full model context window — 128k for GPT-4o or up to 1M if Claude is selected. The limit you are likely hitting is in agent mode during long multi-file tasks.
Copilot context window by model
| Model in Copilot | Context window | Best for |
|---|---|---|
| GPT-4o (default) | 128,000 tokens | General tasks |
| Claude Sonnet 4.6 | 1,048,576 tokens | Large codebases, long sessions |
| Claude Haiku 4.5 | 200,000 tokens | Fast, lightweight tasks |
| o3 / o4-mini | 200,000 tokens | Reasoning-heavy tasks |
How Copilot fills the context window
In agent mode, Copilot autonomously reads files, runs searches, and executes commands to complete tasks. Each of these operations returns verbose output that is added to the context. Over a long task, the accumulation of file reads, error messages, and search results fills the window faster than the actual conversation history.
- ✓File reads: Copilot reads entire files rather than targeted sections
- ✓Search results: Full file contents returned for every match
- ✓Terminal output: Build logs, test output, package install noise
- ✓Conversation history: Every previous exchange stays in context
- ✓System prompts: Copilot adds its own system instructions that consume context
How to compress Copilot agent tool calls with Token Limits MCP
VS Code 1.99 added native MCP server support. In Copilot agent mode, the agent can use workspace MCP servers registered in .vscode/mcp.json. Adding Token Limits as an MCP server means Copilot agent tasks use compressed file reads and search results instead of raw output. Install with token-limits setup-vscode and restart VS Code.
Copilot vs alternatives when context is the bottleneck
| Tool | Default context | With Token Limits | Setup |
|---|---|---|---|
| GitHub Copilot (GPT-4o) | 128k | ~500k effective | VS Code MCP |
| GitHub Copilot (Claude Sonnet) | 1M | 3-5M effective | VS Code MCP |
| Cursor | 200k | 800k effective | Auto via setup-cursor |
| Cline (Claude Sonnet) | 1M | 3-5M effective | Auto via setup-vscode |
| Windsurf | 200k | 800k effective | Auto via setup-windsurf |
Tips to get more from Copilot's context window without MCP
- ✓Use #file references to include only the specific files needed rather than letting Copilot read everything
- ✓Start new Copilot Chat sessions per task — do not carry long conversation history
- ✓Switch the model to Claude Sonnet 4.6 for large-codebase tasks
- ✓Use @workspace sparingly — it loads a large index into context
- ✓Compress pastes manually at tokenlimits.app/compress before including error logs or build output
Extend Copilot agent mode context by 3-4x
Token Limits MCP compresses every Copilot agent tool call in VS Code 1.99+. Free trial, no credit card. Pairs with any model Copilot supports.
FAQ
What is the GitHub Copilot context window size?
It depends on the model. GPT-4o gives 128k tokens. Claude Sonnet 4.6 gives 1 million tokens. You can switch models inside Copilot Chat. For long agentic tasks, Claude Sonnet 4.6 is the better choice.
Why does Copilot stop mid-task?
Copilot agent mode fills the context window with file reads, search results, and terminal output. When the window is full, the agent cannot continue. Compressing tool outputs with Token Limits MCP extends sessions significantly.
Does Copilot support MCP servers?
Yes, in VS Code 1.99+ with Copilot agent mode. The agent reads workspace MCP servers from .vscode/mcp.json. Token Limits works as a workspace MCP server for Copilot agent tasks.
Is Cline better than Copilot for large codebase tasks?
Cline gives you more direct control over what gets read and when, which can be more efficient on large codebases. Both tools benefit equally from context compression via Token Limits MCP.