GitHub Copilot Context Window Limit: What You Can Do

April 9, 2026—Token Limits Team—5 min read

GitHub Copilot uses GPT-4o under the hood for most agent tasks, giving you a 128k token context window. Copilot manages what goes into that window — you do not control it directly. But in Copilot agent mode with MCP enabled (VS Code 1.99+), Token Limits can compress every tool call, effectively giving you 3-4x longer agent sessions.

GitHub Copilot has two modes that handle context differently. Inline completions use a small local context window (usually 8-16k tokens) around your cursor. Agent mode (VS Code Copilot Chat in agentic tasks) uses the full model context window — 128k for GPT-4o or up to 1M if Claude is selected. The limit you are likely hitting is in agent mode during long multi-file tasks.

Copilot context window by model

Model in Copilot	Context window	Best for
GPT-4o (default)	128,000 tokens	General tasks
Claude Sonnet 4.6	1,048,576 tokens	Large codebases, long sessions
Claude Haiku 4.5	200,000 tokens	Fast, lightweight tasks
o3 / o4-mini	200,000 tokens	Reasoning-heavy tasks

If you are hitting Copilot context limits frequently, switching to Claude Sonnet 4.6 inside Copilot agent mode increases your window from 128k to 1M — an 8x improvement with no other changes.

How Copilot fills the context window

In agent mode, Copilot autonomously reads files, runs searches, and executes commands to complete tasks. Each of these operations returns verbose output that is added to the context. Over a long task, the accumulation of file reads, error messages, and search results fills the window faster than the actual conversation history.

✓File reads: Copilot reads entire files rather than targeted sections
✓Search results: Full file contents returned for every match
✓Terminal output: Build logs, test output, package install noise
✓Conversation history: Every previous exchange stays in context
✓System prompts: Copilot adds its own system instructions that consume context

How to compress Copilot agent tool calls with Token Limits MCP

VS Code 1.99 added native MCP server support. In Copilot agent mode, the agent can use workspace MCP servers registered in .vscode/mcp.json. Adding Token Limits as an MCP server means Copilot agent tasks use compressed file reads and search results instead of raw output. Install with token-limits setup-vscode and restart VS Code.

Copilot vs alternatives when context is the bottleneck

Tool	Default context	With Token Limits	Setup
GitHub Copilot (GPT-4o)	128k	~500k effective	VS Code MCP
GitHub Copilot (Claude Sonnet)	1M	3-5M effective	VS Code MCP
Cursor	200k	800k effective	Auto via setup-cursor
Cline (Claude Sonnet)	1M	3-5M effective	Auto via setup-vscode
Windsurf	200k	800k effective	Auto via setup-windsurf

Tips to get more from Copilot's context window without MCP

✓Use #file references to include only the specific files needed rather than letting Copilot read everything
✓Start new Copilot Chat sessions per task — do not carry long conversation history
✓Switch the model to Claude Sonnet 4.6 for large-codebase tasks
✓Use @workspace sparingly — it loads a large index into context
✓Compress pastes manually at tokenlimits.app/compress before including error logs or build output

Extend Copilot agent mode context by 3-4x

Token Limits MCP compresses every Copilot agent tool call in VS Code 1.99+. Free trial, no credit card. Pairs with any model Copilot supports.

Get Token Limits View Setup Guide

FAQ

What is the GitHub Copilot context window size?

It depends on the model. GPT-4o gives 128k tokens. Claude Sonnet 4.6 gives 1 million tokens. You can switch models inside Copilot Chat. For long agentic tasks, Claude Sonnet 4.6 is the better choice.

Why does Copilot stop mid-task?

Copilot agent mode fills the context window with file reads, search results, and terminal output. When the window is full, the agent cannot continue. Compressing tool outputs with Token Limits MCP extends sessions significantly.

Does Copilot support MCP servers?

Yes, in VS Code 1.99+ with Copilot agent mode. The agent reads workspace MCP servers from .vscode/mcp.json. Token Limits works as a workspace MCP server for Copilot agent tasks.

Is Cline better than Copilot for large codebase tasks?

Cline gives you more direct control over what gets read and when, which can be more efficient on large codebases. Both tools benefit equally from context compression via Token Limits MCP.