VS Code MCP Server: Stop Running Out of Context Window

April 12, 20264 min read

VS Code supports MCP servers through .vscode/mcp.json, and Token Limits uses this to compress every tool call before it reaches your AI assistant. One setup command writes the config file. After that, Cline, GitHub Copilot agent mode, and any other MCP-aware extension in your workspace automatically gets compressed tool outputs.

VS Code version 1.99 added native MCP server support. When you add an MCP server to .vscode/mcp.json, any AI extension in that workspace that supports MCP can use its tools. Token Limits registers as an MCP server that intercepts file reads, searches, and command outputs — compressing them by 60-80% before the model sees them.

Which VS Code AI extensions benefit from Token Limits MCP?

  • Cline — reads .vscode/mcp.json and uses registered MCP servers for all tool calls
  • GitHub Copilot agent mode — uses workspace MCP servers in agent tasks (VS Code 1.99+)
  • Continue.dev — supports MCP servers for context tools
  • Any extension that reads VS Code workspace MCP configuration

How to install Token Limits MCP for VS Code

  1. Install the binary: curl -fsSL https://tokenlimits.app/api/install | bash
  2. Run setup: token-limits setup-vscode (writes .vscode/mcp.json to your current workspace)
  3. Enter your license key when prompted (free trial available, no credit card)
  4. Restart VS Code to pick up the new MCP server
  5. Open Cline or Copilot agent — tool calls are now compressed automatically
The .vscode/mcp.json file is workspace-scoped. Add it to your .gitignore if you do not want to commit it, or commit it to share Token Limits config with your team.

What the .vscode/mcp.json config looks like

After running token-limits setup-vscode, your .vscode/mcp.json will contain the Token Limits server configuration with your license key pre-filled. The server runs locally on your machine — your code never leaves your environment. All compression happens on-device before tool outputs are returned to the model.

What Token Limits compresses in VS Code tool calls

Tool call typeTypical savings
File reads (local_read)50-70%
Search results (local_search)70-80%
Directory listings (local_ls)60-75%
Command output (local_exec)75-90%
Diffs (local_diff)50-65%

Node.js version requirement

Token Limits MCP requires Node.js 22 or later. Run node --version to check. If you are on an older version, update via nvm or the official Node.js installer. The binary installer checks for this and will warn you if the version is too low.

Install Token Limits MCP for VS Code in 2 minutes

Compress every Cline, Copilot, and Continue tool call by 60-80%. Free trial, no credit card. One command installs and configures everything.

FAQ

Does Token Limits MCP work with GitHub Copilot in VS Code?

Yes, in VS Code 1.99+ with Copilot agent mode enabled. Copilot agent mode reads workspace MCP servers from .vscode/mcp.json. Token Limits compresses tool outputs used during agent tasks.

Does the MCP server need to run continuously?

Yes. The Token Limits server process needs to be running for VS Code to use it. It starts automatically when you open VS Code if configured correctly. You can check its status with token-limits status.

Is .vscode/mcp.json workspace-specific or global?

Workspace-specific. Each project has its own .vscode/mcp.json. Run token-limits setup-vscode in each workspace you want compression enabled in, or copy the config file between projects.

Does Token Limits MCP slow down VS Code tool calls?

No noticeable difference. Compression runs locally in milliseconds. The latency saving from smaller payloads sent to the model typically exceeds any compression overhead.

What if my VS Code version is older than 1.99?

Cline has its own MCP support independent of VS Code native MCP, so Cline users on older VS Code versions can still use Token Limits. GitHub Copilot agent mode MCP requires VS Code 1.99+.