Claude Sonnet 4's 1M Token Context: What It Means, How to Use It, Tiered Pricing

2026-05-06—6 min read

Claude Sonnet 4 was expanded from 100k to 1M token context window in early 2026. It can also output up to 64k tokens in a single response. This opens new possibilities: load entire projects, long conversations, comprehensive responses. But what does 1M tokens actually mean? How many files can you load? How long can a conversation run? And what is the tiered pricing above 200k tokens? This guide answers all of it.

What 1M tokens means in practical terms

Measure	Tokens	Rough Equivalent
1 page of text	400 tokens	Single function, small doc
100 lines of code	300 tokens	Average method/utility
1 file (average)	1,200 tokens	One source file
100 files	120,000 tokens	Small-medium project
1,000 files	1,200,000 tokens	Large codebase (exceeds 1M)
8-hour conversation	200,000-400,000 tokens	Long back-and-forth session

Real-world project sizes and context usage

✓Small project (10-20 files): ~50,000 tokens. Can load entire codebase + 8-hour conversation.
✓Medium project (50-200 files): ~200,000 tokens. Can load full codebase with full conversation history.
✓Large project (500+ files): ~500,000-1M tokens. Load full codebase but less conversation history.

Extended output: Up to 64k tokens per response

Sonnet 4 can output up to 64k tokens in a single response. This is new as of 2026. Previously, output was capped at 4k-8k. What does 64k tokens of output mean?

✓64k tokens ≈ 48,000 words ≈ 100-200 pages ≈ 2,000-4,000 lines of code
✓One response can contain an entire module rewrite, comprehensive documentation, or full architecture redesign
✓Useful for: Large code generation, detailed step-by-step guides, full project refactors

Tiered pricing: What changes above 200k tokens?

Sonnet 4 pricing has tiers. Input tokens below 200k cost one rate. Above 200k, the rate increases.

Token Range	Input Price (per 1M)	Output Price (per 1M)
0-200k per request	$3	$15
Above 200k per request	$9	$45

This means: if you send a 150k token request, you pay the lower rate. If you send a 250k token request, the first 200k are cheap, the next 50k are 3x expensive.

When you hit the 200k threshold (and it costs more)

✓Loading a huge codebase (500+ files) + full conversation = likely above 200k
✓Asking Claude to analyze an entire large project in one go
✓Pasting multiple large files at once for comparison or refactoring
✓Most typical sessions stay under 200k and use the lower rate

Cost example: Small vs large request

Scenario: You load a codebase and ask Claude to analyze it.

✓Small request (80k tokens): $0.24 input cost
✓Medium request (150k tokens): $0.45 input cost
✓Large request (250k tokens): $0.60 input cost ($0.60 = 200k @ $3 + 50k @ $9)

Managing context efficiently to stay under 200k

Load files strategically: Open only files you need right now
Separate concerns: Analyze one module at a time instead of the whole codebase
Clear history: After a task finishes, /clear to reset context
Use Token Limits: Compress tool outputs so files use fewer tokens

Token Limits makes large contexts cheaper

If you load a 500-file codebase (normally 600k tokens, above the tier threshold), Token Limits compression can drop it to 200k tokens. Stays in the cheap tier, saves money.

npm install -g token-limits
token-limits start
Claude Code: Tools → API URL → http://localhost:4800
File reads and tool outputs now 60-80% smaller
Same amount of context for lower cost

Use Sonnet 4's 1M tokens efficiently with Token Limits

Compression keeps you in the cheap 0-200k token tier. Load bigger codebases, longer conversations, more files—without hitting expensive higher tiers.

Get Token Limits View Setup Guide

FAQ

How do I know if I am above the 200k tier threshold?

You can estimate based on file count and conversation length, but Anthropic does not provide a real-time counter. Token Limits compression ensures you stay under the threshold most of the time.

Is output pricing also tiered?

Yes. Output pricing is tiered the same way: $15 per 1M up to some threshold, then $45. The exact thresholds may vary.

Can I use all 1M tokens in one conversation?

Technically yes, but practically: one massive request costs much more than multiple smaller requests. Better to split work into multiple smaller conversations.

Does the 64k output token limit apply to every request?

Yes. Sonnet 4 can output up to 64k tokens per response. But you pay for all output tokens, so huge responses are expensive.

Should I load all my code at once or load files as needed?

Load as needed. You pay per token anyway. Smaller requests stay under the 200k tier and avoid the 3x price increase.