Claude Sonnet 4's 1M Token Context: What It Means, How to Use It, Tiered Pricing
Claude Sonnet 4 was expanded from 100k to 1M token context window in early 2026. It can also output up to 64k tokens in a single response. This opens new possibilities: load entire projects, long conversations, comprehensive responses. But what does 1M tokens actually mean? How many files can you load? How long can a conversation run? And what is the tiered pricing above 200k tokens? This guide answers all of it.
What 1M tokens means in practical terms
| Measure | Tokens | Rough Equivalent |
|---|---|---|
| 1 page of text | 400 tokens | Single function, small doc |
| 100 lines of code | 300 tokens | Average method/utility |
| 1 file (average) | 1,200 tokens | One source file |
| 100 files | 120,000 tokens | Small-medium project |
| 1,000 files | 1,200,000 tokens | Large codebase (exceeds 1M) |
| 8-hour conversation | 200,000-400,000 tokens | Long back-and-forth session |
Real-world project sizes and context usage
- ✓Small project (10-20 files): ~50,000 tokens. Can load entire codebase + 8-hour conversation.
- ✓Medium project (50-200 files): ~200,000 tokens. Can load full codebase with full conversation history.
- ✓Large project (500+ files): ~500,000-1M tokens. Load full codebase but less conversation history.
Extended output: Up to 64k tokens per response
Sonnet 4 can output up to 64k tokens in a single response. This is new as of 2026. Previously, output was capped at 4k-8k. What does 64k tokens of output mean?
- ✓64k tokens ≈ 48,000 words ≈ 100-200 pages ≈ 2,000-4,000 lines of code
- ✓One response can contain an entire module rewrite, comprehensive documentation, or full architecture redesign
- ✓Useful for: Large code generation, detailed step-by-step guides, full project refactors
Tiered pricing: What changes above 200k tokens?
Sonnet 4 pricing has tiers. Input tokens below 200k cost one rate. Above 200k, the rate increases.
| Token Range | Input Price (per 1M) | Output Price (per 1M) |
|---|---|---|
| 0-200k per request | $3 | $15 |
| Above 200k per request | $9 | $45 |
This means: if you send a 150k token request, you pay the lower rate. If you send a 250k token request, the first 200k are cheap, the next 50k are 3x expensive.
When you hit the 200k threshold (and it costs more)
- ✓Loading a huge codebase (500+ files) + full conversation = likely above 200k
- ✓Asking Claude to analyze an entire large project in one go
- ✓Pasting multiple large files at once for comparison or refactoring
- ✓Most typical sessions stay under 200k and use the lower rate
Cost example: Small vs large request
Scenario: You load a codebase and ask Claude to analyze it.
- ✓Small request (80k tokens): $0.24 input cost
- ✓Medium request (150k tokens): $0.45 input cost
- ✓Large request (250k tokens): $0.60 input cost ($0.60 = 200k @ $3 + 50k @ $9)
Managing context efficiently to stay under 200k
- Load files strategically: Open only files you need right now
- Separate concerns: Analyze one module at a time instead of the whole codebase
- Clear history: After a task finishes, /clear to reset context
- Use Token Limits: Compress tool outputs so files use fewer tokens
Token Limits makes large contexts cheaper
If you load a 500-file codebase (normally 600k tokens, above the tier threshold), Token Limits compression can drop it to 200k tokens. Stays in the cheap tier, saves money.
- npm install -g token-limits
- token-limits start
- Claude Code: Tools → API URL → http://localhost:4800
- File reads and tool outputs now 60-80% smaller
- Same amount of context for lower cost
Use Sonnet 4's 1M tokens efficiently with Token Limits
Compression keeps you in the cheap 0-200k token tier. Load bigger codebases, longer conversations, more files—without hitting expensive higher tiers.
FAQ
How do I know if I am above the 200k tier threshold?
You can estimate based on file count and conversation length, but Anthropic does not provide a real-time counter. Token Limits compression ensures you stay under the threshold most of the time.
Is output pricing also tiered?
Yes. Output pricing is tiered the same way: $15 per 1M up to some threshold, then $45. The exact thresholds may vary.
Can I use all 1M tokens in one conversation?
Technically yes, but practically: one massive request costs much more than multiple smaller requests. Better to split work into multiple smaller conversations.
Does the 64k output token limit apply to every request?
Yes. Sonnet 4 can output up to 64k tokens per response. But you pay for all output tokens, so huge responses are expensive.
Should I load all my code at once or load files as needed?
Load as needed. You pay per token anyway. Smaller requests stay under the 200k tier and avoid the 3x price increase.