What is Context Window?

LLM

Context Window

Definition

The context window is the maximum number of tokens an LLM can process in a single request, including both input (prompt, documents, history) and output, typically ranging from 4K to 1M+ tokens.

Why It Matters

Context windows define what’s possible. A 4K token limit means your LLM can only “see” about 3,000 words at once, not enough for long documents. A 200K window changes everything: entire codebases, book chapters, or hours of conversation history.

But bigger isn’t always better. Longer contexts cost more, run slower, and models can lose track of information (“lost in the middle”). The right context size balances capability against cost and reliability.

For AI engineers, context window management is a daily consideration. How much history to keep? How many RAG documents to include? When to summarize vs. include full content? These decisions shape your application’s behavior and costs.

Implementation Basics

Current Landscape (2026)

GPT-5: 400K+ tokens
Claude 4.5: 200K-1M tokens (with extended thinking)
Gemini 3: 2M+ tokens
Llama 4: 256K tokens
Open source: Widely variable (32K-256K typical)

Token Budget Planning Allocate your context window deliberately:

System prompt: 500-2000 tokens
Retrieved documents: 2000-8000 tokens
Conversation history: Variable
Current input: Variable
Output buffer: Reserve 1000-4000 tokens

Strategies for Limited Context

Summarization: Compress old messages instead of dropping them
Selective retrieval: Only include most relevant documents
Chunking: Process long inputs in segments
History pruning: Keep recent + most relevant past messages

Common Mistakes

Filling context completely (leaves no room for output)
Including irrelevant documents that dilute important context
Not testing how models handle information placement
Ignoring the cost implications of long contexts

Practical Tips Track your token usage. Build monitoring to see how much context you’re actually using. If you’re consistently at 10% of capacity, you’re likely overpaying for context you don’t need. If you’re hitting limits, you need smarter context management.

Source

Long-context models extending beyond 100K tokens show degraded performance on information retrieval tasks in the middle of the context ('Lost in the Middle' phenomenon).

https://arxiv.org/abs/2307.03172

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles