Back to Glossary
LLM

Context Length

Definition

The maximum number of tokens an LLM can process in a single request, including both input prompt and generated output, determining how much information can be considered.

Context length (or context window) is the maximum number of tokens an LLM can process in a single request, encompassing both the input prompt and the generated output.

Why It Matters

Context length determines what you can accomplish in a single LLM call:

  • Document processing: How much text can be analyzed at once
  • Conversation history: How much chat context can be retained
  • RAG effectiveness: How many retrieved chunks can be included
  • Code understanding: How much codebase context fits in one prompt

Modern context lengths have expanded dramatically:

  • GPT-3 (2020): 4K tokens
  • GPT-4 (2023): 128K tokens
  • GPT-5 (2026): 400K+ tokens
  • Claude 4.5 (2026): 200K-1M tokens
  • Gemini 3 (2026): 2M tokens

Longer contexts enable new use cases but come with trade-offs in cost, latency, and attention quality.

Implementation Basics

Working with context length:

  1. Token counting: Use tiktoken (OpenAI) or model-specific tokenizers
  2. Budget allocation: Reserve space for output (typically 4K-8K tokens)
  3. Truncation strategies: Remove oldest messages, summarize, or chunk

Context management patterns:

# Estimate token usage
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4")
token_count = len(enc.encode(text))

Short context (4K-8K): Basic Q&A, simple tasks Medium context (32K-64K): Document analysis, multi-turn conversations Long context (128K+): Large document processing, codebase understanding

Considerations:

  • Cost: API pricing often includes input + output tokens
  • Latency: Longer prompts increase prefill time
  • Attention degradation: Models may lose focus with very long contexts
  • Lost in the middle: Information in the middle of long contexts is often missed

For most applications, design for efficient context use rather than relying on maximum length. Focused, relevant context beats raw volume.

Source

GPT-5 has a 400K+ token context window

https://platform.openai.com/docs/models