Back to Glossary
LLM

Long Context

Definition

LLM capabilities for processing 100K+ token inputs, enabling document analysis, codebase understanding, and extended conversations without chunking or summarization.

Long context refers to LLM capabilities for processing very large inputs (typically 100K to 1M+ tokens) enabling analysis of entire documents, codebases, or extended conversations without chunking.

Why It Matters

Long context fundamentally changes what’s possible with LLMs:

  • No chunking needed: Process entire documents in one call
  • Full codebase understanding: Analyze repositories without splitting
  • Book-length analysis: Summarize, answer questions about, or compare long texts
  • Extended conversations: Maintain context across long interactions

This enables use cases previously requiring complex RAG pipelines:

  • Legal document review with full context
  • Technical specification analysis
  • Multi-file code review and refactoring
  • Long-form content editing and consistency checking

However, long context isn’t a silver bullet since it comes with significant trade-offs.

Implementation Basics

Practical considerations:

Cost and latency:

  • Token-based pricing applies to full context
  • Prefill time scales with input length
  • 100K token requests may take 10-30+ seconds

Attention quality:

  • “Lost in the middle” effect: Models struggle with mid-context information
  • Front and end positions get better attention
  • Critical information should be positioned strategically

When to use long context vs RAG:

Long ContextRAG
One-time analysisRepeated queries
Need full context for reasoningSpecific information retrieval
Coherence across documentMany documents to search
Codebase understandingKnowledge base search

Best practices:

  • Structure prompts to highlight key sections
  • Use clear section markers and headers
  • Test information retrieval with needle-in-haystack tests
  • Consider hybrid approaches: RAG to select context, long context to reason

Long context is powerful but expensive, so use it strategically for tasks that genuinely require comprehensive understanding.

Source

Long context language models can process sequences of hundreds of thousands of tokens

https://arxiv.org/abs/2404.02060