Long Context
Definition
LLM capabilities for processing 100K+ token inputs, enabling document analysis, codebase understanding, and extended conversations without chunking or summarization.
Long context refers to LLM capabilities for processing very large inputs (typically 100K to 1M+ tokens) enabling analysis of entire documents, codebases, or extended conversations without chunking.
Why It Matters
Long context fundamentally changes what’s possible with LLMs:
- No chunking needed: Process entire documents in one call
- Full codebase understanding: Analyze repositories without splitting
- Book-length analysis: Summarize, answer questions about, or compare long texts
- Extended conversations: Maintain context across long interactions
This enables use cases previously requiring complex RAG pipelines:
- Legal document review with full context
- Technical specification analysis
- Multi-file code review and refactoring
- Long-form content editing and consistency checking
However, long context isn’t a silver bullet since it comes with significant trade-offs.
Implementation Basics
Practical considerations:
Cost and latency:
- Token-based pricing applies to full context
- Prefill time scales with input length
- 100K token requests may take 10-30+ seconds
Attention quality:
- “Lost in the middle” effect: Models struggle with mid-context information
- Front and end positions get better attention
- Critical information should be positioned strategically
When to use long context vs RAG:
| Long Context | RAG |
|---|---|
| One-time analysis | Repeated queries |
| Need full context for reasoning | Specific information retrieval |
| Coherence across document | Many documents to search |
| Codebase understanding | Knowledge base search |
Best practices:
- Structure prompts to highlight key sections
- Use clear section markers and headers
- Test information retrieval with needle-in-haystack tests
- Consider hybrid approaches: RAG to select context, long context to reason
Long context is powerful but expensive, so use it strategically for tasks that genuinely require comprehensive understanding.
Source
Long context language models can process sequences of hundreds of thousands of tokens
https://arxiv.org/abs/2404.02060