Chunking
Definition
Chunking is the process of splitting documents into smaller pieces for embedding and retrieval, balancing between preserving context and fitting within token limits while maintaining semantic coherence.
Why It Matters
Raw documents are rarely the right unit for retrieval. A 50-page PDF embedded as one vector loses nuance, because the entire document gets one representation that averages all concepts. A query about pricing finds the document but not the specific section.
Chunking creates granular units for precise retrieval. Each chunk becomes a searchable item. When users ask questions, they get the specific paragraph that answers them rather than an entire document to sift through.
But chunk size is a critical trade-off. Too small (single sentences) loses context, as you retrieve fragments that don’t make sense alone. Too large (full pages) wastes context window tokens on irrelevant content and dilutes the embedding’s semantic focus.
Implementation Basics
Common Strategies
- Fixed-size: Split every N characters/tokens with optional overlap
- Sentence-based: Split on sentence boundaries using NLP
- Paragraph-based: Preserve natural document structure
- Semantic: Use embedding similarity to find natural break points
- Document-specific: Different rules for code, markdown, HTML
Typical Parameters
- Chunk size: 200-1000 tokens is common; 500 is a reasonable default
- Overlap: 10-20% overlap helps preserve context across boundaries
- Separators: Prioritize paragraph breaks, then sentences, then characters
Practical Guidelines Start with simple fixed-size chunking (500 tokens, 50 token overlap). Measure retrieval quality before optimizing. Add semantic chunking only if you see clear failure modes.
Consider your content type: code needs different handling than prose. Headers and metadata should often stay attached to their content. Tables may need special treatment.
Common Mistakes
- Chunking too small in an attempt to be “precise”
- Ignoring document structure (splitting mid-table, mid-code-block)
- Using the same strategy for all content types
- Not including metadata that helps users understand the retrieved chunk
Source
LlamaIndex's node parsers implement various chunking strategies including sentence, paragraph, and semantic-based splitting.
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/