Implementation

Chunking

Definition

Chunking is the process of splitting documents into smaller pieces for embedding and retrieval, balancing between preserving context and fitting within token limits while maintaining semantic coherence.

Why It Matters

Raw documents are rarely the right unit for retrieval. A 50-page PDF embedded as one vector loses nuance, because the entire document gets one representation that averages all concepts. A query about pricing finds the document but not the specific section.

Chunking creates granular units for precise retrieval. Each chunk becomes a searchable item. When users ask questions, they get the specific paragraph that answers them rather than an entire document to sift through.

But chunk size is a critical trade-off. Too small (single sentences) loses context, as you retrieve fragments that don’t make sense alone. Too large (full pages) wastes context window tokens on irrelevant content and dilutes the embedding’s semantic focus.

Implementation Basics

Common Strategies

Fixed-size: Split every N characters/tokens with optional overlap
Sentence-based: Split on sentence boundaries using NLP
Paragraph-based: Preserve natural document structure
Semantic: Use embedding similarity to find natural break points
Document-specific: Different rules for code, markdown, HTML

Typical Parameters

Chunk size: 200-1000 tokens is common; 500 is a reasonable default
Overlap: 10-20% overlap helps preserve context across boundaries
Separators: Prioritize paragraph breaks, then sentences, then characters

Practical Guidelines Start with simple fixed-size chunking (500 tokens, 50 token overlap). Measure retrieval quality before optimizing. Add semantic chunking only if you see clear failure modes.

Consider your content type: code needs different handling than prose. Headers and metadata should often stay attached to their content. Tables may need special treatment.

Common Mistakes

Chunking too small in an attempt to be “precise”
Ignoring document structure (splitting mid-table, mid-code-block)
Using the same strategy for all content types
Not including metadata that helps users understand the retrieved chunk

Source

LlamaIndex's node parsers implement various chunking strategies including sentence, paragraph, and semantic-based splitting.

https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles