Chunking Strategies for RAG Systems: A Practical Engineering Guide

Chunking strategy is the most underestimated factor in RAG system quality. Through building RAG systems across dozens of implementations, I’ve seen teams spend weeks optimizing their embedding models and retrieval algorithms while using naive chunking that destroys their results. The truth is that how you split documents matters more than which vector database you choose.

Most tutorials show you character-based chunking because it’s easy to implement. They don’t show you why your retrieval quality tanks when a key concept gets split across two chunks, or why your system hallucinates when chunks lose their original context. This guide fixes that.

Why Chunking Matters So Much

Every RAG system follows the same flow: split documents into chunks, embed those chunks, retrieve relevant chunks, generate responses from them. Chunking is the foundation that everything else depends on.

Bad chunking corrupts your embeddings. An embedding represents the semantic meaning of a chunk. If your chunk contains half of two unrelated paragraphs, the embedding captures neither meaning well.

Bad chunking breaks retrieval. When the answer to a question spans two chunks, you might retrieve neither. Or retrieve one without the context needed to understand it.

Bad chunking causes hallucinations. When the LLM receives incomplete context, it fills gaps with plausible-sounding fabrications instead of admitting uncertainty.

I’ve seen retrieval quality improve by 30-40% from chunking changes alone, with no model changes, no algorithm changes, just smarter chunking. That’s why this matters.

The Naive Approach and Its Problems

The most common chunking approach splits text at fixed character or token counts. Tutorials show this because it’s three lines of code:

Split every 500 characters with 50-character overlap. Done.

This approach fails for several reasons:

It ignores semantic boundaries. Sentences get cut mid-thought. Paragraphs split at arbitrary points. The resulting chunks don’t represent coherent ideas.

It loses document structure. Headers, sections, and hierarchies disappear. A chunk might contain the end of one section and the beginning of another, completely unrelated content.

It creates orphaned context. A chunk might say “this approach” or “the following method” without including what those references point to.

It handles code terribly. A function split across chunks is useless. Half a code block doesn’t compile or make sense.

If you’re using character-based chunking in production, you’re leaving significant retrieval quality on the table.

Semantic Chunking Principles

Good chunking respects the inherent structure and meaning of your documents. Here are the principles I follow:

Respect Natural Boundaries

Documents have natural break points: paragraph endings, section breaks, topic transitions. Split at these boundaries, not arbitrary character counts.

Paragraphs usually contain single coherent thoughts. They’re natural chunk boundaries.

Sections group related content. Keep section content together when possible.

Lists and enumerations should stay complete. A partial list confuses more than it helps.

Code blocks must remain intact. Never split mid-function or mid-class.

Preserve Context

Chunks need context to be useful. Include surrounding information that gives meaning:

Headers and section titles tell you what content is about. Include relevant headers with each chunk.

Parent context helps with hierarchical documents. A chunk from “Chapter 3 > Section 3.2 > Installation” should carry that context.

Preceding sentences sometimes necessary for pronouns and references. “This method requires…” makes no sense without knowing which method.

Size for Your Use Case

Optimal chunk size depends on your application:

Small chunks (200-400 tokens) work well for:

Precise fact retrieval
FAQ systems
Dense, information-rich content

Medium chunks (400-800 tokens) balance precision and context for:

General Q&A systems
Documentation search
Most production use cases

Large chunks (800-1500 tokens) suit:

Complex reasoning tasks
Analysis requiring broader context
Narrative content

Test different sizes against your actual queries. The right size emerges from empirical testing, not theory. I discuss testing approaches in my RAG implementation guide.

Chunking Strategies in Practice

Here are specific strategies I use for different content types:

Strategy 1: Recursive Character Splitting with Semantic Awareness

This improves naive character splitting by using a hierarchy of separators:

First try to split on section breaks (double newlines, headers)
If chunks are still too large, split on paragraph breaks (single newlines)
If still too large, split on sentence boundaries (periods, question marks)
Last resort: split on words

This approach respects document structure while ensuring chunks don’t exceed size limits. The hierarchy means you only use more aggressive splitting when necessary.

Strategy 2: Structure-Aware Document Parsing

For structured documents (PDFs, HTML, Markdown), use the document structure directly:

Parse the document tree to identify headers, sections, paragraphs.

Chunk by section when sections are appropriately sized.

Combine small sections when adjacent sections are related and individually too small.

Split large sections using semantic principles (paragraph breaks, topic shifts).

This requires more upfront work but produces dramatically better chunks. The document author already organized content logically, so use that organization.

Strategy 3: Topic-Based Chunking

For unstructured text, identify topic boundaries algorithmically:

Sentence embedding clustering groups semantically similar sentences together. Chunks form around naturally occurring topics.

Topic modeling (LDA, BERTopic) identifies themes and splits when topics shift.

Sliding window comparison detects topic changes by comparing embedding similarity between adjacent windows.

These approaches are computationally expensive but valuable for content like transcripts, emails, or chat logs where structural formatting is missing.

Strategy 4: Entity-Centric Chunking

For content about distinct entities (products, people, locations), chunk around entities:

Identify entities using NER or keyword extraction.

Group content by entity so all information about a single entity stays together.

Create entity summaries that serve as chunk metadata for better retrieval.

This works well for knowledge bases, CRM data, and catalogs where users query about specific entities.

Handling Overlap

Overlap prevents information loss at chunk boundaries. But how much overlap?

The Case for Overlap

Without overlap, information that spans chunk boundaries might never be retrieved:

User asks: “What are the prerequisites for installing the module?”

Chunk 1 ends: “…to install the module.” Chunk 2 starts: “Prerequisites include Python 3.8…”

Neither chunk contains “prerequisites for installing” because the query falls between chunks.

Overlap ensures boundary-spanning content exists in at least one chunk.

How Much Overlap

10-20% overlap handles most boundary issues without excessive redundancy.

Higher overlap (25-30%) makes sense when content is highly interconnected.

Lower overlap (5-10%) works for well-structured documents with clear boundaries.

Smart Overlap Implementation

Don’t overlap blindly by character count. Overlap at semantic boundaries:

Sentence-level overlap ensures complete sentences appear in multiple chunks.

Paragraph-level overlap keeps full paragraphs together across boundaries.

Context header overlap repeats section headers in overlapping chunks for context.

Metadata Enrichment

Raw chunk text isn’t enough. Enrich chunks with metadata that improves retrieval:

Source Information

Every chunk should track its origin:

Document ID and title
Page numbers or section identifiers
Document type and category
Creation and modification dates

This metadata enables filtering and helps with response attribution.

Hierarchical Context

For hierarchical documents, store the path:

Document > Chapter > Section > Subsection

This enables queries like “find information about X in Chapter 3” and helps users understand where answers come from.

Semantic Tags

Add extracted information:

Key entities mentioned
Topics covered
Question types this chunk answers

These tags enable metadata filtering that dramatically improves retrieval precision. My hybrid database solutions guide covers combining vector search with metadata filtering.

Content-Specific Strategies

Different content types need different treatment:

Technical Documentation

Preserve code blocks as complete units. Never split code mid-function.

Include surrounding explanation with code since the code alone often isn’t useful.

Maintain hierarchy from headers. “Method: authenticate()” should include the class it belongs to.

Link prerequisites and dependencies to the features they enable.

Legal and Policy Documents

Keep clauses together. Legal text derives meaning from complete clauses.

Preserve numbered sections with their context. “Section 3.2.1” needs sections 3 and 3.2 for context.

Track definitions. Legal documents define terms that appear throughout, so link definitions to usage.

Conversational Content

Maintain speaker turns together. Splitting mid-conversation destroys context.

Group topic threads. When conversations jump topics, chunk by topic not by time.

Include question-answer pairs. Keep questions with their answers in the same chunk.

Tables and Structured Data

Tables as single chunks when they fit. Tables split by rows lose their meaning.

Convert to prose for complex tables. “The price for Product A is $100” retrieves better than cell coordinates.

Include headers with every row when you must split tables. Each chunk needs column context.

Quality Validation

After chunking, validate that your chunks support good retrieval:

Chunk Quality Metrics

Semantic coherence measures whether chunks represent single coherent ideas. Embedding variance within a chunk indicates mixed content.

Context completeness checks whether chunks are self-contained or reference undefined context.

Size distribution analyzes whether chunks cluster around target size or vary wildly.

Boundary quality evaluates whether splits occur at natural boundaries or mid-sentence.

Test with Real Queries

The ultimate test is retrieval quality on actual queries:

Create evaluation sets of questions with known answers and source chunks.

Measure retrieval accuracy with different chunking strategies.

Analyze failures to understand when and why chunking causes retrieval misses.

This empirical testing beats theoretical optimization every time.

Chunking strategy should evolve as you learn from production:

Monitor retrieval failures and analyze whether better chunking would help.

A/B test chunking changes on a subset of traffic before full rollout.

Collect user feedback on answer quality and trace issues to retrieval.

Profile query types and optimize chunking for your actual query distribution.

Production feedback reveals chunking issues that testing misses. Build feedback loops into your system.

From Theory to Practice

Start with semantic-aware chunking using document structure. Add overlap at sentence boundaries. Enrich with source metadata. Test against real queries.

Most importantly, don’t stop there. Chunking is an ongoing optimization, not a one-time decision. The patterns that work for your content and queries will differ from anyone else’s.

For complete RAG implementation patterns including chunking, explore my production RAG systems guide and vector databases guide.

Ready to implement production-grade chunking? Join the AI Engineering community where engineers share chunking strategies that work for different content types and help each other debug retrieval issues.

Zen van Riel

Senior AI Engineer at GitHub | Ex-Microsoft

I grew from intern to Senior Engineer at GitHub, previously working at Microsoft. Now I teach 22,000+ engineers on YouTube, reaching hundreds of thousands of developers with practical AI engineering tutorials. My blog posts are generated from my own video content, focusing on real-world implementation over theory.

Blog last updated Jan 26, 2026