Back to Glossary
RAG

Self-RAG

Definition

Self-RAG is an advanced RAG pattern where the model decides whether retrieval is needed, retrieves if necessary, grades retrieved documents for relevance, and self-critiques its generated answer.

Why It Matters

Not every query needs retrieval, and not every retrieved document is relevant. Self-RAG adds intelligence to the RAG process: the model decides when to retrieve, evaluates whether retrieved content helps, and checks its own answer for accuracy. This reduces unnecessary retrieval and improves answer quality.

How It Works

Self-RAG introduces reflection tokens:

  1. Retrieve Decision: Should I retrieve for this query?
  2. Relevance Check: Is this retrieved passage useful?
  3. Support Check: Is my answer supported by the evidence?
  4. Utility Check: Is my answer actually useful?

The model generates these checks as part of its output, enabling self-correction.

When to Use

Use Self-RAG for: applications where answer quality is critical, mixed queries (some need retrieval, some don’t), scenarios with noisy or varied document collections, and when you want to reduce hallucinations. The added complexity is worthwhile for high-stakes applications.

Source

Self-RAG enables LLMs to adaptively retrieve passages and self-reflect on retrieved content and generated responses.

https://arxiv.org/abs/2310.11511