Back to Glossary
Implementation

Prefill

Definition

The initial phase of LLM inference where the model processes all input tokens to build the key-value cache, determining time-to-first-token latency.

Prefill is the first phase of LLM inference where the model processes all input tokens in parallel to populate the key-value cache, directly affecting time-to-first-token latency.

Why It Matters

LLM inference has two distinct phases with different characteristics:

Prefill phase:

  • Processes entire input prompt
  • Compute-bound (parallelizable)
  • Duration scales with input length
  • Determines time-to-first-token

Decode phase:

  • Generates output tokens one at a time
  • Memory-bound (sequential)
  • Uses cached KV values
  • Determines tokens-per-second

Understanding prefill matters because:

  • Long prompts increase initial latency significantly
  • RAG systems with large contexts are prefill-heavy
  • Optimization strategies differ between phases

Implementation Basics

Prefill optimization strategies:

  1. Prompt caching: Reuse prefill results for common prefixes (system prompts, few-shot examples)
  2. Chunked prefill: Split long prompts across iterations to interleave with decode
  3. Speculative decoding: Reduce decode latency, but prefill remains the same
  4. Flash Attention: Reduces memory usage during prefill but same compute

Measuring prefill performance:

  • TTFT (Time to First Token): Primary metric, includes prefill
  • Prefill throughput: Tokens processed per second during prefill
  • KV cache memory: Memory needed scales with sequence length

Practical implications:

  • System prompts add to every requestโ€™s prefill time
  • Long context windows (100K+ tokens) can have multi-second prefill
  • Batch similar prompt lengths to avoid prefill variance

When optimizing LLM performance, distinguish between prefill-bound and decode-bound workloads. They require different optimization approaches.

Source

The prefill phase processes the prompt to generate KV cache entries

https://arxiv.org/abs/2211.05102