Implementation

Prefill

Definition

The initial phase of LLM inference where the model processes all input tokens to build the key-value cache, determining time-to-first-token latency.

Prefill is the first phase of LLM inference where the model processes all input tokens in parallel to populate the key-value cache, directly affecting time-to-first-token latency.

Why It Matters

LLM inference has two distinct phases with different characteristics:

Prefill phase:

Processes entire input prompt
Compute-bound (parallelizable)
Duration scales with input length
Determines time-to-first-token

Decode phase:

Generates output tokens one at a time
Memory-bound (sequential)
Uses cached KV values
Determines tokens-per-second

Understanding prefill matters because:

Long prompts increase initial latency significantly
RAG systems with large contexts are prefill-heavy
Optimization strategies differ between phases

Implementation Basics

Prefill optimization strategies:

Prompt caching: Reuse prefill results for common prefixes (system prompts, few-shot examples)
Chunked prefill: Split long prompts across iterations to interleave with decode
Speculative decoding: Reduce decode latency, but prefill remains the same
Flash Attention: Reduces memory usage during prefill but same compute

Measuring prefill performance:

TTFT (Time to First Token): Primary metric, includes prefill
Prefill throughput: Tokens processed per second during prefill
KV cache memory: Memory needed scales with sequence length

Practical implications:

System prompts add to every request’s prefill time
Long context windows (100K+ tokens) can have multi-second prefill
Batch similar prompt lengths to avoid prefill variance

When optimizing LLM performance, distinguish between prefill-bound and decode-bound workloads. They require different optimization approaches.

Source

The prefill phase processes the prompt to generate KV cache entries

https://arxiv.org/abs/2211.05102

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles