Back to Glossary
LLM

In-Context Learning

Definition

In-context learning is the ability of large language models to learn new tasks from examples provided in the prompt without any weight updates, enabling task adaptation through demonstration rather than fine-tuning.

Why It Matters

In-context learning is why prompting works at all. Before GPT-3, adapting a model to a new task meant fine-tuning, which involved collecting data, running training, and deploying a new model. In-context learning lets you adapt the same model to countless tasks just by changing the prompt.

This capability emerged unexpectedly with scale. Small models don’t do it well; large models do it remarkably well. The mechanism isn’t fully understood, but the practical implications are clear: you can solve many problems by showing the model what you want rather than training it.

For AI engineers, in-context learning is the foundation of prompt engineering. Every time you provide examples in a prompt, you’re leveraging in-context learning. Understanding its strengths and limitations helps you design effective prompts and know when fine-tuning becomes necessary.

Implementation Basics

In-context learning happens entirely in the forward pass with no weight changes:

1. Few-Shot Prompting The most direct use: include examples in your prompt that demonstrate the desired input-output mapping. The model learns the pattern from examples and applies it to new inputs.

Classify sentiment:
"I love this!" -> Positive
"This is terrible" -> Negative
"Great product" ->

2. Instruction Following Describe the task in natural language. Models trained with instruction-tuning (ChatGPT, Claude) excel at this. Often works better than few-shot for well-defined tasks.

3. Format Learning Models learn output formats from examples. Show JSON output examples and the model produces JSON. Show markdown tables and it produces tables. Format consistency comes from in-context patterns.

4. Limitations In-context learning has bounds. Very complex tasks may not fit in context. Novel domains without analogous training data learn poorly. Consistency across many requests is harder than with fine-tuning.

Practical patterns:

Example Selection - Which examples you choose matters more than quantity. Diverse examples that cover edge cases outperform many similar ones. 3-5 well-chosen examples often saturate performance.

Example Order - Later examples influence more than earlier ones due to recency effects in attention. Put your most representative examples near the end.

Context Budget - Examples consume tokens. Balance example quality against leaving room for the actual task. Long examples reduce maximum output length.

Validation - Test on held-out examples. In-context learning can be brittle, and small prompt changes sometimes cause large output changes. Validate that your prompts generalize.

When in-context learning isn’t enough (inconsistent outputs, poor edge case handling, or specific style requirements) consider whether fine-tuning or RAG would better serve your needs. In-context learning is powerful but not unlimited.

Source

GPT-3 demonstrates that large language models can perform tasks given only a natural language prompt and a few examples, without any gradient updates, an ability that emerges with scale.

https://arxiv.org/abs/2005.14165