Back to Glossary
LLM

Few-Shot Learning

Definition

Few-shot learning is providing an LLM with a small number of examples (typically 2-5) in the prompt to demonstrate the desired task, enabling the model to generalize the pattern to new inputs.

Why It Matters

Sometimes instructions aren’t enough. “Extract the company name from this email” seems clear, but edge cases abound. What about abbreviated names? Multiple companies mentioned? The sender’s company vs. the subject company?

Few-shot examples resolve ambiguity that words can’t. Instead of writing increasingly complex rules, you show the model what you want. Three good examples often beat a page of instructions.

For AI engineers, few-shot learning is a go-to technique for classification, extraction, and formatting tasks. It’s faster than fine-tuning, requires no training data infrastructure, and can be iterated in minutes.

Implementation Basics

Structure

Here are examples of the task:

Input: [example 1 input]
Output: [example 1 output]

Input: [example 2 input]
Output: [example 2 output]

Input: [actual input]
Output:

Choosing Examples

  • Cover diverse cases, don’t just show easy ones
  • Include edge cases you want handled correctly
  • Match the distribution of real inputs
  • Order can matter: put the most representative examples first or last

How Many Examples?

  • 2-3 examples: Often sufficient for simple tasks
  • 5+ examples: Complex tasks or when precision matters
  • Too many: Wastes tokens, may confuse the model
  • Diminishing returns typically after 5-10 examples

Advanced Techniques

  • Dynamic few-shot: Select examples similar to the current input
  • Chain-of-thought examples: Show reasoning steps, not just answers
  • Negative examples: Show what NOT to do (use sparingly)

Trade-offs Few-shot uses context window tokens. Each example costs space that could hold more input or RAG context. Balance example count against other context needs. For high-volume tasks, consider whether fine-tuning would be more cost-effective.

Source

GPT-3 demonstrated that large language models can perform few-shot learning by conditioning on examples provided in context, without any gradient updates.

https://arxiv.org/abs/2005.14165