What is Few-Shot Learning?

LLM

Few-Shot Learning

Definition

Few-shot learning is providing an LLM with a small number of examples (typically 2-5) in the prompt to demonstrate the desired task, enabling the model to generalize the pattern to new inputs.

Why It Matters

Sometimes instructions aren’t enough. “Extract the company name from this email” seems clear, but edge cases abound. What about abbreviated names? Multiple companies mentioned? The sender’s company vs. the subject company?

Few-shot examples resolve ambiguity that words can’t. Instead of writing increasingly complex rules, you show the model what you want. Three good examples often beat a page of instructions.

For AI engineers, few-shot learning is a go-to technique for classification, extraction, and formatting tasks. It’s faster than fine-tuning, requires no training data infrastructure, and can be iterated in minutes.

Implementation Basics

Structure

Here are examples of the task:

Input: [example 1 input]
Output: [example 1 output]

Input: [example 2 input]
Output: [example 2 output]

Input: [actual input]
Output:

Choosing Examples

Cover diverse cases, don’t just show easy ones
Include edge cases you want handled correctly
Match the distribution of real inputs
Order can matter: put the most representative examples first or last

How Many Examples?

2-3 examples: Often sufficient for simple tasks
5+ examples: Complex tasks or when precision matters
Too many: Wastes tokens, may confuse the model
Diminishing returns typically after 5-10 examples

Advanced Techniques

Dynamic few-shot: Select examples similar to the current input
Chain-of-thought examples: Show reasoning steps, not just answers
Negative examples: Show what NOT to do (use sparingly)

Trade-offs Few-shot uses context window tokens. Each example costs space that could hold more input or RAG context. Balance example count against other context needs. For high-volume tasks, consider whether fine-tuning would be more cost-effective.

Source

GPT-3 demonstrated that large language models can perform few-shot learning by conditioning on examples provided in context, without any gradient updates.

https://arxiv.org/abs/2005.14165

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles