Back to Glossary
LLM

Small Language Models (SLMs)

Definition

Small language models are LLMs with fewer parameters (typically under 10B) optimized for efficiency, enabling deployment on edge devices, lower costs, and faster inference.

Why It Matters

Bigger isn’t always better. Small language models offer: lower inference costs, faster responses, ability to run locally or on-device, reduced memory requirements, and often “good enough” quality for many tasks. The SLM trend challenges the assumption that capability requires scale.

Notable SLMs

  • Phi-4 (14B): Microsoft’s efficient reasoning model
  • Llama 3 8B: Meta’s smallest Llama 3
  • Mistral 7B: Punches above its weight
  • Qwen 2.5 (0.5B-7B): Various small sizes
  • Gemma 2 (2B, 9B): Google’s open small models

When to Use SLMs

Good Fit:

  • Edge/mobile deployment
  • High-volume, cost-sensitive applications
  • Low-latency requirements
  • Privacy-sensitive (local processing)
  • Specific, well-defined tasks

Consider Larger:

  • Complex reasoning tasks
  • Multi-step planning
  • Creative or nuanced generation
  • Tasks where quality is paramount

Optimization Strategies

Combine SLMs with: task-specific fine-tuning, RAG for knowledge, careful prompt engineering, and hybrid approaches (SLM for most queries, large model for hard ones).