Back to Glossary
Architecture

Foundation Model

Definition

A foundation model is a large AI model trained on broad data that can be adapted to many downstream tasks through fine-tuning or prompting, serving as the base for specialized applications.

Why It Matters

Foundation models changed AI development economics. Instead of training task-specific models from scratch (expensive, slow, requires lots of data), you start with a pre-trained foundation and adapt it. Fine-tuning GPT for your use case costs orders of magnitude less than training GPT from nothing.

This “pre-train then adapt” paradigm is why AI engineering exploded. You don’t need massive compute budgets or ML research expertise. You need skills in prompting, fine-tuning, and system design around these capable base models.

For AI engineers, foundation models are your building blocks. Choosing the right one (considering cost, capability, license, and deployment constraints) is a key architectural decision.

Implementation Basics

Types of Foundation Models

  • Language: GPT-5, Claude 4.5, Llama 4 (text in, text out)
  • Vision-Language: GPT-5, Claude 4.5, LLaVA (image + text)
  • Code: Codex, CodeLlama, StarCoder (optimized for programming)
  • Embedding: text-embedding-3, BGE, E5 (text to vectors)
  • Image Generation: DALL-E, Stable Diffusion, Midjourney
  • Speech: Whisper (speech-to-text), Eleven Labs (text-to-speech)

Adaptation Strategies

  1. Prompting: Use as-is with well-crafted prompts (easiest)
  2. Few-shot: Include examples in prompts
  3. Fine-tuning: Train on task-specific data (more effort, better results)
  4. RAG: Augment with external knowledge

Selection Criteria

  • Capability: Can it do what you need?
  • Cost: Per-token pricing and expected volume
  • Latency: Response time requirements
  • Privacy: Can data leave your infrastructure?
  • License: Commercial use restrictions
  • Context window: How much input can it handle?

Practical Approach Start with a capable general model (Claude 4.5, GPT-5). Validate it can do the task via prompting. If prompting isn’t sufficient, try fine-tuning. Consider smaller/cheaper models only after validating the approach works, since premature optimization wastes time.

Source

Stanford researchers coined 'foundation model' to describe models trained on broad data at scale that can be adapted to a wide range of downstream tasks.

https://arxiv.org/abs/2108.07258