What is Foundation Model?

Architecture

Foundation Model

Definition

A foundation model is a large AI model trained on broad data that can be adapted to many downstream tasks through fine-tuning or prompting, serving as the base for specialized applications.

Why It Matters

Foundation models changed AI development economics. Instead of training task-specific models from scratch (expensive, slow, requires lots of data), you start with a pre-trained foundation and adapt it. Fine-tuning GPT for your use case costs orders of magnitude less than training GPT from nothing.

This “pre-train then adapt” paradigm is why AI engineering exploded. You don’t need massive compute budgets or ML research expertise. You need skills in prompting, fine-tuning, and system design around these capable base models.

For AI engineers, foundation models are your building blocks. Choosing the right one (considering cost, capability, license, and deployment constraints) is a key architectural decision.

Implementation Basics

Types of Foundation Models

Language: GPT-5, Claude 4.5, Llama 4 (text in, text out)
Vision-Language: GPT-5, Claude 4.5, LLaVA (image + text)
Code: Codex, CodeLlama, StarCoder (optimized for programming)
Embedding: text-embedding-3, BGE, E5 (text to vectors)
Image Generation: DALL-E, Stable Diffusion, Midjourney
Speech: Whisper (speech-to-text), Eleven Labs (text-to-speech)

Adaptation Strategies

Prompting: Use as-is with well-crafted prompts (easiest)
Few-shot: Include examples in prompts
Fine-tuning: Train on task-specific data (more effort, better results)
RAG: Augment with external knowledge

Selection Criteria

Capability: Can it do what you need?
Cost: Per-token pricing and expected volume
Latency: Response time requirements
Privacy: Can data leave your infrastructure?
License: Commercial use restrictions
Context window: How much input can it handle?

Practical Approach Start with a capable general model (Claude 4.5, GPT-5). Validate it can do the task via prompting. If prompting isn’t sufficient, try fine-tuning. Consider smaller/cheaper models only after validating the approach works, since premature optimization wastes time.

Source

Stanford researchers coined 'foundation model' to describe models trained on broad data at scale that can be adapted to a wide range of downstream tasks.

https://arxiv.org/abs/2108.07258

Why It Matters

Implementation Basics

🎁 Go Beyond Definitions

Related Terms

Related Articles