Back to Glossary
LLM

LLM (Large Language Model)

Definition

A Large Language Model is a neural network with billions of parameters trained on massive text datasets to predict the next token, enabling capabilities like text generation, reasoning, and instruction following.

Why It Matters

LLMs are the foundation of modern AI engineering. They power chatbots, code assistants, document analysis, and countless other applications. Understanding how they work (and their limitations) is essential for building reliable AI systems.

The key insight: LLMs don’t “understand” in the human sense. They predict likely next tokens based on patterns in training data. This explains both their capabilities (generating fluent, contextually appropriate text) and their failure modes (hallucinations, inconsistency, confidently wrong answers).

For AI engineers, LLMs are both tool and subject matter. You’ll use them to build products while developing expertise in their quirks, capabilities, and optimal usage patterns.

Implementation Basics

How LLMs Work

  1. Tokenization: Input text splits into tokens
  2. Embedding: Tokens convert to numerical vectors
  3. Attention: Model weighs relationships between all tokens
  4. Prediction: Output probability distribution over vocabulary
  5. Sampling: Select next token based on probabilities
  6. Repeat: Generate tokens until completion

Key Concepts

  • Parameters: Learned weights (billions in modern models)
  • Context window: How much text the model can “see”
  • Training data: Where the model’s knowledge comes from
  • Fine-tuning: Adapting pre-trained models to specific tasks

Popular Models (2026)

  • OpenAI: GPT-5 (flagship reasoning), o3 (advanced reasoning), o4-mini (efficient reasoning)
  • Anthropic: Claude 4.5 (Opus, Sonnet) with extended thinking
  • Google: Gemini 3 with 2M+ context windows
  • Meta: Llama 4 (open weights, competitive with frontier models)
  • Mistral: Mistral Large 2
  • DeepSeek: DeepSeek-V3, DeepSeek-R1 (open weights, MoE architecture)

Working with LLMs

  • They’re probabilistic, so the same input can produce different outputs
  • They don’t have internet access unless you provide it (RAG)
  • They can be confidently wrong (hallucination)
  • They work best with clear, structured prompts
  • They improve with examples (few-shot learning)

Cost Structure APIs charge per token, with input and output priced separately. Output typically costs more. Model size correlates with capability and cost, so choose the smallest model that meets your quality requirements.

Source

GPT-3, with 175 billion parameters, demonstrated that scaling language models improves performance across diverse NLP tasks without task-specific training.

https://arxiv.org/abs/2005.14165