Back to Glossary
LLM

Tokens

Definition

Tokens are the basic units that LLMs process, subword pieces typically 3-4 characters long. They determine context limits, API costs, and processing speed, with 1,000 tokens roughly equal to 750 words.

Why It Matters

Tokens are the currency of LLMs. Everything costs tokens: your prompts, retrieved documents, conversation history, and generated output. Understanding tokens helps you predict costs, optimize performance, and debug unexpected behavior.

The token count often surprises people. “Hello, how are you?” is 6 tokens, not 4 words. Code is especially token-heavy. Variable names, syntax, and whitespace all consume tokens. A 100-line Python file might use 500+ tokens.

For AI engineers, token awareness is essential. You’ll estimate costs, fit content within limits, and optimize prompts, all requiring token fluency.

Implementation Basics

Tokenization Varies by Model Different models use different tokenizers:

  • OpenAI: o200k_base (GPT-5, o3, text-embedding-3)
  • Anthropic: Custom tokenizer
  • Llama: SentencePiece

The same text produces different token counts across models. Always use the specific model’s tokenizer for accurate counts.

Counting Tokens

  • OpenAI: tiktoken library
  • Anthropic: anthropic.count_tokens()
  • General: Model provider’s tokenizer or API

Rough Estimates

  • 1 token ≈ 4 characters (English)
  • 1 token ≈ 0.75 words
  • 1,000 tokens ≈ 750 words
  • Code: roughly 1 token per 2-3 characters

Cost Implications API pricing is per-token, with input and output often priced differently. Output tokens typically cost 2-4x more than input tokens. Long responses cost more than short ones, regardless of input length.

Optimization Strategies

  • Shorten system prompts without losing meaning
  • Use concise output formats (JSON vs. verbose explanations)
  • Truncate retrieved documents to relevant sections
  • Consider model choice, smaller models have lower per-token costs

Source

Byte-Pair Encoding (BPE) tokenization iteratively merges frequent character pairs into tokens, creating a vocabulary that balances efficiency and coverage.

https://arxiv.org/abs/1508.07909