Tokens
Definition
Tokens are the basic units that LLMs process, subword pieces typically 3-4 characters long. They determine context limits, API costs, and processing speed, with 1,000 tokens roughly equal to 750 words.
Why It Matters
Tokens are the currency of LLMs. Everything costs tokens: your prompts, retrieved documents, conversation history, and generated output. Understanding tokens helps you predict costs, optimize performance, and debug unexpected behavior.
The token count often surprises people. “Hello, how are you?” is 6 tokens, not 4 words. Code is especially token-heavy. Variable names, syntax, and whitespace all consume tokens. A 100-line Python file might use 500+ tokens.
For AI engineers, token awareness is essential. You’ll estimate costs, fit content within limits, and optimize prompts, all requiring token fluency.
Implementation Basics
Tokenization Varies by Model Different models use different tokenizers:
- OpenAI: o200k_base (GPT-5, o3, text-embedding-3)
- Anthropic: Custom tokenizer
- Llama: SentencePiece
The same text produces different token counts across models. Always use the specific model’s tokenizer for accurate counts.
Counting Tokens
- OpenAI:
tiktokenlibrary - Anthropic:
anthropic.count_tokens() - General: Model provider’s tokenizer or API
Rough Estimates
- 1 token ≈ 4 characters (English)
- 1 token ≈ 0.75 words
- 1,000 tokens ≈ 750 words
- Code: roughly 1 token per 2-3 characters
Cost Implications API pricing is per-token, with input and output often priced differently. Output tokens typically cost 2-4x more than input tokens. Long responses cost more than short ones, regardless of input length.
Optimization Strategies
- Shorten system prompts without losing meaning
- Use concise output formats (JSON vs. verbose explanations)
- Truncate retrieved documents to relevant sections
- Consider model choice, smaller models have lower per-token costs
Source
Byte-Pair Encoding (BPE) tokenization iteratively merges frequent character pairs into tokens, creating a vocabulary that balances efficiency and coverage.
https://arxiv.org/abs/1508.07909