Temperature
Definition
Temperature is a parameter that controls LLM output randomness, where lower values (0-0.3) produce deterministic, focused responses and higher values (0.7-1.0) increase creativity and variation.
Why It Matters
Temperature is your reliability dial. At temperature 0, the model always picks the most likely next token, deterministic, consistent, but potentially repetitive. At temperature 1, it samples more broadly from probable tokens, varied, creative, but less predictable.
This matters enormously for production systems. Code generation, data extraction, and factual Q&A need low temperature (0-0.3) for consistency. Creative writing, brainstorming, and chat applications often benefit from higher temperatures (0.7-0.9).
For AI engineers, temperature is one of the first parameters to tune. Wrong temperature leads to either robotic repetition or unreliable randomness. Getting it right balances consistency with natural variation.
Implementation Basics
Typical Settings
- 0.0: Deterministic, same input always produces same output
- 0.2-0.3: Slight variation, mostly consistent (code, extraction)
- 0.5-0.7: Balanced (general chat, assistance)
- 0.8-1.0: More creative (writing, brainstorming)
- >1.0: High randomness (usually too chaotic for production)
How It Works Before selecting each token, the model calculates probabilities for all possible next tokens. Temperature scales these probabilities:
- Low temperature: High-probability tokens dominate
- High temperature: Lower-probability tokens get more chance
Practical Guidelines
- Structured output (JSON, code): Temperature 0-0.2
- Factual answers: Temperature 0-0.3
- Conversational responses: Temperature 0.5-0.7
- Creative writing: Temperature 0.7-0.9
Testing Approach Run the same prompt 10 times at your chosen temperature. Inspect the variation. If outputs are too similar for your use case, raise temperature. If too unpredictable, lower it. Find the sweet spot for your specific task.
Interaction with Top-P Temperature and top_p both control randomness. Using both can create unexpected behavior. Pick one to adjust, keep the other at default. Most practitioners prefer temperature for its intuitive behavior.
Source
Temperature values between 0 and 2 control sampling randomness, with higher values making output more random and lower values more deterministic.
https://platform.openai.com/docs/api-reference/chat/create